Together AI Pricing

Last updated: just now

Together AI is an inference provider that hosts open-source models like Llama, DeepSeek, Qwen, and Mistral at competitive per-token prices. Below you'll find Together AI's API pricing for popular models, a cost calculator, and comparisons with direct provider pricing.

$0.180

Cheapest ($/1M out)

Llama 3.1 8B Instruct Turbo

$7.00

Most Expensive ($/1M out)

DeepSeek R1

15+

Models Hosted

$1.37

Avg Output $/1M

What is Together AI?

Together AI is a cloud inference platform -- not a model creator. They host and serve open-source models built by Meta, DeepSeek, Mistral, Alibaba, Google, and others. Think of them as the "AWS for open-source AI": they handle the GPU infrastructure so you just pay per token.

This makes them fundamentally different from OpenAI, Anthropic, or Google, which build and host their own proprietary models. Together AI's advantage is offering competitive pricing on the best open-source models, often cheaper than running them yourself on cloud GPUs.

Together AI Model Pricing

15 popular models sorted by output price

Model	Provider	Input $/1M	Output $/1M	Context	Category
Llama 3.1 8B Instruct Turbo	Meta	$0.180	$0.180	131K	chat
Qwen 2.5 7B Instruct Turbo	Alibaba	$0.300	$0.300	33K	chat
Gemma 2 9B	Google	$0.300	$0.300	8K	chat
Mixtral 8x7B Instruct	Mistral AI	$0.600	$0.600	33K	chat
Qwen 2.5-Coder 32B Instruct	Alibaba	$0.800	$0.800	33K	code
Mistral Small 24B Instruct	Mistral AI	$0.800	$0.800	33K	chat
Gemma 2 27B	Google	$0.800	$0.800	8K	chat
Llama 3.3 70B Instruct Turbo	Meta	$0.880	$0.880	131K	chat
DeepSeek R1 Distill Llama 70B	DeepSeek	$0.880	$0.880	131K	reasoning
DeepSeek V3	DeepSeek	$0.900	$0.900	131K	chat
Qwen 2.5 72B Instruct Turbo	Alibaba	$1.20	$1.20	131K	chat
Mixtral 8x22B Instruct	Mistral AI	$1.20	$1.20	66K	chat
DBRX Instruct	Databricks	$1.20	$1.20	33K	chat
Llama 3.1 405B Instruct Turbo	Meta	$3.50	$3.50	131K	chat
DeepSeek R1	DeepSeek	$3.00	$7.00	131K	reasoning

Prices sourced from Together AI's public pricing page. Together AI hosts many more models -- these are the most popular options. View full Together AI pricing

Together AI vs Direct Provider Pricing

How Together AI's inference pricing compares to accessing models directly from OpenAI, Anthropic, and Google. Together hosts open-source alternatives at a fraction of the cost.

Together AI

Open-source models

Model	In	Out
Llama 3.1 8B Instruct Turbo	$0.180	$0.180
Qwen 2.5 7B Instruct Turbo	$0.300	$0.300
Gemma 2 9B	$0.300	$0.300
Mixtral 8x7B Instruct	$0.600	$0.600
Qwen 2.5-Coder 32B Instruct	$0.800	$0.800
Mistral Small 24B Instruct	$0.800	$0.800
Gemma 2 27B	$0.800	$0.800
Llama 3.3 70B Instruct Turbo	$0.880	$0.880

OpenAI

59 models

Model	In	Out
gpt-oss-20b	$0.030	$0.140
gpt-oss-120b	$0.039	$0.190
gpt-oss-safeguard-20b	$0.075	$0.300
GPT-5 Nano	$0.050	$0.400
GPT-4.1 Nano	$0.100	$0.400
GPT-4o-mini Search Preview	$0.150	$0.600
GPT-4o-mini (2024-07-18)	$0.150	$0.600
GPT-4o-mini	$0.150	$0.600

Anthropic

13 models

Model	In	Out
Claude 3 Haiku	$0.250	$1.25
Claude 3.5 Haiku	$0.800	$4.00
Claude Haiku 4.5	$1.00	$5.00
Claude Sonnet 4.6	$3.00	$15.00
Claude Sonnet 4.5	$3.00	$15.00
Claude Sonnet 4	$3.00	$15.00
Claude 3.7 Sonnet	$3.00	$15.00
Claude 3.7 Sonnet (thinking)	$3.00	$15.00

Open-Source Models -- Live Pricing via OpenRouter

155 models -- the kind of models Together AI hosts

Model	Provider	Input $/1M	Output $/1M	Context	Score
LFM2-8B-A1B	Liquid AI	$0.010	$0.020	33K	45
LFM2-2.6B	Liquid AI	$0.010	$0.020	33K	45
Gemma 3n 4B	Google	$0.020	$0.040	33K	41
Mistral Nemo	Mistral AI	$0.020	$0.040	131K	54
Llama 3 8B Instruct	Meta	$0.030	$0.040	8K	37
Llama 3.2 11B Vision Instruct	Meta	$0.049	$0.049	131K	56
Llama 3.1 8B Instruct	Meta	$0.020	$0.050	16K	46
Llama Guard 3 8B	Meta	$0.020	$0.060	131K	39
Gemma 3 4B	Google	$0.040	$0.080	131K	55
Mistral Small 3	Mistral AI	$0.050	$0.080	33K	59
Qwen2.5 Coder 7B Instruct	Alibaba	$0.030	$0.090	33K	45
Gemma 2 9B	Google	$0.030	$0.090	8K	29
Ministral 3 3B 2512	Mistral AI	$0.100	$0.100	131K	69
Qwen3 235B A22B Instruct 2507	Alibaba	$0.071	$0.100	262K	68
Qwen2.5 7B Instruct	Alibaba	$0.040	$0.100	33K	45

View all 155 open-source models

Together AI Cost Calculator

Estimated daily and monthly costs for common usage patterns on Together AI. Assumes an average of ~1,000 input tokens and ~500 output tokens per request.

Model	$/1M In	$/1M Out	100 req/day	1,000 req/day	10,000 req/day	Monthly (1K/d)
Llama 3.1 8B Instruct Turbo	$0.180	$0.180	$0.03/d	$0.27/d	$2.70/d	$8.10/mo
Mixtral 8x7B Instruct	$0.600	$0.600	$0.09/d	$0.90/d	$9.00/d	$27.00/mo
Llama 3.3 70B Instruct Turbo	$0.880	$0.880	$0.13/d	$1.32/d	$13.20/d	$39.60/mo
Mixtral 8x22B Instruct	$1.20	$1.20	$0.18/d	$1.80/d	$18.00/d	$54.00/mo
DeepSeek R1	$3.00	$7.00	$0.65/d	$6.50/d	$65.00/d	$195.00/mo

Note: Actual costs depend on prompt length, response length, and whether you use features like fine-tuned models or dedicated endpoints (which have different pricing). Try the interactive calculator for custom estimates across all providers.

Understanding Together AI Pricing

Inference Provider Model

Together AI charges per token for API calls. Unlike model creators (OpenAI, Anthropic), Together hosts open-source models on optimized GPU clusters. Their pricing reflects infrastructure cost, not R&D. This means you get access to powerful models like Llama 3.3 70B for under $1/1M tokens -- a fraction of what proprietary models cost.

Serverless vs Dedicated

Together offers two deployment modes. Serverless is pay-per-token with no minimum commitment -- ideal for development and variable workloads. Dedicated deployments give you reserved GPU capacity with guaranteed throughput, priced per GPU-hour rather than per token.

OpenAI-Compatible API

Together AI uses an OpenAI-compatible API format. If your application already works with OpenAI's SDK, switching to Together AI is as simple as changing the base URL and API key. This makes it easy to test open-source models as cheaper alternatives to GPT-4o or Claude without rewriting your code.

Fine-Tuning Costs

Beyond inference, Together AI offers fine-tuning starting at competitive per-token training rates. Fine-tuned models are served at the same inference price as base models, making it cost-effective to customize open-source models for your specific use case. This is significantly cheaper than fine-tuning through OpenAI.

Cheapest AI Models

Find the most affordable models across all providers and inference platforms.

Open-Source AI Models

Browse all open-source models -- the kind Together AI hosts.

All AI Pricing

Compare pricing for coding, image generation, and more.

OpenAI API Pricing

Compare Together AI pricing with GPT-4o and all OpenAI models.

Anthropic API Pricing

Compare with Claude Opus, Sonnet, and Haiku pricing.

Google AI Pricing

Compare with Gemini 2.5 Pro, Flash, and all Google models.

Together AI Pricing FAQ

Together AI is a cloud inference platform that hosts open-source AI models. Unlike OpenAI or Anthropic, Together does not create its own models -- it provides optimized infrastructure to run models from Meta (Llama), DeepSeek, Mistral, Alibaba (Qwen), Google (Gemma), and others. They offer fast inference, fine-tuning, and dedicated deployments.

Together AI pricing is per-token and varies by model. The cheapest model is Llama 3.1 8B Instruct Turbo at $0.180/1M output tokens. The most expensive is DeepSeek R1 at $7.00/1M output tokens. Average output cost across their catalog is $1.37/1M tokens.

For open-source models, Together AI is typically much cheaper than OpenAI's proprietary models. For example, Llama 3.3 70B on Together costs $0.88/1M tokens vs GPT-4o at $10/1M output tokens. However, the comparison isn't apples-to-apples since OpenAI's models may perform better on certain tasks. Together AI is best for teams that want high-quality open-source models at low cost.

Self-hosting open-source models requires GPU infrastructure (e.g., A100 or H100 GPUs costing $1-3/hr). Together AI eliminates this overhead with pay-per-token pricing. For low to moderate usage, Together is significantly cheaper. For very high volume (millions of requests/day), self-hosting may break even, but you take on the operational burden of managing GPUs, scaling, and keeping models updated.

Together AI offers $5 in free credits for new accounts, which is enough for thousands of API calls with smaller models. After the free credits run out, you pay per token based on the model you use. There is no ongoing free tier for production usage.

Together AI hosts 15+ popular open-source models including Llama 3.3, DeepSeek R1, DeepSeek V3, Qwen 2.5, Mistral, Mixtral, Gemma 2, and more. They regularly add new models as they are released. All models are accessible through an OpenAI-compatible API, making it easy to switch between providers.

Together AI competes with Groq, Fireworks, Anyscale, and Replicate in the inference provider space. Together is known for competitive pricing and broad model support. Groq focuses on speed with custom LPU hardware. Fireworks emphasizes low-latency serving. The best choice depends on your priorities: cost, speed, or model selection.

Yes. Together AI supports fine-tuning for Llama, Mistral, and other open-source models. Fine-tuning costs vary by model size and training duration. Fine-tuned models can then be deployed on Together's infrastructure at similar per-token pricing, making it a full platform for customized AI deployments.

All Pricing Cheapest AI Models Open-Source Models OpenAI Pricing Anthropic Pricing Google Pricing Model Comparison LLM Leaderboard

Model

Input $/1M

Output $/1M

Llama 3.1 8B Instruct Turbo

$0.180

Qwen 2.5 7B Instruct Turbo

$0.300

Gemma 2 9B

$0.300

Mixtral 8x7B Instruct

$0.600

Qwen 2.5-Coder 32B Instruct

$0.800

Mistral Small 24B Instruct

$0.800

Gemma 2 27B

$0.800

Llama 3.3 70B Instruct Turbo

$0.880

DeepSeek R1 Distill Llama 70B

$0.880

DeepSeek V3

$0.900

Qwen 2.5 72B Instruct Turbo

$1.20

Mixtral 8x22B Instruct

$1.20

DBRX Instruct

$1.20

Llama 3.1 405B Instruct Turbo

$3.50

DeepSeek R1

$3.00

$7.00

Together AI vs Direct Provider Pricing

How Together AI's inference pricing compares to accessing models directly from OpenAI, Anthropic, and Google. Together hosts open-source alternatives at a fraction of the cost.

Together AI

Open-source models

Model	In	Out
Llama 3.1 8B Instruct Turbo	$0.180	$0.180
Qwen 2.5 7B Instruct Turbo	$0.300	$0.300
Gemma 2 9B	$0.300	$0.300
Mixtral 8x7B Instruct	$0.600	$0.600
Qwen 2.5-Coder 32B Instruct	$0.800	$0.800
Mistral Small 24B Instruct	$0.800	$0.800
Gemma 2 27B	$0.800	$0.800
Llama 3.3 70B Instruct Turbo	$0.880	$0.880

OpenAI

59 models

Model	In	Out
gpt-oss-20b	$0.030	$0.140
gpt-oss-120b	$0.039	$0.190
gpt-oss-safeguard-20b	$0.075	$0.300
GPT-5 Nano	$0.050	$0.400
GPT-4.1 Nano	$0.100	$0.400
GPT-4o-mini Search Preview	$0.150	$0.600
GPT-4o-mini (2024-07-18)	$0.150	$0.600
GPT-4o-mini	$0.150	$0.600

Anthropic

13 models

Model	In	Out
Claude 3 Haiku	$0.250	$1.25
Claude 3.5 Haiku	$0.800	$4.00
Claude Haiku 4.5	$1.00	$5.00
Claude Sonnet 4.6	$3.00	$15.00
Claude Sonnet 4.5	$3.00	$15.00
Claude Sonnet 4	$3.00	$15.00
Claude 3.7 Sonnet	$3.00	$15.00
Claude 3.7 Sonnet (thinking)	$3.00	$15.00