Together AI is an inference provider that hosts open-source models like Llama, DeepSeek, Qwen, and Mistral at competitive per-token prices. Below you'll find Together AI's API pricing for popular models, a cost calculator, and comparisons with direct provider pricing.
Together AI is a cloud inference platform -- not a model creator. They host and serve open-source models built by Meta, DeepSeek, Mistral, Alibaba, Google, and others. Think of them as the "AWS for open-source AI": they handle the GPU infrastructure so you just pay per token.
This makes them fundamentally different from OpenAI, Anthropic, or Google, which build and host their own proprietary models. Together AI's advantage is offering competitive pricing on the best open-source models, often cheaper than running them yourself on cloud GPUs.
| Model | Input $/1M | Output $/1M |
|---|---|---|
| Llama 3.1 8B Instruct Turbo | $0.180 | $0.180 |
| Qwen 2.5 7B Instruct Turbo | $0.300 | $0.300 |
| Gemma 2 9B | $0.300 | $0.300 |
| Mixtral 8x7B Instruct | $0.600 | $0.600 |
| Qwen 2.5-Coder 32B Instruct | $0.800 | $0.800 |
| Mistral Small 24B Instruct | $0.800 | $0.800 |
| Gemma 2 27B | $0.800 | $0.800 |
| Llama 3.3 70B Instruct Turbo | $0.880 | $0.880 |
| DeepSeek R1 Distill Llama 70B | $0.880 | $0.880 |
| DeepSeek V3 | $0.900 | $0.900 |
| Qwen 2.5 72B Instruct Turbo | $1.20 | $1.20 |
| Mixtral 8x22B Instruct | $1.20 | $1.20 |
| DBRX Instruct | $1.20 | $1.20 |
| Llama 3.1 405B Instruct Turbo | $3.50 | $3.50 |
| DeepSeek R1 | $3.00 | $7.00 |
Prices sourced from Together AI's public pricing page. Together AI hosts many more models -- these are the most popular options. View full Together AI pricing
How Together AI's inference pricing compares to accessing models directly from OpenAI, Anthropic, and Google. Together hosts open-source alternatives at a fraction of the cost.
| Model | In | Out |
|---|---|---|
| Llama 3.1 8B Instruct Turbo | $0.180 | $0.180 |
| Qwen 2.5 7B Instruct Turbo | $0.300 | $0.300 |
| Gemma 2 9B | $0.300 | $0.300 |
| Mixtral 8x7B Instruct | $0.600 | $0.600 |
| Qwen 2.5-Coder 32B Instruct | $0.800 | $0.800 |
| Mistral Small 24B Instruct | $0.800 | $0.800 |
| Gemma 2 27B | $0.800 | $0.800 |
| Llama 3.3 70B Instruct Turbo | $0.880 | $0.880 |
| Model | In | Out |
|---|---|---|
| gpt-oss-20b | $0.030 | $0.140 |
| gpt-oss-120b | $0.039 | $0.190 |
| gpt-oss-safeguard-20b | $0.075 | $0.300 |
| GPT-5 Nano | $0.050 | $0.400 |
| GPT-4.1 Nano | $0.100 | $0.400 |
| GPT-4o-mini Search Preview | $0.150 | $0.600 |
| GPT-4o-mini (2024-07-18) | $0.150 | $0.600 |
| GPT-4o-mini | $0.150 | $0.600 |
| Model | In | Out |
|---|---|---|
| Claude 3 Haiku | $0.250 | $1.25 |
| Claude 3.5 Haiku | $0.800 | $4.00 |
| Claude Haiku 4.5 | $1.00 | $5.00 |
| Claude Sonnet 4.6 | $3.00 | $15.00 |
| Claude Sonnet 4.5 | $3.00 | $15.00 |
| Claude Sonnet 4 | $3.00 | $15.00 |
| Claude 3.7 Sonnet | $3.00 | $15.00 |
| Claude 3.7 Sonnet (thinking) | $3.00 | $15.00 |
| Model | Input $/1M | Output $/1M |
|---|---|---|
| LFM2-8B-A1B | $0.010 | $0.020 |
| LFM2-2.6B | $0.010 | $0.020 |
| Gemma 3n 4B | $0.020 | $0.040 |
| Mistral Nemo | $0.020 | $0.040 |
| Llama 3 8B Instruct | $0.030 | $0.040 |
| Llama 3.2 11B Vision Instruct | $0.049 | $0.049 |
| Llama 3.1 8B Instruct | $0.020 | $0.050 |
| Llama Guard 3 8B | $0.020 | $0.060 |
| Gemma 3 4B | $0.040 | $0.080 |
| Mistral Small 3 | $0.050 | $0.080 |
| Qwen2.5 Coder 7B Instruct | $0.030 | $0.090 |
| Gemma 2 9B | $0.030 | $0.090 |
| Ministral 3 3B 2512 | $0.100 | $0.100 |
| Qwen3 235B A22B Instruct 2507 | $0.071 | $0.100 |
| Qwen2.5 7B Instruct | $0.040 | $0.100 |
Estimated daily and monthly costs for common usage patterns on Together AI. Assumes an average of ~1,000 input tokens and ~500 output tokens per request.
| Model | $/1M In | $/1M Out |
|---|---|---|
| Llama 3.1 8B Instruct Turbo | $0.180 | $0.180 |
| Mixtral 8x7B Instruct | $0.600 | $0.600 |
| Llama 3.3 70B Instruct Turbo | $0.880 | $0.880 |
| Mixtral 8x22B Instruct | $1.20 | $1.20 |
| DeepSeek R1 | $3.00 | $7.00 |
Note: Actual costs depend on prompt length, response length, and whether you use features like fine-tuned models or dedicated endpoints (which have different pricing). Try the interactive calculator for custom estimates across all providers.
Together AI charges per token for API calls. Unlike model creators (OpenAI, Anthropic), Together hosts open-source models on optimized GPU clusters. Their pricing reflects infrastructure cost, not R&D. This means you get access to powerful models like Llama 3.3 70B for under $1/1M tokens -- a fraction of what proprietary models cost.
Together offers two deployment modes. Serverless is pay-per-token with no minimum commitment -- ideal for development and variable workloads. Dedicated deployments give you reserved GPU capacity with guaranteed throughput, priced per GPU-hour rather than per token.
Together AI uses an OpenAI-compatible API format. If your application already works with OpenAI's SDK, switching to Together AI is as simple as changing the base URL and API key. This makes it easy to test open-source models as cheaper alternatives to GPT-4o or Claude without rewriting your code.
Beyond inference, Together AI offers fine-tuning starting at competitive per-token training rates. Fine-tuned models are served at the same inference price as base models, making it cost-effective to customize open-source models for your specific use case. This is significantly cheaper than fine-tuning through OpenAI.
Find the most affordable models across all providers and inference platforms.
Browse all open-source models -- the kind Together AI hosts.
Compare pricing for coding, image generation, and more.
Compare Together AI pricing with GPT-4o and all OpenAI models.
Compare with Claude Opus, Sonnet, and Haiku pricing.
Compare with Gemini 2.5 Pro, Flash, and all Google models.
Together AI is a cloud inference platform that hosts open-source AI models. Unlike OpenAI or Anthropic, Together does not create its own models -- it provides optimized infrastructure to run models from Meta (Llama), DeepSeek, Mistral, Alibaba (Qwen), Google (Gemma), and others. They offer fast inference, fine-tuning, and dedicated deployments.
Together AI pricing is per-token and varies by model. The cheapest model is Llama 3.1 8B Instruct Turbo at $0.180/1M output tokens. The most expensive is DeepSeek R1 at $7.00/1M output tokens. Average output cost across their catalog is $1.37/1M tokens.
For open-source models, Together AI is typically much cheaper than OpenAI's proprietary models. For example, Llama 3.3 70B on Together costs $0.88/1M tokens vs GPT-4o at $10/1M output tokens. However, the comparison isn't apples-to-apples since OpenAI's models may perform better on certain tasks. Together AI is best for teams that want high-quality open-source models at low cost.
Self-hosting open-source models requires GPU infrastructure (e.g., A100 or H100 GPUs costing $1-3/hr). Together AI eliminates this overhead with pay-per-token pricing. For low to moderate usage, Together is significantly cheaper. For very high volume (millions of requests/day), self-hosting may break even, but you take on the operational burden of managing GPUs, scaling, and keeping models updated.
Together AI offers $5 in free credits for new accounts, which is enough for thousands of API calls with smaller models. After the free credits run out, you pay per token based on the model you use. There is no ongoing free tier for production usage.
Together AI hosts 15+ popular open-source models including Llama 3.3, DeepSeek R1, DeepSeek V3, Qwen 2.5, Mistral, Mixtral, Gemma 2, and more. They regularly add new models as they are released. All models are accessible through an OpenAI-compatible API, making it easy to switch between providers.
Together AI competes with Groq, Fireworks, Anyscale, and Replicate in the inference provider space. Together is known for competitive pricing and broad model support. Groq focuses on speed with custom LPU hardware. Fireworks emphasizes low-latency serving. The best choice depends on your priorities: cost, speed, or model selection.
Yes. Together AI supports fine-tuning for Llama, Mistral, and other open-source models. Fine-tuning costs vary by model size and training duration. Fine-tuned models can then be deployed on Together's infrastructure at similar per-token pricing, making it a full platform for customized AI deployments.