Comprehensive pricing comparison for AI model APIs. All prices are per 1M tokens (input and output priced separately). Prices reflect the latest published rates as of March 2026.
| Model | Provider | Input / 1M | Output / 1M | Context | Tier |
|---|---|---|---|---|---|
| Claude Opus 4.6 | Anthropic | $15.00 | $75.00 | 200K | Premium |
| GPT-5.2 | OpenAI | $10.00 | $30.00 | 256K | Premium |
| o3 | OpenAI | $10.00 | $40.00 | 200K | Premium |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 200K | Standard |
| Gemini 3 Pro | $3.50 | $10.50 | 2M | Standard | |
| GPT-4.1 | OpenAI | $2.00 | $8.00 | 1M | Standard |
| Grok 4.1 | xAI | $3.00 | $15.00 | 131K | Standard |
| Mistral Large 3 | Mistral | $2.00 | $6.00 | 128K | Standard |
| Claude Haiku 4 | Anthropic | $0.80 | $4.00 | 200K | Budget |
| DeepSeek R1 | DeepSeek | $0.55 | $2.19 | 128K | Budget |
| DeepSeek V3.1 | DeepSeek | $0.27 | $1.10 | 128K | Budget |
| Gemini 2.5 Flash | $0.15 | $0.60 | 1M | Budget | |
| GPT-4.1 mini | OpenAI | $0.40 | $1.60 | 1M | Budget |
| Gemma 3 27B | Free | Free | 128K | Free / OSS | |
| Llama 4 Maverick | Meta | Free | Free | 1M | Free / OSS |
| Llama 4 Scout | Meta | Free | Free | 10M | Free / OSS |
| Qwen 3.5 397B | Alibaba | Free | Free | 128K | Free / OSS |
Prices are for standard API access. Batch and cached pricing may differ. "Free" refers to open-weight models; self-hosting compute costs apply.
AI model pricing generally falls into four tiers. Choose based on your performance requirements and budget.
$0
Models with open weights that can be self-hosted at no API cost. Includes Llama 4, Qwen 3.5, and Gemma 3. You pay only for compute if self-hosting.
Best for: Privacy-sensitive workloads, experimentation, cost-conscious teams
$0.15 — $2.19 / 1M tokens
Lightweight or optimized models offering strong performance at very low cost. DeepSeek V3.1, Gemini Flash, and GPT-4.1 mini fall in this range.
Best for: High-volume production, chatbots, classification, summarization
$2.00 — $15.00 / 1M tokens
The workhorse tier. Models like Claude Sonnet 4.6, Gemini 3 Pro, and Grok 4.1 deliver excellent reasoning and coding at moderate prices.
Best for: Most production use cases, coding assistants, complex analysis
$10.00 — $75.00 / 1M tokens
Frontier models offering the absolute best performance. Claude Opus 4.6, GPT-5.2, and o3 push the boundaries of reasoning, coding, and creativity.
Best for: Research, complex reasoning, agentic workflows, code generation
Every provider charges significantly more for output (generated) tokens than input (prompt) tokens. For cost optimization, keep prompts detailed but request concise outputs.
Self-hosting Llama 4 Maverick or Qwen 3.5 eliminates per-token costs entirely. For high-volume workloads (10M+ tokens/day), the infrastructure savings can be substantial.
Models like Claude Sonnet 4.6 and Gemini 3 Pro achieve 90%+ of frontier performance at a fraction of the cost. Most production applications should start here.
Use our AI price calculator to estimate monthly costs based on your specific workload: tokens per request, requests per day, and model choice.