AI Model API Pricing

Comprehensive pricing comparison for AI model APIs. All prices are per 1M tokens (input and output priced separately). Prices reflect the latest published rates as of March 2026.

Model	Provider	Input / 1M	Output / 1M	Context	Tier
Claude Opus 4.6	Anthropic	$15.00	$75.00	200K	Premium
GPT-5.2	OpenAI	$10.00	$30.00	256K	Premium
o3	OpenAI	$10.00	$40.00	200K	Premium
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	200K	Standard
Gemini 3 Pro	Google	$3.50	$10.50	2M	Standard
GPT-4.1	OpenAI	$2.00	$8.00	1M	Standard
Grok 4.1	xAI	$3.00	$15.00	131K	Standard
Mistral Large 3	Mistral	$2.00	$6.00	128K	Standard
Claude Haiku 4	Anthropic	$0.80	$4.00	200K	Budget
DeepSeek R1	DeepSeek	$0.55	$2.19	128K	Budget
DeepSeek V3.1	DeepSeek	$0.27	$1.10	128K	Budget
Gemini 2.5 Flash	Google	$0.15	$0.60	1M	Budget
GPT-4.1 mini	OpenAI	$0.40	$1.60	1M	Budget
Gemma 3 27B	Google	Free	Free	128K	Free / OSS
Llama 4 Maverick	Meta	Free	Free	1M	Free / OSS
Llama 4 Scout	Meta	Free	Free	10M	Free / OSS
Qwen 3.5 397B	Alibaba	Free	Free	128K	Free / OSS

Prices are for standard API access. Batch and cached pricing may differ. "Free" refers to open-weight models; self-hosting compute costs apply.

Understanding Pricing Tiers

AI model pricing generally falls into four tiers. Choose based on your performance requirements and budget.

Free / Open-Source

Models with open weights that can be self-hosted at no API cost. Includes Llama 4, Qwen 3.5, and Gemma 3. You pay only for compute if self-hosting.

Best for: Privacy-sensitive workloads, experimentation, cost-conscious teams

Budget

$0.15 — $2.19 / 1M tokens

Lightweight or optimized models offering strong performance at very low cost. DeepSeek V3.1, Gemini Flash, and GPT-4.1 mini fall in this range.

Best for: High-volume production, chatbots, classification, summarization

Standard

$2.00 — $15.00 / 1M tokens

The workhorse tier. Models like Claude Sonnet 4.6, Gemini 3 Pro, and Grok 4.1 deliver excellent reasoning and coding at moderate prices.

Best for: Most production use cases, coding assistants, complex analysis

Premium

$10.00 — $75.00 / 1M tokens

Frontier models offering the absolute best performance. Claude Opus 4.6, GPT-5.2, and o3 push the boundaries of reasoning, coding, and creativity.

Best for: Research, complex reasoning, agentic workflows, code generation

Key Takeaways

Output tokens cost 2-5x more than input tokens

Every provider charges significantly more for output (generated) tokens than input (prompt) tokens. For cost optimization, keep prompts detailed but request concise outputs.

Open-source models offer the best value at scale

Self-hosting Llama 4 Maverick or Qwen 3.5 eliminates per-token costs entirely. For high-volume workloads (10M+ tokens/day), the infrastructure savings can be substantial.

The "standard" tier offers the best performance per dollar

Models like Claude Sonnet 4.6 and Gemini 3 Pro achieve 90%+ of frontier performance at a fraction of the cost. Most production applications should start here.

Calculate Your Actual Costs

Use our AI price calculator to estimate monthly costs based on your specific workload: tokens per request, requests per day, and model choice.

Price Calculator|Pricing by Category|Model Rankings

Model

Provider

Input / 1M

Output / 1M

Context

Tier

Claude Opus 4.6

Anthropic

$15.00

$75.00

200K

Premium

GPT-5.2

OpenAI

$10.00

$30.00

256K

Premium

OpenAI

$10.00

$40.00

200K

Premium

Claude Sonnet 4.6

Anthropic

$3.00

$15.00

200K

Standard

Gemini 3 Pro

Google

$3.50

$10.50

Standard

GPT-4.1

OpenAI

$2.00

$8.00

Standard

Grok 4.1

xAI

$3.00

$15.00

131K

Standard

Mistral Large 3

Mistral

$2.00

$6.00

128K

Standard

Claude Haiku 4

Anthropic

$0.80

$4.00

200K

Budget

DeepSeek R1

DeepSeek

$0.55

$2.19

128K

Budget

DeepSeek V3.1

DeepSeek

$0.27

$1.10

128K

Budget

Gemini 2.5 Flash

Google

$0.15

$0.60

Budget

GPT-4.1 mini

OpenAI

$0.40

$1.60

Budget

Gemma 3 27B

Google

Free

128K

Free / OSS

Llama 4 Maverick

Free / Open-Source

Models with open weights that can be self-hosted at no API cost. Includes Llama 4, Qwen 3.5, and Gemma 3. You pay only for compute if self-hosting.

Best for: Privacy-sensitive workloads, experimentation, cost-conscious teams

Budget

$0.15 — $2.19 / 1M tokens

Lightweight or optimized models offering strong performance at very low cost. DeepSeek V3.1, Gemini Flash, and GPT-4.1 mini fall in this range.

Best for: High-volume production, chatbots, classification, summarization

Standard

$2.00 — $15.00 / 1M tokens

The workhorse tier. Models like Claude Sonnet 4.6, Gemini 3 Pro, and Grok 4.1 deliver excellent reasoning and coding at moderate prices.

Best for: Most production use cases, coding assistants, complex analysis

Premium

$10.00 — $75.00 / 1M tokens

Frontier models offering the absolute best performance. Claude Opus 4.6, GPT-5.2, and o3 push the boundaries of reasoning, coding, and creativity.

Best for: Research, complex reasoning, agentic workflows, code generation

Key Takeaways

Output tokens cost 2-5x more than input tokens

Every provider charges significantly more for output (generated) tokens than input (prompt) tokens. For cost optimization, keep prompts detailed but request concise outputs.

Open-source models offer the best value at scale

Self-hosting Llama 4 Maverick or Qwen 3.5 eliminates per-token costs entirely. For high-volume workloads (10M+ tokens/day), the infrastructure savings can be substantial.

The "standard" tier offers the best performance per dollar

Models like Claude Sonnet 4.6 and Gemini 3 Pro achieve 90%+ of frontier performance at a fraction of the cost. Most production applications should start here.