Context Window Efficiency Explorer

Analyzes score-per-context-token ratio across 293 AI models to find those that make the best use of their context window, output capacity, and cost.

Efficiency Overview

Key efficiency metrics across all analyzed models.

Most Efficient (128K+)

Qwen3 VL 30B A3B Thinking

523.4 score/MToken

Best Output Efficiency

Inflection 3 Pi

29.0 score/1K output

Best Cost Efficiency

LFM2-8B-A1B

2313.3 score/$

Avg Overall Efficiency

7.6%

normalized across all models

Efficiency Rankings

Top 50 models ranked by score per million context tokens.

#	Model	Provider	Score	Context	Output	Score/MToken	Output Eff.	Cost Eff.	Tier
1	Llemma 7beleutherai	eleutherai	33	4K	4K	8032.2	8.0	32.9	Small
2	GPT-3.5 Turbo (older v0613)OpenAI	OpenAI	29	4K	4K	7130.6	7.1	19.5	Small
3	GPT-3.5 Turbo InstructOpenAI	OpenAI	26	4K	4K	6251.5	6.3	14.6	Small
4	Mistral 7B Instruct v0.1Mistral AI	Mistral AI	17	3K	--	6090.7	--	114.7	Small
5	Gemma 3n 2B (free)Google	Google	46	8K	2K	5639.6	22.6	Free	Small
6	Gemma 3n 4B (free)Google	Google	45	8K	2K	5468.8	21.9	Free	Small
7	GPT-4 (older v0314)OpenAI	OpenAI	44	8K	4K	5420.6	10.8	1.0	Small
8	GPT-4OpenAI	OpenAI	44	8K	4K	5420.6	10.8	1.0	Small
9	Llama 3 8B InstructMeta	Meta	32	8K	16K	3857.4	1.9	902.9	Small
10	Inflection 3 PiInflection	Inflection	30	8K	1K	3712.5	29.0	4.8	Small
11	Inflection 3 ProductivityInflection	Inflection	30	8K	1K	3712.5	29.0	4.8	Small
12	Gemma 2 27BGoogle	Google	29	8K	2K	3540.0	14.2	44.6	Small
13	Llama 3 70B InstructMeta	Meta	28	8K	8K	3381.3	3.5	44.3	Small
14	Gemma 2 9BGoogle	Google	21	8K	--	2600.1	--	355.0	Small
15	LlamaGuard 2 8BMeta	Meta	20	8K	--	2453.6	--	100.5	Small
16	Phi 4Microsoft	Microsoft	36	16K	16K	2197.3	2.2	360.0	Small
17	Llama 3.1 8B InstructMeta	Meta	35	16K	16K	2130.1	2.1	997.1	Small
18	GPT-3.5 Turbo 16kOpenAI	OpenAI	31	16K	4K	1898.1	7.6	8.9	Small
19	GPT-3.5 TurboOpenAI	OpenAI	31	16K	4K	1861.5	7.4	30.5	Small
20	ERNIE 4.5 VL 28B A3BBaidu	Baidu	51	30K	8K	1696.7	6.4	145.4	Small
21	Gemma 3 4B (free)Google	Google	51	33K	8K	1547.2	6.2	Free	Medium
22	DeepSeek V3.1DeepSeek	DeepSeek	50	33K	7K	1510.6	6.9	110.0	Medium
23	Gemma 3 12B (free)Google	Google	47	33K	8K	1437.4	5.7	Free	Medium
24	QwQ 32BAlibaba	Alibaba	46	33K	33K	1400.8	1.4	166.9	Medium
25	LFM2.5-1.2B-Thinking (free)Liquid AI	Liquid AI	46	33K	--	1394.7	--	Free	Medium
26	Qwen3 30B A3B Thinking 2507Alibaba	Alibaba	45	33K	--	1379.4	--	231.2	Medium
27	Voxtral Small 24B 2507Mistral AI	Mistral AI	44	32K	--	1359.4	--	217.5	Medium
28	Qwen2.5 VL 72B InstructAlibaba	Alibaba	43	33K	33K	1321.4	1.3	54.1	Medium
29	LFM2.5-1.2B-Instruct (free)Liquid AI	Liquid AI	42	33K	--	1287.8	--	Free	Medium
30	Molmo2 8BAllen AI	Allen AI	47	37K	37K	1283.1	1.3	236.5	Medium
31	Rnj 1 Instructessentialai	essentialai	42	33K	--	1275.6	--	278.7	Medium
32	Qwen-Max Alibaba	Alibaba	41	33K	8K	1263.4	5.1	15.9	Medium
33	R1 Distill Qwen 32BDeepSeek	DeepSeek	41	33K	33K	1263.4	1.3	142.8	Medium
34	Mistral Small 3Mistral AI	Mistral AI	41	33K	16K	1248.2	2.5	629.2	Medium
35	Qwen3 4B (free)Alibaba	Alibaba	50	41K	--	1210.9	--	Free	Medium
36	Mistral Small CreativeMistral AI	Mistral AI	38	33K	--	1168.8	--	191.5	Medium
37	Qwen3 30B A3BAlibaba	Alibaba	48	41K	41K	1167.0	1.2	265.6	Medium
38	Qwen3 14BAlibaba	Alibaba	48	41K	41K	1167.0	1.2	318.7	Medium
39	Qwen3 32BAlibaba	Alibaba	48	41K	41K	1167.0	1.2	298.8	Medium
40	Qwen2.5 72B InstructAlibaba	Alibaba	37	33K	16K	1138.3	2.3	146.3	Medium
41	Qwen3 8BAlibaba	Alibaba	47	41K	8K	1137.7	5.7	207.1	Medium
42	SabaMistral AI	Mistral AI	37	33K	--	1113.9	--	91.3	Medium
43	LFM2-24B-A2BLiquid AI	Liquid AI	35	33K	--	1059.0	--	462.7	Medium
44	LFM2-8B-A1BLiquid AI	Liquid AI	35	33K	--	1059.0	--	2313.3	Medium
45	LFM2-2.6BLiquid AI	Liquid AI	35	33K	--	1059.0	--	2313.3	Medium
46	Qwen2.5 Coder 7B InstructAlibaba	Alibaba	34	33K	--	1046.8	--	571.7	Medium
47	Qwen2.5 7B InstructAlibaba	Alibaba	33	33K	--	1004.0	--	470.0	Medium
48	Mixtral 8x7B InstructMistral AI	Mistral AI	32	33K	16K	976.6	2.0	59.3	Medium
49	Gemma 3n 4BGoogle	Google	32	33K	--	967.4	--	1056.7	Medium
50	Coder Largearcee-ai	arcee-ai	32	33K	--	961.3	--	48.5	Medium

Tier Analysis

Efficiency breakdown across context window tiers.

Small20 models

Avg Score33

Avg Score/MToken4124.8

Best

Llemma 7b

Worst

ERNIE 4.5 VL 28B A3B

Medium48 models

Avg Score40

Avg Score/MToken979.2

Best

Gemma 3 4B (free)

Worst

Sonar

Large189 models

Avg Score51

Avg Score/MToken301.1

Best

Qwen3 VL 30B A3B Thinking

Worst

Nova Lite 1.0

Mega36 models

Avg Score61

Avg Score/MToken57.4

Best

GPT-5.4 Pro

Worst

Grok 4.1 Fast

Diminishing Returns Analysis

Are bigger context windows correlated with higher scores?

Tier	Avg Context	Avg Score	Avg Efficiency
Small	10K	33	4124.8
Medium	49K	40	979.2
Large	186K	51	301.1
Mega	1.1M	61	57.4

Output Token Efficiency

Top 20 models by output efficiency (score per 1K output tokens). Models with 16K+ output tokens are highlighted.

Model	Provider	Score	Max Output	Output Eff.
Inflection 3 PiInflection	Inflection	30	1K	29.0
Inflection 3 ProductivityInflection	Inflection	30	1K	29.0
Gemma 3n 2B (free)Google	Google	46	2K	22.6
Gemma 3n 4B (free)Google	Google	45	2K	21.9
UI-TARS 7B ByteDance	ByteDance	44	2K	21.3
MiniMax M2-herMiniMax	MiniMax	39	2K	19.1
Gemma 2 27BGoogle	Google	29	2K	14.2
Jamba Large 1.7AI21 Labs	AI21 Labs	49	4K	12.0
GPT-4 TurboOpenAI	OpenAI	46	4K	11.2
GPT-4o (2024-05-13)OpenAI	OpenAI	45	4K	10.9
GPT-4 (older v0314)OpenAI	OpenAI	44	4K	10.8
GPT-4OpenAI	OpenAI	44	4K	10.8
GPT-4 Turbo PreviewOpenAI	OpenAI	40	4K	9.7
GPT-4 Turbo (older v1106)OpenAI	OpenAI	40	4K	9.7
Command R (08-2024)Cohere	Cohere	37	4K	9.3
Command R+ (08-2024)Cohere	Cohere	37	4K	9.3
Command R7B (12-2024)Cohere	Cohere	36	4K	9.1
Claude 3 HaikuAnthropic	Anthropic	35	4K	8.5
Nova Pro 1.0Amazon	Amazon	43	5K	8.4
Nova Lite 1.0Amazon	Amazon	43	5K	8.3

Key Insights

Auto-generated observations from the efficiency data.

Context Sweet Spot

Small models have the highest average efficiency at 4124.8 score/MToken across 20 models.

Output Matters

Models with 16K+ output tokens score 32% higher on average than models with smaller output limits.

Compact High Performers

0 models achieve top-20 scores with under 128K context.

Explore More

Dive deeper into context windows, compare models, or explore other dimensions.

All Explorers Context Windows Large Context Models