Context Window Efficiency Explorer

Analyzes score-per-context-token ratio across 297 AI models to find those that make the best use of their context window, output capacity, and cost.

Context Window vs Score

Efficiency Overview

Key efficiency metrics across all analyzed models.

Most Efficient (128K+)

GPT-5.3 Chat

664.1 score/MToken

Best Output Efficiency

Inflection 3 Pi

35.9 score/1K output

Best Cost Efficiency

LFM2-8B-A1B

3546.7 score/$

Avg Overall Efficiency

7.1%

normalized across all models

Efficiency Rankings

Top 50 models ranked by score per million context tokens.

#	Model	Provider	Score	Context	Output	Score/MToken	Output Eff.	Cost Eff.	Tier
1	Llemma 7beleutherai	eleutherai	48	4K	4K	11596.7	11.6	47.5	Small
2	GPT-3.5 Turbo (older v0613)OpenAI	OpenAI	38	4K	4K	9279.6	9.3	25.3	Small
3	GPT-3.5 Turbo InstructOpenAI	OpenAI	32	4K	4K	7863.2	7.9	18.4	Small
4	Gemma 2 27BGoogle	Google	60	8K	2K	7287.6	29.2	91.8	Small
5	Gemma 3n 2B (free)Google	Google	58	8K	2K	7104.5	28.4	Free	Small
6	Gemma 3n 4B (free)Google	Google	56	8K	2K	6774.9	27.1	Free	Small
7	Llama 3 70B InstructMeta	Meta	41	8K	8K	4943.8	5.1	64.8	Small
8	GPT-4 (older v0314)OpenAI	OpenAI	39	8K	4K	4761.3	9.5	0.9	Small
9	GPT-4OpenAI	OpenAI	39	8K	4K	4761.3	9.5	0.9	Small
10	Inflection 3 PiInflection	Inflection	37	8K	1K	4600.0	35.9	5.9	Small
11	Inflection 3 ProductivityInflection	Inflection	37	8K	1K	4600.0	35.9	5.9	Small
12	Phi 4Microsoft	Microsoft	60	16K	16K	3637.7	3.6	581.5	Small
13	Llama 3.1 8B InstructMeta	Meta	42	16K	16K	2587.9	2.6	1211.4	Small
14	ERNIE 4.5 VL 28B A3BBaidu	Baidu	75	30K	8K	2500.0	9.4	214.3	Small
15	GPT-3.5 Turbo 16kOpenAI	OpenAI	40	16K	4K	2435.2	9.7	11.4	Small
16	GPT-3.5 TurboOpenAI	OpenAI	40	16K	4K	2435.2	9.7	39.9	Small
17	DeepSeek V3.1DeepSeek	DeepSeek	74	33K	7K	2252.2	10.3	164.0	Medium
18	Rnj 1 Instructessentialai	essentialai	65	33K	--	1977.5	--	432.0	Medium
19	Gemma 3 4B (free)Google	Google	61	33K	8K	1861.6	7.4	Free	Medium
20	Qwen2.5 VL 72B InstructAlibaba	Alibaba	60	33K	33K	1840.2	1.8	75.4	Medium
21	R1 Distill Qwen 32BDeepSeek	DeepSeek	60	33K	33K	1837.2	1.8	207.6	Medium
22	Mistral Small 3Mistral AI	Mistral AI	60	33K	16K	1815.8	3.6	915.4	Medium
23	LFM2.5-1.2B-Thinking (free)Liquid AI	Liquid AI	59	33K	--	1800.5	--	Free	Medium
24	Mistral Small CreativeMistral AI	Mistral AI	59	33K	--	1800.5	--	295.0	Medium
25	Qwen-Max Alibaba	Alibaba	59	33K	8K	1794.4	7.2	22.6	Medium
26	Qwen3 30B A3BAlibaba	Alibaba	71	41K	41K	1743.2	1.7	396.7	Medium
27	Qwen3 14BAlibaba	Alibaba	71	41K	41K	1743.2	1.7	476.0	Medium
28	Qwen3 32BAlibaba	Alibaba	71	41K	41K	1743.2	1.7	446.3	Medium
29	Gemma 3 12B (free)Google	Google	55	33K	8K	1684.6	6.7	Free	Medium
30	LFM2-24B-A2BLiquid AI	Liquid AI	53	33K	--	1623.5	--	709.3	Medium
31	LFM2.5-1.2B-Instruct (free)Liquid AI	Liquid AI	53	33K	--	1623.5	--	Free	Medium
32	LFM2-8B-A1BLiquid AI	Liquid AI	53	33K	--	1623.5	--	3546.7	Medium
33	LFM2-2.6BLiquid AI	Liquid AI	53	33K	--	1623.5	--	3546.7	Medium
34	SabaMistral AI	Mistral AI	53	33K	--	1614.4	--	132.3	Medium
35	Qwen2.5 72B InstructAlibaba	Alibaba	52	33K	16K	1599.1	3.2	205.5	Medium
36	Qwen3 8BAlibaba	Alibaba	65	41K	8K	1589.4	7.9	289.3	Medium
37	Qwen3 4B (free)Alibaba	Alibaba	63	41K	--	1538.1	--	Free	Medium
38	Gemma 3n 4BGoogle	Google	46	33K	--	1413.0	--	1543.3	Medium
39	Coder Largearcee-ai	arcee-ai	46	33K	--	1388.5	--	70.0	Medium
40	Qwen2.5 Coder 7B InstructAlibaba	Alibaba	43	33K	--	1309.2	--	715.0	Medium
41	Qwen2.5 7B InstructAlibaba	Alibaba	43	33K	--	1306.2	--	611.4	Medium
42	Qwen2.5 Coder 32B InstructAlibaba	Alibaba	42	33K	--	1293.9	--	51.1	Medium
43	Mixtral 8x7B InstructMistral AI	Mistral AI	42	33K	16K	1293.9	2.6	78.5	Medium
44	Llama 3.1 405B (base)Meta	Meta	39	33K	33K	1181.0	1.2	9.7	Medium
45	Pixtral 12BMistral AI	Mistral AI	38	33K	--	1168.8	--	383.0	Medium
46	Qwen2.5-VL 7B InstructAlibaba	Alibaba	38	33K	--	1147.5	--	188.0	Medium
47	R1DeepSeek	DeepSeek	68	64K	16K	1067.2	4.3	42.7	Medium
48	Olmo 3 32B ThinkAllen AI	Allen AI	66	66K	66K	1011.7	1.0	204.0	Medium
49	Olmo 3.1 32B InstructAllen AI	Allen AI	65	66K	--	990.3	--	162.3	Medium
50	Olmo 3.1 32B ThinkAllen AI	Allen AI	65	66K	66K	988.8	1.0	199.4	Medium

Tier Analysis

Efficiency breakdown across context window tiers.

Small16 models

Avg Score46

Avg Score/MToken5448.1

Best

Llemma 7b

Worst

GPT-3.5 Turbo 16k

Medium44 models

Avg Score55

Avg Score/MToken1317.1

Best

DeepSeek V3.1

Worst

Sonar

Large198 models

Avg Score71

Avg Score/MToken413.8

Best

GPT-5.3 Chat

Worst

GPT-5 Nano

Mega39 models

Avg Score81

Avg Score/MToken74.8

Best

Claude Opus 4.6

Worst

Grok 4.20 Multi-Agent Beta

Diminishing Returns Analysis

Are bigger context windows correlated with higher scores?

Tier	Avg Context	Avg Score	Avg Efficiency
Small	11K	46	5448.1
Medium	50K	55	1317.1
Large	191K	71	413.8
Mega	1.1M	81	74.8

Output Token Efficiency

Top 20 models by output efficiency (score per 1K output tokens). Models with 16K+ output tokens are highlighted.

Model	Provider	Score	Max Output	Output Eff.
Inflection 3 PiInflection	Inflection	37	1K	35.9
Inflection 3 ProductivityInflection	Inflection	37	1K	35.9
UI-TARS 7B ByteDance	ByteDance	63	2K	30.6
Gemma 2 27BGoogle	Google	60	2K	29.2
MiniMax M2-herMiniMax	MiniMax	59	2K	29.0
Gemma 3n 2B (free)Google	Google	58	2K	28.4
Gemma 3n 4B (free)Google	Google	56	2K	27.1
Jamba Large 1.7AI21 Labs	AI21 Labs	71	4K	17.4
GPT-4 TurboOpenAI	OpenAI	61	4K	14.8
GPT-4o (2024-05-13)OpenAI	OpenAI	53	4K	12.9
Command R (08-2024)Cohere	Cohere	48	4K	11.9
Command R+ (08-2024)Cohere	Cohere	48	4K	11.9
Llemma 7beleutherai	eleutherai	48	4K	11.6
Nova Lite 1.0Amazon	Amazon	58	5K	11.4
Nova Pro 1.0Amazon	Amazon	58	5K	11.4
Command R7B (12-2024)Cohere	Cohere	45	4K	11.2
Sonar Pro SearchPerplexity	Perplexity	85	8K	10.6
Claude 3 HaikuAnthropic	Anthropic	43	4K	10.5
GPT-4 Turbo PreviewOpenAI	OpenAI	43	4K	10.4
GPT-4 Turbo (older v1106)OpenAI	OpenAI	43	4K	10.4

Key Insights

Auto-generated observations from the efficiency data.

Context Sweet Spot

Small models have the highest average efficiency at 5448.1 score/MToken across 16 models.

Output Matters

Models with 16K+ output tokens score 35% higher on average than models with smaller output limits.

Compact High Performers

0 models achieve top-20 scores with under 128K context.

探索更多

Dive deeper into context windows, compare models, or explore other dimensions.

All Explorers Context Windows Large Context Models

Frequently Asked Questions

Efficiency is measured as the score-per-context-token ratio - how much ranking score a model achieves relative to its context window size. Models that score highly with smaller context windows are considered more efficient than those requiring massive context to achieve similar results.

Cost efficiency combines quality (composite score) with pricing. The most cost-efficient models achieve high benchmark scores while maintaining low per-token API costs. Free and budget-tier models that perform well are the most cost-efficient options.

Not necessarily. Our efficiency analysis shows diminishing returns beyond certain context sizes. Models with 128K tokens often score similarly to those with 1M+ tokens, meaning the extra context capacity adds cost without proportional quality gains for most use cases.