Analyzes score-per-context-token ratio across 293 AI models to find those that make the best use of their context window, output capacity, and cost.
Key efficiency metrics across all analyzed models.
Avg Overall Efficiency
7.6%
normalized across all models
Top 50 models ranked by score per million context tokens.
Efficiency breakdown across context window tiers.
Are bigger context windows correlated with higher scores?
| Tier | Avg Context | Avg Score | Avg Efficiency |
|---|---|---|---|
| Small | 10K | 33 | 4124.8 |
| Medium | 49K | 40 | 979.2 |
| Large | 186K | 51 | 301.1 |
| Mega | 1.1M | 61 | 57.4 |
Top 20 models by output efficiency (score per 1K output tokens). Models with 16K+ output tokens are highlighted.
| Model | Score | Max Output | Output Eff. |
|---|---|---|---|
| Inflection 3 PiInflection | 30 | 1K | 29.0 |
| Inflection 3 ProductivityInflection | 30 | 1K | 29.0 |
| Gemma 3n 2B (free)Google | 46 | 2K | 22.6 |
| Gemma 3n 4B (free)Google | 45 | 2K | 21.9 |
| UI-TARS 7B ByteDance | 44 | 2K | 21.3 |
| MiniMax M2-herMiniMax | 39 | 2K | 19.1 |
| Gemma 2 27BGoogle | 29 | 2K | 14.2 |
| Jamba Large 1.7AI21 Labs | 49 | 4K | 12.0 |
| GPT-4 TurboOpenAI | 46 | 4K | 11.2 |
| GPT-4o (2024-05-13)OpenAI | 45 | 4K | 10.9 |
| GPT-4 (older v0314)OpenAI | 44 | 4K | 10.8 |
| GPT-4OpenAI | 44 | 4K | 10.8 |
| GPT-4 Turbo PreviewOpenAI | 40 | 4K | 9.7 |
| GPT-4 Turbo (older v1106)OpenAI | 40 | 4K | 9.7 |
| Command R (08-2024)Cohere | 37 | 4K | 9.3 |
| Command R+ (08-2024)Cohere | 37 | 4K | 9.3 |
| Command R7B (12-2024)Cohere | 36 | 4K | 9.1 |
| Claude 3 HaikuAnthropic | 35 | 4K | 8.5 |
| Nova Pro 1.0Amazon | 43 | 5K | 8.4 |
| Nova Lite 1.0Amazon | 43 | 5K | 8.3 |
Auto-generated observations from the efficiency data.
Context Sweet Spot
Small models have the highest average efficiency at 4124.8 score/MToken across 20 models.
Output Matters
Models with 16K+ output tokens score 32% higher on average than models with smaller output limits.
Compact High Performers
0 models achieve top-20 scores with under 128K context.
Dive deeper into context windows, compare models, or explore other dimensions.