Human preference rating from 6M+ crowdsourced blind head-to-head comparisons. Users chat with two anonymous models and pick the better response.
Why it matters: The most trusted 'vibes-based' benchmark — reflects real human preferences, not just academic metrics. Widely considered the most meaningful overall ranking.
Top Model
1,446
Gemini 2.5 Pro Preview 05-06
Average Score
1,298
Across 65 models
Models Tested
65
Metric: Elo rating
Human Baseline
—
Range: 900–1600
All models with a reported Arena Elo score, ranked by highest Elo rating.
Arena Elo is a standardized evaluation that measures AI model performance on specific tasks. It provides comparable scores across different models, helping developers choose the right model for their needs.
Gemini 2.5 Pro Preview 05-06 currently holds the top score on the Arena Elo benchmark. See our full rankings table above for the complete leaderboard with 65 models.
We update benchmark data from multiple sources including HuggingFace Open LLM Leaderboard and LMArena. Scores are refreshed regularly as new evaluations are published and new models are released.
No. While Arena Elo is an important indicator, real-world performance depends on many factors including pricing, latency, context window, and specific task requirements. We recommend using our composite score which weighs multiple benchmarks and practical factors.