AI models ranked by mathematical reasoning using MATH-500, GSM8K, and AIME 2024 benchmark scores.
Gemini 2.5 Pro
Score: 93.8
84.1
Across all ranked models
16
With benchmark data
| # | Model | Score |
|---|---|---|
| 1 | Gemini 2.5 ProGoogle | 93.8 |
| 2 | o3 MiniOpenAI | 93.4 |
| 3 | DeepSeek V3DeepSeek | 93 |
| 4 | Gemini 2.0 FlashGoogle | 91.5 |
| 5 | o1OpenAI | 90.8 |
| 6 | R1DeepSeek | 89.8 |
| 7 | Claude 3.5 SonnetAnthropic | 86.1 |
| 8 | GPT-4oOpenAI | 84.8 |
| 9 | Mistral LargeMistral AI | 83.3 |
| 10 | GPT-4 TurboOpenAI | 81.9 |
| 11 | Claude Opus 4.5Anthropic | 81.2 |
| 12 | GPT-4o-miniOpenAI | 80.1 |
| 13 | Llama 3.1 70B InstructMeta | 79.6 |
| 14 | Llama 3.3 70B InstructMeta | 77 |
| 15 | Claude 3.7 SonnetAnthropic | 70.5 |
| 16 | Claude 3.5 HaikuAnthropic | 69.2 |
Each model's score is a weighted average of its available benchmark results. When a model is missing some benchmarks, the weights are re-normalized across the benchmarks that are available. All scores are on a 0-100 scale. Data sourced from official model cards, published papers, and third-party evaluation platforms.
Based on our benchmark analysis, Gemini 2.5 Pro by Google is currently the #1 ranked model for math, with a weighted score of 93.8/100.
Models are ranked using a weighted average of MATH-500, GSM8K, AIME 2024 benchmark scores. All scores are normalized to a 0-100 scale.
We currently rank 16 models that have relevant benchmark data for math tasks.