AI models ranked by data analysis capability using MMLU, MATH-500, and GPQA benchmark scores. Find the top LLM for data science, analytics, and insights.
o1
Score: 93.8
54
Across all ranked models
27
With benchmark data
| # | Model | Score |
|---|---|---|
| 1 | o1OpenAI | 93.8 |
| 2 | R1DeepSeek | 93.6 |
| 3 | Gemini 2.5 ProGoogle | 92.7 |
| 4 | o3 MiniOpenAI | 91.6 |
| 5 | Claude Opus 4.5Anthropic | 90 |
| 6 | DeepSeek V3DeepSeek | 89.2 |
| 7 | Claude 3.7 SonnetAnthropic | 86.8 |
| 8 | Claude 3.5 SonnetAnthropic | 84.2 |
| 9 | GPT-4oOpenAI | 83.5 |
| 10 | Llama 3.3 70B InstructMeta | 82.3 |
| 11 | Gemini 2.0 FlashGoogle | 82.1 |
| 12 | Mistral LargeMistral AI | 80.6 |
| 13 | GPT-4 TurboOpenAI | 80.5 |
| 14 | Llama 3.1 70B InstructMeta | 78.3 |
| 15 | GPT-4o-miniOpenAI | 76.9 |
| 16 | Claude 3.5 HaikuAnthropic | 75.9 |
| 17 | Phi 4Microsoft | 20.8 |
| 18 | Qwen2.5 72B InstructAlibaba | 16.7 |
| 19 | Qwen2.5 Coder 32B InstructAlibaba | 13.2 |
| 20 | Gemma 2 9BGoogle | 9.7 |
| 21 | Command R7B (12-2024)Cohere | 7.8 |
| 22 | Llama 3.1 8B InstructMeta | 7.4 |
| 23 | Llama 3.2 3B InstructMeta | 6.2 |
| 24 | Qwen2.5 Coder 7B InstructAlibaba | 5.8 |
| 25 | Qwen2.5 7B InstructAlibaba | 5.5 |
| 26 | Llama 3 8B InstructMeta | 2.1 |
| 27 | QwQ 32BAlibaba | 1.3 |
Each model's score is a weighted average of its available benchmark results. When a model is missing some benchmarks, the weights are re-normalized across the benchmarks that are available. All scores are on a 0-100 scale. Data sourced from official model cards, published papers, and third-party evaluation platforms.
Based on our benchmark analysis, o1 by OpenAI is currently the #1 ranked model for data analysis, with a weighted score of 93.8/100.
Models are ranked using a weighted average of MMLU, MATH-500, GPQA benchmark scores. All scores are normalized to a 0-100 scale.
We currently rank 27 models that have relevant benchmark data for data analysis tasks.