Best AI Models for Math

AI models ranked by mathematical reasoning using MATH-500, GSM8K, and AIME 2024 benchmark scores.

Last updated: just now

#1 Model

Gemini 2.5 Pro

Score: 93.8

Average Score

84.1

Across all ranked models

Models Ranked

With benchmark data

Weights:MATH-500 (40%)GSM8K (30%)AIME 2024 (30%)

#	Model	Provider	Score	MATH-500	GSM8K	AIME 2024
1	Gemini 2.5 ProGoogle	Google	93.8	95.2	--	92
2	o3 MiniOpenAI	OpenAI	93.4	97.9	--	87.3
3	DeepSeek V3DeepSeek	DeepSeek	93	90.2	96.7	--
4	Gemini 2.0 FlashGoogle	Google	91.5	89.7	93.8	--
5	o1OpenAI	OpenAI	90.8	96.4	--	83.3
6	R1DeepSeek	DeepSeek	89.8	97.3	--	79.8
7	Claude 3.5 SonnetAnthropic	Anthropic	86.1	78.3	96.4	--
8	GPT-4oOpenAI	OpenAI	84.8	76.6	95.8	--
9	Mistral LargeMistral AI	Mistral AI	83.3	76	93.1	--
10	GPT-4 TurboOpenAI	OpenAI	81.9	72.6	94.2	--
11	Claude Opus 4.5Anthropic	Anthropic	81.2	88.1	--	72
12	GPT-4o-miniOpenAI	OpenAI	80.1	70.2	93.2	--
13	Llama 3.1 70B InstructMeta	Meta	79.6	68	95.1	--
14	Llama 3.3 70B InstructMeta	Meta	77	77	--	--
15	Claude 3.7 SonnetAnthropic	Anthropic	70.5	82.2	--	55
16	Claude 3.5 HaikuAnthropic	Anthropic	69.2	69.2	--	--

How scores are calculated

Each model's score is a weighted average of its available benchmark results. When a model is missing some benchmarks, the weights are re-normalized across the benchmarks that are available. All scores are on a 0-100 scale. Data sourced from official model cards, published papers, and third-party evaluation platforms.