Best AI Models for Instruction Following

AI models ranked by instruction-following accuracy using the IFEval benchmark.

Last updated: just now

#1 Model

Claude 3.7 Sonnet

Score: 92.3

Average Score

Across all ranked models

Models Ranked

With benchmark data

Weights:IFEval (100%)

#	Model	Provider	Score	IFEval
1	Claude 3.7 SonnetAnthropic	Anthropic	92.3	92.3
2	Llama 3.3 70B InstructMeta	Meta	92.1	92.1
3	Claude 3.5 SonnetAnthropic	Anthropic	88.1	88.1
4	DeepSeek V3DeepSeek	DeepSeek	87.1	87.1
5	o1OpenAI	OpenAI	86.5	86.5
6	Qwen2.5 72B InstructAlibaba	Alibaba	86.4	86.4
7	GPT-4oOpenAI	OpenAI	84.3	84.3
8	Llama 3.1 70B InstructMeta	Meta	83.6	83.6
9	Mistral LargeMistral AI	Mistral AI	82.4	82.4
10	GPT-4o-miniOpenAI	OpenAI	80.4	80.4
11	Command R7B (12-2024)Cohere	Cohere	77.1	77.1
12	Qwen2.5 7B InstructAlibaba	Alibaba	75.9	75.9
13	Qwen2.5 Coder 32B InstructAlibaba	Alibaba	72.7	72.7
14	Llama 3.1 8B InstructMeta	Meta	72.1	72.1
15	Llama 3.2 3B InstructMeta	Meta	68.5	68.5
16	Qwen2.5 Coder 7B InstructAlibaba	Alibaba	61.5	61.5
17	Gemma 2 9BGoogle	Google	58.8	58.8
18	QwQ 32BAlibaba	Alibaba	39.8	39.8
19	Llama 3 8B InstructMeta	Meta	24	24
20	Phi 4Microsoft	Microsoft	5.9	5.9

How scores are calculated

Each model's score is a weighted average of its available benchmark results. When a model is missing some benchmarks, the weights are re-normalized across the benchmarks that are available. All scores are on a 0-100 scale. Data sourced from official model cards, published papers, and third-party evaluation platforms.