Model Stability Report

Which AI models are the most consistent over time? This report analyzes rank changes, state classifications, and sparkline volatility across 293 tracked models to produce a stability score from 0 to 100.

Rock Solid

293

Consistent

Variable

Volatile

Most Stable Models

Top 20 models with the highest stability scores. These models maintain consistent rankings with minimal volatility.

#	Model	Provider	Score	Stability	State	Rank Spread
1	GPT-5.4 ProOpenAI	OpenAI	90.9	100	stable	±1
2	GPT-5.2 ProOpenAI	OpenAI	89.9	100	stable	±2
3	GPT-5 ProOpenAI	OpenAI	89.9	100	stable	±2
4	o3 ProOpenAI	OpenAI	81.6	100	stable	±2
5	Claude Opus 4.1Anthropic	Anthropic	81.1	100	stable	±2
6	o1-proOpenAI	OpenAI	77.2	100	stable	±2
7	Claude Opus 4Anthropic	Anthropic	75.5	100	stable	±2
8	o3 Deep ResearchOpenAI	OpenAI	74.0	100	stable	±2
9	Claude Opus 4.6Anthropic	Anthropic	70.5	100	stable	±2
10	Claude Opus 4.5Anthropic	Anthropic	70.0	100	stable	±2
11	GPT-5.4OpenAI	OpenAI	69.7	100	stable	±2
12	Claude Sonnet 4.5Anthropic	Anthropic	69.1	100	stable	±2
13	Qwen3 VL 30B A3B ThinkingAlibaba	Alibaba	68.6	100	stable	±2
14	Qwen3 VL 235B A22B ThinkingAlibaba	Alibaba	68.6	100	stable	±2
15	GPT-5.2OpenAI	OpenAI	68.4	100	stable	±2
16	Gemini 3.1 Pro Preview Custom ToolsGoogle	Google	68.2	100	stable	±2
17	Gemini 3.1 Pro PreviewGoogle	Google	68.2	100	stable	±2
18	Gemini 3 Pro PreviewGoogle	Google	68.2	100	stable	±2
19	Claude Sonnet 4.6Anthropic	Anthropic	68.0	100	stable	±2
20	GPT-5.1OpenAI	OpenAI	67.4	100	stable	±2

Most Volatile Models

Bottom 20 models with the lowest stability scores. These models show significant ranking fluctuations or inconsistent states.

#	Model	Provider	Score	Stability	State	Rank Spread
1	Mistral 7B Instruct v0.1Mistral AI	Mistral AI	17.2	100	stable	±2
2	LlamaGuard 2 8BMeta	Meta	20.1	100	stable	±2
3	Gemma 2 9BGoogle	Google	21.3	100	stable	±2
4	GPT-3.5 Turbo InstructOpenAI	OpenAI	25.6	100	stable	±2
5	Llama 3.2 1B InstructMeta	Meta	25.9	100	stable	±2
6	WizardLM-2 8x22BMicrosoft	Microsoft	26.1	100	stable	±2
7	Llama 3.2 3B InstructMeta	Meta	26.2	100	stable	±2
8	Llama 3 70B InstructMeta	Meta	27.7	100	stable	±2
9	Gemma 2 27BGoogle	Google	29.0	100	stable	±2
10	GPT-3.5 Turbo (older v0613)OpenAI	OpenAI	29.2	100	stable	±2
11	Mistral LargeMistral AI	Mistral AI	29.7	100	stable	±2
12	Qwen2.5-VL 7B InstructAlibaba	Alibaba	29.7	100	stable	±2
13	Inflection 3 ProductivityInflection	Inflection	29.7	100	stable	±2
14	Inflection 3 PiInflection	Inflection	29.7	100	stable	±2
15	Mixtral 8x22B InstructMistral AI	Mistral AI	30.2	100	stable	±2
16	Llama 3.1 405B (base)Meta	Meta	30.2	100	stable	±2
17	GPT-3.5 TurboOpenAI	OpenAI	30.5	100	stable	±2
18	Llama Guard 3 8BMeta	Meta	30.5	100	stable	±2
19	GPT-3.5 Turbo 16kOpenAI	OpenAI	31.1	100	stable	±2
20	Qwen2.5 Coder 32B InstructAlibaba	Alibaba	31.1	100	stable	±2

Stability by Provider

Aggregated stability metrics per provider. Providers are ranked by their average stability score across all models.

Provider	Models	Avg Stability	Most Stable Model	Most Volatile Model
OpenAI	59	100.0	GPT-5.4 Pro(100)	GPT-5.4 Pro(100)
Anthropic	13	100.0	Claude Opus 4.1(100)	Claude Opus 4.1(100)
Alibaba	51	100.0	Qwen3 VL 30B A3B Thinking(100)	Qwen3 VL 30B A3B Thinking(100)
Google	24	100.0	Gemini 3.1 Pro Preview Custom Tools(100)	Gemini 3.1 Pro Preview Custom Tools(100)
NVIDIA	8	100.0	Nemotron Nano 12B 2 VL (free)(100)	Nemotron Nano 12B 2 VL (free)(100)
xAI	8	100.0	Grok 4.1 Fast(100)	Grok 4.1 Fast(100)
ByteDance	4	100.0	Seed-2.0-Mini(100)	Seed-2.0-Mini(100)
Perplexity	5	100.0	Sonar Pro Search(100)	Sonar Pro Search(100)
Amazon	5	100.0	Nova 2 Lite(100)	Nova 2 Lite(100)
Moonshot AI	5	100.0	Kimi K2.5(100)	Kimi K2.5(100)
StepFun	2	100.0	Step 3.5 Flash (free)(100)	Step 3.5 Flash (free)(100)
MiniMax	6	100.0	MiniMax M2.5(100)	MiniMax M2.5(100)
arcee-ai	7	100.0	Trinity Large Preview (free)(100)	Trinity Large Preview (free)(100)
Xiaomi	1	100.0	MiMo-V2-Flash(100)	MiMo-V2-Flash(100)
DeepSeek	12	100.0	DeepSeek V3.2(100)	DeepSeek V3.2(100)
Mistral AI	25	100.0	Mistral Small 3.2 24B(100)	Mistral Small 3.2 24B(100)
Inception	3	100.0	Mercury 2(100)	Mercury 2(100)
Meta	17	100.0	Llama 4 Maverick(100)	Llama 4 Maverick(100)
Baidu	5	100.0	ERNIE 4.5 VL 28B A3B(100)	ERNIE 4.5 VL 28B A3B(100)
Kuaishou	1	100.0	KAT-Coder-Pro V1(100)	KAT-Coder-Pro V1(100)
Meituan	1	100.0	LongCat Flash Chat(100)	LongCat Flash Chat(100)
AI21 Labs	1	100.0	Jamba Large 1.7(100)	Jamba Large 1.7(100)
Allen AI	7	100.0	Olmo 3.1 32B Think(100)	Olmo 3.1 32B Think(100)
Tencent	1	100.0	Hunyuan A13B Instruct(100)	Hunyuan A13B Instruct(100)
Upstage	1	100.0	Solar Pro 3(100)	Solar Pro 3(100)
Liquid AI	5	100.0	LFM2.5-1.2B-Thinking (free)(100)	LFM2.5-1.2B-Thinking (free)(100)
aion-labs	3	100.0	Aion-2.0(100)	Aion-2.0(100)
Writer	1	100.0	Palmyra X5(100)	Palmyra X5(100)
deepcogito	1	100.0	Cogito v2.1 671B(100)	Cogito v2.1 671B(100)
Cohere	4	100.0	Command A(100)	Command A(100)
essentialai	1	100.0	Rnj 1 Instruct(100)	Rnj 1 Instruct(100)
IBM	1	100.0	Granite 4.0 Micro(100)	Granite 4.0 Micro(100)
Microsoft	2	100.0	Phi 4(100)	Phi 4(100)
eleutherai	1	100.0	Llemma 7b(100)	Llemma 7b(100)
Inflection	2	100.0	Inflection 3 Pi(100)	Inflection 3 Pi(100)

Stability Distribution

How stability scores are distributed across all 293 tracked models.

0–10

10–20

20–30

30–40

40–50

50–60

60–70

70–80

80–90

90–100

293

What Makes a Model Stable?

Our stability scoring system uses three key signals to measure how consistently a model performs over time.

Rank Consistency

The most direct measure of stability. Models lose up to 25 points for large 24-hour rank changes (5 points per rank position moved) and up to 21 points for 7-day changes (3 points per position). Models that hold their rank tightly score higher.

State Classification

Each model has a state reflecting its overall reliability. Models in a "stable" state receive a 10-point bonus, while "fragile" models are penalized 15 points. This captures systemic reliability beyond simple rank movement.

Sparkline Volatility

The 14-day sparkline data reveals hidden volatility. We compute the standard deviation of the sparkline and subtract up to 20 points. Even models that end where they started can be penalized if they oscillated wildly along the way.

All Trackers

Coding, image, and video model trackers

Degradation Tracker

Detect models with declining performance

Coding Tracker

Daily coding model performance and rankings

Model Stability Report

Rock Solid

293

Consistent

Variable

Volatile

Most Stable Models

Top 20 models with the highest stability scores. These models maintain consistent rankings with minimal volatility.

#	Model	Provider	Score	Stability	State	Rank Spread
1	GPT-5.4 ProOpenAI	OpenAI	90.9	100	stable	±1
2	GPT-5.2 ProOpenAI	OpenAI	89.9	100	stable	±2
3	GPT-5 ProOpenAI	OpenAI	89.9	100	stable	±2
4	o3 ProOpenAI	OpenAI	81.6	100	stable	±2
5	Claude Opus 4.1Anthropic	Anthropic	81.1	100	stable	±2
6	o1-proOpenAI	OpenAI	77.2	100	stable	±2
7	Claude Opus 4Anthropic	Anthropic	75.5	100	stable	±2
8	o3 Deep ResearchOpenAI	OpenAI	74.0	100	stable	±2
9	Claude Opus 4.6Anthropic	Anthropic	70.5	100	stable	±2
10	Claude Opus 4.5Anthropic	Anthropic	70.0	100	stable	±2
11	GPT-5.4OpenAI	OpenAI	69.7	100	stable	±2
12	Claude Sonnet 4.5Anthropic	Anthropic	69.1	100	stable	±2
13	Qwen3 VL 30B A3B ThinkingAlibaba	Alibaba	68.6	100	stable	±2
14	Qwen3 VL 235B A22B ThinkingAlibaba	Alibaba	68.6	100	stable	±2
15	GPT-5.2OpenAI	OpenAI	68.4	100	stable	±2
16	Gemini 3.1 Pro Preview Custom ToolsGoogle	Google	68.2	100	stable	±2
17	Gemini 3.1 Pro PreviewGoogle	Google	68.2	100	stable	±2
18	Gemini 3 Pro PreviewGoogle	Google	68.2	100	stable	±2
19	Claude Sonnet 4.6Anthropic	Anthropic	68.0	100	stable	±2
20	GPT-5.1OpenAI	OpenAI	67.4	100	stable	±2

Most Volatile Models

Bottom 20 models with the lowest stability scores. These models show significant ranking fluctuations or inconsistent states.

#	Model	Provider	Score	Stability	State	Rank Spread
1	Mistral 7B Instruct v0.1Mistral AI	Mistral AI	17.2	100	stable	±2
2	LlamaGuard 2 8BMeta	Meta	20.1	100	stable	±2
3	Gemma 2 9BGoogle	Google	21.3	100	stable	±2
4	GPT-3.5 Turbo InstructOpenAI	OpenAI	25.6	100	stable	±2
5	Llama 3.2 1B InstructMeta	Meta	25.9	100	stable	±2
6	WizardLM-2 8x22BMicrosoft	Microsoft	26.1	100	stable	±2
7	Llama 3.2 3B InstructMeta	Meta	26.2	100	stable	±2
8	Llama 3 70B InstructMeta	Meta	27.7	100	stable	±2
9	Gemma 2 27BGoogle	Google	29.0	100	stable	±2
10	GPT-3.5 Turbo (older v0613)OpenAI	OpenAI	29.2	100	stable	±2
11	Mistral LargeMistral AI	Mistral AI	29.7	100	stable	±2
12	Qwen2.5-VL 7B InstructAlibaba	Alibaba	29.7	100	stable	±2
13	Inflection 3 ProductivityInflection	Inflection	29.7	100	stable	±2
14	Inflection 3 PiInflection	Inflection	29.7	100	stable	±2
15	Mixtral 8x22B InstructMistral AI	Mistral AI	30.2	100	stable	±2
16	Llama 3.1 405B (base)Meta	Meta	30.2	100	stable	±2
17	GPT-3.5 TurboOpenAI	OpenAI	30.5	100	stable	±2
18	Llama Guard 3 8BMeta	Meta	30.5	100	stable	±2
19	GPT-3.5 Turbo 16kOpenAI	OpenAI	31.1	100	stable	±2
20	Qwen2.5 Coder 32B InstructAlibaba	Alibaba	31.1	100	stable	±2

Stability by Provider

Aggregated stability metrics per provider. Providers are ranked by their average stability score across all models.

Provider	Models	Avg Stability	Most Stable Model	Most Volatile Model
OpenAI	59	100.0	GPT-5.4 Pro(100)	GPT-5.4 Pro(100)
Anthropic	13	100.0	Claude Opus 4.1(100)	Claude Opus 4.1(100)
Alibaba	51	100.0	Qwen3 VL 30B A3B Thinking(100)	Qwen3 VL 30B A3B Thinking(100)
Google	24	100.0	Gemini 3.1 Pro Preview Custom Tools(100)	Gemini 3.1 Pro Preview Custom Tools(100)
NVIDIA	8	100.0	Nemotron Nano 12B 2 VL (free)(100)	Nemotron Nano 12B 2 VL (free)(100)
xAI	8	100.0	Grok 4.1 Fast(100)	Grok 4.1 Fast(100)
ByteDance	4	100.0	Seed-2.0-Mini(100)	Seed-2.0-Mini(100)
Perplexity	5	100.0	Sonar Pro Search(100)	Sonar Pro Search(100)
Amazon	5	100.0	Nova 2 Lite(100)	Nova 2 Lite(100)
Moonshot AI	5	100.0	Kimi K2.5(100)	Kimi K2.5(100)
StepFun	2	100.0	Step 3.5 Flash (free)(100)	Step 3.5 Flash (free)(100)
MiniMax	6	100.0	MiniMax M2.5(100)	MiniMax M2.5(100)
arcee-ai	7	100.0	Trinity Large Preview (free)(100)	Trinity Large Preview (free)(100)
Xiaomi	1	100.0	MiMo-V2-Flash(100)	MiMo-V2-Flash(100)
DeepSeek	12	100.0	DeepSeek V3.2(100)	DeepSeek V3.2(100)
Mistral AI	25	100.0	Mistral Small 3.2 24B(100)	Mistral Small 3.2 24B(100)
Inception	3	100.0	Mercury 2(100)	Mercury 2(100)
Meta	17	100.0	Llama 4 Maverick(100)	Llama 4 Maverick(100)
Baidu	5	100.0	ERNIE 4.5 VL 28B A3B(100)	ERNIE 4.5 VL 28B A3B(100)
Kuaishou	1	100.0	KAT-Coder-Pro V1(100)	KAT-Coder-Pro V1(100)
Meituan	1	100.0	LongCat Flash Chat(100)	LongCat Flash Chat(100)
AI21 Labs	1	100.0	Jamba Large 1.7(100)	Jamba Large 1.7(100)
Allen AI	7	100.0	Olmo 3.1 32B Think(100)	Olmo 3.1 32B Think(100)
Tencent	1	100.0	Hunyuan A13B Instruct(100)	Hunyuan A13B Instruct(100)
Upstage	1	100.0	Solar Pro 3(100)	Solar Pro 3(100)
Liquid AI	5	100.0	LFM2.5-1.2B-Thinking (free)(100)	LFM2.5-1.2B-Thinking (free)(100)
aion-labs	3	100.0	Aion-2.0(100)	Aion-2.0(100)
Writer	1	100.0	Palmyra X5(100)	Palmyra X5(100)
deepcogito	1	100.0	Cogito v2.1 671B(100)	Cogito v2.1 671B(100)
Cohere	4	100.0	Command A(100)	Command A(100)
essentialai	1	100.0	Rnj 1 Instruct(100)	Rnj 1 Instruct(100)
IBM	1	100.0	Granite 4.0 Micro(100)	Granite 4.0 Micro(100)
Microsoft	2	100.0	Phi 4(100)	Phi 4(100)
eleutherai	1	100.0	Llemma 7b(100)	Llemma 7b(100)
Inflection	2	100.0	Inflection 3 Pi(100)	Inflection 3 Pi(100)

Stability Distribution

How stability scores are distributed across all 293 tracked models.

0–10

10–20

20–30

30–40

40–50

50–60

60–70

70–80

80–90

90–100

293

What Makes a Model Stable?

Our stability scoring system uses three key signals to measure how consistently a model performs over time.

Rank Consistency

State Classification

Sparkline Volatility

All Trackers

Coding, image, and video model trackers

Degradation Tracker

Detect models with declining performance

Coding Tracker

Daily coding model performance and rankings

Model Stability Report

Most Stable Models

Most Volatile Models

Stability by Provider

Stability Distribution

What Makes a Model Stable?

Rank Consistency

State Classification

Sparkline Volatility

Related

Model Stability Report

Most Stable Models

Most Volatile Models

Stability by Provider

Stability Distribution

What Makes a Model Stable?

Rank Consistency

State Classification

Sparkline Volatility

Related