Degradation Tracker

Detect when AI models may be getting worse. This tracker flags models with declining rankings, fragile states, and sustained performance drops across 293 tracked models.

Models at Risk

Declining (7d)

Fragile

Sustained Decline

Models at Risk

0 models showing signs of degradation, ranked by risk score. Higher risk scores indicate more concerning performance trends.

No models at risk

All tracked models are performing within expected parameters.

Stable Models

293 models with no decline and a stable ranking state. These models are performing consistently.

#	Model	Provider	Score	State
1	GPT-5.4 ProOpenAI	OpenAI	90.9	stable
2	GPT-5.2 ProOpenAI	OpenAI	89.9	stable
3	GPT-5 ProOpenAI	OpenAI	89.9	stable
4	o3 ProOpenAI	OpenAI	81.6	stable
5	Claude Opus 4.1Anthropic	Anthropic	81.1	stable
6	o1-proOpenAI	OpenAI	77.2	stable
7	Claude Opus 4Anthropic	Anthropic	75.5	stable
8	o3 Deep ResearchOpenAI	OpenAI	74.0	stable
9	Claude Opus 4.6Anthropic	Anthropic	70.5	stable
10	Claude Opus 4.5Anthropic	Anthropic	70.0	stable
11	GPT-5.4OpenAI	OpenAI	69.7	stable
12	Claude Sonnet 4.5Anthropic	Anthropic	69.1	stable
13	Qwen3 VL 30B A3B ThinkingAlibaba	Alibaba	68.6	stable
14	Qwen3 VL 235B A22B ThinkingAlibaba	Alibaba	68.6	stable
15	GPT-5.2OpenAI	OpenAI	68.4	stable
16	Gemini 3.1 Pro Preview Custom ToolsGoogle	Google	68.2	stable
17	Gemini 3.1 Pro PreviewGoogle	Google	68.2	stable
18	Gemini 3 Pro PreviewGoogle	Google	68.2	stable
19	Claude Sonnet 4.6Anthropic	Anthropic	68.0	stable
20	GPT-5.1OpenAI	OpenAI	67.4	stable

Showing top 20 of 293 stable models.

How Degradation Is Detected

Our degradation detection system uses multiple signals to identify models that may be declining in quality or reliability.

Declining (7d)

Models whose 7-day rank change is worse than -2 positions. A sustained drop of more than two ranks over a week suggests the model may be losing ground to competitors or experiencing performance issues.

Fragile State

Models classified as "fragile" by our scoring system. These models have inconsistent performance metrics or borderline scores that could shift significantly with small changes in evaluation data.

Sustained Decline

Models declining on both the 24-hour and 7-day timeframes. When a model is losing rank on both short and medium-term windows, it indicates a persistent downward trend rather than temporary fluctuation.

Risk Score

The degradation risk score combines multiple signals: 7-day rank decline weighted 2x, 24-hour rank decline weighted 1x, plus 5 bonus points for fragile state. Higher scores indicate greater risk of meaningful performance degradation.

All Trackers

Coding, image, and video model trackers

Coding Tracker

Daily coding model performance and rankings

Leaderboard

Full model leaderboard with composite scores

Model

Score

24h

State

GPT-5.4 ProOpenAI

90.9

stable

GPT-5.2 ProOpenAI

89.9

stable

GPT-5 ProOpenAI

89.9

stable

o3 ProOpenAI

81.6

stable

Claude Opus 4.1Anthropic

81.1

stable

o1-proOpenAI

77.2

stable

Claude Opus 4Anthropic

75.5

stable

o3 Deep ResearchOpenAI

74.0

stable

Claude Opus 4.6Anthropic

70.5

stable

Claude Opus 4.5Anthropic

70.0

stable

GPT-5.4OpenAI

69.7

stable

Claude Sonnet 4.5Anthropic

69.1

stable

Qwen3 VL 30B A3B ThinkingAlibaba

68.6

stable

Qwen3 VL 235B A22B ThinkingAlibaba

68.6

stable

GPT-5.2OpenAI

68.4

stable

Gemini 3.1 Pro Preview Custom ToolsGoogle

68.2

stable

Gemini 3.1 Pro PreviewGoogle

68.2

stable

Gemini 3 Pro PreviewGoogle

68.2

stable

Claude Sonnet 4.6Anthropic

68.0

stable

GPT-5.1OpenAI

67.4

stable

How Degradation Is Detected

Our degradation detection system uses multiple signals to identify models that may be declining in quality or reliability.

Declining (7d)

Fragile State

Models classified as "fragile" by our scoring system. These models have inconsistent performance metrics or borderline scores that could shift significantly with small changes in evaluation data.

Degradation Tracker

Models at Risk

Stable Models

How Degradation Is Detected

Declining (7d)

Fragile State

Sustained Decline

Risk Score

Related

Degradation Tracker

Models at Risk

Stable Models

How Degradation Is Detected

Declining (7d)

Fragile State

Sustained Decline

Risk Score

Related