Historical Performance - Coding

90-day score trends for top coding AI models. Track composite score changes, model entry dates, and key events that impacted rankings.

Score Trends

Rank	Model	Provider	Score	30d	60d	90d
#1	GPT-5.4 Pro	OpenAI	94.0	+6.7	+14.7	+22.1
#2	GPT-5.4	OpenAI	94.0	+10.7	+20.0	+27.8
#3	GPT-5.4 Mini	OpenAI	93.3	+10.2	+19.6	+28.7
#4	GPT-5.2 Pro	OpenAI	92.7	+2.4	+5.3	+9.2
#5	GPT-5.2	OpenAI	92.7	+8.0	+17.0	+25.9
#6	Claude Opus 4.6	Anthropic	92.1	+3.8	+7.4	+11.7
#7	GPT-5 Pro	OpenAI	91.9	+6.3	+11.8	+15.6
#8	o3 Deep Research	OpenAI	91.5	+2.8	+5.6	+9.4
#9	Claude Opus 4.5	Anthropic	90.4	+3.3	+7.7	+12.3
#10	Gemini 3 Pro Preview	Google	90.3	+9.4	+19.8	+29.9

Model Timeline

Date Entered	Model	Provider	Entry Rank	Current Rank
2025-11-07	GPT-5.2 Pro	OpenAI	#4	#4
2025-10-31	GPT-5 Pro	OpenAI	#10	#7
2025-10-21	Gemini 3 Pro Preview	Google	#16	#10
2025-10-02	Claude Opus 4.6	Anthropic	#9	#6
2025-09-23	GPT-5.4 Pro	OpenAI	#1	#1
2025-08-11	Claude Opus 4.5	Anthropic	#12	#9
2025-06-29	GPT-5.4	OpenAI	#10	#2
2025-06-23	o3 Deep Research	OpenAI	#12	#8
2025-06-22	GPT-5.2	OpenAI	#13	#5
2025-06-05	GPT-5.4 Mini	OpenAI	#9	#3

Key Events

2026-02-28versionClaude Opus 4.6 released with expanded context

2026-02-15pricingOpenAI reduced GPT-5.2 pricing by 20%

2026-02-01versionGemini 3 Pro launched with multimodal improvements

2026-01-20versionDeepSeek V3.1 update with enhanced reasoning

2026-01-10pricingAnthropic introduced new Claude Sonnet tier pricing

2025-12-15versionQwen 3.5 397B released by Alibaba Cloud

2025-12-01pricingGoogle adjusted Gemini API pricing structure

2025-11-15versionGrok 4.1 launched with code generation focus

Coding Tracker All Trackers Leaderboard

Frequently Asked Questions

We track composite scores for the top coding AI models over a 90-day rolling window. Scores combine coding benchmarks like SWE-bench and HumanEval, pricing, context window, and capability data that refreshes hourly.

These columns show how much each model's composite score has changed over the last 30, 60, or 90 days. A positive change indicates improving performance or rankings, while a negative change suggests the model is falling behind newer competitors.

Check the Score Trends table above to see which models show the largest positive 30-day change. New model releases and major updates often cause significant score improvements.

Score data is refreshed hourly. The historical trend lines and change percentages are recalculated with each update to reflect the latest benchmark results, pricing changes, and capability additions.

Score Trends

Rank	Model	Provider	Score	30d	60d	90d
#1	GPT-5.4 Pro	OpenAI	94.0	+6.7	+14.7	+22.1
#2	GPT-5.4	OpenAI	94.0	+10.7	+20.0	+27.8
#3	GPT-5.4 Mini	OpenAI	93.3	+10.2	+19.6	+28.7
#4	GPT-5.2 Pro	OpenAI	92.7	+2.4	+5.3	+9.2
#5	GPT-5.2	OpenAI	92.7	+8.0	+17.0	+25.9
#6	Claude Opus 4.6	Anthropic	92.1	+3.8	+7.4	+11.7
#7	GPT-5 Pro	OpenAI	91.9	+6.3	+11.8	+15.6
#8	o3 Deep Research	OpenAI	91.5	+2.8	+5.6	+9.4
#9	Claude Opus 4.5	Anthropic	90.4	+3.3	+7.7	+12.3
#10	Gemini 3 Pro Preview	Google	90.3	+9.4	+19.8	+29.9

Model Timeline

Date Entered	Model	Provider	Entry Rank	Current Rank
2025-11-07	GPT-5.2 Pro	OpenAI	#4	#4
2025-10-31	GPT-5 Pro	OpenAI	#10	#7
2025-10-21	Gemini 3 Pro Preview	Google	#16	#10
2025-10-02	Claude Opus 4.6	Anthropic	#9	#6
2025-09-23	GPT-5.4 Pro	OpenAI	#1	#1
2025-08-11	Claude Opus 4.5	Anthropic	#12	#9
2025-06-29	GPT-5.4	OpenAI	#10	#2
2025-06-23	o3 Deep Research	OpenAI	#12	#8
2025-06-22	GPT-5.2	OpenAI	#13	#5
2025-06-05	GPT-5.4 Mini	OpenAI	#9	#3

Key Events

2026-02-28versionClaude Opus 4.6 released with expanded context

2026-02-15pricingOpenAI reduced GPT-5.2 pricing by 20%

2026-02-01versionGemini 3 Pro launched with multimodal improvements

2026-01-20versionDeepSeek V3.1 update with enhanced reasoning

2026-01-10pricingAnthropic introduced new Claude Sonnet tier pricing

2025-12-15versionQwen 3.5 397B released by Alibaba Cloud

2025-12-01pricingGoogle adjusted Gemini API pricing structure

2025-11-15versionGrok 4.1 launched with code generation focus

Historical Performance - Coding

Score Trends

Model Timeline

Key Events

Related

Historical Performance - Coding

Score Trends

Model Timeline

Key Events

Related