90-day score trends for top coding AI models. Track composite score changes, model entry dates, and key events that impacted rankings.
| Rank | Model | Provider | Score | Trend | 30d | 60d | 90d |
|---|---|---|---|---|---|---|---|
| #1 | GPT-5.4 Pro | OpenAI | 94.0 | +6.7 | +14.7 | +22.1 | |
| #2 | GPT-5.4 | OpenAI | 94.0 | +10.7 | +20.0 | +27.8 | |
| #3 | GPT-5.4 Mini | OpenAI | 93.3 | +10.2 | +19.6 | +28.7 | |
| #4 | GPT-5.2 Pro | OpenAI | 92.7 | +2.4 | +5.3 | +9.2 | |
| #5 | GPT-5.2 | OpenAI | 92.7 | +8.0 | +17.0 | +25.9 | |
| #6 | Claude Opus 4.6 | Anthropic | 92.1 | +3.8 | +7.4 | +11.7 | |
| #7 | GPT-5 Pro | OpenAI | 91.9 | +6.3 | +11.8 | +15.6 | |
| #8 | o3 Deep Research | OpenAI | 91.5 | +2.8 | +5.6 | +9.4 | |
| #9 | Claude Opus 4.5 | Anthropic | 90.4 | +3.3 | +7.7 | +12.3 | |
| #10 | Gemini 3 Pro Preview | 90.3 | +9.4 | +19.8 | +29.9 |
| Date Entered | Model | Provider | Entry Rank | Current Rank |
|---|---|---|---|---|
| 2025-11-07 | GPT-5.2 Pro | OpenAI | #4 | #4 |
| 2025-10-31 | GPT-5 Pro | OpenAI | #10 | #7 |
| 2025-10-21 | Gemini 3 Pro Preview | #16 | #10 | |
| 2025-10-02 | Claude Opus 4.6 | Anthropic | #9 | #6 |
| 2025-09-23 | GPT-5.4 Pro | OpenAI | #1 | #1 |
| 2025-08-11 | Claude Opus 4.5 | Anthropic | #12 | #9 |
| 2025-06-29 | GPT-5.4 | OpenAI | #10 | #2 |
| 2025-06-23 | o3 Deep Research | OpenAI | #12 | #8 |
| 2025-06-22 | GPT-5.2 | OpenAI | #13 | #5 |
| 2025-06-05 | GPT-5.4 Mini | OpenAI | #9 | #3 |
We track composite scores for the top coding AI models over a 90-day rolling window. Scores combine coding benchmarks like SWE-bench and HumanEval, pricing, context window, and capability data that refreshes hourly.
These columns show how much each model's composite score has changed over the last 30, 60, or 90 days. A positive change indicates improving performance or rankings, while a negative change suggests the model is falling behind newer competitors.
Check the Score Trends table above to see which models show the largest positive 30-day change. New model releases and major updates often cause significant score improvements.
Score data is refreshed hourly. The historical trend lines and change percentages are recalculated with each update to reflect the latest benchmark results, pricing changes, and capability additions.