| 75 | Hunyuan A13B InstructTencent | Tencent | 72.3 | -16 | -27 | high | fragile |
| 74 | o4 MiniOpenAI | OpenAI | 83.7 | -23 | -23 | high | fragile |
| 67 | GPT-5 CodexOpenAI | OpenAI | 85.0 | -8 | -27 | high | fragile |
| 56 | Nemotron Nano 12B 2 VLNVIDIA | NVIDIA | 72.6 | -21 | -15 | high | fragile |
| 55 | Codestral 2508Mistral AI | Mistral AI | 64.8 | -16 | -17 | high | fragile |
| 53 | Nemotron Nano 12B 2 VL (free)NVIDIA | NVIDIA | 82.3 | -18 | -15 | high | fragile |
| 52 | Llama Guard 4 12BMeta | Meta | 59.0 | -13 | -17 | high | fragile |
| 51 | Qwen3 Next 80B A3B ThinkingAlibaba | Alibaba | 72.7 | -6 | -20 | high | fragile |
| 50 | Gemini 2.5 ProGoogle | Google | 84.8 | -5 | -20 | high | fragile |
| 48 | GPT-5.1-Codex-MaxOpenAI | OpenAI | 85.0 | -17 | -13 | high | fragile |
| 48 | Trinity Miniarcee-ai | arcee-ai | 82.4 | -13 | -15 | high | fragile |
| 47 | o1OpenAI | OpenAI | 75.7 | -12 | -15 | high | fragile |
| 47 | Ministral 3 14B 2512Mistral AI | Mistral AI | 73.5 | +12 | -21 | high | fragile |
| 47 | Nova 2 LiteAmazon | Amazon | 72.7 | +1 | -21 | high | fragile |
| 45 | GPT-5.1-Codex-MiniOpenAI | OpenAI | 85.0 | -4 | -18 | high | fragile |
| 43 | Grok 4 FastxAI | xAI | 83.3 | +5 | -19 | high | fragile |
| 43 | Claude Sonnet 4Anthropic | Anthropic | 79.9 | -16 | -11 | high | fragile |
| 43 | ERNIE 4.5 21B A3B ThinkingBaidu | Baidu | 70.0 | +1 | -19 | high | fragile |
| 43 | Spotlightarcee-ai | arcee-ai | 62.3 | -8 | -15 | high | fragile |
| 42 | Llama 4 ScoutMeta | Meta | 72.0 | -13 | -12 | high | fragile |
| 41 | Qwen3 Coder NextAlibaba | Alibaba | 76.7 | -16 | -10 | high | fragile |
| 41 | Nemotron Nano 9B V2NVIDIA | NVIDIA | 71.6 | +14 | -18 | high | fragile |
| 40 | o4 Mini HighOpenAI | OpenAI | 85.0 | -5 | -15 | high | fragile |
| 40 | DeepSeek V3 0324DeepSeek | DeepSeek | 73.2 | -21 | -7 | high | fragile |
| 40 | GPT-4o AudioOpenAI | OpenAI | 72.1 | -9 | -13 | high | fragile |
| 40 | Kimi K2 0711Moonshot AI | Moonshot AI | 62.7 | -1 | -17 | high | fragile |
| 39 | Qwen3.5 397B A17BAlibaba | Alibaba | 81.8 | -20 | -7 | high | fragile |
| 39 | Qwen3 32BAlibaba | Alibaba | 71.4 | -2 | -16 | high | fragile |
| 39 | DeepSeek V3DeepSeek | DeepSeek | 69.7 | +6 | -17 | high | fragile |
| 39 | Mistral Small CreativeMistral AI | Mistral AI | 59.0 | +7 | -17 | high | fragile |
| 38 | Aion-2.0aion-labs | aion-labs | 69.2 | -15 | -9 | high | fragile |
| 37 | Qwen3.5-27BAlibaba | Alibaba | 79.1 | +8 | -16 | high | fragile |
| 37 | Qwen3 4B (free)Alibaba | Alibaba | 63.0 | +8 | -16 | high | fragile |
| 37 | MiniMax-01MiniMax | MiniMax | 62.0 | -20 | -6 | high | fragile |
| 37 | LFM2-2.6BLiquid AI | Liquid AI | 53.2 | +2 | -16 | high | fragile |
| 36 | Qwen Plus 0728 (thinking)Alibaba | Alibaba | 82.8 | -3 | -14 | high | fragile |
| 36 | Qwen3 MaxAlibaba | Alibaba | 76.0 | -7 | -12 | high | fragile |
| 35 | MiMo-V2-FlashXiaomi | Xiaomi | 82.6 | +14 | -15 | high | fragile |
| 35 | Gemini 2.5 FlashGoogle | Google | 80.1 | +4 | -15 | high | fragile |
| 35 | Llama 3.3 70B InstructMeta | Meta | 65.7 | 0 | -15 | high | fragile |
| 35 | Gemma 3 4BGoogle | Google | 56.2 | -6 | -12 | high | fragile |
| 34 | Step 3.5 Flash (free)StepFun | StepFun | 78.2 | -9 | -10 | high | fragile |
| 33 | DeepSeek V3.2DeepSeek | DeepSeek | 74.1 | +15 | -14 | high | fragile |
| 33 | Jamba Large 1.7AI21 Labs | AI21 Labs | 71.2 | -14 | -7 | high | fragile |
| 33 | Claude 3.5 SonnetAnthropic | Anthropic | 65.8 | -12 | -8 | high | fragile |
| 32 | Olmo 3.1 32B InstructAllen AI | Allen AI | 64.9 | -1 | -13 | high | fragile |
| 31 | DeepSeek V3.1DeepSeek | DeepSeek | 73.8 | +16 | -13 | high | fragile |
| 30 | Claude Opus 4.1Anthropic | Anthropic | 82.0 | -3 | -11 | high | fragile |
| 30 | DeepSeek V3.2 ExpDeepSeek | DeepSeek | 77.2 | -11 | -7 | high | fragile |
| 30 | GPT-4o (extended)OpenAI | OpenAI | 54.3 | -5 | -10 | high | fragile |
| 29 | Grok Code Fast 1xAI | xAI | 84.8 | -24 | +10 | high | fragile |
| 29 | Qwen3 VL 32B InstructAlibaba | Alibaba | 80.9 | +3 | -12 | high | fragile |
| 29 | gpt-oss-120b (free)OpenAI | OpenAI | 73.8 | +26 | -12 | high | fragile |
| 29 | Mistral Large 3 2512Mistral AI | Mistral AI | 73.5 | -10 | -7 | high | fragile |
| 29 | GPT Audio MiniOpenAI | OpenAI | 68.4 | +8 | -12 | high | fragile |
| 29 | gpt-oss-120bOpenAI | OpenAI | 67.7 | +12 | -12 | high | fragile |
| 28 | Grok 3 MinixAI | xAI | 76.2 | -18 | -5 | high | stable |
| 27 | Claude Opus 4Anthropic | Anthropic | 81.7 | +3 | -11 | high | fragile |
| 27 | Qwen3.5-FlashAlibaba | Alibaba | 79.4 | +8 | -11 | high | fragile |
| 27 | Ministral 3 3B 2512Mistral AI | Mistral AI | 72.6 | -22 | +21 | high | fragile |
| 27 | Olmo 3 32B ThinkAllen AI | Allen AI | 66.3 | -6 | -8 | high | fragile |
| 27 | Devstral Small 1.1Mistral AI | Mistral AI | 62.6 | +7 | -11 | high | fragile |
| 27 | Llama 3.1 Nemotron Ultra 253B v1NVIDIA | NVIDIA | 57.5 | -10 | -6 | high | fragile |
| 25 | Qwen3 VL 235B A22B ThinkingAlibaba | Alibaba | 77.4 | +11 | -10 | high | fragile |
| 24 | Mistral Large 2407Mistral AI | Mistral AI | 53.0 | -3 | -8 | high | fragile |
| 23 | Qwen3 30B A3B Thinking 2507Alibaba | Alibaba | 80.9 | -18 | +9 | high | fragile |
| 23 | DeepSeek V3.2 SpecialeDeepSeek | DeepSeek | 77.1 | +20 | -9 | high | fragile |
| 23 | Qwen3 VL 235B A22B InstructAlibaba | Alibaba | 74.0 | -4 | -7 | high | fragile |
| 23 | LFM2.5-1.2B-Thinking (free)Liquid AI | Liquid AI | 59.0 | -4 | -7 | high | fragile |
| 23 | Mistral LargeMistral AI | Mistral AI | 54.1 | +4 | -9 | high | fragile |
| 21 | Qwen3 VL 8B ThinkingAlibaba | Alibaba | 85.0 | +19 | -8 | high | fragile |
| 21 | Nova Premier 1.0Amazon | Amazon | 77.8 | +2 | -8 | high | fragile |
| 21 | gpt-oss-20b (free)OpenAI | OpenAI | 73.8 | -16 | +21 | high | fragile |
| 21 | Grok 3xAI | xAI | 73.7 | -16 | +14 | high | fragile |
| 21 | GPT-4o-mini Search PreviewOpenAI | OpenAI | 72.9 | +21 | -8 | high | fragile |
| 21 | Qwen3 Coder 30B A3B InstructAlibaba | Alibaba | 72.3 | -21 | 0 | high | stable |
| 21 | Llama 3.3 Nemotron Super 49B V1.5NVIDIA | NVIDIA | 68.6 | +10 | -8 | high | fragile |
| 21 | Gemma 3 27B (free)Google | Google | 62.8 | -4 | -6 | high | fragile |
| 21 | Qwen VL PlusAlibaba | Alibaba | 60.9 | -16 | +8 | high | fragile |
| 21 | MiniMax M2-herMiniMax | MiniMax | 59.4 | +10 | -8 | high | fragile |
| 20 | Trinity Mini (free)arcee-ai | arcee-ai | 72.6 | -15 | +8 | high | fragile |
| 20 | Qwen3 Next 80B A3B InstructAlibaba | Alibaba | 70.1 | -15 | +7 | high | fragile |
| 20 | Claude 3.7 Sonnet (thinking)Anthropic | Anthropic | 69.8 | -15 | +8 | high | fragile |
| 20 | Command ACohere | Cohere | 60.0 | -16 | -2 | high | stable |
| 19 | Qwen3 Coder PlusAlibaba | Alibaba | 78.6 | +15 | -7 | high | fragile |
| 19 | Gemini 2.0 FlashGoogle | Google | 75.0 | -19 | 0 | high | stable |
| 19 | GPT-4oOpenAI | OpenAI | 64.4 | -14 | +6 | high | fragile |
| 18 | Qwen Plus 0728Alibaba | Alibaba | 77.0 | -13 | +14 | high | fragile |
| 18 | Mistral Medium 3Mistral AI | Mistral AI | 65.0 | -1 | -6 | high | fragile |
| 17 | o3OpenAI | OpenAI | 85.7 | +3 | -6 | high | fragile |
| 17 | GPT AudioOpenAI | OpenAI | 68.4 | +1 | -6 | high | fragile |
| 17 | Gemma 2 27BGoogle | Google | 59.7 | +10 | -6 | high | fragile |
| 17 | LFM2-24B-A2BLiquid AI | Liquid AI | 53.2 | -7 | -5 | high | stable |
| 17 | Llama 3.1 8B InstructMeta | Meta | 42.4 | +1 | -6 | high | fragile |
| 16 | Sonar Pro SearchPerplexity | Perplexity | 85.0 | -8 | -4 | high | stable |
| 16 | Mistral Small 3.2 24BMistral AI | Mistral AI | 67.3 | -11 | +10 | high | fragile |
| 16 | Mercury CoderInception | Inception | 67.0 | -12 | -2 | high | stable |
| 16 | Claude 3.5 HaikuAnthropic | Anthropic | 62.5 | -10 | -3 | high | stable |
| 15 | Step 3.5 FlashStepFun | StepFun | 73.2 | -15 | +4 | high | stable |
| 15 | Qwen3 8BAlibaba | Alibaba | 65.1 | -7 | -4 | high | stable |
| 14 | GPT-4.1OpenAI | OpenAI | 77.4 | -14 | +4 | high | stable |
| 14 | Gemma 3 12B (free)Google | Google | 55.2 | -6 | -4 | high | stable |
| 14 | SabaMistral AI | Mistral AI | 52.9 | -6 | -4 | high | stable |
| 14 | Qwen2.5 7B InstructAlibaba | Alibaba | 42.8 | -4 | -5 | high | stable |
| 14 | GPT-4 Turbo PreviewOpenAI | OpenAI | 42.7 | -4 | -5 | high | stable |
| 13 | Kimi K2.5Moonshot AI | Moonshot AI | 85.0 | -8 | +10 | high | fragile |
| 13 | Claude Haiku 4.5Anthropic | Anthropic | 83.0 | -7 | -3 | high | stable |
| 13 | GPT-5 MiniOpenAI | OpenAI | 79.2 | -13 | 0 | high | stable |
| 13 | Sonar ProPerplexity | Perplexity | 63.1 | -8 | +8 | high | fragile |
| 13 | Llama 3.2 11B Vision InstructMeta | Meta | 54.4 | -3 | -5 | high | stable |
| 13 | GPT-3.5 TurboOpenAI | OpenAI | 39.9 | -5 | -4 | high | stable |
| 12 | o3 MiniOpenAI | OpenAI | 73.4 | -7 | +11 | high | fragile |
| 12 | LongCat Flash ChatMeituan | Meituan | 72.8 | -7 | +16 | high | fragile |
| 12 | Kimi K2 ThinkingMoonshot AI | Moonshot AI | 72.6 | -8 | -2 | high | stable |
| 12 | Qwen VL MaxAlibaba | Alibaba | 68.1 | -10 | -1 | high | stable |
| 12 | GPT-4o-miniOpenAI | OpenAI | 64.6 | -7 | +10 | high | fragile |
| 12 | Gemma 3 27BGoogle | Google | 63.6 | -7 | +11 | high | fragile |
| 12 | Devstral MediumMistral AI | Mistral AI | 62.6 | -12 | +4 | high | stable |
| 12 | Nova Lite 1.0Amazon | Amazon | 58.2 | -12 | +1 | high | stable |
| 11 | o4 Mini Deep ResearchOpenAI | OpenAI | 85.0 | -1 | -5 | high | stable |
| 11 | GPT-4.1 NanoOpenAI | OpenAI | 80.7 | -7 | -2 | high | stable |
| 11 | o1-proOpenAI | OpenAI | 76.5 | -6 | +16 | high | fragile |
| 11 | Ministral 3 8B 2512Mistral AI | Mistral AI | 73.5 | -3 | -4 | high | stable |
| 11 | Qwen3 235B A22B Instruct 2507Alibaba | Alibaba | 70.0 | -11 | 0 | high | stable |
| 11 | Kimi K2 0905Moonshot AI | Moonshot AI | 65.7 | -11 | +3 | high | stable |
| 11 | Qwen-TurboAlibaba | Alibaba | 60.7 | -5 | -3 | high | stable |
| 11 | Aion-1.0aion-labs | aion-labs | 56.6 | -6 | +15 | high | fragile |
| 10 | o3 Deep ResearchOpenAI | OpenAI | 91.5 | -2 | -4 | high | stable |
| 10 | GPT-5.3 ChatOpenAI | OpenAI | 85.0 | +7 | -5 | high | stable |
| 10 | Gemini 3.1 Pro Preview Custom ToolsGoogle | Google | 85.0 | +1 | -5 | high | stable |
| 10 | GPT-5.1-CodexOpenAI | OpenAI | 85.0 | -5 | +11 | high | fragile |
| 10 | Gemini 2.5 Flash LiteGoogle | Google | 81.4 | +12 | -5 | high | stable |
| 10 | Claude 3.7 SonnetAnthropic | Anthropic | 77.1 | +11 | -5 | high | stable |
| 10 | Qwen3 235B A22B Thinking 2507Alibaba | Alibaba | 69.3 | -10 | +3 | high | stable |
| 10 | Rnj 1 Instructessentialai | essentialai | 64.8 | +14 | -5 | high | stable |
| 10 | Gemma 3 12BGoogle | Google | 56.2 | +8 | -5 | high | stable |
| 10 | Pixtral Large 2411Mistral AI | Mistral AI | 55.7 | +5 | -5 | high | stable |
| 10 | GPT-4o (2024-08-06)OpenAI | OpenAI | 55.6 | -5 | +12 | high | fragile |
| 10 | Nova Micro 1.0Amazon | Amazon | 51.2 | +5 | -5 | high | stable |
| 10 | Llama 3 70B InstructMeta | Meta | 40.5 | 0 | -5 | high | stable |
| 9 | Qwen3.5 Plus 2026-02-15Alibaba | Alibaba | 85.0 | -4 | +9 | medium | fragile |
| 9 | Mercury 2Inception | Inception | 81.3 | -4 | +9 | medium | fragile |
| 9 | KAT-Coder-Pro V1Kuaishou | Kuaishou | 77.4 | -4 | +16 | medium | fragile |
| 9 | Qwen3 Coder 480B A35B (free)Alibaba | Alibaba | 69.0 | -4 | +9 | medium | fragile |
| 9 | Qwen3 Next 80B A3B Instruct (free)Alibaba | Alibaba | 67.0 | -9 | +5 | medium | stable |
| 9 | GPT-4o (2024-11-20)OpenAI | OpenAI | 63.3 | -4 | +8 | medium | fragile |
| 9 | Mistral Small 3Mistral AI | Mistral AI | 59.5 | -4 | +6 | medium | fragile |
| 9 | Qwen2.5 VL 32B InstructAlibaba | Alibaba | 56.7 | -4 | +15 | medium | fragile |
| 9 | GPT-4o-mini (2024-07-18)OpenAI | OpenAI | 53.7 | -9 | +1 | medium | stable |
| 9 | Qwen2.5-VL 7B InstructAlibaba | Alibaba | 37.6 | -3 | -3 | medium | stable |
| 8 | Gemini 3 Flash PreviewGoogle | Google | 89.4 | -4 | -2 | medium | stable |
| 8 | Grok 4.20 Multi-Agent BetaxAI | xAI | 82.2 | -3 | +17 | medium | fragile |
| 8 | Qwen3 VL 30B A3B InstructAlibaba | Alibaba | 80.9 | -8 | +1 | medium | stable |
| 8 | R1 0528DeepSeek | DeepSeek | 77.7 | +3 | -4 | medium | stable |
| 8 | Composer 2Cursor | Cursor | 76.4 | +15 | -4 | medium | stable |
| 8 | GPT-5 ChatOpenAI | OpenAI | 75.0 | -3 | +11 | medium | fragile |
| 8 | Qwen3 30B A3BAlibaba | Alibaba | 71.4 | +6 | -4 | medium | stable |
| 8 | Qwen3 235B A22BAlibaba | Alibaba | 71.3 | -8 | 0 | medium | stable |
| 8 | Nemotron 3 Nano 30B A3B (free)NVIDIA | NVIDIA | 67.7 | -3 | +16 | medium | fragile |
| 8 | Grok 3 BetaxAI | xAI | 63.5 | +16 | -4 | medium | stable |
| 8 | UI-TARS 7B ByteDance | ByteDance | 62.7 | +3 | -4 | medium | stable |
| 8 | Aion-1.0-Miniaion-labs | aion-labs | 56.6 | -3 | +6 | medium | fragile |
| 8 | Llama 3.3 70B Instruct (free)Meta | Meta | 44.1 | -4 | -2 | medium | stable |
| 8 | GPT-4 Turbo (older v1106)OpenAI | OpenAI | 42.7 | -3 | +6 | medium | fragile |
| 8 | Qwen2.5 Coder 32B InstructAlibaba | Alibaba | 42.4 | +1 | -4 | medium | stable |
| 8 | GPT-3.5 Turbo 16kOpenAI | OpenAI | 39.9 | +6 | -4 | medium | stable |
| 8 | Pixtral 12BMistral AI | Mistral AI | 38.3 | -4 | -2 | medium | stable |
| 7 | Qwen3 Coder 480B A35BAlibaba | Alibaba | 64.3 | -3 | -2 | medium | stable |
| 7 | R1 Distill Qwen 32BDeepSeek | DeepSeek | 60.2 | -2 | +12 | medium | fragile |
| 7 | GPT-3.5 Turbo (older v0613)OpenAI | OpenAI | 38.0 | -7 | +4 | medium | stable |
| 6 | Grok 4.1 FastxAI | xAI | 86.9 | -2 | -2 | medium | stable |
| 6 | Gemini 3.1 Pro PreviewGoogle | Google | 85.5 | +28 | -3 | medium | stable |
| 6 | Gemma 3 4B (free)Google | Google | 61.0 | -6 | +1 | medium | stable |
| 6 | Qwen-Max Alibaba | Alibaba | 58.8 | +10 | -3 | medium | stable |
| 6 | Mistral Large 2411Mistral AI | Mistral AI | 49.9 | 0 | -3 | medium | stable |
| 6 | Coder Largearcee-ai | arcee-ai | 45.5 | -2 | -2 | medium | stable |
| 6 | Inflection 3 PiInflection | Inflection | 36.8 | +2 | -3 | medium | stable |
| 6 | Inflection 3 ProductivityInflection | Inflection | 36.8 | 0 | -3 | medium | stable |
| 5 | o3 ProOpenAI | OpenAI | 87.7 | 0 | +10 | medium | fragile |
| 5 | Grok 4xAI | xAI | 85.8 | 0 | +20 | medium | fragile |
| 5 | Grok 4.20 BetaxAI | xAI | 85.7 | +2 | +12 | medium | fragile |
| 5 | GPT-5.1OpenAI | OpenAI | 85.2 | +3 | +29 | medium | fragile |
| 5 | GPT-5.4 NanoOpenAI | OpenAI | 85.0 | -5 | +280 | medium | preliminary |
| 5 | Seed-2.0-LiteByteDance | ByteDance | 85.0 | +3 | +6 | medium | fragile |
| 5 | GPT-5.3-CodexOpenAI | OpenAI | 85.0 | +26 | +9 | medium | fragile |
| 5 | Seed 1.6 FlashByteDance | ByteDance | 85.0 | +3 | +21 | medium | fragile |
| 5 | Seed 1.6ByteDance | ByteDance | 85.0 | +9 | +15 | medium | fragile |
| 5 | GPT-5.1 ChatOpenAI | OpenAI | 85.0 | +18 | +10 | medium | fragile |
| 5 | Qwen3 VL 30B A3B ThinkingAlibaba | Alibaba | 85.0 | +2 | +11 | medium | fragile |
| 5 | Gemini 2.5 Pro Preview 06-05Google | Google | 84.3 | +5 | +15 | medium | fragile |
| 5 | Gemini 2.5 Flash Lite Preview 09-2025Google | Google | 83.7 | 0 | +11 | medium | fragile |
| 5 | MiniMax M2.5 (free)MiniMax | MiniMax | 83.4 | 0 | +6 | medium | fragile |
| 5 | GPT-5.2 ChatOpenAI | OpenAI | 82.9 | +7 | +17 | medium | fragile |
| 5 | Gemini 2.5 Pro Preview 05-06Google | Google | 82.7 | +10 | +14 | medium | fragile |
| 5 | Gemini 3.1 Flash Lite PreviewGoogle | Google | 81.9 | +9 | +11 | medium | fragile |
| 5 | Qwen3 VL 8B InstructAlibaba | Alibaba | 80.9 | +4 | +17 | medium | fragile |
| 5 | Qwen3.5-122B-A10BAlibaba | Alibaba | 79.7 | +9 | +6 | medium | fragile |
| 5 | Qwen3.5-9BAlibaba | Alibaba | 79.3 | +8 | +14 | medium | fragile |
| 5 | Qwen3.5-35B-A3BAlibaba | Alibaba | 78.3 | +11 | +18 | medium | fragile |
| 5 | GPT-4.1 MiniOpenAI | OpenAI | 77.4 | +11 | +15 | medium | fragile |
| 5 | Composer 2 FastCursor | Cursor | 76.4 | -5 | +3 | medium | stable |
| 5 | MiniMax M2.5MiniMax | MiniMax | 76.0 | +5 | +14 | medium | fragile |
| 5 | GPT-5 NanoOpenAI | OpenAI | 75.6 | +4 | +17 | medium | fragile |
| 5 | Qwen3 30B A3B Instruct 2507Alibaba | Alibaba | 75.2 | +15 | +6 | medium | fragile |
| 5 | ERNIE 4.5 VL 28B A3BBaidu | Baidu | 75.0 | +25 | +14 | medium | fragile |
| 5 | DeepSeek V3.1 TerminusDeepSeek | DeepSeek | 73.7 | +4 | +10 | medium | fragile |
| 5 | Nemotron 3 SuperNVIDIA | NVIDIA | 73.5 | +25 | +21 | medium | fragile |
| 5 | MiniMax M2.1MiniMax | MiniMax | 73.1 | +11 | +20 | medium | fragile |
| 5 | MiniMax M2MiniMax | MiniMax | 72.7 | +6 | +19 | medium | fragile |
| 5 | Trinity Large Preview (free)arcee-ai | arcee-ai | 72.6 | +13 | +21 | medium | fragile |
| 5 | Mistral Medium 3.1Mistral AI | Mistral AI | 70.3 | 0 | +13 | medium | fragile |
| 5 | MiniMax M1MiniMax | MiniMax | 68.4 | +2 | +11 | medium | fragile |
| 5 | Cogito v2.1 671Bdeepcogito | deepcogito | 66.7 | +15 | +6 | medium | fragile |
| 5 | Grok 3 Mini BetaxAI | xAI | 66.1 | +2 | +9 | medium | fragile |
| 5 | o3 Mini HighOpenAI | OpenAI | 65.4 | +6 | +21 | medium | fragile |
| 5 | ERNIE 4.5 21B A3BBaidu | Baidu | 65.2 | +4 | +15 | medium | fragile |
| 5 | Mistral Small 3.1 24B (free)Mistral AI | Mistral AI | 62.2 | +11 | +11 | medium | fragile |
| 5 | Sonar Reasoning ProPerplexity | Perplexity | 61.6 | +4 | +8 | medium | fragile |
| 5 | R1 Distill Llama 70BDeepSeek | DeepSeek | 61.0 | +10 | +8 | medium | fragile |
| 5 | Qwen2.5 VL 72B InstructAlibaba | Alibaba | 60.3 | +9 | +10 | medium | fragile |
| 5 | Phi 4Microsoft | Microsoft | 59.6 | +3 | +6 | medium | fragile |
| 5 | SonarPerplexity | Perplexity | 53.7 | +8 | +7 | medium | fragile |
| 5 | LFM2-8B-A1BLiquid AI | Liquid AI | 53.2 | +7 | +9 | medium | fragile |
| 5 | Llama 3.1 Nemotron 70B InstructNVIDIA | NVIDIA | 53.2 | -5 | +5 | medium | stable |
| 5 | GPT-4o (2024-05-13)OpenAI | OpenAI | 52.7 | -3 | -1 | medium | stable |
| 5 | Claude 3 HaikuAnthropic | Anthropic | 43.0 | +5 | +9 | medium | fragile |
| 5 | Llama Guard 3 8BMeta | Meta | 42.9 | +2 | +6 | medium | fragile |
| 5 | GPT-4OpenAI | OpenAI | 39.0 | +1 | +6 | medium | fragile |
| 4 | Mistral Small 4Mistral AI | Mistral AI | 79.4 | -4 | +225 | low | preliminary |
| 4 | Solar Pro 3Upstage | Upstage | 72.5 | +14 | -2 | low | stable |
| 4 | gpt-oss-20bOpenAI | OpenAI | 68.5 | +12 | -2 | low | stable |
| 4 | GPT-4 TurboOpenAI | OpenAI | 60.5 | +8 | -2 | low | stable |
| 4 | Gemma 3n 4BGoogle | Google | 46.3 | +2 | -2 | low | stable |
| 4 | Llama 3.1 405B (base)Meta | Meta | 38.7 | +2 | -2 | low | stable |
| 4 | Llama 3.2 3B InstructMeta | Meta | 35.9 | +2 | -2 | low | stable |
| 4 | Llama 3.2 3B Instruct (free)Meta | Meta | 35.2 | -4 | +1 | low | stable |
| 4 | Llama 3.2 1B InstructMeta | Meta | 31.9 | +1 | -2 | low | stable |
| 3 | GPT-5.2OpenAI | OpenAI | 92.7 | -3 | +4 | low | stable |
| 3 | Llama 3.1 70B InstructMeta | Meta | 59.9 | -3 | +3 | low | stable |
| 3 | Mistral NemoMistral AI | Mistral AI | 50.7 | -3 | 0 | low | stable |
| 3 | Command R+ (08-2024)Cohere | Cohere | 47.8 | -1 | -1 | low | stable |
| 2 | GPT-5.2 ProOpenAI | OpenAI | 92.7 | +1 | -1 | low | stable |
| 2 | Claude Opus 4.6Anthropic | Anthropic | 92.1 | +1 | -1 | low | stable |
| 2 | Claude Opus 4.5Anthropic | Anthropic | 90.4 | 0 | -1 | low | stable |
| 2 | Claude Sonnet 4.6Anthropic | Anthropic | 89.2 | -2 | +2 | low | stable |
| 2 | Claude Sonnet 4.5Anthropic | Anthropic | 89.0 | -2 | +5 | low | stable |
| 2 | Nemotron 3 Super (free)NVIDIA | NVIDIA | 84.1 | +2 | -1 | low | stable |
| 2 | Qwen3 Max ThinkingAlibaba | Alibaba | 81.8 | +13 | -1 | low | stable |
| 2 | Mistral Small 3.1 24BMistral AI | Mistral AI | 66.2 | +1 | -1 | low | stable |
| 2 | Gemma 3n 2B (free)Google | Google | 58.2 | -2 | +4 | low | stable |
| 2 | Qwen2.5 72B InstructAlibaba | Alibaba | 52.4 | 0 | -1 | low | stable |
| 2 | SWE-1.5Windsurf | Windsurf | 49.2 | -2 | +5 | low | stable |
| 2 | Llemma 7beleutherai | eleutherai | 47.5 | 0 | -1 | low | stable |
| 2 | MellumJetBrains | JetBrains | 32.6 | +2 | -1 | low | stable |
| 2 | WizardLM-2 8x22BMicrosoft | Microsoft | 32.2 | -2 | +2 | low | stable |
| 1 | GPT-5.4OpenAI | OpenAI | 94.0 | -1 | +4 | low | stable |
| 1 | autofixer-01Vercel | Vercel | 38.8 | -1 | +2 | low | stable |
| 1 | Mixtral 8x22B InstructMistral AI | Mistral AI | 37.1 | -1 | +3 | low | stable |
| 1 | GPT-3.5 Turbo InstructOpenAI | OpenAI | 32.2 | -1 | 0 | low | stable |