Windsurf (formerly Codeium) is an AI-native IDE with deep codebase understanding. Models need strong code completion, multi-file awareness, and fast inference.
Scored by: coding benchmarks (50%), capability match (25%), price (15%), context (10%).
| # | Model | Score | Output $/M |
|---|---|---|---|
| 1 | Grok 4.20 Beta Arena Elo: 1496 | 87 | $6.00 |
| 2 | Gemini 3.1 Pro Preview Arena Elo: 1492 | 87 | $12.00 |
| 3 | GPT-5.4 Pro | 85 | $180.00 |
| 4 | Grok 4.1 Fast Arena Elo: 1473 | 85 | $0.500 |
| 5 | GPT-5.4 Mini | 84 | $4.50 |
| 6 | Gemini 3 Flash Preview HumanEval: 92% | 84 | $3.00 |
| 7 | GPT-5.2 Pro | 84 | $168.00 |
| 8 | GPT-5.1 Arena Elo: 1456 | 84 | $10.00 |
| 9 | Qwen3.5 397B A17B Arena Elo: 1450 | 83 | $2.34 |
| 10 | o3 Deep Research | 83 | $40.00 |
| 11 | GPT-5 Pro | 83 | $120.00 |
| 12 | Claude Opus 4.1 Arena Elo: 1449 | 83 | $75.00 |
| 13 | Gemini 3.1 Flash Lite Preview Arena Elo: 1437 | 82 | $1.50 |
| 14 | GPT-5.2 Chat Arena Elo: 1481 | 82 | $14.00 |
| 15 | Claude Haiku 4.5 HumanEval: 89.8% | 82 | $5.00 |
| 16 | Llama 4 Maverick HumanEval: 89.5% | 82 | $0.600 |
| 17 | Gemini 2.0 Flash HumanEval: 89.4% | 82 | $0.400 |
| 18 | Qwen3.5-122B-A10B Arena Elo: 1419 | 81 | $2.08 |
| 19 | Qwen3 VL 235B A22B Instruct Arena Elo: 1416 | 81 | $0.880 |
| 20 | Grok 4 Fast Arena Elo: 1422 | 81 | $0.500 |
| 21 | o3 Pro | 81 | $80.00 |
| 22 | MiMo-V2-Omni | 80 | $2.00 |
| 23 | MiMo-V2-Pro | 80 | $3.00 |
| 24 | GPT-5.4 Nano | 80 | $1.25 |
| 25 | Nemotron 3 Super (free) | 80 | Free |
| 26 | Seed-2.0-Lite | 80 | $2.00 |
| 27 | Seed-2.0-Mini | 80 | $0.400 |
| 28 | Qwen3.5-27B Arena Elo: 1410 | 80 | $1.56 |
| 29 | Gemini 3.1 Pro Preview Custom Tools | 80 | $12.00 |
| 30 | GPT-5.3-Codex | 80 | $14.00 |
| 31 | Qwen3.5 Plus 2026-02-15 | 80 | $1.56 |
| 32 | Kimi K2.5 | 80 | $2.20 |
| 33 | GPT-5.2-Codex | 80 | $14.00 |
| 34 | Seed 1.6 Flash | 80 | $0.300 |
| 35 | Seed 1.6 | 80 | $2.00 |
| 36 | GPT-5.1-Codex-Max | 80 | $10.00 |
| 37 | GPT-5.1-Codex | 80 | $10.00 |
| 38 | GPT-5.1-Codex-Mini | 80 | $2.00 |
| 39 | o4 Mini Deep Research | 80 | $8.00 |
| 40 | GPT-5 Codex | 80 | $10.00 |
| 41 | Grok Code Fast 1 | 80 | $1.50 |
| 42 | Gemini 2.5 Pro Preview 06-05 | 80 | $10.00 |
| 43 | o4 Mini High | 80 | $4.40 |
| 44 | Mistral Large HumanEval: 92% | 80 | $6.00 |
| 45 | MiniMax M2.7 | 79 | $1.20 |
| 46 | Qwen3.5-35B-A3B Arena Elo: 1398 | 79 | $1.30 |
| 47 | Qwen3.5-Flash Arena Elo: 1400 | 79 | $0.260 |
| 48 | MiniMax M2.5 (free) | 79 | Free |
| 49 | MiniMax M2.5 Arena Elo: 1404 | 79 | $1.17 |
| 50 | Claude Opus 4.6 SWE-bench: 83.7% | 79 | $25.00 |
Based on our analysis of coding benchmarks, capability matching, and pricing, Grok 4.20 Beta currently ranks #1 for Windsurf. Rankings are updated hourly using real benchmark data.
We score models using a weighted formula: coding benchmarks like SWE-bench and HumanEval (50%), capability match for Windsurf's requirements (25%), pricing affordability (15%), and context window size (10%). Only models with the capabilities Windsurf needs are included.
We currently track 306 AI models compatible with Windsurf. This includes models from OpenAI, Anthropic, Google, DeepSeek, and other providers accessible via API.
Many open-source models are compatible with Windsurf through API providers like OpenRouter, Together AI, and Groq. Check our rankings to see which open-source models perform best.
Rankings refresh hourly. We monitor benchmark scores, pricing changes, and new model releases to keep recommendations current.