Warp is an AI-powered terminal with built-in command suggestions and natural language to shell translation. Fast, cheap models with strong instruction following work best.
Scored by: coding benchmarks (50%), capability match (25%), price (15%), context (10%).
| # | Model | Score | Output $/M |
|---|---|---|---|
| 1 | Grok 4.1 Fast Arena Elo: 1473 | 90 | $0.500 |
| 2 | Nemotron 3 Super (free) | 87 | Free |
| 3 | MiniMax M2.5 (free) | 87 | Free |
| 4 | Llama 4 Maverick HumanEval: 89.5% | 87 | $0.600 |
| 5 | Gemini 2.0 Flash HumanEval: 89.4% | 87 | $0.400 |
| 6 | GPT-5.4 Mini | 86 | $4.50 |
| 7 | Grok 4.20 Beta Arena Elo: 1496 | 86 | $6.00 |
| 8 | DeepSeek V3.2 Arena Elo: 1423 | 86 | $0.380 |
| 9 | Nemotron Nano 12B 2 VL (free) | 86 | Free |
| 10 | DeepSeek V3.2 Exp Arena Elo: 1424 | 86 | $0.410 |
| 11 | Grok 4 Fast Arena Elo: 1422 | 86 | $0.500 |
| 12 | Llama 3.3 70B Instruct HumanEval: 88.4% | 86 | $0.320 |
| 13 | GPT-4o-mini HumanEval: 87.2% | 86 | $0.600 |
| 14 | Seed-2.0-Mini | 85 | $0.400 |
| 15 | Gemini 3.1 Pro Preview Arena Elo: 1492 | 85 | $12.00 |
| 16 | Qwen3.5 397B A17B Arena Elo: 1450 | 85 | $2.34 |
| 17 | Seed 1.6 Flash | 85 | $0.300 |
| 18 | Gemini 3 Flash Preview HumanEval: 92% | 85 | $3.00 |
| 19 | Qwen3 VL 235B A22B Instruct Arena Elo: 1416 | 85 | $0.880 |
| 20 | DeepSeek V3.1 Terminus Arena Elo: 1417 | 85 | $0.790 |
| 21 | DeepSeek V3.1 Arena Elo: 1419 | 85 | $0.750 |
| 22 | Gemini 3.1 Flash Lite Preview Arena Elo: 1437 | 84 | $1.50 |
| 23 | Qwen3.5-Flash Arena Elo: 1400 | 84 | $0.260 |
| 24 | Step 3.5 Flash (free) | 84 | Free |
| 25 | GPT-5.2 Chat Arena Elo: 1481 | 84 | $14.00 |
| 26 | Gemini 2.5 Flash Lite Preview 09-2025 | 84 | $0.400 |
| 27 | LongCat Flash Chat Arena Elo: 1401 | 84 | $0.800 |
| 28 | DeepSeek V3 0324 HumanEval: 84.5% | 84 | $0.770 |
| 29 | Mercury 2 | 83 | $0.750 |
| 30 | Step 3.5 Flash Arena Elo: 1389 | 83 | $0.300 |
| 31 | MiMo-V2-Flash | 83 | $0.290 |
| 32 | Trinity Mini | 83 | $0.150 |
| 33 | gpt-oss-safeguard-20b | 83 | $0.300 |
| 34 | Tongyi DeepResearch 30B A3B | 83 | $0.450 |
| 35 | Qwen Plus 0728 (thinking) | 83 | $0.780 |
| 36 | Gemini 2.5 Flash Lite | 83 | $0.400 |
| 37 | DeepSeek V3 HumanEval: 82.6% | 83 | $0.890 |
| 38 | Claude 3.5 Haiku HumanEval: 88.1% | 83 | $4.00 |
| 39 | MiMo-V2-Omni | 82 | $2.00 |
| 40 | MiMo-V2-Pro | 82 | $3.00 |
| 41 | GPT-5.4 Nano | 82 | $1.25 |
| 42 | Mistral Small 4 | 82 | $0.600 |
| 43 | Seed-2.0-Lite | 82 | $2.00 |
| 44 | Qwen3.5-9B | 82 | $0.150 |
| 45 | Qwen3.5-27B Arena Elo: 1410 | 82 | $1.56 |
| 46 | Qwen3.5-122B-A10B Arena Elo: 1419 | 82 | $2.08 |
| 47 | Qwen3.5 Plus 2026-02-15 | 82 | $1.56 |
| 48 | Kimi K2.5 | 82 | $2.20 |
| 49 | Seed 1.6 | 82 | $2.00 |
| 50 | GPT-5.1 Arena Elo: 1456 | 82 | $10.00 |
Based on our analysis of coding benchmarks, capability matching, and pricing, Grok 4.1 Fast currently ranks #1 for Warp. Rankings are updated hourly using real benchmark data.
We score models using a weighted formula: coding benchmarks like SWE-bench and HumanEval (50%), capability match for Warp's requirements (25%), pricing affordability (15%), and context window size (10%). Only models with the capabilities Warp needs are included.
We currently track 303 AI models compatible with Warp. This includes models from OpenAI, Anthropic, Google, DeepSeek, and other providers accessible via API.
Many open-source models are compatible with Warp through API providers like OpenRouter, Together AI, and Groq. Check our rankings to see which open-source models perform best.
Rankings refresh hourly. We monitor benchmark scores, pricing changes, and new model releases to keep recommendations current.