GitHub Copilot provides AI-powered code suggestions directly in your editor. It uses models for inline completions, chat, and code review. Fast, streaming-capable models work best.
Scored by: coding benchmarks (50%), capability match (25%), price (15%), context (10%).
| # | Model | Score | Output $/M |
|---|---|---|---|
| 1 | Grok 4.1 Fast Arena Elo: 1473 | 90 | $0.500 |
| 2 | Nemotron 3 Super (free) | 87 | Free |
| 3 | MiniMax M2.5 (free) | 87 | Free |
| 4 | Llama 4 Maverick HumanEval: 89.5% | 87 | $0.600 |
| 5 | Gemini 2.0 Flash HumanEval: 89.4% | 87 | $0.400 |
| 6 | GPT-5.4 Mini | 86 | $4.50 |
| 7 | Grok 4.20 Beta Arena Elo: 1496 | 86 | $6.00 |
| 8 | DeepSeek V3.2 Arena Elo: 1423 | 86 | $0.380 |
| 9 | Nemotron Nano 12B 2 VL (free) | 86 | Free |
| 10 | DeepSeek V3.2 Exp Arena Elo: 1424 | 86 | $0.410 |
| 11 | Grok 4 Fast Arena Elo: 1422 | 86 | $0.500 |
| 12 | Llama 3.3 70B Instruct HumanEval: 88.4% | 86 | $0.320 |
| 13 | GPT-4o-mini HumanEval: 87.2% | 86 | $0.600 |
| 14 | Seed-2.0-Mini | 85 | $0.400 |
| 15 | Gemini 3.1 Pro Preview Arena Elo: 1492 | 85 | $12.00 |
| 16 | Qwen3.5 397B A17B Arena Elo: 1450 | 85 | $2.34 |
| 17 | Seed 1.6 Flash | 85 | $0.300 |
| 18 | Gemini 3 Flash Preview HumanEval: 92% | 85 | $3.00 |
| 19 | Qwen3 VL 235B A22B Instruct Arena Elo: 1416 | 85 | $0.880 |
| 20 | DeepSeek V3.1 Terminus Arena Elo: 1417 | 85 | $0.780 |
| 21 | DeepSeek V3.1 Arena Elo: 1419 | 85 | $0.750 |
| 22 | Gemini 3.1 Flash Lite Preview Arena Elo: 1437 | 84 | $1.50 |
| 23 | Qwen3.5-Flash Arena Elo: 1400 | 84 | $0.260 |
| 24 | Step 3.5 Flash (free) | 84 | Free |
| 25 | GPT-5.2 Chat Arena Elo: 1481 | 84 | $14.00 |
| 26 | Gemini 2.5 Flash Lite Preview 09-2025 | 84 | $0.400 |
| 27 | LongCat Flash Chat Arena Elo: 1401 | 84 | $0.800 |
| 28 | DeepSeek V3 0324 HumanEval: 84.5% | 84 | $0.770 |
| 29 | Mercury 2 | 83 | $0.750 |
| 30 | Step 3.5 Flash Arena Elo: 1389 | 83 | $0.300 |
| 31 | MiMo-V2-Flash | 83 | $0.290 |
| 32 | Trinity Mini | 83 | $0.150 |
| 33 | gpt-oss-safeguard-20b | 83 | $0.300 |
| 34 | Tongyi DeepResearch 30B A3B | 83 | $0.450 |
| 35 | Qwen Plus 0728 (thinking) | 83 | $0.780 |
| 36 | Qwen3 30B A3B Thinking 2507 | 83 | $0.400 |
| 37 | Gemini 2.5 Flash Lite | 83 | $0.400 |
| 38 | DeepSeek V3 HumanEval: 82.6% | 83 | $0.890 |
| 39 | Claude 3.5 Haiku HumanEval: 88.1% | 83 | $4.00 |
| 40 | MiMo-V2-Omni | 82 | $2.00 |
| 41 | MiMo-V2-Pro | 82 | $3.00 |
| 42 | GPT-5.4 Nano | 82 | $1.25 |
| 43 | Mistral Small 4 | 82 | $0.600 |
| 44 | Seed-2.0-Lite | 82 | $2.00 |
| 45 | Qwen3.5-9B | 82 | $0.150 |
| 46 | Qwen3.5-27B Arena Elo: 1410 | 82 | $1.56 |
| 47 | Qwen3.5-122B-A10B Arena Elo: 1419 | 82 | $2.08 |
| 48 | Qwen3.5 Plus 2026-02-15 | 82 | $1.56 |
| 49 | Kimi K2.5 | 82 | $2.20 |
| 50 | Seed 1.6 | 82 | $2.00 |
Based on our analysis of coding benchmarks, capability matching, and pricing, Grok 4.1 Fast currently ranks #1 for GitHub Copilot. Rankings are updated hourly using real benchmark data.
We score models using a weighted formula: coding benchmarks like SWE-bench and HumanEval (50%), capability match for GitHub Copilot's requirements (25%), pricing affordability (15%), and context window size (10%). Only models with the capabilities GitHub Copilot needs are included.
We currently track 306 AI models compatible with GitHub Copilot. This includes models from OpenAI, Anthropic, Google, DeepSeek, and other providers accessible via API.
Many open-source models are compatible with GitHub Copilot through API providers like OpenRouter, Together AI, and Groq. Check our rankings to see which open-source models perform best.
Rankings refresh hourly. We monitor benchmark scores, pricing changes, and new model releases to keep recommendations current.