Best Local LLM for Coding

Open-source coding models you can run on your own hardware. No API keys, no per-token costs, no data leaving your machine. Ranked by our composite scoring system across 177 locally-runnable coding models, updated hourly.

#1 Overall

Gemini 3.1 Pro Preview Custom Tools

Google

Best Lightweight

Qwen3 235B A22B Thinking 2507

Alibaba

177

Local Models

Free / Open

121

+ Tool Use

+ Vision

+ Reasoning

Top 20 Local Coding LLMs — Ranked by Score

#	Model	Provider	Score	$/1M Out	Context
1	Gemini 3.1 Pro Preview Custom ToolsGoogle	Google	89	$12.00	1.0M
2	Qwen3.5-FlashAlibaba	Alibaba	89	$0.40	1M
3	Nemotron 3 Super (free)NVIDIA	NVIDIA	88	Free	262K
4	Seed-2.0-LiteByteDance	ByteDance	88	$2.00	262K
5	Qwen3.5-35B-A3BAlibaba	Alibaba	87	$1.30	262K
6	Qwen3.5-27BAlibaba	Alibaba	87	$1.56	262K
7	Qwen3.5-122B-A10BAlibaba	Alibaba	87	$2.08	262K
8	Qwen3.5 397B A17BAlibaba	Alibaba	87	$2.34	262K
9	Kimi K2.5Moonshot AI	Moonshot AI	87	$2.20	262K
10	Qwen3 VL 8B ThinkingAlibaba	Alibaba	85	$1.36	131K
11	Qwen3 VL 30B A3B ThinkingAlibaba	Alibaba	85	$1.56	131K
12	Qwen3 VL 235B A22B ThinkingAlibaba	Alibaba	85	$2.60	131K
13	MiniMax M2.5 (free)MiniMax	MiniMax	80	Free	197K
14	MiniMax M2.5MiniMax	MiniMax	80	$1.20	197K
15	MiniMax M2MiniMax	MiniMax	80	$1.00	197K
16	MiMo-V2-FlashXiaomi	Xiaomi	79	$0.29	262K
17	Trinity Miniarcee-ai	arcee-ai	79	$0.15	131K
18	Nemotron Nano 12B 2 VL (free)NVIDIA	NVIDIA	79	Free	128K
19	Tongyi DeepResearch 30B A3BAlibaba	Alibaba	79	$0.45	131K
20	Qwen3 235B A22B Thinking 2507Alibaba	Alibaba	79	$0.60	262K

Running Coding LLMs Locally

Why Run Local?

Local LLMs keep your code on your machine — no data sent to external servers. You get zero per-token cost after setup, full offline capability, and no rate limits. Ideal for proprietary codebases, air-gapped environments, or developers who want complete control over their AI toolchain.

Hardware Requirements

A 7B model at Q4 quantization runs comfortably with 6 GB VRAM (RTX 3060). For 13B-34B models, aim for 12-24 GB VRAM (RTX 4090, A5000). Larger 70B+ models need 48+ GB across multiple GPUs or specialized hardware. CPU-only inference works but is 5-10x slower.

Quantization Formats

Quantization shrinks models to fit consumer hardware. GGUF (llama.cpp format) is the most popular for local use, offering Q4, Q5, and Q8 variants. GPTQ and AWQ are GPU-optimized alternatives. Lower quantization (Q4) trades minimal quality for much smaller memory footprint — often the sweet spot for coding tasks.

Getting Started

Ollama is the fastest way — install it, pull a model, and go. llama.cpp gives maximum control and best CPU performance. vLLM is ideal for GPU serving with high throughput. For IDE integration, Continue.dev connects to any local endpoint. All tools are free and open source.

Frequently Asked Questions

Based on our composite scoring that evaluates benchmarks, code quality, and real-world performance, Gemini 3.1 Pro Preview Custom Tools currently leads our local coding LLM rankings with a score of 89. Other top local models include Qwen3.5-Flash, Nemotron 3 Super (free), and Seed-2.0-Lite. All of these can be downloaded and run on your own hardware using tools like Ollama, llama.cpp, or vLLM.

It depends on the model size and quantization. A 7B parameter model at Q4 quantization needs roughly 4-6 GB of VRAM, making it runnable on most modern GPUs. A 13B model needs 8-10 GB, and 34B+ models typically require 16-24 GB or more. CPU-only inference is possible with llama.cpp but significantly slower. For the best experience, an NVIDIA RTX 3060 (12 GB) or RTX 4090 (24 GB) is recommended.

Ollama is the easiest way to get started — it handles model downloading, quantization, and serving with a single command. Just install Ollama, run "ollama pull codellama" (or any supported model), and start chatting. For IDE integration, extensions like Continue.dev can connect to your local Ollama instance. More advanced users can use llama.cpp for maximum performance tuning or vLLM for high-throughput serving.

The gap has narrowed significantly. Top open-source coding models like DeepSeek Coder V2, CodeLlama 70B, and Qwen2.5-Coder perform competitively on benchmarks like HumanEval and SWE-bench. For many everyday coding tasks — autocompletion, refactoring, writing tests, explaining code — local models are excellent. Cloud models still tend to have an edge on very complex multi-step reasoning and large-codebase understanding, but local models offer unbeatable privacy and zero ongoing cost.

Explore more model rankings, compare specific models head-to-head, or filter by capabilities on the full leaderboard.

Self-Hosted Models Best for Coding Open Source Models Cheapest Models LLM Leaderboard