The complete Meta Llama model lineup: 17 open-source models spanning from compact 1B-parameter variants to the flagship Llama 4 Maverick. Meta's Llama family is the most widely adopted open-source LLM ecosystem, offering free weights for self-hosting, fine-tuning, and commercial use. Scores updated hourly from live API data.
17 models from Meta, sorted by composite score
API pricing via OpenRouter. Self-hosted Llama models are free to run on your own hardware.
| Model | Input $/1M | Output $/1M |
|---|---|---|
| Llama 3.3 70B Instruct (free) | Free | Free |
| Llama 3.2 3B Instruct (free) | Free | Free |
| Llama 3 8B Instruct | $0.030 | $0.040 |
| Llama 3.2 11B Vision Instruct | $0.049 | $0.049 |
| Llama 3.1 8B Instruct | $0.020 | $0.050 |
| Llama Guard 3 8B | $0.020 | $0.060 |
| Llama Guard 4 12B | $0.180 | $0.180 |
| Llama 3.2 1B Instruct | $0.027 | $0.200 |
| LlamaGuard 2 8B | $0.200 | $0.200 |
| Llama 4 Scout | $0.080 | $0.300 |
| Llama 3.3 70B Instruct | $0.100 | $0.320 |
| Llama 3.2 3B Instruct | $0.051 | $0.340 |
| Llama 3.1 70B Instruct | $0.400 | $0.400 |
| Llama 4 Maverick | $0.150 | $0.600 |
| Llama 3 70B Instruct | $0.510 | $0.740 |
| Llama 3.1 405B Instruct | $4.00 | $4.00 |
| Llama 3.1 405B (base) | $4.00 | $4.00 |
Meta releases Llama model weights under permissive open licenses, making them free to download, modify, and deploy commercially. This open approach has made Llama the most widely adopted open-source LLM family, powering thousands of applications, research projects, and fine-tuned variants. When accessed via API providers like OpenRouter, per-token pricing applies to cover inference infrastructure costs.
The Llama family spans a wide range of parameter counts to fit different hardware and performance needs. Smaller variants (1B, 3B, 8B) run efficiently on consumer GPUs and edge devices. Mid-range models (70B) offer strong general-purpose performance on server hardware. The largest models like Llama 4 Maverick push the frontier of open-source quality, competing with proprietary models on reasoning and coding benchmarks.
Llama models can be self-hosted using tools like Ollama (one-command local deployment), vLLM (high-throughput serving), or llama.cpp (CPU and quantized inference). Self-hosting eliminates per-token costs and keeps data on-premises, making Llama a popular choice for privacy-sensitive workloads and enterprise deployments.
Because the weights are open, Llama models are the most popular base for fine-tuning. Techniques like LoRA and QLoRA allow efficient adaptation to specific domains (legal, medical, code) on a single GPU. The ecosystem includes tools like Hugging Face Transformers, Axolotl, and Unsloth for streamlined training. Many top open-source models on the leaderboard are Llama-based fine-tunes.
Explore Llama comparisons, open-source rankings, and pricing across the full model landscape.