The complete Meta Llama model lineup: 15 open-source models spanning from compact 1B-parameter variants to the flagship Llama 4 Maverick. Meta's Llama family is the most widely adopted open-source LLM ecosystem, offering free weights for self-hosting, fine-tuning, and commercial use. Scores updated hourly from live API data.
15 models from Meta, sorted by composite score
API pricing across major providers. Self-hosted Llama models are free to run on your own hardware.
| Model | Input $/1M | Output $/1M |
|---|---|---|
| Llama 3.3 70B Instruct (free) | Free | Free |
| Llama 3.2 3B Instruct (free) | Free | Free |
| Llama 3 8B Instruct | $0.030 | $0.040 |
| Llama 3.2 11B Vision Instruct | $0.049 | $0.049 |
| Llama 3.1 8B Instruct | $0.020 | $0.050 |
| Llama Guard 3 8B | $0.020 | $0.060 |
| Llama Guard 4 12B | $0.180 | $0.180 |
| Llama 3.2 1B Instruct | $0.027 | $0.200 |
| Llama 4 Scout | $0.080 | $0.300 |
| Llama 3.3 70B Instruct | $0.100 | $0.320 |
| Llama 3.2 3B Instruct | $0.051 | $0.340 |
| Llama 3.1 70B Instruct | $0.400 | $0.400 |
| Llama 4 Maverick | $0.150 | $0.600 |
| Llama 3 70B Instruct | $0.510 | $0.740 |
| Llama 3.1 405B (base) | $4.00 | $4.00 |
Meta releases Llama model weights under permissive open licenses, making them free to download, modify, and deploy commercially. This open approach has made Llama the most widely adopted open-source LLM family, powering thousands of applications, research projects, and fine-tuned variants. When accessed via API providers, per-token pricing applies to cover inference infrastructure costs.
The Llama family spans a wide range of parameter counts to fit different hardware and performance needs. Smaller variants (1B, 3B, 8B) run efficiently on consumer GPUs and edge devices. Mid-range models (70B) offer strong general-purpose performance on server hardware. The largest models like Llama 4 Maverick push the frontier of open-source quality, competing with proprietary models on reasoning and coding benchmarks.
Llama models can be self-hosted using tools like Ollama (one-command local deployment), vLLM (high-throughput serving), or llama.cpp (CPU and quantized inference). Self-hosting eliminates per-token costs and keeps data on-premises, making Llama a popular choice for privacy-sensitive workloads and enterprise deployments.
Because the weights are open, Llama models are the most popular base for fine-tuning. Techniques like LoRA and QLoRA allow efficient adaptation to specific domains (legal, medical, code) on a single GPU. The ecosystem includes tools like Hugging Face Transformers, Axolotl, and Unsloth for streamlined training. Many top open-source models on the leaderboard are Llama-based fine-tunes.
Explore Llama comparisons, open-source rankings, and pricing across the full model landscape.
Meta Llama is a family of open-source large language models developed by Meta. Llama models can be downloaded, modified, and deployed on your own hardware, making them popular for self-hosted AI and fine-tuning.
Yes, Llama models are open-source and free to download. You can self-host them at no per-token cost. Alternatively, cloud providers like Together AI, Fireworks, and Groq offer hosted Llama inference at low prices.
Llama 3.3 70B offers the best balance of capability and efficiency. For maximum performance, Llama 3.1 405B leads. For edge deployment, Llama 3.2 1B and 3B are optimized for mobile and embedded devices.