149 lightweight AI models under $1/1M tokens. Small language models (SLMs) are optimized for speed, low cost, and edge deployment - ideal for mobile apps, IoT, chatbots, and high-volume production workloads.
| # | Model | Score |
|---|---|---|
| 1 | Nemotron 3 Super (free)NVIDIA | 84 |
| 2 | MiniMax M2.5 (free)MiniMax | 83 |
| 3 | Nemotron Nano 12B 2 VL (free)NVIDIA | 82 |
| 4 | Seed 1.6 FlashByteDance | 85 |
| 5 | Grok 4.1 FastxAI | 87 |
| 6 | Seed-2.0-MiniByteDance | 85 |
| 7 | Trinity Miniarcee-ai | 82 |
| 8 | Gemini 2.5 Flash Lite Preview 09-2025Google | 84 |
| 9 | MiMo-V2-FlashXiaomi | 83 |
| 10 | gpt-oss-safeguard-20bOpenAI | 82 |
| 11 | Grok 4 FastxAI | 83 |
| 12 | Step 3.5 Flash (free)StepFun | 78 |
| 13 | Qwen3.5-9BAlibaba | 79 |
| 14 | Tongyi DeepResearch 30B A3BAlibaba | 82 |
| 15 | Gemini 2.5 Flash LiteGoogle | 81 |
| 16 | Qwen3 30B A3B Thinking 2507Alibaba | 81 |
| 17 | Qwen3.5-FlashAlibaba | 79 |
| 18 | Qwen3 VL 32B InstructAlibaba | 81 |
| 19 | GPT-4.1 NanoOpenAI | 81 |
| 20 | Qwen3 VL 8B InstructAlibaba | 81 |
| 21 | Qwen3 VL 30B A3B InstructAlibaba | 81 |
| 22 | Qwen Plus 0728 (thinking)Alibaba | 83 |
| 23 | Mercury 2Inception | 81 |
| 24 | gpt-oss-120b (free)OpenAI | 74 |
| 25 | gpt-oss-20b (free)OpenAI | 74 |
| 26 | Mistral Small 4Mistral AI | 79 |
| 27 | DeepSeek V3.2 ExpDeepSeek | 77 |
| 28 | Gemini 2.0 Flash LiteGoogle | 76 |
| 29 | Trinity Large Preview (free)arcee-ai | 73 |
| 30 | Trinity Mini (free)arcee-ai | 73 |
Processing millions of requests per day? SLMs cost 10-100x less than premium models. A chatbot handling 1M messages/month costs ~$100 with budget models vs $10,000+ with premium ones.
Open-source SLMs can run on consumer hardware - laptops, phones, or edge devices. Models like Phi, Gemma, and small Llama variants fit in 4-8GB of RAM.
Smaller models respond faster. For real-time applications like autocomplete, classification, or chat, SLMs deliver sub-100ms responses.
Many tasks - classification, extraction, summarization, translation - don't need the largest models. A well-chosen SLM can match premium model quality on focused tasks.
Small language models are AI models with fewer parameters, typically under 10 billion. They run faster, cost less, and can operate on edge devices while still handling many common tasks like text generation, summarization, and simple coding assistance.
Use SLMs when you need low latency, low cost, or on-device deployment. Use full LLMs when you need complex reasoning, creative writing, or state-of-the-art accuracy. SLMs are ideal for chatbots, simple Q&A, and high-volume applications where cost matters more than peak performance.
SLMs trade some capability for speed and efficiency. Modern SLMs like Phi-3 and Gemma 2 can match older large models on many benchmarks. For specialized tasks, a fine-tuned SLM can outperform a general-purpose LLM while being 10-100x cheaper to run.