152 lightweight AI models under $1/1M tokens. Small language models (SLMs) are optimized for speed, low cost, and edge deployment — ideal for mobile apps, IoT, chatbots, and high-volume production workloads.
| # | Model | Score |
|---|---|---|
| 1 | Qwen3 VL 30B A3B ThinkingAlibaba | 69 |
| 2 | Qwen3 VL 235B A22B ThinkingAlibaba | 69 |
| 3 | Nemotron Nano 12B 2 VL (free)NVIDIA | 64 |
| 4 | Gemini 2.5 Flash Lite Preview 09-2025Google | 65 |
| 5 | GPT-5 NanoOpenAI | 64 |
| 6 | Gemini 2.5 Flash LiteGoogle | 64 |
| 7 | Grok 4.1 FastxAI | 64 |
| 8 | Grok 4 FastxAI | 64 |
| 9 | Step 3.5 Flash (free)StepFun | 58 |
| 10 | Qwen3.5-FlashAlibaba | 62 |
| 11 | Seed-2.0-MiniByteDance | 61 |
| 12 | Seed 1.6 FlashByteDance | 60 |
| 13 | Qwen3 235B A22B Thinking 2507Alibaba | 57 |
| 14 | gpt-oss-120b (free)OpenAI | 56 |
| 15 | gpt-oss-20b (free)OpenAI | 56 |
| 16 | Gemma 3 27B (free)Google | 56 |
| 17 | Trinity Large Preview (free)arcee-ai | 54 |
| 18 | Trinity Mini (free)arcee-ai | 54 |
| 19 | Nemotron Nano 9B V2 (free)NVIDIA | 54 |
| 20 | Qwen3 Coder 480B A35B (free)Alibaba | 54 |
| 21 | GPT-4.1 NanoOpenAI | 58 |
| 22 | Trinity Miniarcee-ai | 53 |
| 23 | Gemini 2.0 Flash LiteGoogle | 55 |
| 24 | Nemotron 3 Nano 30B A3B (free)NVIDIA | 51 |
| 25 | Qwen3 Next 80B A3B Instruct (free)Alibaba | 51 |
| 26 | Mistral Small 3.1 24B (free)Mistral AI | 51 |
| 27 | Mistral Small 3.2 24BMistral AI | 53 |
| 28 | MiMo-V2-FlashXiaomi | 54 |
| 29 | Gemma 3 4B (free)Google | 51 |
| 30 | Gemini 2.0 FlashGoogle | 54 |
Processing millions of requests per day? SLMs cost 10-100x less than premium models. A chatbot handling 1M messages/month costs ~$100 with budget models vs $10,000+ with premium ones.
Open-source SLMs can run on consumer hardware — laptops, phones, or edge devices. Models like Phi, Gemma, and small Llama variants fit in 4-8GB of RAM.
Smaller models respond faster. For real-time applications like autocomplete, classification, or chat, SLMs deliver sub-100ms responses.
Many tasks — classification, extraction, summarization, translation — don't need the largest models. A well-chosen SLM can match premium model quality on focused tasks.