Small Language Models

152 lightweight AI models under $1/1M tokens. Small language models (SLMs) are optimized for speed, low cost, and edge deployment — ideal for mobile apps, IoT, chatbots, and high-volume production workloads.

152

Under $1/1M

Free

123

Open Source

103

+ Tool Use

Small Language Models — Ranked by Efficiency

#	Model	Provider	Score	$/1M Out	Context
1	Qwen3 VL 30B A3B ThinkingAlibaba	Alibaba	69	Free	131K
2	Qwen3 VL 235B A22B ThinkingAlibaba	Alibaba	69	Free	131K
3	Nemotron Nano 12B 2 VL (free)NVIDIA	NVIDIA	64	Free	128K
4	Gemini 2.5 Flash Lite Preview 09-2025Google	Google	65	$0.40	1.0M
5	GPT-5 NanoOpenAI	OpenAI	64	$0.40	400K
6	Gemini 2.5 Flash LiteGoogle	Google	64	$0.40	1.0M
7	Grok 4.1 FastxAI	xAI	64	$0.50	2M
8	Grok 4 FastxAI	xAI	64	$0.50	2M
9	Step 3.5 Flash (free)StepFun	StepFun	58	Free	256K
10	Qwen3.5-FlashAlibaba	Alibaba	62	$0.40	1M
11	Seed-2.0-MiniByteDance	ByteDance	61	$0.40	262K
12	Seed 1.6 FlashByteDance	ByteDance	60	$0.30	262K
13	Qwen3 235B A22B Thinking 2507Alibaba	Alibaba	57	Free	131K
14	gpt-oss-120b (free)OpenAI	OpenAI	56	Free	131K
15	gpt-oss-20b (free)OpenAI	OpenAI	56	Free	131K
16	Gemma 3 27B (free)Google	Google	56	Free	131K
17	Trinity Large Preview (free)arcee-ai	arcee-ai	54	Free	131K
18	Trinity Mini (free)arcee-ai	arcee-ai	54	Free	131K
19	Nemotron Nano 9B V2 (free)NVIDIA	NVIDIA	54	Free	128K
20	Qwen3 Coder 480B A35B (free)Alibaba	Alibaba	54	Free	262K
21	GPT-4.1 NanoOpenAI	OpenAI	58	$0.40	1.0M
22	Trinity Miniarcee-ai	arcee-ai	53	$0.15	131K
23	Gemini 2.0 Flash LiteGoogle	Google	55	$0.30	1.0M
24	Nemotron 3 Nano 30B A3B (free)NVIDIA	NVIDIA	51	Free	256K
25	Qwen3 Next 80B A3B Instruct (free)Alibaba	Alibaba	51	Free	262K
26	Mistral Small 3.1 24B (free)Mistral AI	Mistral AI	51	Free	128K
27	Mistral Small 3.2 24BMistral AI	Mistral AI	53	$0.18	131K
28	MiMo-V2-FlashXiaomi	Xiaomi	54	$0.29	262K
29	Gemma 3 4B (free)Google	Google	51	Free	33K
30	Gemini 2.0 FlashGoogle	Google	54	$0.40	1.0M

When to Use Small Language Models

High-Volume Applications

Processing millions of requests per day? SLMs cost 10-100x less than premium models. A chatbot handling 1M messages/month costs ~$100 with budget models vs $10,000+ with premium ones.

Edge & Mobile Deployment

Open-source SLMs can run on consumer hardware — laptops, phones, or edge devices. Models like Phi, Gemma, and small Llama variants fit in 4-8GB of RAM.

Low-Latency Requirements

Smaller models respond faster. For real-time applications like autocomplete, classification, or chat, SLMs deliver sub-100ms responses.

Task-Specific Workloads

Many tasks — classification, extraction, summarization, translation — don't need the largest models. A well-chosen SLM can match premium model quality on focused tasks.

Cheapest Models Fastest Models Free Models Open Source Self-Hosted Full Leaderboard

Small Language Models

Small Language Models — Ranked by Efficiency

When to Use Small Language Models

High-Volume Applications

Edge & Mobile Deployment

Low-Latency Requirements

Task-Specific Workloads

Related Pages

Small Language Models

Small Language Models — Ranked by Efficiency

When to Use Small Language Models

High-Volume Applications

Edge & Mobile Deployment

Low-Latency Requirements

Task-Specific Workloads

Related Pages