Small Language Models

149 lightweight AI models under $1/1M tokens. Small language models (SLMs) are optimized for speed, low cost, and edge deployment - ideal for mobile apps, IoT, chatbots, and high-volume production workloads.

149

Under $1/1M

Free

110

Open Source

102

+ Tool Use

Small Language Models - Ranked by Efficiency

#	Model	Provider	Score	$/1M Out	Context
1	Nemotron 3 Super (free)NVIDIA	NVIDIA	84	Free	262K
2	MiniMax M2.5 (free)MiniMax	MiniMax	83	Free	197K
3	Nemotron Nano 12B 2 VL (free)NVIDIA	NVIDIA	82	Free	128K
4	Seed 1.6 FlashByteDance	ByteDance	85	$0.30	262K
5	Grok 4.1 FastxAI	xAI	87	$0.50	2M
6	Seed-2.0-MiniByteDance	ByteDance	85	$0.40	262K
7	Trinity Miniarcee-ai	arcee-ai	82	$0.15	131K
8	Gemini 2.5 Flash Lite Preview 09-2025Google	Google	84	$0.40	1.0M
9	MiMo-V2-FlashXiaomi	Xiaomi	83	$0.29	262K
10	gpt-oss-safeguard-20bOpenAI	OpenAI	82	$0.30	131K
11	Grok 4 FastxAI	xAI	83	$0.50	2M
12	Step 3.5 Flash (free)StepFun	StepFun	78	Free	256K
13	Qwen3.5-9BAlibaba	Alibaba	79	$0.15	256K
14	Tongyi DeepResearch 30B A3BAlibaba	Alibaba	82	$0.45	131K
15	Gemini 2.5 Flash LiteGoogle	Google	81	$0.40	1.0M
16	Qwen3 30B A3B Thinking 2507Alibaba	Alibaba	81	$0.40	131K
17	Qwen3.5-FlashAlibaba	Alibaba	79	$0.26	1M
18	Qwen3 VL 32B InstructAlibaba	Alibaba	81	$0.42	131K
19	GPT-4.1 NanoOpenAI	OpenAI	81	$0.40	1.0M
20	Qwen3 VL 8B InstructAlibaba	Alibaba	81	$0.50	131K
21	Qwen3 VL 30B A3B InstructAlibaba	Alibaba	81	$0.52	131K
22	Qwen Plus 0728 (thinking)Alibaba	Alibaba	83	$0.78	1M
23	Mercury 2Inception	Inception	81	$0.75	128K
24	gpt-oss-120b (free)OpenAI	OpenAI	74	Free	131K
25	gpt-oss-20b (free)OpenAI	OpenAI	74	Free	131K
26	Mistral Small 4Mistral AI	Mistral AI	79	$0.60	262K
27	DeepSeek V3.2 ExpDeepSeek	DeepSeek	77	$0.41	164K
28	Gemini 2.0 Flash LiteGoogle	Google	76	$0.30	1.0M
29	Trinity Large Preview (free)arcee-ai	arcee-ai	73	Free	131K
30	Trinity Mini (free)arcee-ai	arcee-ai	73	Free	131K

When to Use Small Language Models

High-Volume Applications

Processing millions of requests per day? SLMs cost 10-100x less than premium models. A chatbot handling 1M messages/month costs ~$100 with budget models vs $10,000+ with premium ones.

Edge & Mobile Deployment

Open-source SLMs can run on consumer hardware - laptops, phones, or edge devices. Models like Phi, Gemma, and small Llama variants fit in 4-8GB of RAM.

Low-Latency Requirements

Smaller models respond faster. For real-time applications like autocomplete, classification, or chat, SLMs deliver sub-100ms responses.

Task-Specific Workloads

Many tasks - classification, extraction, summarization, translation - don't need the largest models. A well-chosen SLM can match premium model quality on focused tasks.

Small Language Models

Small Language Models - Ranked by Efficiency

When to Use Small Language Models

High-Volume Applications

Edge & Mobile Deployment

Low-Latency Requirements

Task-Specific Workloads

相关页面

Small Language Models

Small Language Models - Ranked by Efficiency

When to Use Small Language Models

High-Volume Applications

Edge & Mobile Deployment

Low-Latency Requirements

Task-Specific Workloads

相关页面