AI Model Speed Comparison

Compare AI model speeds across 300 models. See latency, tokens per second, and streaming capabilities to find the fastest models for your real-time applications.

300

Models Benchmarked

300

Streaming

300

TPS Data

Free

Speed Rankings - All Models

#	Model	Provider	Score	Latency	TPS	$/1M Out
1	Olmo 3.1 32B InstructAllen AI	Allen AI	65	124ms	339	$0.60
2	ERNIE 4.5 21B A3BBaidu	Baidu	65	124ms	339	$0.28
3	Gemma 3n 2B (free)Google	Google	58	124ms	339	Free
4	Pixtral 12BMistral AI	Mistral AI	38	124ms	339	$0.10
5	DeepSeek V3.1DeepSeek	DeepSeek	74	148ms	336	$0.75
6	Gemma 3n 4BGoogle	Google	46	148ms	336	$0.04
7	SabaMistral AI	Mistral AI	53	148ms	336	$0.60
8	SWE-1.5Windsurf	Windsurf	49	148ms	336	Free
9	LFM2.5-1.2B-Thinking (free)Liquid AI	Liquid AI	59	172ms	332	Free
10	Llama 3.2 3B InstructMeta	Meta	36	172ms	332	$0.34
11	Rnj 1 Instructessentialai	essentialai	65	196ms	329	$0.15
12	LFM2-2.6BLiquid AI	Liquid AI	53	196ms	329	$0.02
13	Qwen2.5 7B InstructAlibaba	Alibaba	43	196ms	329	$0.10
14	Qwen3 8BAlibaba	Alibaba	65	220ms	326	$0.40
15	Llama 3.3 70B Instruct (free)Meta	Meta	44	220ms	326	Free
16	Llama 3.2 1B InstructMeta	Meta	32	220ms	326	$0.20
17	Qwen2.5 72B InstructAlibaba	Alibaba	52	244ms	323	$0.39
18	GPT-3.5 Turbo (older v0613)OpenAI	OpenAI	38	244ms	323	$2.00
19	Qwen2.5 VL 72B InstructAlibaba	Alibaba	60	268ms	320	$0.80
20	Llama 3.1 8B InstructMeta	Meta	42	268ms	320	$0.05
21	Mixtral 8x7B InstructMistral AI	Mistral AI	42	268ms	320	$0.54
22	DeepSeek V3.2DeepSeek	DeepSeek	74	304ms	315	$0.38
23	Kimi K2 ThinkingMoonshot AI	Moonshot AI	73	304ms	315	$2.00
24	Qwen3 VL 30B A3B ThinkingAlibaba	Alibaba	85	304ms	315	$1.56
25	Qwen3 VL 235B A22B ThinkingAlibaba	Alibaba	77	304ms	315	$2.60
26	Tongyi DeepResearch 30B A3BAlibaba	Alibaba	82	304ms	315	$0.45
27	R1 0528DeepSeek	DeepSeek	78	304ms	315	$2.15
28	Gemma 3 27BGoogle	Google	64	304ms	315	$0.16
29	Qwen3 30B A3BAlibaba	Alibaba	71	316ms	313	$0.28
30	Cogito v2.1 671Bdeepcogito	deepcogito	67	328ms	312	$1.25
31	Nemotron Nano 12B 2 VL (free)NVIDIA	NVIDIA	82	328ms	312	Free
32	Olmo 2 32B InstructAllen AI	Allen AI	45	328ms	312	$0.20
33	Nova Micro 1.0Amazon	Amazon	51	328ms	312	$0.14
34	Llama 3.2 3B Instruct (free)Meta	Meta	35	328ms	312	Free
35	Qwen2.5 Coder 32B InstructAlibaba	Alibaba	42	340ms	310	$1.00
36	WizardLM-2 8x22BMicrosoft	Microsoft	32	340ms	310	$0.62
37	GPT-3.5 TurboOpenAI	OpenAI	40	340ms	310	$1.50
38	Trinity Large Preview (free)arcee-ai	arcee-ai	73	352ms	308	Free
39	Solar Pro 3Upstage	Upstage	73	352ms	308	$0.60
40	UI-TARS 7B ByteDance	ByteDance	63	352ms	308	$0.20

Understanding AI Model Speed

Latency (Time to First Token)

How quickly the model starts responding. Critical for chatbots and interactive applications. Sub-500ms latency feels instantaneous to users.

Tokens Per Second (TPS)

Generation speed after the first token. Higher TPS means faster completion of long responses. Premium models typically range from 30-100+ TPS.

Streaming for Perceived Speed

Streaming shows tokens as they generate, making the model feel much faster. Even a slow model (20 TPS) feels responsive when streaming compared to waiting for a complete response.

Speed vs Quality Trade-off

Smaller models are generally faster. Budget models often have the best TPS. Reasoning models are slower but more accurate - choose based on your latency requirements.

AI Model Speed Comparison

Speed Rankings - All Models

Understanding AI Model Speed

Latency (Time to First Token)

Tokens Per Second (TPS)

Streaming for Perceived Speed

Speed vs Quality Trade-off

相关页面

AI Model Speed Comparison

Speed Rankings - All Models

Understanding AI Model Speed

Latency (Time to First Token)

Tokens Per Second (TPS)

Streaming for Perceived Speed

Speed vs Quality Trade-off

相关页面