AI Model Speed Comparison

Compare AI model speeds across 293 models. See latency, tokens per second, and streaming capabilities to find the fastest models for your real-time applications.

293

Models Benchmarked

293

Streaming

TPS Data

Free

Speed Rankings — All Models

#	Model	Provider	Score	Latency	TPS	$/1M Out
1	GPT-5.4 ProOpenAI	OpenAI	91	—	—	$180.00
2	GPT-5.4OpenAI	OpenAI	70	—	—	$15.00
3	Mercury 2Inception	Inception	53	—	—	$0.75
4	GPT-5.3 ChatOpenAI	OpenAI	62	—	—	$14.00
5	Gemini 3.1 Flash Lite PreviewGoogle	Google	66	—	—	$1.50
6	Seed-2.0-MiniByteDance	ByteDance	61	—	—	$0.40
7	Qwen3.5-35B-A3BAlibaba	Alibaba	61	—	—	$1.30
8	Qwen3.5-27BAlibaba	Alibaba	61	—	—	$1.56
9	Qwen3.5-122B-A10BAlibaba	Alibaba	61	—	—	$2.08
10	Qwen3.5-FlashAlibaba	Alibaba	62	—	—	$0.40
11	LFM2-24B-A2BLiquid AI	Liquid AI	35	—	—	$0.12
12	Gemini 3.1 Pro Preview Custom ToolsGoogle	Google	68	—	—	$12.00
13	GPT-5.3-CodexOpenAI	OpenAI	67	—	—	$14.00
14	Aion-2.0aion-labs	aion-labs	46	—	—	$1.60
15	Gemini 3.1 Pro PreviewGoogle	Google	68	—	—	$12.00
16	Claude Sonnet 4.6Anthropic	Anthropic	68	—	—	$15.00
17	Qwen3.5 Plus 2026-02-15Alibaba	Alibaba	62	—	—	$1.56
18	Qwen3.5 397B A17BAlibaba	Alibaba	61	—	—	$2.34
19	MiniMax M2.5MiniMax	MiniMax	54	—	—	$1.20
20	Qwen3 Max ThinkingAlibaba	Alibaba	54	—	—	$3.90
21	Claude Opus 4.6Anthropic	Anthropic	71	—	—	$25.00
22	Qwen3 Coder NextAlibaba	Alibaba	50	—	—	$0.75
23	Step 3.5 Flash (free)StepFun	StepFun	58	—	—	Free
24	Step 3.5 FlashStepFun	StepFun	51	—	—	$0.30
25	Trinity Large Preview (free)arcee-ai	arcee-ai	54	—	—	Free
26	Kimi K2.5Moonshot AI	Moonshot AI	59	—	—	$2.20
27	Solar Pro 3Upstage	Upstage	47	—	—	$0.60
28	MiniMax M2-herMiniMax	MiniMax	39	—	—	$1.20
29	Palmyra X5Writer	Writer	44	—	—	$6.00
30	LFM2.5-1.2B-Thinking (free)Liquid AI	Liquid AI	46	—	—	Free
31	LFM2.5-1.2B-Instruct (free)Liquid AI	Liquid AI	42	—	—	Free
32	GPT AudioOpenAI	OpenAI	51	—	—	$10.00
33	GPT Audio MiniOpenAI	OpenAI	49	—	—	$2.40
34	GPT-5.2-CodexOpenAI	OpenAI	67	—	—	$14.00
35	Molmo2 8BAllen AI	Allen AI	47	—	—	$0.20
36	Olmo 3.1 32B InstructAllen AI	Allen AI	43	—	—	$0.60
37	Seed 1.6 FlashByteDance	ByteDance	60	—	—	$0.30
38	Seed 1.6ByteDance	ByteDance	60	—	—	$2.00
39	MiniMax M2.1MiniMax	MiniMax	48	—	—	$0.95
40	Gemini 3 Flash PreviewGoogle	Google	66	—	—	$3.00

Understanding AI Model Speed

Latency (Time to First Token)

How quickly the model starts responding. Critical for chatbots and interactive applications. Sub-500ms latency feels instantaneous to users.

Tokens Per Second (TPS)

Generation speed after the first token. Higher TPS means faster completion of long responses. Premium models typically range from 30-100+ TPS.

Streaming for Perceived Speed

Streaming shows tokens as they generate, making the model feel much faster. Even a slow model (20 TPS) feels responsive when streaming compared to waiting for a complete response.

Speed vs Quality Trade-off

Smaller models are generally faster. Budget models often have the best TPS. Reasoning models are slower but more accurate — choose based on your latency requirements.

Fastest Models Streaming Models Speed Test Tool Cheapest Models Full Leaderboard

Model

Score

GPT-5.4 ProOpenAI

GPT-5.4OpenAI

Mercury 2Inception

GPT-5.3 ChatOpenAI

Gemini 3.1 Flash Lite PreviewGoogle

Seed-2.0-MiniByteDance

Qwen3.5-35B-A3BAlibaba

Qwen3.5-27BAlibaba

Qwen3.5-122B-A10BAlibaba

Qwen3.5-FlashAlibaba

LFM2-24B-A2BLiquid AI

Gemini 3.1 Pro Preview Custom ToolsGoogle

GPT-5.3-CodexOpenAI

Aion-2.0aion-labs

Gemini 3.1 Pro PreviewGoogle

Claude Sonnet 4.6Anthropic

Qwen3.5 Plus 2026-02-15Alibaba

Qwen3.5 397B A17BAlibaba

MiniMax M2.5MiniMax

Qwen3 Max ThinkingAlibaba

Claude Opus 4.6Anthropic

Qwen3 Coder NextAlibaba

Step 3.5 Flash (free)StepFun

Step 3.5 FlashStepFun

Trinity Large Preview (free)arcee-ai

Kimi K2.5Moonshot AI

Solar Pro 3Upstage

MiniMax M2-herMiniMax

Palmyra X5Writer

LFM2.5-1.2B-Thinking (free)Liquid AI

LFM2.5-1.2B-Instruct (free)Liquid AI

GPT AudioOpenAI

GPT Audio MiniOpenAI

GPT-5.2-CodexOpenAI

Molmo2 8BAllen AI

Olmo 3.1 32B InstructAllen AI

Seed 1.6 FlashByteDance

Seed 1.6ByteDance

MiniMax M2.1MiniMax

Gemini 3 Flash PreviewGoogle

Understanding AI Model Speed

Latency (Time to First Token)

How quickly the model starts responding. Critical for chatbots and interactive applications. Sub-500ms latency feels instantaneous to users.

Tokens Per Second (TPS)

Generation speed after the first token. Higher TPS means faster completion of long responses. Premium models typically range from 30-100+ TPS.

Streaming for Perceived Speed

Streaming shows tokens as they generate, making the model feel much faster. Even a slow model (20 TPS) feels responsive when streaming compared to waiting for a complete response.

Speed vs Quality Trade-off

Smaller models are generally faster. Budget models often have the best TPS. Reasoning models are slower but more accurate — choose based on your latency requirements.

AI Model Speed Comparison

Speed Rankings — All Models

Understanding AI Model Speed

Latency (Time to First Token)

Tokens Per Second (TPS)

Streaming for Perceived Speed

Speed vs Quality Trade-off

Related Pages

AI Model Speed Comparison

Speed Rankings — All Models

Understanding AI Model Speed

Latency (Time to First Token)

Tokens Per Second (TPS)

Streaming for Perceived Speed

Speed vs Quality Trade-off

Related Pages