Compare AI model speeds across 300 models. See latency, tokens per second, and streaming capabilities to find the fastest models for your real-time applications.
How quickly the model starts responding. Critical for chatbots and interactive applications. Sub-500ms latency feels instantaneous to users.
Generation speed after the first token. Higher TPS means faster completion of long responses. Premium models typically range from 30-100+ TPS.
Streaming shows tokens as they generate, making the model feel much faster. Even a slow model (20 TPS) feels responsive when streaming compared to waiting for a complete response.
Smaller models are generally faster. Budget models often have the best TPS. Reasoning models are slower but more accurate - choose based on your latency requirements.
We measure speed using three metrics: tokens per second (generation throughput), time to first token (initial latency), and overall speed index (a weighted composite). These measurements come from real API calls via OpenRouter and are refreshed regularly.
Latency directly impacts user experience. For real-time applications like chatbots and code completion, low time-to-first-token is critical. For batch processing, tokens-per-second throughput matters more. The right speed metric depends on your use case.
Speed varies by provider infrastructure and model size. Smaller models generally respond faster but may sacrifice quality. Our speed comparison tool shows real-time measurements across all models, helping you find the optimal speed-quality tradeoff.