Compare AI model speeds across 293 models. See latency, tokens per second, and streaming capabilities to find the fastest models for your real-time applications.
How quickly the model starts responding. Critical for chatbots and interactive applications. Sub-500ms latency feels instantaneous to users.
Generation speed after the first token. Higher TPS means faster completion of long responses. Premium models typically range from 30-100+ TPS.
Streaming shows tokens as they generate, making the model feel much faster. Even a slow model (20 TPS) feels responsive when streaming compared to waiting for a complete response.
Smaller models are generally faster. Budget models often have the best TPS. Reasoning models are slower but more accurate — choose based on your latency requirements.