Compare context window sizes and output capacities across AI models. Larger context windows allow processing more text, code, or documents in a single request. Bar width uses a logarithmic scale.
Context window sizes and output capacities are sourced from provider API data. The blue bar shows the full context window, while the darker bar shows the maximum output token limit. Scale is logarithmic to better visualize the range from thousands to millions of tokens.
The context window is the total number of tokens a model can handle in one request, including both input and output. Max output tokens is the maximum length of the model's response alone. For example, a model with a 128K context window and 4K max output can accept up to 124K tokens of input but will only generate up to 4K tokens in its response.
Context window size depends on the model's architecture and training. Newer models like Gemini and Claude support over 1M tokens using advanced attention mechanisms. Older or smaller models may be limited to 4K-32K tokens. Larger contexts require more computational resources, which is why they often correlate with higher pricing.
Match the model to your task requirements. For document analysis or code review, prioritize large context windows. For content generation, prioritize high max output tokens. For chat applications, moderate values for both usually suffice. Also consider that not all models perform equally well at their maximum capacity, so test with your specific workload.