Context & Capacity Comparison

Compare context window sizes and output capacities across AI models. Larger context windows allow processing more text, code, or documents in a single request. Bar width uses a logarithmic scale.

Models Tracked

Providers

Largest Context

Context Window

Max Output Tokens

Grok 4.20 Multi-Agent Beta

Gemini 3.1 Flash Lite Preview

Googlecoding

1.0M(66K out)

Gemini 3.1 Pro Preview Custom Tools

Googlecoding

1.0M(66K out)

Gemini 3.1 Pro Preview

Googlecoding

1.0M(66K out)

Gemini 3 Flash Preview

Gemini 2.5 Flash Lite Preview 09-2025

Googlecoding

1.0M(66K out)

Gemini 2.5 Flash Lite

Gemini 2.5 Pro Preview 06-05

Googlecoding

1.0M(66K out)

Gemini 2.5 Pro Preview 05-06

Gemini 2.0 Flash Lite

Qwen3.5 Plus 2026-02-15

Qwen Plus 0728 (thinking)

Context window sizes and output capacities are sourced from provider API data. The blue bar shows the full context window, while the darker bar shows the maximum output token limit. Scale is logarithmic to better visualize the range from thousands to millions of tokens.

Frequently Asked Questions

The context window is the total number of tokens a model can handle in one request, including both input and output. Max output tokens is the maximum length of the model's response alone. For example, a model with a 128K context window and 4K max output can accept up to 124K tokens of input but will only generate up to 4K tokens in its response.

Context window size depends on the model's architecture and training. Newer models like Gemini and Claude support over 1M tokens using advanced attention mechanisms. Older or smaller models may be limited to 4K-32K tokens. Larger contexts require more computational resources, which is why they often correlate with higher pricing.

Match the model to your task requirements. For document analysis or code review, prioritize large context windows. For content generation, prioritize high max output tokens. For chat applications, moderate values for both usually suffice. Also consider that not all models perform equally well at their maximum capacity, so test with your specific workload.