5 AI models that can generate images from text prompts. This includes 5 dedicated image generators and 0 text models with inline image output capabilities.
Purpose-built AI models designed specifically for image generation. These accept text prompts (and sometimes reference images) and produce high-quality visual output.
| # | Model | Score |
|---|---|---|
| 1 | GPT-5 ImageOpenAI | 96 |
| 2 | Nano Banana Pro (Gemini 3 Pro Image Preview)Google | 88 |
| 3 | GPT-5 Image MiniOpenAI | 79 |
| 4 | Nano Banana 2 (Gemini 3.1 Flash Image Preview)Google | 69 |
| 5 | Nano Banana (Gemini 2.5 Flash Image)Google | 63 |
Large language models that can generate images inline within a conversation. These combine text understanding with image creation in a single unified model.
No text models with image output available.
Purpose-built models like DALL-E, Stable Diffusion, and Flux that specialize in converting text descriptions into images. They typically produce higher-quality, more controllable visual output and support features like inpainting, outpainting, and style control. These models are optimized purely for image synthesis.
Large language models like GPT-4o and Gemini that can generate images as part of a text conversation. These unified models understand context, follow complex multi-step instructions, and can create images that are tightly integrated with textual reasoning. They trade some image quality for conversational flexibility.
Choose dedicated image models when you need the highest visual quality, fine-grained control over style and composition, batch generation, or specialized features like image editing, variations, and upscaling. They are ideal for design workflows, marketing assets, and creative production pipelines.
Choose text models with image output when you need images generated as part of a larger conversation, when the image needs to reflect complex contextual understanding, or when you want a single API call that handles both text and image responses. These are great for chatbots, content creation assistants, and agentic workflows.
How image generation models compare on cost. Prices are per million tokens for API access.
| Model | Input/1M | Output/1M |
|---|---|---|
| GPT-5 Image Mini | $2.50 | $2.00 |
| Nano Banana (Gemini 2.5 Flash Image) | $0.30 | $2.50 |
| Nano Banana 2 (Gemini 3.1 Flash Image Preview) | $0.50 | $3.00 |
| GPT-5 Image | $10.00 | $10.00 |
| Nano Banana Pro (Gemini 3 Pro Image Preview) | $2.00 | $12.00 |
Explore multimodal capabilities, vision models, and head-to-head model comparisons.