Complete pricing breakdown for OpenAI's Whisper speech-to-text API. Compare costs per minute of audio for Whisper Large V3, gpt-4o-transcribe, and gpt-4o-mini-transcribe. Includes a cost calculator and comparison with Google Speech-to-Text, AssemblyAI, and Deepgram.
Whisper is OpenAI's automatic speech recognition (ASR) system, trained on 680,000 hours of multilingual audio data. It can transcribe speech in over 100 languages and translate non-English audio into English. The model is open-source under the MIT license, meaning you can self-host it for free or use OpenAI's hosted API for pay-per-minute pricing.
OpenAI offers several transcription endpoints: the original Whisper Large V3 and its faster Turbo variant, plus newer GPT-4o-powered transcription models (gpt-4o-transcribe and gpt-4o-mini-transcribe) that provide improved accuracy and cost savings.
| Model | $/min | $/hour |
|---|---|---|
Whisper Large V3 | $0.0060 | $0.36 |
Whisper Large V3 Turbo | $0.0060 | $0.36 |
gpt-4o-transcribe | $0.0060 | $0.36 |
gpt-4o-mini-transcribe | $0.0030 | $0.18 |
| Model | Input $/1M | Output $/1M |
|---|---|---|
| Gemini 3.1 Flash Lite Preview | $0.250 | $1.50 |
| Gemini 3.1 Pro Preview Custom Tools | $2.00 | $12.00 |
| Gemini 3.1 Pro Preview | $2.00 | $12.00 |
| GPT Audio | $2.50 | $10.00 |
| GPT Audio Mini | $0.600 | $2.40 |
| Gemini 3 Flash Preview | $0.500 | $3.00 |
| Gemini 3 Pro Preview | $2.00 | $12.00 |
| Voxtral Small 24B 2507 | $0.100 | $0.300 |
| Gemini 2.5 Flash Lite Preview 09-2025 | $0.100 | $0.400 |
| GPT-4o Audio | $2.50 | $10.00 |
| Gemini 2.5 Flash Lite | $0.100 | $0.400 |
| Gemini 2.5 Flash | $0.300 | $2.50 |
| Gemini 2.5 Pro | $1.25 | $10.00 |
| Gemini 2.5 Pro Preview 06-05 | $1.25 | $10.00 |
| Gemma 3n 4B (free) | Free | Free |
| Gemma 3n 4B | $0.020 | $0.040 |
| Gemini 2.5 Pro Preview 05-06 | $1.25 | $10.00 |
| Gemini 2.0 Flash Lite | $0.075 | $0.300 |
| Gemini 2.0 Flash | $0.100 | $0.400 |
Estimate your monthly Whisper API costs based on hours of audio processed. Prices shown for all available Whisper and transcription models.
| Model | $/min |
|---|---|
| Whisper Large V3 | $0.0060 |
| Whisper Large V3 Turbo | $0.0060 |
| gpt-4o-transcribe | $0.0060 |
| gpt-4o-mini-transcribe | $0.0030 |
Note: Actual costs depend on audio file size and duration. OpenAI's Whisper API accepts files up to 25 MB per request. For longer files, split audio into segments. Batch API usage may offer additional discounts for non-time-sensitive workloads. Try the interactive calculator for custom estimates.
See how OpenAI Whisper compares to other speech-to-text providers. All prices in USD per minute of audio.
| Model | $/min | $/hr |
|---|---|---|
| Whisper Large V3 | $0.0060 | $0.36 |
| Whisper Large V3 Turbo | $0.0060 | $0.36 |
| gpt-4o-transcribe | $0.0060 | $0.36 |
| gpt-4o-mini-transcribe | $0.0030 | $0.18 |
| Provider | $/min | $/hr |
|---|---|---|
Google Speech-to-Text (V2) 60 min/month free | $0.016 | $0.96 |
AssemblyAI 100 hours free trial | $0.0065 | $0.39 |
Deepgram (Nova-2) $200 free credit | $0.0043 | $0.26 |
Unlike text LLMs that charge per token, Whisper charges per minute of audio input. You pay for the duration of the audio file, not the length of the transcript output. Audio shorter than one minute is billed at the per-minute rate. There are no separate input and output charges -- the per-minute price covers the full transcription.
OpenAI offers two transcription architectures: the original Whisper model and newer GPT-4o-powered models. Whisper Large V3 is open-source and well-tested across many languages. The gpt-4o-transcribe model offers better accuracy on noisy audio and complex accents, while gpt-4o-mini-transcribe provides the most affordable option at half the cost.
Since Whisper is open-source, you can run it on your own GPU hardware for free. A single NVIDIA A100 can process audio roughly 5-10x faster than real-time. Self-hosting becomes more cost-effective when processing more than ~50 hours of audio per month, depending on your GPU costs. The API is ideal for lower volumes or when you want zero infrastructure overhead.
Use gpt-4o-mini-transcribe for cost-sensitive workloads -- it's 50% cheaper than standard Whisper at $0.003/min. For batch processing, use the Batch API for potential discounts. Compress audio to reduce file sizes (Whisper works well with 16kHz mono audio). Remove silence from recordings before transcription to avoid paying for dead air.
Full OpenAI pricing for GPT-4o, o3, and all text models.
Compare with Claude Opus 4, Sonnet 4, and all Anthropic models.
Compare with Gemini 2.5 Pro, Flash, and Google Speech-to-Text.
Find the most affordable models across all providers.
OpenAI Whisper API pricing starts at $0.003 per minute of audio with gpt-4o-mini-transcribe, and $0.006 per minute for Whisper Large V3, Whisper Large V3 Turbo, and gpt-4o-transcribe. That works out to $0.18 to $0.36 per hour of audio, making it one of the most affordable speech-to-text APIs available.
Yes, in most cases. Google Speech-to-Text V2 costs $0.016 per minute (standard model), which is roughly 2.5x more expensive than Whisper at $0.006/min. However, Google offers a cheaper data-logging model at $0.006/min and 60 free minutes per month. For high-volume usage, Whisper is typically more cost-effective.
Whisper Large V3 is OpenAI's open-source speech recognition model, available both via API and for self-hosting. gpt-4o-transcribe uses the GPT-4o model architecture for transcription, offering improved accuracy on complex audio, better punctuation, and superior accent handling. Both cost $0.006/min via the API, but gpt-4o-transcribe generally produces higher-quality transcriptions.
The Whisper model itself is open-source (MIT license) and can be run locally for free if you have the hardware. However, using OpenAI's hosted Whisper API costs $0.006 per minute. For the cheapest hosted option, gpt-4o-mini-transcribe is available at $0.003 per minute. Self-hosting Whisper on your own GPU can be more cost-effective for high-volume workloads exceeding ~50 hours/month.
Whisper API pricing ($0.006/min) is competitive with AssemblyAI ($0.0065/min) and slightly more expensive than Deepgram Nova-2 ($0.0043/min). However, gpt-4o-mini-transcribe at $0.003/min is the cheapest option among all major providers. Each service offers different strengths: Whisper excels at multilingual support, AssemblyAI includes built-in analytics, and Deepgram offers the lowest latency for real-time streaming.