OpenAI Whisper API Pricing

Last updated: just now

Complete pricing breakdown for OpenAI's Whisper speech-to-text API. Compare costs per minute of audio for Whisper Large V3, gpt-4o-transcribe, and gpt-4o-mini-transcribe. Includes a cost calculator and comparison with Google Speech-to-Text, AssemblyAI, and Deepgram.

$0.0030

Cheapest ($/min)

gpt-4o-mini-transcribe

$0.18

Per Hour (cheapest)

gpt-4o-mini-transcribe

Whisper Models

100+

Languages Supported

What is OpenAI Whisper?

Whisper is OpenAI's automatic speech recognition (ASR) system, trained on 680,000 hours of multilingual audio data. It can transcribe speech in over 100 languages and translate non-English audio into English. The model is open-source under the MIT license, meaning you can self-host it for free or use OpenAI's hosted API for pay-per-minute pricing.

OpenAI offers several transcription endpoints: the original Whisper Large V3 and its faster Turbo variant, plus newer GPT-4o-powered transcription models (gpt-4o-transcribe and gpt-4o-mini-transcribe) that provide improved accuracy and cost savings.

OpenAI Whisper & Transcription Pricing

4 models

Model	$/min	$/hour	$/10 hrs	Features
Whisper Large V3 Most accurate open-source speech recognition model. Supports 100+ languages.	$0.0060	$0.36	$3.60	100+ languagesTimestampsTranslationBatch API
Whisper Large V3 Turbo Faster variant of Large V3 with near-identical accuracy and lower latency.	$0.0060	$0.36	$3.60	100+ languagesLower latencyTimestampsBatch API
gpt-4o-transcribe GPT-4o-powered transcription with improved accuracy on complex audio and accents.	$0.0060	$0.36	$3.60	Enhanced accuracyBetter punctuationAccent handlingStreaming
gpt-4o-mini-transcribe Most affordable option. Uses GPT-4o Mini for cost-effective transcription.	$0.0030	$0.18	$1.80	50% cheaperGood accuracyFast processingStreaming

Audio & Speech Models in Our Database

19 models with audio capabilities

Model	Provider	Input $/1M	Output $/1M	Modality
Gemini 3.1 Flash Lite Preview	Google	$0.250	$1.50	text+image+file+audio+video->text
Gemini 3.1 Pro Preview Custom Tools	Google	$2.00	$12.00	text+image+file+audio+video->text
Gemini 3.1 Pro Preview	Google	$2.00	$12.00	text+image+file+audio+video->text
GPT Audio	OpenAI	$2.50	$10.00	text+audio->text+audio
GPT Audio Mini	OpenAI	$0.600	$2.40	text+audio->text+audio
Gemini 3 Flash Preview	Google	$0.500	$3.00	text+image+file+audio+video->text
Gemini 3 Pro Preview	Google	$2.00	$12.00	text+image+file+audio+video->text
Voxtral Small 24B 2507	Mistral AI	$0.100	$0.300	text+audio->text
Gemini 2.5 Flash Lite Preview 09-2025	Google	$0.100	$0.400	text+image+file+audio+video->text
GPT-4o Audio	OpenAI	$2.50	$10.00	text+audio->text+audio
Gemini 2.5 Flash Lite	Google	$0.100	$0.400	text+image+file+audio+video->text
Gemini 2.5 Flash	Google	$0.300	$2.50	text+image+file+audio+video->text
Gemini 2.5 Pro	Google	$1.25	$10.00	text+image+file+audio+video->text
Gemini 2.5 Pro Preview 06-05	Google	$1.25	$10.00	text+image+file+audio->text
Gemma 3n 4B (free)	Google	Free	Free	text->text
Gemma 3n 4B	Google	$0.020	$0.040	text->text
Gemini 2.5 Pro Preview 05-06	Google	$1.25	$10.00	text+image+file+audio+video->text
Gemini 2.0 Flash Lite	Google	$0.075	$0.300	text+image+file+audio+video->text
Gemini 2.0 Flash	Google	$0.100	$0.400	text+image+file+audio+video->text

Whisper Cost Calculator

Estimate your monthly Whisper API costs based on hours of audio processed. Prices shown for all available Whisper and transcription models.

Model	$/min	10 min	1 hour	10 hours	100 hours	Monthly (2h/day)
Whisper Large V3	$0.0060	$0.06	$0.36	$3.60	$36.00	$21.60/mo
Whisper Large V3 Turbo	$0.0060	$0.06	$0.36	$3.60	$36.00	$21.60/mo
gpt-4o-transcribe	$0.0060	$0.06	$0.36	$3.60	$36.00	$21.60/mo
gpt-4o-mini-transcribe	$0.0030	$0.03	$0.18	$1.80	$18.00	$10.80/mo

Note: Actual costs depend on audio file size and duration. OpenAI's Whisper API accepts files up to 25 MB per request. For longer files, split audio into segments. Batch API usage may offer additional discounts for non-time-sensitive workloads. Try the interactive calculator for custom estimates.

Whisper vs Alternatives -- Pricing Comparison

See how OpenAI Whisper compares to other speech-to-text providers. All prices in USD per minute of audio.

OpenAI Whisper

4 models

Model	$/min	$/hr
Whisper Large V3	$0.0060	$0.36
Whisper Large V3 Turbo	$0.0060	$0.36
gpt-4o-transcribe	$0.0060	$0.36
gpt-4o-mini-transcribe	$0.0030	$0.18

Alternatives

3 providers

Provider	$/min	$/hr
Google Speech-to-Text (V2) 60 min/month free	$0.016	$0.96
AssemblyAI 100 hours free trial	$0.0065	$0.39
Deepgram (Nova-2) $200 free credit	$0.0043	$0.26

Understanding Whisper API Pricing

Per-Minute Billing

Unlike text LLMs that charge per token, Whisper charges per minute of audio input. You pay for the duration of the audio file, not the length of the transcript output. Audio shorter than one minute is billed at the per-minute rate. There are no separate input and output charges -- the per-minute price covers the full transcription.

Whisper vs GPT-4o Transcription

OpenAI offers two transcription architectures: the original Whisper model and newer GPT-4o-powered models. Whisper Large V3 is open-source and well-tested across many languages. The gpt-4o-transcribe model offers better accuracy on noisy audio and complex accents, while gpt-4o-mini-transcribe provides the most affordable option at half the cost.

Self-Hosting vs API

Since Whisper is open-source, you can run it on your own GPU hardware for free. A single NVIDIA A100 can process audio roughly 5-10x faster than real-time. Self-hosting becomes more cost-effective when processing more than ~50 hours of audio per month, depending on your GPU costs. The API is ideal for lower volumes or when you want zero infrastructure overhead.

Saving on Whisper Costs

Use gpt-4o-mini-transcribe for cost-sensitive workloads -- it's 50% cheaper than standard Whisper at $0.003/min. For batch processing, use the Batch API for potential discounts. Compress audio to reduce file sizes (Whisper works well with 16kHz mono audio). Remove silence from recordings before transcription to avoid paying for dead air.

OpenAI API Pricing

Full OpenAI pricing for GPT-4o, o3, and all text models.

Anthropic API Pricing

Compare with Claude Opus 4, Sonnet 4, and all Anthropic models.

Google AI Pricing

Compare with Gemini 2.5 Pro, Flash, and Google Speech-to-Text.

Cheapest AI Models

Find the most affordable models across all providers.

Whisper API Pricing FAQ

OpenAI Whisper API pricing starts at $0.003 per minute of audio with gpt-4o-mini-transcribe, and $0.006 per minute for Whisper Large V3, Whisper Large V3 Turbo, and gpt-4o-transcribe. That works out to $0.18 to $0.36 per hour of audio, making it one of the most affordable speech-to-text APIs available.

Yes, in most cases. Google Speech-to-Text V2 costs $0.016 per minute (standard model), which is roughly 2.5x more expensive than Whisper at $0.006/min. However, Google offers a cheaper data-logging model at $0.006/min and 60 free minutes per month. For high-volume usage, Whisper is typically more cost-effective.

Whisper Large V3 is OpenAI's open-source speech recognition model, available both via API and for self-hosting. gpt-4o-transcribe uses the GPT-4o model architecture for transcription, offering improved accuracy on complex audio, better punctuation, and superior accent handling. Both cost $0.006/min via the API, but gpt-4o-transcribe generally produces higher-quality transcriptions.

The Whisper model itself is open-source (MIT license) and can be run locally for free if you have the hardware. However, using OpenAI's hosted Whisper API costs $0.006 per minute. For the cheapest hosted option, gpt-4o-mini-transcribe is available at $0.003 per minute. Self-hosting Whisper on your own GPU can be more cost-effective for high-volume workloads exceeding ~50 hours/month.

Whisper API pricing ($0.006/min) is competitive with AssemblyAI ($0.0065/min) and slightly more expensive than Deepgram Nova-2 ($0.0043/min). However, gpt-4o-mini-transcribe at $0.003/min is the cheapest option among all major providers. Each service offers different strengths: Whisper excels at multilingual support, AssemblyAI includes built-in analytics, and Deepgram offers the lowest latency for real-time streaming.