apitranscriptionpricinggoogle-cloud

Google Cloud STT Pricing 2026: $0.003–$0.016/min

BMMamane B. MoussaFebruary 23, 2026Updated July 1, 202612 min read

Summarize this article with:

TL;DR

Google Cloud Speech-to-Text V2 charges $0.016/min for real-time standard transcription, dropping to $0.010 and $0.008 at higher volumes. Dynamic Batch — the 24-hour-turnaround tier — costs roughly $0.003/min, an 80% cut versus real-time. Every account gets 60 free minutes per month with no expiration. Medical models (V1 only) are $0.078/min. If you have been quoting V1 rates or treating streaming and batch as interchangeable, your cost estimates are likely off by 50% or more.

Google Cloud Speech-to-Text V2 charges $0.016 per minute for real-time transcription and as low as ~$0.003 per minute for Dynamic Batch, the 24-hour-turnaround tier that most teams overlook. If you have been quoting V1 rates or treating streaming and batch as interchangeable, your cost estimates are likely off by 50% or more.

The Short Version

The V2 API (the current default) ships with Google's Chirp model family at no premium. Standard real-time recognition runs $0.016/min, dropping with volume to $0.004/min at 2M+ minutes per month. Dynamic Batch cuts that to roughly $0.003 to $0.004/min in exchange for results within 24 hours, a meaningful option for any non-time-sensitive workload. The legacy V1 API uses a separate rate card tied to data-logging opt-in; those numbers still appear on the pricing page but apply only if you are explicitly calling the V1 endpoints. The free tier gives every account 60 minutes per month, ongoing, plus a one-time $300 credit for new GCP customers. Medical models (V1 only) price separately at $0.078/min.

The single biggest pricing mistake: calling the streaming endpoint for pre-recorded audio. That adds roughly 50% (V1) or erases the Dynamic Batch discount entirely (V2). Match the endpoint to your latency requirement.

V2 API: The Current Pricing Model

The V2 API is Google's recommended path for new integrations. It includes Chirp, Google's universal model trained on 100+ languages, at the standard rate, with no surcharge for choosing the more capable model.

Real-time recognition (standard):

Monthly Volume	Rate per Minute
0 to 500,000 minutes	$0.016
500,001 to 1,000,000 minutes	$0.010
1,000,001 to 2,000,000 minutes	$0.008
2,000,000+ minutes	$0.004

Dynamic Batch (V2): Roughly $0.003 per minute for workloads that can accept up to 24-hour turnaround. This is the headline cost-reduction lever in V2 that did not exist in V1. For a podcast studio, an archive transcription project, or any overnight batch workflow, Dynamic Batch can reduce per-minute costs by 80% compared to real-time.

Billing is reported by most sources as rounding up to the nearest 15 seconds per request. A 16-second clip is billed as 30 seconds. At scale, with many short clips, that rounding adds up faster than the per-minute rate suggests. For longer files (interviews, episodes, call recordings), the rounding effect is negligible.

V1 API: Legacy Rates Still in Use

The V1 API remains supported for existing integrations. Its pricing is structured around data-logging consent rather than volume tiers.

Model	With Data Logging	Without Data Logging
Standard (batch and streaming)	$0.016/min (after 60 free min)	$0.024/min
Enhanced (phone, noisy audio)	$0.016/min (after 60 free min)	$0.024/min
Medical Dictation	$0.078/min	$0.078/min
Medical Conversation	$0.078/min	$0.078/min

Data logging means Google may use your audio to improve models. For non-sensitive content (public podcasts, marketing recordings), opting in cut V1 costs by 33%. For anything touching PII, HIPAA, or financial data, data logging is off the table regardless of savings.

The V2 API does not use this same data-logging discount structure, which is one reason the per-minute rate looks identical ($0.016) but the mechanisms differ. If you are comparing your current invoice to published rates, confirm which API version your code calls.

Google Cloud STT: Cost per Minute by Mode (V2)

Dynamic Batch

~$0.003/min

High Volume (2M+ min)

$0.004/min

Mid Volume (1M min)

$0.008/min

Standard (up to 500K min)

$0.016/min

Rates as reported by Google Cloud pricing page and third-party sources, July 2026. Dynamic Batch rate is approximately $0.003–$0.004/min; chart uses $0.003.

Free Tier: What You Actually Get

60 minutes per month, ongoing. This is not a trial, it resets every month for the life of your account, with no expiration date. Both V1 and V2 usage counts against this allowance.

$300 in GCP credits for new customers. This applies across all Google Cloud services, not just Speech-to-Text, and expires after 90 days. For testing purposes it is generous; for production planning, treat it as a one-time onboarding buffer, not an ongoing subsidy.

Billing requires a payment method on file even if you stay within the free tier. If your billing account lapses, API calls fail immediately.

Real Cost Scenarios

Scenario 1: Podcast Studio, 50 Hours Per Month

A team producing 50 hours of episodes per month for show notes and SEO content does not need results in under a second. Podcasts are pre-recorded, processed overnight, and the 24-hour window is fine.

V2 Dynamic Batch config:

Volume: 50 hours = 3,000 minutes/month
Rate: ~$0.003/min
Free tier credit: 60 minutes
Billable minutes: 2,940
Monthly cost: 2,940 x $0.003 = ~$8.82/month

Compare that to the V1 "data logging" rate the old conventional wisdom recommended: 2,940 x $0.016 = $47.04/month. Dynamic Batch is not a minor discount. It is a different pricing tier that cuts the same workload cost by roughly 80%. The trade-off is exclusively about turnaround time.

If the team has a deadline and needs transcripts within an hour, they use standard real-time at $0.016/min. For next-day show notes, Dynamic Batch is the obvious choice.

If you just need transcripts without the GCP setup overhead, ConvertAudioToText offers flat-rate plans that remove the per-minute math entirely.

Scenario 2: Call Center, 500 Hours Per Month

A customer support team transcribing 500 hours of phone recordings monthly for QA review. Calls are already recorded and reviewed the following business day, so same-day turnaround is not required.

V2 Dynamic Batch config:

Volume: 500 hours = 30,000 minutes/month
Rate: ~$0.003/min
Billable minutes: 29,940
Monthly cost: 29,940 x $0.003 = ~$89.82/month

At real-time V2 standard rates, the same volume costs 29,940 x $0.016 = $478.64/month. The decision is entirely about whether review teams need transcripts the same day calls happen. Most QA workflows can absorb a 24-hour delay.

For comparison, at this volume Google's Deepgram competitor (Nova-3, $0.0077/min) would run about $231 monthly at real-time rates, still more than Dynamic Batch but substantially less than Google real-time. See the full API pricing breakdown for the complete picture.

Scenario 3: Healthcare Platform, 1,000 Hours Per Month

A telemedicine platform transcribing 1,000 hours of consultations monthly. Medical vocabulary support and HIPAA compliance both matter.

V1 Medical Conversation config (V2 does not have a dedicated medical model):

Volume: 1,000 hours = 60,000 minutes/month
Rate: $0.078/min (data logging disabled for HIPAA)
Billable minutes: 59,940
Monthly cost: 59,940 x $0.078 = $4,675.32/month

That is $56,100 per year. At this spend level, contacting Google Cloud sales for a custom enterprise rate is worth the conversation, volume agreements can reduce per-minute rates by 15 to 30%. Platforms at this scale should also benchmark whether a well-configured general model with custom vocabulary handling can match the Medical model's accuracy for their specific terminology.

The V1-vs-V2 Decision

My take: for any new integration started today, use V2. The Chirp model is more capable across a broader language set, Dynamic Batch exists only in V2, and volume discounts on V2 are public and automatic rather than requiring a sales conversation. The only reason to stay on V1 is an existing integration that would require non-trivial refactoring.

V2 does not currently offer dedicated Medical models. If your application requires Google's Medical Dictation or Medical Conversation models, you are staying on V1. That is not a fringe case, it is the legitimate reason the V1 API remains supported.

Google Cloud vs Other APIs

Provider	Real-Time Rate (per min)	Batch/Economy Mode	Free Tier
Google Cloud STT V2	$0.016	~$0.003 Dynamic Batch	60 min/month
Google Cloud STT V1	$0.016 (with logging) / $0.024	Not available	60 min/month
AWS Transcribe	$0.024 (standard tier)	Lower volume tiers	60 min/month, 12 months only
Azure Speech	~$0.0167 ($1/hour)	$0.003 batch	5 hours/month
Deepgram Nova-3	$0.0077	$0.0065 (Growth plan)	$200 credit
AssemblyAI	~$0.006	~$0.0025 base tier	$50 credit
OpenAI Whisper	$0.006	$0.003 (gpt-4o-mini)	None

Google's V2 real-time rate of $0.016/min sits above Deepgram and AssemblyAI for pure per-minute cost. Where it pulls ahead is language breadth (125+ languages through Chirp), the free monthly allowance that never expires, and deep integration with the rest of GCP. The comparison with AWS Transcribe is particularly close on paper; the practical differences come down to ecosystem fit and which free tier structure matches your budget cycle.

For a broader look at how transcription APIs are priced, the Google/Deepgram comparison is a good case study in the difference between language breadth and per-minute efficiency.

When Google Cloud Speech-to-Text Makes Sense

You need 125+ languages. Chirp covers languages that most competitors do not offer at production quality. If your application handles Swahili, Javanese, Tagalog, or similar low-resource languages, Google is often the only viable hosted option.

Your infrastructure is already on GCP. Audio stored in Cloud Storage, results written to BigQuery, triggers via Cloud Functions, zero egress costs, native service account auth, and operational tooling already in place. The integration friction is effectively zero.

Medical transcription with Google's compliance posture. The Medical models carry HIPAA BAA coverage and healthcare-specific certifications that most competitors cannot match.

Enterprise compliance checkboxes. SOC 2 Type II, ISO 27001, FedRAMP, if your security team has a checklist, Google Cloud checks it.

When to Look at Alternatives

Cost-sensitive high volume without medical requirements. At 500 hours per month, Deepgram Nova-3 real-time ($0.0077/min) runs substantially cheaper than Google V2 real-time. The hidden costs of transcription services post walks through total-cost comparisons including infrastructure overhead.

English-only accuracy benchmarks. Some competitors pull ahead on English-specific word error rate. Deepgram Nova-3 and the larger Whisper models benchmark well on English content. If your content is exclusively English and WER is your primary metric, run your own test before assuming Google is best.

No-GCP infrastructure. Using Speech-to-Text requires a GCP project, service account, billing account, and SDK setup. That is manageable but non-trivial for a startup that is not otherwise a GCP customer. A standalone API like AssemblyAI or a turnkey tool like ConvertAudioToText's audio-to-text tool gets you from file to transcript faster if you are not building a custom pipeline. Flat-rate plans with no per-minute billing are on our pricing page.

Cost Reduction Checklist

Switch non-urgent workloads to Dynamic Batch. The V2 Dynamic Batch tier is the single highest-leverage cost reduction available. If results can wait 24 hours, there is no reason to pay real-time rates.

Convert stereo to mono before sending. Multi-channel audio is billed per channel. A standard stereo recording doubles your cost if you do not convert first. Downsampling to mono 16kHz before upload cuts storage, transfer, and API cost without affecting accuracy.

Confirm you are calling V2 endpoints for new code. The pricing calculator and documentation default to V2, but integrations copied from older examples may still point at V1 endpoints. Verify by checking the endpoint URL in your API calls.

Set billing alerts in the Cloud Console. Configure alerts at 50%, 80%, and 100% of your expected monthly spend. A retry loop or configuration bug can multiply your bill before you notice.

Use the Google Cloud Pricing Calculator to model your actual volume, API version, and feature set before committing.

Common Questions

How much does Google Cloud Speech-to-Text cost per minute in 2026?

The V2 API charges $0.016 per minute for real-time standard transcription, dropping to $0.010 and $0.008 per minute at higher monthly volumes. The Dynamic Batch mode costs approximately $0.003 to $0.004 per minute for workloads that can accept 24-hour turnaround. The V1 API charges $0.016 per minute with data logging enabled or $0.024 per minute without it. Medical models (V1 only) run $0.078 per minute.

What is the free tier for Google Cloud Speech-to-Text?

Every Google Cloud account gets 60 minutes of free transcription per month, on an ongoing basis, with no expiration. New GCP customers also receive $300 in credits (90-day expiration) that apply across all GCP services including Speech-to-Text. After the free 60 minutes, standard per-minute rates apply automatically. A billing account must be attached to your project to use the API, even within the free tier.

What is Dynamic Batch and how much does it cost?

Dynamic Batch is a V2-only processing mode that queues your transcription requests for lower-priority processing, delivering results within 24 hours. The rate is approximately $0.003 to $0.004 per minute, compared to $0.016/min for real-time standard. It is best suited for overnight batch jobs: podcast archives, next-day QA review, bulk transcription projects. For live captioning or any use case requiring results in seconds, use the standard real-time endpoint.

What is the difference between the V1 and V2 API for pricing?

V1 pricing uses a data-logging model: $0.016/min with data logging enabled or $0.024/min without it. V2 pricing uses volume tiers starting at $0.016/min, with no separate data-logging discount. V2 also adds Dynamic Batch and includes the Chirp model family at no premium. V1 is the only option if you need the Medical Dictation or Medical Conversation models. For any new integration, V2 is the recommended path.

How does Google Cloud Speech-to-Text pricing compare to Deepgram?

At real-time rates, Deepgram Nova-3 ($0.0077/min) is cheaper than Google V2 standard ($0.016/min) for identical workloads up to 500,000 minutes per month. Above 1M minutes per month, Google's volume tiers bring V2 down to $0.008/min and lower, making the comparison closer. Google's Dynamic Batch at roughly $0.003/min undercuts Deepgram for non-time-sensitive workloads. The practical choice depends on which languages you need (Google leads on breadth), your GCP footprint, and whether real-time latency is a requirement. See the speech-to-text API pricing comparison for a more detailed breakdown.

Sources

Google Cloud Speech-to-Text Pricing, official pricing page
CostBench: Google Speech-to-Text Pricing 2026, V1/V2 rate table
BrassTranscripts: Google STT Why Your Bill Doubles, infrastructure cost context
Smallest.ai: Google Cloud STT 2026 Review, V2 model overview
Deepgram Pricing, Nova-3 rates verified
Amazon Transcribe Pricing, AWS rates
Azure Speech Pricing, Azure rates
OpenAI Whisper Pricing 2026, Whisper/gpt-4o rates

Try transcription free

Convert any audio or video to clean, unwatermarked text — speaker labels, timestamps, and AI summaries included. First 10 minutes free, no account.

apitranscription

AWS Transcribe Pricing 2026: $0.024/min Entry, $0.0078 at Scale

AWS Transcribe pricing 2026: Standard starts at $0.024/min and drops to $0.0078/min above 5M minutes/month. Medical is $0.075/min. Free 60 min/month for first 12 months. When AWS beats Deepgram and when it doesn't.

Feb 23, 202611 min