Fix Transcription of Numbers (When 'Fifteen' Becomes '50' and '$1,250' Becomes 'twelve fifty')
numberstranscriptionformattingfix

Fix Transcription of Numbers (When 'Fifteen' Becomes '50' and '$1,250' Becomes 'twelve fifty')

ConvertAudioToText TeamMay 26, 20267 min read

Your Transcript Says "Fifteen" When the Speaker Said "50"

The speaker clearly said "fifty thousand dollars." The transcript shows "$15,000." The speaker said "two point five percent." The transcript shows "25%." The numbers are wrong, and they are wrong in a specific way that makes the transcript misleading rather than just imprecise.

Numbers in transcription are one of the most common failure modes. They mix with surrounding speech, have multiple valid renderings (digits or words), and often involve units (dollars, percent, dates) that the model has to parse correctly. The fixes are systematic.

Why AI Gets Numbers Wrong

Several specific failure modes account for most number errors.

Confusing Numerical Words

"Fifty" and "fifteen" sound similar in fast or accented speech. "Thirty" and "thirteen." "Seventy" and "seventeen." The model has to pick one based on context and acoustic features, and it picks wrong more often than for other word categories.

Misparsing Number Sequences

"Two five seven nine" intended as a phone number snippet becomes "2,579" as if it were a quantity. "Fifty thousand two hundred" becomes "52,200" or "fifty thousand, 200." The number parsing is fragile.

Wrong Unit Attribution

"Three percent" might become "3%" or "three percent" inconsistently. "Five dollars and fifty cents" might become "$5.50" or "5.50" or "5 dollars, 50 cents." The units stick to the wrong digits sometimes.

Date Format Confusion

"May fifteenth twenty twenty-six" might become "May 15, 2026" or "5/15/26" or "May 15th, 2026." The format is inconsistent across the transcript.

Decimal and Range Errors

"Point five" becomes ".5" or "0.5" or "five tenths." "Two to three hours" becomes "2-3 hours" or "two to 3 hours."

Fix 1: Enable Smart Formatting

Most modern transcription tools include smart formatting that handles number rendering more consistently. If your tool has this option, turn it on.

  • CATT audio-to-text: smart formatting on by default. Numbers render as digits in most contexts, words when the speaker emphasized.
  • Deepgram Nova-3 with smart_format=true: strong number handling.
  • OpenAI Whisper API: reasonable defaults but no explicit smart formatting parameter.
  • AssemblyAI with format text enabled: consistent number rendering.
  • AWS Transcribe with ShowSpeakerLabels and number formatting: works but requires explicit configuration.

If you are getting raw output without smart formatting, that is the first fix.

Fix 2: Find-and-Replace for Predictable Number Errors

Some number errors are consistent across your transcripts. Find-and-replace handles them.

Common substitutions:

fifteen million → 15 million
thirteen percent → 13%
two zero two five → 2025
seventy five thousand → 75,000

For dates and currencies that follow a consistent pattern, you can build a list of common errors and their corrections.

The limitation: many number errors are context-specific. "Twenty" is the right rendering for "twenty people" and "20" might be right for "20 dollars." Find-and-replace cannot make that distinction.

Fix 3: Manual Correction for High-Stakes Numbers

For content where numbers carry significant meaning (financial reports, scientific papers, legal documents, real estate listings), manual review of the numbers is the practical fix.

The pattern that works:

  1. Run the transcription.
  2. Scan the transcript for every number mentioned.
  3. Listen to each numerical mention in the audio and verify the transcript.
  4. Fix any that are wrong.

For a 30-minute meeting with 10-15 numerical mentions, this takes 5-10 minutes. It is the only reliable fix for high-stakes content.

Fix 4: Post-Process with an LLM

For better number handling without manual review, an LLM post-processing step can clean up number formatting:

Read this transcript and check every number for consistency. 
Fix any that appear to be misheard based on context. 
Render numbers as digits when used as quantities ($50, 25%, 1,500). 
Render as words when at the start of sentences or in informal speech.

Transcript:
[paste here]

GPT-4o and Claude both handle this reasonably. The LLM cannot fix errors that depend on the audio (it does not have the audio), but it can fix obvious formatting inconsistencies and impossible numbers (a stated "1,500 percent" was probably "15 percent").

Specific Number Categories and Their Fixes

Different categories of numbers fail in different ways. The right fix depends on the category.

Money and Currency

Common errors: wrong amount (15 vs 50), wrong currency symbol, wrong decimal placement.

Fix: enable smart formatting. For high-stakes content (financial reports, contracts), manual verification.

Percentages

Common errors: wrong magnitude (3% vs 30%), wrong sign for changes.

Fix: enable smart formatting. Verify trends manually (a 30% increase reported as 3% changes the meaning entirely).

Dates

Common errors: wrong year (2025 vs 2026), wrong format, ordinal vs cardinal.

Fix: enable smart formatting. For document dates, manual verification.

Phone Numbers and ID Numbers

Common errors: missing digits, wrong groupings, transposed digits.

Fix: manual verification required. Phone numbers and ID numbers are too specific for automatic correction.

Quantities and Measurements

Common errors: wrong magnitude ("2 grams" vs "20 grams"), wrong unit ("meters" vs "millimeters").

Fix: smart formatting helps. For technical content (medical, scientific, engineering), manual verification.

Ratios and Ranges

Common errors: range endpoints wrong, ratios reversed.

Fix: read the transcript aloud against the audio for any ratio or range.

Years and Decades

Common errors: "twenty twenty-six" vs "2026," "the eighties" vs "the 80s."

Fix: smart formatting handles most cases. For historical writing, decide on a consistent style and find-and-replace if needed.

Times

Common errors: 12-hour vs 24-hour format, missing AM/PM.

Fix: smart formatting handles most. For meeting transcripts where exact times matter (start/end times of agenda items), verify manually.

A Workflow for Number-Heavy Content

For users who regularly transcribe content with significant numerical content (financial podcasts, technical lectures, business meetings):

  1. Use a tool with smart formatting enabled by default.
  2. After transcription, do a numbers-only review pass: scan for every number, verify against audio if it matters.
  3. Maintain a substitution list for recurring number errors specific to your domain.
  4. For high-stakes content, have a second person verify the numbers.

This catches the failures that pure automation misses.

When Numbers Are Critical

Some content has numbers that absolutely must be right. Financial reports, scientific data, medical dosages, legal contracts, real estate transactions.

For these:

  • AI transcription as a first pass.
  • Manual verification of every number against audio.
  • Second-person verification for the highest-stakes content.
  • Optional: human transcription via Rev or specialized services for truly critical content.

The AI vs human transcription post covers when to escalate. For number-critical content, the answer is "AI for speed, human or careful manual review for accuracy."

Tools to Avoid for Number-Heavy Content

Some tools have particularly weak number handling and should be avoided for content where numbers matter:

  • Browser-based Web Speech API tools.
  • Older versions of any speech recognition system.
  • Free tier tools without smart formatting.
  • Generic open-source Whisper deployments without post-processing.

For number-heavy content, our pipeline, Deepgram Nova-3, OpenAI Whisper API, or AssemblyAI are the right choices. All have smart formatting by default and consistent number rendering.

The Underlying Problem and What Will Change

The number transcription problem is a model limitation that improves slowly. Numbers are sparser in training data than common words; the model has less to learn from. Better speech models will help, but the problem will not disappear in the next 12-18 months.

The future of transcription covers where the model improvements are heading. For numbers specifically, the practical answer for 2026 is the workflow above: smart formatting plus a quick numbers review for content where they matter.

For most users, the fix is configuration plus a five-minute review pass per transcript. For users with very high-stakes numerical content, the fix includes a verification step that catches the remaining errors. Either way, the problem is solvable; you just have to expect numbers to need extra attention. Pair this with the accuracy fix post for the broader transcript quality picture.

Try transcription free

Convert any audio or video to accurate text in seconds. Speaker labels, timestamps, and AI summaries included. No account required.

Related Articles