numberstranscriptionformattingfix

Fix Transcription of Numbers: A Systematic Guide (2026)

BMMamane B. MoussaMay 26, 2026Updated July 2, 202610 min read

Summarize this article with:

TL;DR

Number errors in AI transcription fall into a small set of repeatable patterns: confusable spoken forms ("fifteen" vs "fifty"), format inconsistency (digits vs words), and wrong unit attribution. The fix is layered: enable smart formatting in your tool first, then apply find-and-replace for domain-specific errors, then do a targeted manual pass only for numbers that carry real stakes. LLM post-processing handles the remaining format inconsistencies that smart formatting misses.

Number errors in transcription are fixable, but the fix depends on which of three root causes you are dealing with. The model heard the wrong number (a mishearing). The model heard correctly but rendered it wrong (a formatting error). Or the formatting is inconsistent across the transcript (a style error). Each has a different solution, and mixing them up wastes time.

This guide walks through each cause and its fix, in order of what to try first.

Why Numbers Fail Differently Than Other Words

Numbers are the category where AI transcription makes its most expensive mistakes. The errors are specific and repeatable.

"Fifteen" vs. "fifty" is the single most common number mishearing. The same pattern repeats across the teens and tens: thirteen/thirty, fourteen/forty, sixteen/sixty, seventeen/seventy. In fast speech or with any accent, these pairs share enough acoustic features that the model picks between them based on context, not just sound. If the surrounding sentence does not strongly favor one number, the model guesses and sometimes guesses wrong.

Number sequences create a second failure mode. "Two five seven nine" said as a reference number becomes "2,579" as a quantity. "One hundred fifty-two" might become "152" or "one hundred, 52." The model is trying to group digits into a meaningful number, and it does not always know whether you want a phone fragment, a quantity, or an identifier.

Unit attribution fails in a third specific way. "Five dollars fifty" might become "$5.50," "five-fifty," or "5 dollars, 50 cents" depending on the tool and whether smart formatting is on. The number gets the right digits but the format is wrong, or the decimal shifts.

Date and decimal parsing form the last major cluster. "May fifteen twenty twenty-six" might become "May 15, 2026," "5/15/26," or "May 15th" depending on what format the model defaults to. "Point five" might become ".5," "0.5," or "five tenths."

Fix 1: Enable Smart Formatting

Most number rendering errors go away when you turn on smart formatting. This is the first fix to try, and for many users, it handles 80 percent of the problem.

Here is what each major tool actually provides, per current vendor documentation (checked 2026-07):

| Tool | Parameter | Default | What it handles | |------|-----------|---------|-----------------|\n| Deepgram | smart_format=true | Off | Numerals, currency, dates, times, phone numbers, emails (English; select other languages) | | AWS Transcribe | Automatic (no parameter needed) | On for supported languages | Numbers, currency, dates, times, addresses | | AssemblyAI | format_text=true | On | Text formatting including number rendering | | OpenAI Whisper API | No dedicated parameter | Mixed | No built-in number formatter; use prompt or post-processing |

Deepgram's smart_format=true is the most explicit: passing the parameter in a batch or streaming request converts spoken numbers to their digit forms for English, including currency symbols, AM/PM times, and ordinal dates. AWS Transcribe applies this automatically for supported languages with no configuration required. AssemblyAI's format_text is on by default, so if you are getting raw output from AssemblyAI, check that it has not been disabled.

OpenAI Whisper is the outlier. There is no smart_format parameter. Its output defaults to a mix of digits and written-out words depending on context, and the documentation recommends using the prompt parameter to guide style or post-processing to fix consistency. If you need consistent number rendering from Whisper, treat that separately.

Audio upload tool showing smart formatting output for numbers

One nuance worth knowing: smart formatting resolves rendering, not mishearings. If the model heard "fifteen" when the speaker said "fifty," smart_format=true will render the wrong number cleanly as "15." The mishearing stays.

Fix 2: Find-and-Replace for Domain-Specific Errors

If you regularly transcribe content from the same domain, you will see the same number errors repeat. A financial podcast where the host always says "fifty basis points" will produce the same "15 basis points" error across episodes. Build a substitution list.

Common patterns worth building a list for:

fifteen basis points → 50 basis points  [verify against audio first]
thirteen percent → 13%
two zero two five → 2025
seventy-five thousand → 75,000

The limitation is real: find-and-replace cannot distinguish context. "Twenty" is the right rendering for "twenty people," and "20" is often right for "20 dollars" in a financial context. A blanket substitution of all instances of "twenty" to "20" will break valid occurrences. Use this fix for errors that are specific enough to a domain term that false-positive replacements are unlikely.

Fix 3: LLM Post-Processing for Format Consistency

For format inconsistency across a long transcript (dates in three different formats, some numbers as words and some as digits with no consistent logic), an LLM post-processing pass cleans it up without requiring you to verify audio.

A prompt that works:

Read this transcript and fix number formatting for consistency.
Convert quantities to digits ($50, 25%, 1,500, 8:30 AM).
Write out numbers that start a sentence.
Fix obvious impossible numbers if context makes the correction clear.
Do not change any number you are uncertain about.

Transcript:
[paste here]

GPT-4o and Claude handle this reliably for formatting tasks. The important limit: the LLM sees only text. If the transcript has a mishearing, the LLM will correct formatting around the wrong number, not fix the mishearing. Use this for style, not fact-checking.

Fix 4: Targeted Manual Verification for High-Stakes Content

For content where a wrong number has real consequences, a targeted listen-and-verify pass is the only reliable fix.

The workflow:

Run transcription with smart formatting enabled.
Search the transcript for every number: scan for digits, currency symbols, percentages, dates.
For each number that matters, jump to that timestamp in the audio and verify.
Fix any that are wrong.

For a 30-minute meeting with 10-15 numerical mentions, this takes 5-10 minutes. It is the only approach that catches mishearings that smart formatting cannot fix.

For the highest-stakes content, such as legal contracts, medical dosages, or published financial data, a second person doing the same pass independently is worth the time.

Number Categories and Their Specific Failure Modes

Different categories fail in different ways. Knowing which category you are dealing with points you to the right fix.

Money and Currency

Common errors: wrong magnitude (15 vs 50), missing currency symbol, decimal shift ("five fifty" parsed as 5.50 vs 550).

Fix: smart formatting first. For any financial content where figures will be cited or published, manual verification.

Percentages

Common errors: wrong order of magnitude (3% vs 30%, which reverses the meaning of a trend). A 30 percent increase rendered as 3 percent is not a small inaccuracy.

Fix: smart formatting. Scan manually for any percentage that describes a change or a rate, since those carry the most meaning-shifting risk.

Dates

Common errors: wrong year (2025 vs 2026), inconsistent format within the same transcript (May 15 in one paragraph, 5/15 in another), ordinal vs cardinal ("May fifteenth" vs "May 15").

Fix: smart formatting handles most cases. For any date in a legal or contractual context, verify manually.

Phone Numbers and ID Numbers

Common errors: transposed digits, missing digits, wrong grouping (555-12-34 instead of 555-1234).

Fix: manual verification required. Phone numbers and identifiers are too specific for any automatic correction to be trusted. They are also not checkable from context.

Quantities and Measurements

Common errors: wrong magnitude ("2 grams" vs "20 grams"), wrong unit attached to the right number ("50 meters" when the speaker said "50 millimeters").

Fix: smart formatting helps with rendering. For technical content (medical, scientific, engineering), treat all measurements the same as high-stakes financial data: listen and verify.

Ratios and Ranges

Common errors: range endpoints transposed, ratio direction reversed ("two to one" vs "one to two").

Fix: read each ratio and range against the audio. These are semantically reversible and context does not always disambiguate.

Times

Common errors: 12-hour vs 24-hour format, missing AM/PM, "o'clock" dropped.

Fix: smart formatting handles most. For meeting transcripts where exact times correspond to agenda items, verify the ones that matter.

A Workflow for Number-Heavy Content

For anyone who regularly transcribes financial reports, technical lectures, earnings calls, or clinical recordings:

Use a tool with smart formatting enabled by default or with an explicit parameter to turn it on.
After transcription, do a numbers-only pass: search for digits and verify the ones that carry meaning.
Maintain a domain-specific substitution list for errors that recur across your content.
For published or legal content, build in a second-person verification step for any figure that will be cited.

This catches what pure automation misses without requiring you to re-verify every word.

When to Escalate to Human Transcription

Some content has numbers that simply cannot be wrong: regulatory filings, clinical trial data, legal depositions, real estate contracts.

For these, the practical stack is AI transcription as a first pass, careful manual review of all numbers against audio, and optional human transcription via a specialized service for the most critical passages. The AI vs human transcription post covers when that escalation makes economic sense.

If you are also dealing with broader accuracy problems beyond numbers, the fix for poor transcription accuracy covers the full picture. Numbers are a specific sub-problem; the general accuracy fixes are separate.

If you just need a clean first-pass transcript without a meeting bot or account setup, ConvertAudioToText applies smart formatting by default and works directly from an upload or URL, which is useful for one-off verification jobs.

FAQ

Why does my transcript say 'fifteen' when the speaker said 'fifty'?

The two words share acoustic features, especially in fast or accented speech: the stressed syllable lands differently but both have a /f/ onset and a similar vowel. AI models pick between them based on acoustic probability and surrounding context. If the surrounding context does not strongly favor one number over the other, the model will guess wrong at some rate. Enabling smart formatting and doing a targeted listen-and-verify pass on any number that matters are the practical fixes.

Does enabling smart formatting fix all number errors?

No. Smart formatting handles rendering consistency: it converts "fifty thousand dollars" to "$50,000" reliably once the model heard the right words. It cannot fix a mishearing. If the model heard "fifteen" when the speaker said "fifty," smart formatting will render the wrong number neatly. You still need a manual check for high-stakes figures.

When should I use LLM post-processing vs. manual correction for numbers?

Use LLM post-processing for format consistency issues across a long transcript: inconsistent date formats, mixed digit/word rendering, impossible numbers that context makes obvious. Use manual correction when the exact figure matters and an error would have real consequences, such as financial data, dosages, legal amounts, or technical specifications. LLMs can only work from the transcript text, not the audio, so they cannot resolve a mishearing.

Which transcription tools handle numbers best by default?

Deepgram with smart_format=true is the strongest for English: it handles numerals, currency, dates, phone numbers, and times in a single setting. AWS Transcribe applies number normalization automatically for supported languages with no extra parameter. AssemblyAI has format_text=true on by default and handles number rendering well. OpenAI Whisper has no dedicated number-formatting parameter; its output defaults to a mix of digits and words depending on context, and post-processing is the recommended fix for consistency.

Sources

Try transcription free

Convert any audio or video to clean, unwatermarked text — speaker labels, timestamps, and AI summaries included. First 30 minutes free, no account.

punctuationtranscription

Fix Missing Punctuation in Your Transcript (2026 Guide)

Your transcript is a wall of text with no periods or commas. This guide explains why punctuation goes missing and how to fix it fast, with per-engine honesty.

May 26, 20269 min

foreign-wordsmultilingual

Fix Foreign Words in Your Transcription (2026 Guide)

Your AI transcript mangled every French phrase and German place name. Here is the systematic fix for foreign words in English transcripts, from tool choice to custom vocabulary.

May 26, 202610 min