legal discoverytranscriptionlitigation

Transcription for Legal Discovery: Audio Evidence at Scale

BMMamane B. MoussaMay 26, 2026Updated July 2, 202613 min read

Summarize this article with:

Audio in the Discovery Pipeline

Audio is now a first-class ESI category in federal litigation. Under FRCP 26(b)(1), any nonprivileged matter relevant to a claim or defense is discoverable, and that scope sweeps in voice recordings the same as emails or spreadsheets. The practical consequence: a matter that looks like a document review problem often has hundreds of hours of audio sitting inside it, and listening file by file is not a realistic review strategy. Transcription is the tool that converts that audio into something searchable, codeable, and defensible.

What Audio Ends Up in Discovery

Modern matters routinely surface:

Recorded customer-service calls. Companies record calls for quality and training. Those recordings are discoverable in disputes touching the underlying customer relationships.

Voicemails. Both individual-device voicemails and voicemails on corporate VoIP systems. Often overlooked in the collection stage.

Conference and video-meeting recordings. Zoom, Teams, and Meet recordings saved automatically to cloud storage are easy for opposing counsel to request and rarely reviewed internally before a lawsuit surfaces.

Voice memos. Employees increasingly use on-device voice recording apps for quick notes. These are personal-device ESI.

Surveillance audio. Retail locations, corporate security cameras, and body-cam footage from corporate security personnel.

Smart-device recordings. Amazon Alexa, Google Home, and similar logs have appeared in commercial and criminal proceedings.

Telephonic depositions and remote legal conference recordings. Audio files from proceedings conducted by phone or video link.

For any matter where these may be relevant, the litigation team needs a way to review the audio without listening through every file in real time.

The Discovery Workflow for Audio

A standard e-discovery workflow adapted for audio files:

Collection. Audio files are collected alongside documents, often using the same ESI collection process (Relativity, OpenText, or outside e-discovery counsel). FRCP 34(b) permits the requesting party to specify a production format; absent a specification, the responding party produces audio in the form ordinarily maintained or in a reasonably usable form.

Processing. Audio files are converted to a standard format: MP3 for smaller file sizes or WAV for higher fidelity. The production format should be addressed in the ESI protocol agreed upon at the outset of the case.

Transcription. AI transcription converts audio to text for loading into the review platform. The transcript is treated as a work-product review aid, not the official record. The audio file itself is the primary deliverable for production.

Review. Reviewers read the transcript for relevance and listen to specific sections when accuracy matters. Time-stamped transcripts let reviewers jump directly to the relevant section without listening from the beginning.

Privilege screening. Audio files contain no privilege markings. Reviewers identify attorney-client communications or work-product content during transcript review. Files identified as privileged are withheld and entered on the privilege log.

Production. Audio files and, if agreed by protocol, accompanying transcripts are produced. The transcript is typically labeled as a working document; the audio drives the privilege and relevance calls.

Trial preparation. For audio that may be used as a trial exhibit, a certified court reporter or licensed transcriptionist attests to accuracy. That step is targeted and separate from the bulk review process.

AI Transcription as a Review Aid: Where It Fits and Where It Does Not

The most important line in discovery transcription is the one between review and record.

For bulk review, AI transcription is well-matched to the task. The accuracy rate from current AI models (roughly 95 to 98 percent on clear recordings, per generally available benchmarks) is sufficient to identify relevant material, spot privileged content, and flag files for closer attention. A reviewer reading an AI transcript can find what matters; they can then listen to the audio for the handful of files that matter substantively.

For the official record, AI transcription alone is not sufficient. Courts require human certification for transcripts offered as evidence. Per current practice reviewed across multiple legal service sources (CourtScribes, TranscribeMe, and practitioners at Duane Morris), AI-only transcripts are not considered court-admissible in most US federal and state courts without a certified court reporter or licensed transcriptionist attesting to their accuracy. The hybrid approach is the industry norm: AI handles the volume, humans certify the specific files that may reach a judge or jury.

My take: the distinction matters more than most litigation teams acknowledge up front. The plan should be explicit from day one about which files are review-only transcripts and which are candidates for certified production.

Documentation is part of defensibility. Like any e-discovery process, the transcription workflow should be documented: what tool you used, how you validated accuracy on a sample, and what quality-control steps were applied. That record supports defending the work product if the process is challenged later.

Privilege Screening Across Audio at Scale

Audio creates a privilege-identification challenge that documents do not: there are no headers, no "To/From" lines, and no subject fields. The content of the recording is the only signal.

Under FRE 502(b), inadvertent disclosure of a privileged recording does not waive the privilege if the holder took reasonable steps to prevent the disclosure and promptly moved to rectify it. The December 1, 2025 amendments to FRCP 26(f) and 16(b) now require parties to address privilege-log disputes earlier in the case, which means privilege workflows that used to be deferred to mid-review need to be front-loaded.

Practical steps that work:

Search the transcript corpus first. Before any file-by-file review, run keyword searches across all transcripts for attorney names, law firm names, "privileged," "confidential," "legal strategy," and similar terms. Surface the probable privilege candidates before reviewers touch them.

Treat speaker identification as privilege-screening infrastructure. If the transcription includes speaker labels, use them. A recording involving outside counsel is a different class of risk than a recording involving only business staff.

Agree on the privilege log format for audio early. A privilege log entry for an audio file needs to describe duration, participants (to the extent identifiable), subject matter (in non-waiving terms), and the applicable privilege. Negotiate this at the ESI protocol stage. Some firms log audio by timestamp range if only part of a recording is privileged.

Claw-back agreements under FRE 502(e). For large audio corpora where complete pre-production privilege review is impractical, a negotiated 502(d) court order protects against waiver from inadvertent production. This is standard in large matters; it should be standard in any matter with significant audio.

One caution flagged by Duane Morris in their February 2026 analysis of AI transcription tools: if privileged strategy discussions are recorded and transcribed through a third-party service, that data is potentially exposed to the vendor. Use a service with a data processing agreement in place and confirm the vendor does not train on client content.

Audio file uploaded to ConvertAudioToText for transcription

The ConvertAudioToText audio upload tool: transcription output is delivered as searchable text, not a certified court record.

Authentication of Audio Recordings

Before any audio transcript is used in litigation, the underlying recording must be authenticated under FRE 901(a). The proponent must show evidence "sufficient for a reasonable juror to conclude the evidence is authentic." Standard methods:

Witness testimony with knowledge (FRE 901(b)(1)): the person who made or received the recording testifies that it is what the proponent says it is.
Distinctive characteristics (FRE 901(b)(4)): voice identification, context, content.
Process or system evidence (FRE 901(b)(9)): documentation showing the recording system produces accurate results.

Chain-of-custody documentation is the supporting structure. Courts increasingly expect preserved metadata: device identifier, recording timestamps, storage hash values, and ingestion audit logs.

A pending advisory committee proposal, raised in 2025-2026 proceedings, would add a burden-shifting rule under proposed FRE 901(c): if a party shows it is more likely than not that audio evidence was fabricated or materially altered using AI, the burden shifts to the proponent to prove authenticity by a preponderance of the evidence. This proposal is not yet in effect, but litigation teams handling audio with uncertain custody chains should plan for it.

Foreign-Language Recordings

International matters frequently involve recordings in multiple languages. The workflow: transcribe in the source language first (the native-language transcript is the primary record), then apply machine translation for English-language reviewers. For specific recordings that may be admitted as evidence, certified human translation is a separate and later step. Volume makes AI-first the only practical approach for review; certified translation applies selectively to the files that matter.

Comparing Transcription Approaches for Discovery Volume

Approach	Best for	Pricing model	Limitations
Human transcription (e.g., Rev at $1.50-$1.99/audio min)	Certified record for trial exhibits, depositions	Per-minute, metered	Cost prohibitive at discovery volume; days for turnaround
E-discovery vendor bundled transcription	Large matters already on a vendor platform	Per-GB or per-doc review rate	Opaque cost; vendor lock-in
Standalone AI transcription	Bulk review corpus; privilege screening at scale	Flat monthly or metered	Not certified; review aid only
API-based AI engines (Whisper, Deepgram)	Automated pipelines, integration with review platforms	Per-second or per-minute metered	Requires technical build; no UI

The cost difference between human and AI for bulk review is substantial. Rev.com's published rate for human transcription is $1.50 per audio minute for standard turnaround (checked July 2026). A 200-hour audio corpus at that rate costs roughly $18,000. The same corpus through a flat-rate AI service is a fraction of that. The appropriate approach is not one or the other: AI for the review corpus, human certification for the specific files that will be used in court.

If you just need searchable transcripts of an audio corpus for review, without meeting-bot infrastructure or certified output, ConvertAudioToText's /tools/audio-to-text tool processes files directly and returns time-stamped, speaker-labeled text. For firms handling discovery in-house at volume, the Business plan includes API access for automated pipelines.

Production Considerations

When producing audio files in discovery:

Format. Address format in the ESI protocol. MP3 is standard for lower file sizes; WAV for higher fidelity. Requesting parties who want WAV should say so in the protocol; absent a specification, the responding party has discretion under FRCP 34(b).

Metadata. Preserve file metadata: creation date, duration, file size, and hash values. Standard e-discovery metadata practices apply.

Accompanying transcripts. Some protocols call for production of transcripts with audio; others specify audio only. Work this out at the protocol stage. If transcripts are produced, they should be clearly labeled as AI-generated working documents, not certified records.

Bates labeling. Audio files can be Bates labeled. Standards vary by platform; confirm your review platform handles audio Bates labeling before the protocol is finalized.

Privilege withholding and the log. Audio files identified as privileged are withheld and entered on the privilege log. Agree on the required log fields for audio at the beginning of the matter.

Best Practices for Discovery Transcription

Transcribe broadly, listen selectively. Use AI transcription for the full audio corpus so it is searchable. Listen to the files that matter substantively. This gives reviewers search access across the full corpus without the cost of listening in real time.

Document the workflow. What tool, what configuration, what sample-validation method. The documentation supports defending the work product if the process is challenged.

Validate accuracy on a sample. Before committing to the full corpus, listen to 10-15 randomly selected files and compare to the transcript. Note any systematic weaknesses: a specific accent, heavy background noise, overlapping speakers. Address those before they affect substantive calls.

Separate the review transcript from the certified record. AI transcripts are review documents. Flag early in the matter which files are likely trial candidates; plan for certified transcription on those specific files. This avoids the cost of certifying the entire corpus while protecting you on the files that matter.

Evaluate the vendor's data handling. For audio that includes privileged attorney-client communications or business strategy, confirm the transcription service does not use uploaded content for model training and offers a data processing agreement if the matter requires it.

Building the Workflow for a New Matter

At the outset: Identify whether the matter has significant audio. Include audio in the ESI protocol discussion: format, metadata fields, accompanying transcripts yes or no, privilege log requirements for audio files.

During collection: Collect audio files using the same custodian-based process as documents. Don't leave voicemails and meeting recordings out of the collection scope because they're not "documents."

Early review phase: Run AI transcription on a sample. Validate accuracy. Confirm the output loads correctly into the review platform and is searchable. Set up privilege-screening searches across the transcript corpus.

Bulk review phase: Transcribe the full corpus. Reviewers use transcripts as the primary medium. Time-coded transcripts let reviewers listen to specific sections without scrubbing from the start.

Trial preparation: Identify the specific audio files that will be used as exhibits or in depositions. Commission certified transcription for those files. The AI transcripts remain available for preparation and cross-reference.

FAQ

Is an AI-generated transcript admissible as evidence in federal court?

Not as a certified record. Under current practice across US federal and state courts, transcripts offered as evidence must be certified by a qualified court reporter or licensed transcriptionist who attests to their accuracy. AI-only transcripts are review tools: they support discovery and preparation but do not substitute for a certified record. For trial exhibits or deposition use, a human certification step is required.

Does producing an AI transcript alongside an audio file affect privilege?

The transcript itself is a work product document created by counsel's litigation team as part of the review process. The privilege analysis of the underlying audio does not change because it was transcribed. The risk is different: if a transcription service stores uploaded content externally or uses it for model training, the privileged audio may be exposed to a third-party vendor. Use a service with a data processing agreement and verify their data retention and training policies before uploading privileged content.

What does FRCP 34 require for producing audio files?

Under FRCP 34(b), a party must produce ESI in the form specified by the requesting party, or, absent a specification, in the form in which it is ordinarily maintained or in a reasonably usable form. Audio files produced in their native format (MP3, WAV, M4A) satisfy the "reasonably usable form" standard in most circumstances. The parties should address audio format in the ESI protocol at the outset of the case to avoid disputes at production.

How do you handle inadvertent production of a privileged audio recording?

Under FRE 502(b), inadvertent disclosure does not waive privilege if the holder took reasonable steps to prevent it and promptly moved to correct the error. For large audio corpora, negotiate a 502(d) court order (a "clawback agreement") at the outset: this allows for return without waiver regardless of the level of care taken, which is a stronger protection than 502(b)'s "reasonable steps" standard. Log the discovery of the error, send notice promptly, and document the steps taken to correct it.

Sources

Try transcription free

Convert any audio or video to clean, unwatermarked text — speaker labels, timestamps, and AI summaries included. First 30 minutes free, no account.

court hearingstranscription

Court Hearing Transcription: Official Records vs AI Copies 2026

Official court transcripts need certified reporters under 28 U.S.C. 753; AI serves the case-prep lane. The two-lane system explained.

May 26, 202612 min

depositiontranscription

Deposition Transcription Guide: Who Makes the Record (2026)

How deposition transcription actually works in 2026: certified reporters, AI working drafts, rough-draft timelines, errata sheets, privilege risks, and cost breakdowns.