Transcription for Legal Discovery: Recordings, Audio Evidence, and AI
legal discoverytranscriptionlitigation

Transcription for Legal Discovery: Recordings, Audio Evidence, and AI

ConvertAudioToText TeamMay 26, 20269 min read

Discovery has gotten louder. Modern litigation involves volumes of audio that did not exist twenty years ago: voicemails, recorded customer calls, security camera audio, body camera footage, recorded meetings, voice memos, and the audio from countless video files. Each of these may be discoverable, may need to be reviewed, and may need to be produced. Transcription is the technology that makes this audio reviewable at the scale that modern matters require.

Important context first: ConvertAudioToText is suitable for transcribing audio that has been collected for discovery review and analysis. For audio that may be admitted as evidence at trial, court reporter or certified human transcripts may be required for the official record. For high-volume document review at scale, talk to your e-discovery vendor about their preferred transcription pipeline. This article addresses the working transcription that supports the review process.

What Audio Shows Up in Discovery

Modern matters routinely involve:

Recorded customer service calls. Most companies record calls for "quality and training purposes." These recordings may be discoverable in disputes involving the underlying customer relationships.

Voicemails. Voicemails on individual employee devices and on corporate VoIP systems.

Conference call recordings. Records of internal and external calls, often saved as audio files on corporate file servers.

Video meeting recordings. Zoom, Teams, Meet, and similar recordings. Often automatically saved to cloud storage where they are easy to discover but rarely reviewed.

Voice memos. Increasingly common as employees use phone-based recording tools for quick notes.

Surveillance audio. Security camera audio, retail location audio recordings, body cam audio from corporate security personnel.

Smart device recordings. Alexa, Google Home, and similar device recordings that may capture relevant conversations.

Telephonic depositions and conferences. Audio recordings of remote legal proceedings.

For any matter where these may be relevant, the litigation team needs a way to review the audio. Listening to hundreds of hours of audio file by file is not practical. Transcription makes the audio reviewable.

The Standard Discovery Workflow

A typical e-discovery workflow with audio:

Collection. Audio files are collected along with documents. Standard ESI collection processes (Relativity, OpenText, etc.) handle audio files.

Processing. Audio files are converted to a standard format for transcription. Most platforms convert to MP3 or WAV.

Transcription. Audio is transcribed via AI or human services. Output is text that can be loaded into the document review platform alongside emails, documents, and other materials.

Review. Reviewers see the transcript alongside the audio file. They can read the transcript to assess relevance and listen to specific sections when accuracy matters.

Production. Audio files (with associated transcripts) are produced in standard formats. The transcript is typically marked as a working document; the audio is the primary deliverable.

Trial preparation. Selected audio files are used for trial. Official transcripts (court reporter certified) may be produced for those specific files.

Why AI Transcription Fits Discovery

Discovery transcription is well-suited to AI for several reasons:

Volume. Discovery often involves hundreds of hours of audio. Human transcription at $1.50/minute would cost tens of thousands of dollars; AI transcription handles the same volume at flat monthly cost.

Speed. Discovery deadlines are tight. AI transcription returns results in minutes per file; human transcription takes days.

Sufficient accuracy for review. 95-98% accurate transcripts are accurate enough for relevance review. Reviewers reading transcripts can identify potentially relevant material; the audio itself can be played for the small percentage of files that matter substantively.

Cost scaling. AI transcription cost is flat. Human transcription cost scales with volume.

For the actual evidence at trial (where accuracy matters more), targeted human transcription or court reporter certification of specific files is appropriate.

ConvertAudioToText's unlimited monthly plan at $9.99 covers high-volume discovery work. API access supports automated pipelines integrated with document review platforms.

Document Review Integration

Modern document review platforms (Relativity, Everlaw, DISCO, Reveal) support audio review workflows:

Transcript as searchable text. Transcripts loaded as text documents become searchable along with email and document content. Reviewers can find all audio files where a specific term is mentioned.

Audio playback. Reviewers can listen to specific sections of audio when transcript review surfaces a potentially relevant moment.

Time-coded transcripts. Transcripts with timestamps let reviewers jump directly to specific audio sections.

Speaker labels. Helpful for determining whose voice is on a recording. Critical for some matters where speaker identity affects relevance.

Translation. For non-English recordings, translated transcripts support cross-language review.

The integration is straightforward for most platforms. Your e-discovery vendor or in-house e-discovery team can set up the pipeline.

Specific Discovery Scenarios

Employment matters. Recorded interviews, voicemails between employees and managers, audio from workplace investigations. Transcripts support review for context and specific allegations.

Commercial disputes. Recorded business calls, internal strategy meetings, customer interactions. Transcripts support identification of relevant business communications.

Antitrust matters. Conference calls between competitors (rare but always significant when present), trade association meetings, customer pricing discussions. Transcripts support pattern analysis across many recordings.

Securities matters. Earnings call recordings, internal financial discussions, advisor-client communications. Transcripts support specific quote identification.

Healthcare matters. Patient-provider communications (subject to additional privacy rules), internal clinical discussions, peer review communications. Special privacy considerations apply.

Government investigations. DOJ, SEC, FTC matters often involve substantial audio. The agency's specific document handling rules apply.

For each of these, the specific scenario dictates the appropriate transcription approach.

Privilege and Confidentiality

Audio in discovery often contains privileged content mixed with non-privileged. Standard issues:

Identifying privileged audio. Audio files do not have privilege markings the way documents do. Reviewers identify privileged content during review based on the file's content.

Privilege review. Audio identified as privileged is withheld from production and listed on the privilege log.

Inadvertent disclosure. If a privileged audio file is inadvertently produced, the standard claw-back rules apply.

Joint defense and common interest. Audio from joint defense calls may have additional privilege protections; these need to be analyzed case-by-case.

The AI transcription itself does not change the privilege analysis. The transcription service is a third-party processor; the work product status of the transcript depends on the underlying material and how it is used.

ConvertAudioToText does not train on uploaded files and offers data processing agreements for matters where additional documentation is needed. Contact us about the unlimited plan.

Foreign Language Recordings

International matters often involve recordings in multiple languages. The standard workflow:

Native language transcription first. Transcribe the audio in its source language. We support Spanish, French, and dozens of others.

Translation to English. Machine translation provides an English version for reviewers who do not speak the source language. The native language transcript is the primary record.

Selective certified translation. For specific recordings that may be used as evidence, certified human translation may be required. This is a separate process from the AI transcription workflow.

The volume considerations matter most for multilingual matters. Human translation at scale is impractical; AI transcription plus machine translation enables review.

Comparing Tools for Discovery Volume

For large-scale discovery work, the relevant comparisons:

Service-tier human transcription. $1.50-3.00/minute. Appropriate for selected files but not for full discovery volume.

E-discovery vendor transcription. Many e-discovery vendors offer transcription as part of their service. Pricing varies; often built into the per-GB or per-document review pricing.

Standalone AI transcription. Lower cost, faster turnaround, sufficient accuracy for review. Otter.ai, Trint, and ConvertAudioToText all serve this category.

API-based AI transcription. For very large matters, automated pipelines via Whisper or Deepgram APIs. Most cost-effective at extreme scale.

ConvertAudioToText's unlimited plan plus API access fits firms handling discovery transcription in-house. For complex matters, your e-discovery vendor likely has a preferred pipeline.

Production Considerations

When producing audio files in discovery:

Format. Standard formats are MP3 (lower file size) or WAV (higher fidelity). Production format should be specified in the discovery protocol.

Metadata. File metadata (creation date, source, length) should be preserved. Standard e-discovery practices apply.

Accompanying transcripts. Transcripts may or may not be produced along with audio. Some protocols call for transcripts; others specify audio only.

Bates labeling. Audio files can be Bates labeled like other documents. Standards vary by platform.

Privilege withholding. Audio files identified as privileged are withheld; standard privilege log entries apply.

Special handling. Some matters have specific handling for audio (special access controls, separate production sets).

The production protocol should address audio explicitly. Your e-discovery counsel and vendor should work this out at the protocol negotiation stage.

Cost Examples From Recent Matters

For context on what discovery transcription actually costs:

Mid-sized commercial dispute. 200 hours of audio (mostly recorded customer calls and internal meetings). AI transcription via unlimited monthly plan cost about $40 over 4 months. Comparable human transcription would have cost $18,000-30,000.

Large employment class action. 500 hours of audio across multiple custodians. AI transcription via API access cost approximately $300 over the matter. Comparable human transcription would have been $45,000-75,000.

Antitrust investigation. 1,200 hours of audio across two years of regulator communications. Combined AI and selective human transcription was approximately $5,000 (with human review focused on top 50 highest-priority files). All-human transcription would have been $108,000-180,000.

The cost differences are dramatic. The accuracy difference (AI plus targeted human review vs. all-human) is small enough that the cost savings are well worth it for the review phase.

Best Practices for Discovery Transcription

Three practices that work:

Transcribe broadly, listen selectively. Use AI transcription for the full audio corpus so it is searchable. Listen to the specific files that matter substantively. This combines breadth with depth without prohibitive cost.

Document the transcription workflow. Like any e-discovery process, document what tools you used, what configurations you applied, and how you validated accuracy. The documentation supports defending the work product later.

Validate accuracy on key files. For files that will play a substantive role in the matter (trial exhibits, deposition prep), validate the AI transcript against the audio. Catch errors before they affect litigation decisions.

A Realistic Workflow Setup

For a litigation team setting up discovery transcription for a new matter:

Week 1: confirm the matter has substantial audio. Talk to the e-discovery vendor about the integration approach.

Week 2: test the AI transcription on a small sample. Validate accuracy. Confirm the pipeline produces transcripts in the right format for the review platform.

Week 3-6: transcribe the full audio corpus. Files load into the review platform alongside other ESI.

Throughout review: reviewers use transcripts as the primary review medium. Audio playback for specific files where transcript review surfaces relevance.

Trial preparation: selected audio files may be transcribed by certified court reporters for the official record. AI transcripts remain available for analysis and preparation.

Start With Your Active Matter

If your current matter involves substantial audio, audit it. Identify which files have been transcribed, which have been reviewed, and which have not. If significant audio remains unprocessed, AI transcription is the fastest path to making the audio reviewable. Start with a sample, validate accuracy, and scale.

Try transcription free

Convert any audio or video to accurate text in seconds. Speaker labels, timestamps, and AI summaries included. No account required.

Related Articles