
How to Convert MOV to Text: Transcribe iPhone & QuickTime Video
Why MOV Files Exist and Why You Have Them
MOV is Apple's video container, born out of QuickTime in the 1990s. Today, if you film with an iPhone, record your screen with QuickTime, or export from Final Cut Pro, you get a .mov file.
Technically, MOV and MP4 are siblings. Both can hold H.264 or H.265 video with AAC audio. The difference is mostly in metadata structure and what codecs each container officially supports. For transcription purposes, MOV files transcribe the same way MP4 files do, you upload, the tool extracts audio, you get text back.
The reason MOV needs its own guide: a few specific situations come up only with MOV files. Apple ProRes, hidden audio tracks, and "MOV recorded by an iPhone but won't open on a Windows PC" are all MOV-specific headaches.
Where Your MOV File Came From
iPhone videos are the biggest source. Open the Camera app, hit record, and you get a .mov file with H.265 or H.264 video and AAC audio.
QuickTime Player on macOS records .mov natively. This is what most Mac users use for quick screen recordings or webcam captures.
Final Cut Pro exports default to .mov with ProRes video and PCM audio. These are the highest-quality video files most consumers will ever encounter, often hitting 5-10 GB per hour.
Older DSLR cameras (Canon, Nikon, Panasonic from 2008-2018) recorded .mov with H.264 video and PCM or AC3 audio.
Browsers and some screen-capture tools also produce .mov files when run on macOS.
If you can play it in QuickTime, you have a MOV. Transcribing it is straightforward.
Step-by-Step: Convert MOV to Text
Step 1: Verify the Audio Track
Open the MOV in QuickTime. Confirm you can hear audio. Some screen recordings are silent (no mic was active). Some iPhone videos record system audio plus mic, but if you had the phone on silent, the system audio is missing.
If you see no waveform during playback in QuickTime's editing view (Window > Show Movie Inspector shows the audio tracks), there is nothing to transcribe.
Step 2: Check the File Size
iPhone videos at 4K 60fps run about 400 MB per minute. A 30-minute iPhone clip can be 12 GB. Most online transcription tools cap uploads at 500 MB to 2 GB.
You have three options:
- Use a tool that handles large MOV files via direct upload. The MOV to text tool on CATT can chunk uploads or accept files up to 2 GB on the unlimited tier.
- Extract just the audio. Run
ffmpeg -i video.mov -vn -c:a copy audio.m4a. A 30-minute 4K iPhone video becomes a 14 MB audio file. - Re-encode the MOV to lower resolution. Useful only if you want to keep the video. For transcription only, audio extraction is faster.
Step 3: Upload the File
Drag the MOV into the transcription tool. The tool reads the QuickTime container, extracts the audio stream, and queues it for processing. You do not need to convert anything.
Step 4: Select the Spoken Language
Specify the language manually. Auto-detect works, but on short clips or accented speakers, manual selection adds 2-5 accuracy points. Whisper supports 99 languages including English, Spanish, French, and Japanese.
Step 5: Run the Transcription
A 30-minute MOV takes 3-5 minutes to process. Most of the time is audio extraction. The transcription itself is fast.
Step 6: Review and Export
Look for misheard names, dates, and technical terms. Export as TXT, DOCX, or SRT/VTT depending on what you need next. If your MOV was a video you plan to publish with captions, SRT is the right export.
MOV-Specific Scenarios
iPhone Videos
iPhone records .mov with HEVC (H.265) by default on newer phones, or H.264 if you toggled "Most Compatible" in Settings > Camera > Formats. Both contain AAC audio in stereo. The audio quality is excellent at close distances and degrades fast at 6+ feet.
iPhone videos are often portrait orientation. The audio is unaffected by orientation but some tools assume landscape, make sure your tool handles vertical video. The audio track inside is identical regardless of orientation.
For event recording with iPhone, position the phone close to the speaker. Background noise (crowd, music, room reverb) drops accuracy noticeably.
QuickTime Screen Recordings
QuickTime Player records screen with optional microphone audio. The MOV contains H.264 video and AAC audio. These transcribe well because the speaker is usually close to a USB mic or the built-in mic, and the audio is uncompressed at the recording stage.
A typical use case: recording a software demo, then transcribing for documentation. The tutorial template is good for these because it understands "do X, then Y" instruction structure.
ProRes Footage from Final Cut
Final Cut Pro exports ProRes 422 or 422 HQ for editing workflows. These MOV files are huge (10-50 GB per hour) and contain uncompressed PCM audio. Transcription tools can process them, but the upload is the bottleneck.
Better path: in Final Cut, export an "Audio only" file (File > Share > Apple Devices Audio Only) which gives you a small M4A. Transcribe that.
Older DSLR Footage
Canon 5D Mark III and similar DSLRs record .mov with H.264 video and either PCM or AC3 audio. The audio quality varies wildly because DSLR onboard mics are mediocre. If the production used a separate audio recorder (very common in professional video work), the MOV may have low-quality scratch audio that you should ignore in favor of the synced separate audio.
If you have separate audio files synced to your DSLR MOVs, transcribe the separate audio. Better quality, less noise.
What Happens Under the Hood
- Tool reads the QuickTime atom structure from your MOV.
- Identifies the audio track (usually 'mp4a' for AAC, 'lpcm' for uncompressed, 'ac-3' for older DSLR audio).
- Extracts the audio stream into a temporary container.
- Downsamples to 16 kHz mono if needed.
- Runs Whisper Large-v3 on 30-second chunks.
- Stitches results into a unified transcript.
The video stream is ignored throughout. The tool does not "watch" the video, only listens to it.
Practical Tips for Better MOV Transcription
Use a separate mic for important recordings. iPhone built-in mics are good for close talkers but pick up wind, handling noise, and room sound aggressively. A lavalier mic plugged into your phone (via Lightning or USB-C adapter) dramatically improves transcription accuracy.
Record in airplane mode. Notifications, calls, and background apps can introduce dropouts in long iPhone recordings.
For interviews, position the phone 12-18 inches from the speaker. Closer than that and you get plosives (the "puh" sound on Ps). Farther and you lose intelligibility to room noise.
Trim before uploading. iPhone videos often have 20-30 seconds of dead air at the start while you press record and walk into position. Trim those out in Photos or QuickTime before uploading. Saves processing time.
Use the right template for the content. A meeting in MOV form benefits from a meeting-style template. A research interview uses research interview. A podcast pre-call uses podcast episode. Generic transcription leaves value on the table.
MOV vs Other Video Formats
| Format | Audio Codec (typical) | Transcription Difficulty | Best Source |
|---|---|---|---|
| MOV (iPhone) | AAC stereo | Easy | Modern phones, QuickTime |
| MP4 | AAC mono/stereo | Easy | Most everything else |
| MOV (ProRes) | PCM | Easy but large files | Pro video editing |
| MKV | AAC, Opus, AC3 | Easy | Movies, archived TV |
| WebM | Opus | Easy | Browser recordings |
| AVI | MP3, AC3, PCM | Easy | Old camcorders |
MOV is among the cleanest sources for transcription because the audio codecs Apple chose (AAC, PCM) are well-supported and never DRM-locked for user-recorded files.
Common MOV Issues
"Cannot read this MOV file"
Some tools fail on ProRes MOVs or older MOVs with unusual audio codecs (Apple Lossless inside MOV, for example). Convert with ffmpeg:
ffmpeg -i in.mov -c:v copy -c:a aac out.mov
This re-encodes audio to standard AAC without re-encoding video.
File plays in QuickTime but not VLC
Likely a HEVC (H.265) issue on systems without the right codec. The audio is still fine, extract it with ffmpeg and transcribe the audio file directly.
Transcript missing the first 5-10 seconds
Some iPhone MOVs have initial silence padding while autofocus and audio gain stabilize. The audio is technically there but the gain is so low it falls below voice detection. Either trim the start, or accept the loss and edit the transcript.
Two audio tracks, transcription picked the wrong one
iPhone records mono mic audio plus sometimes a synchronized second track for stereo room. Check tracks with ffprobe in.mov -show_streams -select_streams a. Specify the right track number to your transcription tool if it supports stream selection.
MOV from Android does not open
Android does not natively record .mov files. If you have a .mov "from Android," it was probably renamed from .mp4. Try renaming the extension back to .mp4 and uploading.
Frequently Asked Questions
Is MOV better than MP4 for transcription?
Functionally identical. The container is different but the audio codecs inside (AAC, PCM) are the same as MP4. Pick based on convenience, not transcription quality.
Can I transcribe MOV files for free?
Yes. The CATT free tier accepts MOV files up to 60 minutes per month. For larger volume, pricing starts at $9.99/mo unlimited.
What is the maximum MOV file size?
Most tools cap at 2 GB. iPhone 4K videos can exceed this; extract audio first with ffmpeg or use a tool with larger upload limits. ProRes MOVs almost always require audio extraction first.
Does MOV transcription handle multiple languages?
Yes. Whisper supports 99 languages and language pairs. Specify the spoken language manually for best results, especially for code-switched audio where speakers alternate between languages.
Try transcription free
Convert any audio or video to accurate text in seconds. Speaker labels, timestamps, and AI summaries included. No account required.
Related Articles

How to Convert AVI to Text: Transcribe Legacy & CCTV Video Files
Convert AVI video to text. Step-by-step guide for transcribing old camcorder footage, CCTV recordings, and archived AVI files with modern AI transcription tools.

How to Convert FLV to Text: Transcribe Old Flash & RTMP Recordings
Convert FLV video to text. Step-by-step guide for transcribing legacy Flash recordings, archived webinars, and RTMP stream captures with modern AI transcription tools.