
Recording In-Person Interviews: Setup Guide for Clean Audio
A clean in-person interview recording is the single biggest accuracy lever a journalist, researcher, or podcaster has. Get the audio right and a 90-minute interview transcribes in three minutes with 97%+ accuracy. Get it wrong and you spend two hours cleaning up errors that should never have happened. This guide covers the specific mic setups that work for one-on-one, two-person, and small-group interviews, with prices and exact placement.
Why In-Person Interviews Are Harder Than Studio Recording
Studio setups assume you control the room. In-person interviews happen in coffee shops, offices, cars, hotel lobbies, conference centers, parks, and conference floors. You bring whatever fits in a backpack and you have ninety seconds to set up before your subject gets self-conscious.
The constraints:
- Variable noise levels you cannot control
- Subjects who move during the conversation
- Time pressure (often a hard 30-60 minute window)
- A discomfort threshold (giant mics make people clam up)
The good news is modern AI transcription is forgiving enough that you do not need broadcast-grade audio. You need consistent close-mic technique and proper speaker separation. Both are solvable with $200-500 of gear.
The Three Setups That Cover Every Use Case
Setup 1: Single Lavalier on the Subject
The lowest-friction option for a one-person interview. Clip a lavalier mic to the subject's shirt, six to eight inches below their chin, on the side facing you. Run it into a portable recorder.
Hardware: Rode SmartLav+ ($79) or Movo LV4-O ($35) plus a Zoom H1n ($129) or even an iPhone with the right adapter.
Pros: Tiny visual footprint, subject forgets it is there after two minutes, audio is consistent regardless of head movement.
Cons: Only captures the subject. You are not recorded. For a journalist doing Q&A this is usually fine because your questions are short and you remember them.
Setup 2: Two Lavaliers, One Per Speaker (Recommended)
The setup most professionals use. Two lavalier mics, each on a separate channel of a stereo recorder. Each speaker gets their own audio track which means perfect speaker diarization at transcription time.
Hardware: Two Movo LV4-O mics ($35 each) plus a Zoom H5 or H6 ($269-$399). Alternatively two Rode Wireless Go II mics if you want to go wireless ($299 for the pair).
Why this is worth the extra $100-$300 over Setup 1: at transcription time, each speaker is on a clean isolated track. Speaker diarization becomes deterministic instead of probabilistic, hitting near-100% accuracy even when speakers overlap. The math on time saved (no manual relabeling) usually justifies the gear cost on the second or third interview.
Setup 3: Single Boundary Mic on the Table
The fastest setup. A single Shure MX391 or Movo PCC-2 boundary mic in the middle of the table picks up everyone equally well. No clipping mics on people, no setup time.
Hardware: Boundary mic ($35-150) plus the Zoom H1n or H5.
Pros: Zero setup with the subject, very small visual footprint, works for groups up to four people.
Cons: Picks up everything else on the table (taps, papers, coffee cups), worse speaker diarization than per-speaker mics. Diarization on a single mixed file with 3+ speakers drops to about 75-85% accuracy.
Use this when subjects refuse to wear lavaliers or when you have a roundtable format with too many people for individual mics.
Mic Placement Specifics
For lavaliers:
- Clip to the lapel or chest pocket area, six to eight inches below the chin
- Angle the capsule slightly upward, toward the speaker's mouth
- Avoid loose collars, scarves, or jewelry that brushes against the mic
- Test for fabric rustle by asking the subject to move their head left and right while you watch the levels
For boundary mics on a table:
- Center of the table, equidistant from all speakers if possible
- On a folded napkin or notebook to dampen table thumps
- At least 18 inches from anyone's hands to avoid finger taps
- Away from laptops with fans
For handheld mics (the dynamic mic stick approach journalists use):
- Six to eight inches from the speaker's mouth
- Off-axis (angled to the side, not directly at the mouth) to reduce plosives
- Swap between speakers during questions, not during answers
Recorder Settings That Matter
On the Zoom H-series or any similar field recorder:
- Format: WAV 44.1kHz 16-bit minimum. This is overkill for transcription but gives you headroom if you also use the recording for podcast distribution.
- Channels: Separate L and R if using two mics. Make sure each lavalier is on its own channel, not summed to mono.
- Gain: Set during a 30-second test, peaks at -12 to -6dB. Adjust before the interview starts. Avoid touching the gain knob mid-interview because handling noise will show up on the recording.
- Limiter: On. Catches sudden loud bursts (laughter, raised voices) before they clip.
- Low-cut filter: 80Hz or 100Hz. Removes rumble from HVAC, traffic, and handling noise without affecting speech intelligibility.
Handling Noisy Environments
For coffee shops, hotel lobbies, conference floors, and other uncontrollable spaces:
- Choose a corner or wall seat over the middle of the room
- Sit perpendicular to the noise source (kitchen, entrance, foot traffic) not parallel
- Use cardioid mics that reject side noise rather than omnidirectional ones
- Get the mic closer to the speaker (four to six inches instead of eight)
- Record a 30-second room tone sample before or after the interview, useful for noise reduction in post
If the room is genuinely too loud (a busy bar, a construction-adjacent office), reschedule or move. No amount of mic technique fixes ambient noise that competes with speech.
Backup Recording
Always record on a second device. Always.
The cheap version: Put your phone on the table with a voice memo app running. If your primary recorder fails (dead battery, full SD card, crashed firmware), the phone has your interview.
The professional version: Two recorders, one as primary and one running in parallel. Many field recorders (Zoom H5, H6, F2) have dual recording modes that write the same input to two separate files at different gain levels.
Lost interview footage is a career-shaping disaster. Two-recorder redundancy costs an extra $100 and one minute of setup. Do it.
The Transcription Workflow Afterward
Pull the files off your recorder onto your laptop. Upload to Audio to Text and pick your language.
For Setup 2 (two-track), upload the stereo WAV. Most transcription tools detect that you have two speakers on two channels and label them automatically. CATT's Whisper Large-v3 pipeline handles this cleanly.
For Setup 1 or 3 (single track), the AI will diarize the mixed audio. Accuracy depends on how distinct the voices are and how much overlap there is. Expect 80-95% speaker attribution for two-person, 70-85% for three or more.
The research interview template extracts themes, key quotes, and timestamps automatically once you have the raw transcript. For a typical 60-minute interview, you have a searchable transcript and a structured summary in about five minutes of human work.
Legal Considerations for Interviewees
In the US, federal law and most states allow recording with single-party consent. Some states (California, Florida, Illinois, Maryland, Massachusetts, Montana, Nevada, New Hampshire, Pennsylvania, Washington) require all-party consent.
For journalism and research, the safer practice is always to get verbal consent on the recording itself. "This is being recorded, is that okay with you?" with a clear "yes" captured on tape. It protects you and clarifies expectations.
For confidential or sensitive interviews, see our notes on transcription confidentiality which covers what platforms can and cannot promise about your audio.
A Field Kit That Fits in a Backpack
The kit that handles 90% of in-person interview scenarios:
- Zoom H5 recorder ($269)
- Two Movo LV4-O lavalier mics ($70 for the pair)
- Two TRS to TRRS adapters (in case you need to use a phone as backup)
- Spare AA batteries and a fresh 32GB SD card
- A 3.5mm extension cable in case you need to seat the subject further from the recorder
- A small notebook for noting timestamps of important moments
Total: about $350. Lasts years. Handles every interview format from a quiet office to a noisy conference floor.
Set this up once, test it on a 60-second sample, and you will get clean transcribable audio every single time.
Try transcription free
Convert any audio or video to accurate text in seconds. Speaker labels, timestamps, and AI summaries included. No account required.
Related Articles

Accessible Lectures With Transcripts: A Guide for Educators in 2026
How transcripts make lectures accessible to students with hearing loss, ADHD, dyslexia, and ESL learners. Practical workflow, legal context, and tooling tips.

Extracting Action Items From Meeting Recordings: A Workflow That Sticks
How to extract reliable action items from meeting recordings. AI prompts, workflow, and common failure modes that turn good intentions into dropped balls.