How to Add Subtitles to a Video: 2026 Step-by-Step Guide
subtitlesvideoguide

How to Add Subtitles to a Video: 2026 Step-by-Step Guide

ConvertAudioToText TeamMay 26, 20267 min read

What Adding Subtitles Actually Involves in 2026

Adding subtitles to a video used to mean transcribing by ear, typing into a text editor, and praying the timecodes lined up. In 2026 the workflow has three steps, not thirty: generate a caption file from speech, edit any errors, and either attach the file as a soft track or burn the text directly into the pixels. The hard part is picking which method matches where the video will live.

This guide covers both routes. The AI workflow gets you a publish-ready caption track in under five minutes for most short videos. The manual route gives you frame-perfect control when you need it. We will also walk through SRT formatting, common burn-in pitfalls, and the exact settings that work on YouTube, TikTok, and Instagram.

Soft Subtitles vs Burned-In Subtitles

The first decision is whether your viewer can toggle the subtitles on and off. Soft subtitles, also called closed captions, live as a separate text track inside the video container or as an external SRT file. The viewer or the player decides whether to display them. Burned-in subtitles, also called hardcoded or open captions, are part of the pixels themselves. They show whether or not the viewer wants them.

Soft subtitles are the right choice for YouTube, Vimeo, and any platform with a built-in player. They let viewers turn captions off, switch languages, and they help search engines index your video. Burned-in subtitles are required for TikTok, Instagram Reels, Twitter video, and anywhere the platform does not surface a caption toggle. Most short-form social platforms strip soft subtitle tracks on upload, so if you skip the burn-in step your captions disappear.

Some videos need both. Educational content on YouTube often has burned-in captions for accessibility plus a soft track for translation. The cost is small: render once with burn-in, then upload the SRT separately. We cover the practical recipe in the burning subtitles into video walkthrough.

Method 1: AI Subtitle Generation

The fastest 2026 workflow starts with a tool that converts speech directly into a timed caption file. Drop the video in, pick the language, get an SRT or VTT back in minutes. The subtitle generator at CATT handles this end-to-end and supports 99 languages, including Spanish, French, Arabic, and African languages where most competitors fall over.

Here is the practical sequence:

  1. Upload the video file. MP4, MOV, MKV, and WebM all work. You can also paste a URL for YouTube video transcription and skip the download step.
  2. Pick the spoken language. For mixed-language audio, pick the dominant one.
  3. Pick an output format. SRT for most uses. VTT for HTML5 video on websites. TTML when a broadcaster asks for it.
  4. Download the file and open it in any text editor.

Expect 90 to 97 percent accuracy on clear audio. Names, technical terms, and accents are where errors cluster. Set aside 10 to 15 minutes per hour of video to scan for those.

Method 2: Manual SRT Files

Manual subtitling is alive and well for short videos where every word and every cut matters. The SRT format is plain text and looks like this:

1
00:00:00,000 --> 00:00:03,500
The first line of caption text.

2
00:00:03,800 --> 00:00:07,200
The second caption appears next.

Each block has a sequence number, a start and end timecode, and one or two lines of text. Keep lines under 42 characters for readability. Keep each caption block on screen for at least one second and at most six. Aim for a reading speed of 17 characters per second or slower so viewers can actually finish each line.

Manual SRT is the right call for music videos, comedy sketches with precise timing, and anything where automated tools miss the joke. For long-form content, manual editing of an AI-generated draft is faster than typing from scratch.

Method 3: Video Editor Built-Ins

Most editors now ship auto-captioning. Premiere Pro, DaVinci Resolve, Final Cut, and CapCut all have a one-click caption generator. The quality varies by editor and is generally a step behind dedicated speech tools because the editor's caption model is a side feature, not the main product.

The advantage is that captions live in your timeline as a text layer, so you can style them, animate them, and burn them into the render in one pass. The disadvantage is that exporting clean SRT for upload to a separate platform is sometimes clunky. CapCut, for example, makes it hard to extract an SRT for YouTube upload without manual conversion.

A useful hybrid: generate the SRT with a dedicated add-subtitles-to-video tool, import it into your editor as a caption track, then style and burn from there. You get better accuracy than the editor's native model and full styling control.

Format Choice: SRT, VTT, or TTML

The three caption formats you will run into differ in a few important ways. SRT is the lingua franca: universal player support, no styling, plain timecodes. VTT (WebVTT) is the modern web standard: HTML5 video uses it natively, supports basic styling, supports speaker labels. TTML (Timed Text Markup Language) is the broadcast and OTT standard: rich styling, multi-language support in one file, used by Netflix and most TV-grade workflows.

For YouTube, Vimeo, and most consumer uses, SRT is fine. For your own website with HTML5 video, VTT is the right pick. For broadcast delivery or streaming services, TTML is often required. The SRT vs VTT vs TTML breakdown goes into when each is mandatory.

YouTube Specifics

YouTube accepts SRT uploads via Studio. Go to the video, open Subtitles, click Add Language, and upload your file. The platform also auto-generates captions, but those are noticeably worse than what a dedicated tool produces. For SEO purposes, your uploaded SRT replaces the auto-generated one and helps search rank for keywords spoken in the video.

If you only want auto-captions and they are good enough, the YouTube auto-captions vs AI tools comparison covers the gap.

TikTok and Instagram Specifics

Both platforms strip external caption files on upload, so burning subtitles into the pixels is the only reliable option. Both apps also have native sticker captions, which look fine but cannot be turned off and are stuck with the platform's font choices.

The 2026 best practice is to generate an SRT, style it in your video editor with platform-friendly settings (white text, black box background, bottom-third position), and burn it in during export. The captioning for TikTok and Instagram deep dive has the exact font sizes and safe-zone margins.

Multi-Language Workflow

If you need French, Spanish, and Portuguese versions, the workflow is to generate the source-language SRT first, then translate that file rather than re-transcribing each version. Translating timed text preserves the timecodes, so all three versions stay in sync without manual re-timing.

CATT's translate subtitles tool handles this in a single step. The output is a new SRT in the target language with the original timecodes intact. The subtitle translation workflow post walks through the full pipeline for a podcast clip turned into eight languages.

What to Do Next

For a one-off video, the fastest path is to generate an SRT with a subtitle generator, edit any errors in a plain text editor, and upload alongside the video. For a series, build the translate-once-then-burn workflow into your export preset so every render lands with captions baked in. Either way, you have skipped the part where someone hand-types every word at 8 PM the night before publish.

Try transcription free

Convert any audio or video to accurate text in seconds. Speaker labels, timestamps, and AI summaries included. No account required.

Related Articles