Glossary · format

What is Caption / subtitle?

Captions (also called subtitles) are text overlays that transcribe spoken content for accessibility, translation, or sound-off viewing. YouTube supports multiple caption tracks per video — uploader-provided in any language, plus auto-generated for the primary spoken language. Common formats: SRT, VTT, TTML.

Also called:subtitle · srt · vtt · closed captions · cc

YouTube captions come from two sources: uploader-provided (manually written or uploaded as SRT/VTT) and auto-generated (YouTube's ASR speech recognition). Auto-captions are usually correct for English but degrade for other languages, technical jargon, or accented speech.

For downloads, all caption tracks are exportable as SRT, VTT, or TXT. Use [/youtube-subtitle-downloader](/youtube-subtitle-downloader) for the GUI. yt-dlp from the CLI offers --write-subs --sub-lang all for batch caption export.

Auto-generated captions also have an auto-translate feature: pick the original language's ASR captions, then YouTube can translate them to ~150 target languages on the fly. Quality varies — works well for major languages, poorly for low-resource ones.

Common questions

SRT or VTT — which subtitle format should I use?
SRT for universal compatibility (every video player). VTT (WebVTT) for web embedding (the HTML5 video element's native format). Most software accepts both. Content is identical; only the file structure differs.

Related terms

VidPickr is a free, browser-based YouTube downloader. Every term in this glossary either describes how YouTube delivers video or why your downloads behave the way they do. Try the downloader →