YouTube captions come from two sources: uploader-provided (manually written or uploaded as SRT/VTT) and auto-generated (YouTube's ASR speech recognition). Auto-captions are usually correct for English but degrade for other languages, technical jargon, or accented speech.
For downloads, all caption tracks are exportable as SRT, VTT, or TXT. Use [/youtube-subtitle-downloader](/youtube-subtitle-downloader) for the GUI. yt-dlp from the CLI offers --write-subs --sub-lang all for batch caption export.
Auto-generated captions also have an auto-translate feature: pick the original language's ASR captions, then YouTube can translate them to ~150 target languages on the fly. Quality varies — works well for major languages, poorly for low-resource ones.
Common questions
SRT or VTT — which subtitle format should I use?
Related terms
Metadata (video file metadata)
Metadata is the information about a video file that isn't the audio or video data itself — title, artist, duration, resolution, codec used, encoding date, GPS location, thumbnail.
MP4 (container, deep dive)
MP4 is the universal video container format — every device, browser, and editor handles it.
VidPickr is a free, browser-based YouTube downloader. Every term in this glossary either describes how YouTube delivers video or why your downloads behave the way they do. Try the downloader →