YouTube serves video and audio as separate streams for any quality above 720p. The video file has no audio; the audio file has no video. To produce the single file you actually want to save, the two need to be combined — that's muxing.
Muxing happens at the byte level. The encoded H.264 frames and AAC audio frames are dropped into an MP4 container without being decoded or re-encoded. This is why a clean mux of a 1080p video takes seconds, not minutes — there is no "processing", just file assembly.
VidPickr does muxing in your browser. The video and audio streams come from YouTube's CDN, get fed into a JavaScript MP4 muxer (mp4-muxer), and the combined file streams to your disk via the File System Access API. No server side mux, no full-file buffer in RAM — bytes flow through.
Common questions
Is muxing the same as transcoding?
Why does YouTube split video and audio?
Related terms
Container (file format)
A container is the file format that wraps one or more audio and video streams into a single file.
Codec
A codec is the algorithm that encodes (compresses) and decodes raw audio or video into a smaller stream.
Fragmented MP4
Fragmented MP4 (fMP4) is an MP4 variant where the file is split into many short chunks ("fragments"), each containing its own header.
M4A
M4A is an audio-only file format that wraps AAC-encoded audio in an MP4 container.
VidPickr is a free, browser-based YouTube downloader. Every term in this glossary either describes how YouTube delivers video or why your downloads behave the way they do. Try the downloader →