How do I use AI to compress audio and video files?

2023 © Wikiask
Main topic: Tech
Short answer:
  • Audio and video compressing through artificial intelligence is done by reconstructing the input signal. It is accomplished by building a neural network that includes an encoder, a decoder, and a quantizer; these three components are trained end-to-end.

Encoding, reorganizing, or otherwise altering data in some other way to minimize its size is the process that we refer to as data compression. Its most basic form entails re-coding information with fewer bits than the initial representation.[1]

Compressing data is becoming more widely recognized as an essential organizing concept because of studies conducted in artificial intelligence and cognitive psychology. Being able to condense information effectively is intimately linked to exhibiting intelligent behavior. Finding regularities within data is a necessary step in data compression. The relationship between prediction and compression gives a potentially simple reason why the brain has evolved to be so good at carrying out compression.[2][3][4]

Meta's AI-powered "EnCodec" neural network can compress audio files ten times smaller than the MP3 standard at 64kbps[edit]

The work that Meta's Fundamental AI Research (FAIR) team has done to further AI-powered audio hypercompression has been described in full. To compress audio data to the desired size, a three-part technique is created. According to Meta, this method might significantly enhance the sound quality of speech over low-bandwidth connections, including phone calls in places with patchy service.

Encodec, a neural network that is trained end to end to reconstruct the input signal, consists of three parts[edit]

  • The encoder takes the unencrypted data and turns it into a representation with larger dimensions and a slower frame rate.
  • A quantizer that shrinks this representation to the desired size. The quantizer is trained to provide the size (or combination of sizes) we need while preserving the crucial data necessary to recreate the original signal. This compressed form can either be transferred across the network or can be saved on a disc. This is equivalent to a .mp3 file when on a PC.
  • The last stage is the decoder. It transforms the compressed signal into a waveform that resembles the original as closely as feasible. Since flawless reconstruction is not achievable at low bit rates, the key to lossless compression is to find alterations that will not be visible to humans. To do this, it uses discriminators to raise the produced samples' perceptual quality.[5][6]

Is Encodec capable of compressing video files? – NO[edit]

While Meta's "EnCodec" cannot currently compress videos, this is the beginning of a long-term project with the aim of developing innovations that might enhance experiences like video streaming, videoconferencing, and playing games with friends in VR.[7]


  1. "What is Data Compression? | Barracuda Networks". Retrieved 2022-11-05.
  2. Hutter, Marcus. "500'000€ Prize for Compressing Human Knowledge". Retrieved 2022-11-05.
  3. "SoundStream: An End-to-End Neural Audio Codec – Google AI Blog". Retrieved 2022-11-05.
  4. Maguire, Phil; Mulhall, Oisín; Maguire, Rebecca; Taylor, Jessica (2015). "Compressionism: A Theory of Mind Based on Data Compression". Proceedings of the 11th International Conference on Cognitive Science: 294–299. ISSN 1613-0073.
  5. Edwards, Benj (2022-11-01). "Meta's AI-powered audio codec promises 10x compression over MP3". Ars Technica. Retrieved 2022-11-05.
  6. Shenwai, Tanushree (2022-10-27). "Meta Uses Artificial Intelligence (AI) To Compress Audio Files For Quick Sharing". MarkTechPost. Retrieved 2022-11-05.
  7. "Using AI to compress audio files for quick and easy sharing". Retrieved 2022-11-05.