Today, we are talking about a research article behind Movie Gen, a collection of foundation models developed by Meta that generate high-quality videos and audio. The Movie Gen models can synthesize videos from text, customize videos based on a person’s image, edit videos with precision, and generate audio synchronized with the video. The article presents the models' architecture, training, and results, as well as a comparison with previous work and commercial systems.