

Genie: Generative Interactive Environments with Ashley Edwards - #696
5 snips Aug 5, 2024
In this conversation, Ashley Edwards, a member of the technical staff at Runway with past affiliations at Google DeepMind and Uber, reveals the innovative Genie project. They discuss Genie’s ability to create interactive video environments for training reinforcement learning agents without supervision. Topics include the mechanics of latent action models, video tokenization, and dynamics modeling for frame prediction. Ashley highlights the practical implications of Genie and compares it to other models like Sora, mapping out future directions in video generation.
AI Snips
Chapters
Transcript
Episode notes
Genie's Motivation: Unlimited Environments
- Reinforcement learning researchers often struggle to find diverse environments for training generalist agents.
- Genie offers a solution by learning an unlimited source of environments from videos, eliminating the need for manual environment creation.
Unsupervised World Model Learning
- Genie learns world models from videos without explicit action data, enabling interaction and frame prediction.
- It allows users to step into text-generated images, sketches, or real-world photos and interact as if they were real game environments.
Genie's Core Components
- Genie comprises three core components: a latent action model, a dynamics model, and a video tokenizer.
- The latent action model learns actions from videos, the dynamics model generates future frames, and the video tokenizer creates video representations.