

Genie 3: A New Frontier for World Models with Jack Parker-Holder and Shlomi Fruchter - #743
114 snips Aug 19, 2025
In this engaging discussion, Jack Parker-Holder and Shlomi Fruchter, both researchers at Google DeepMind, dive into Genie 3, a groundbreaking model that creates playable virtual worlds. They explore the evolution of world models in AI, emphasizing their importance for decision-making and planning. The duo sheds light on Genie 3’s real-time interactivity, visual memory capabilities, and the challenges faced in its development. They also touch on the innovative concept of promptability, showcasing how the model can dynamically manipulate virtual environments, paving the way for exciting applications.
AI Snips
Chapters
Transcript
Episode notes
World Models As General Simulators
- A world model predicts future states given past observations and actions, enabling planning and counterfactual reasoning.
- Jack Parker-Holder reframes world models as general simulators of dynamics, not just single-MDP predictors.
Early Proof: Latent Actions Made Playable Worlds
- Genie 1 proved the foundation world model idea using unlabeled videos and learned latent actions to make worlds playable.
- Jack Parker-Holder recounts training small models for 2D platformers and robot arms that worked but had short duration and mode collapse issues.
Diffusion And Scale Extended Temporal Coherence
- Scaling to 3D data and diffusion methods in Genie 2 extended generation to 360p and tens of seconds of consistent simulation.
- Jack Parker-Holder notes Genie 2 showed viability for larger 3D worlds but still lacked real-time interactivity and text-based prompting.