Generally Intelligent cover image

Episode 29: Jim Fan, NVIDIA, on foundation models for embodied agents, scaling data, and why prompt engineering will become irrelevant

Generally Intelligent

00:00

The VPT Approach: Training AI Agents in Minecraft

OpenAI has a different approach to training agents using human data from Minecraft/nThe first step is to hire humans to play Minecraft and record their actions/nThe recorded human data is used to train an inverse dynamics model/nThe inverse dynamics model can label actions in YouTube videos with no explicit action data/nKeyboard and mouse actions are important for aligning YouTube video actions with game actions/nUsing the inverse dynamics model, more videos can be labeled than with human data alone/nA behavior cloning agent trained on the labeled videos can perform well on long horizon tasks/nVPT approach allows for direct imitation from tutorial videos without relying on rewards/nCollecting human data can be costly and may limit applicability to other domains/nThe VPT approach does not utilize language transcripts, but language could play a significant role/nLanguage can be used for prompting and grounding in embodied agent setups/nBoth VPT and MindOjo/MindClip approaches are complementary and important for success in Minecraft

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner