Jim Fan on Nvidia’s Embodied AI Lab and Jensen Huang’s Prediction that All Robots will be Autonomous
Sep 17, 2024
auto_awesome
Jim Fan, an AI researcher at Nvidia leading the Embodied AI group, shares insights on the future of robotics. He discusses a transformative data strategy integrating internet-scale, simulation, and real-world data to develop versatile robots. Jim envisions a foundation agent that can adapt across various skills and environments. He echoes Jensen Huang's prediction that all moving entities will eventually achieve autonomy. The conversation also touches on the impact of data-driven training in human-centered robotic advancements.
Jim Fan emphasizes a balanced data strategy combining internet-scale, simulation, and real-world data to overcome robotics data challenges.
The vision for the future entails that everything that moves will become autonomous, potentially leading to as many intelligent robots as iPhone users.
Deep dives
The Future of Autonomous Robotics
The discussion emphasizes that everything that moves will eventually be autonomous, suggesting a significant future for intelligent robots. This assertion aligns with the vision that in about a decade, there could be as many intelligent robots as current iPhone users. The foundation is being laid today to build efficient computing platforms dedicated to humanoid robots and intelligent machines. This vision underlines the urgency for researchers and developers to innovate in robotics at an accelerated pace.
The Importance of Diverse Data Strategies
A successful robotic strategy needs a balanced use of internet-scale data, simulation data, and real-world robot data. Internet data offers rich, diverse examples of human interaction but lacks actionable motor control information. Simulation data, on the other hand, allows for the generation of substantial quantities of labeled data quickly, yet often falls short in capturing real-world complexities. Real robot data is crucial for grounding, though it is limited in collection speed and expensive, making a combined approach essential for overcoming current data bottlenecks.
Unlocking the GPT-3 Moment for Robotics
The aspiration for a GPT-3 moment in robotics hinges on achieving breakthroughs in both low-level and high-level reasoning processes. Achieving proficiency in low-level tasks, such as understanding various definitions of the word 'open', is crucial for advanced robotic functionality. This involves creating models that can generalize motor actions across different tasks and contexts, allowing robots to perform complex, dynamic interactions effectively. Alongside this, the development of high-level reasoning capabilities is also necessary for planning and adaptive decision-making in robots.
Humanoid Robots as the Future's Catalyst
Focusing on humanoid robots is favorable due to their compatibility with human-centric environments, from factories to homes, allowing them to perform tasks generally intended for humans. The improvements in humanoid hardware and decreasing costs of production open up new avenues for affordable and widespread robotics applications. As the AI brain for these robots continues to evolve, we could see household helpers emerging in various aspects of daily life, from caregiving to routine chores. The long-term vision includes seamlessly integrating humanoid robots into society, provided that technical challenges are met and societal issues, like safety and regulation, are addressed.
AI researcher Jim Fan has had a charmed career. He was OpenAI’s first intern before he did his PhD at Stanford with “godmother of AI,” Fei-Fei Li. He graduated into a research scientist position at Nvidia and now leads its Embodied AI “GEAR” group. The lab’s current work spans foundation models for humanoid robots to agents for virtual worlds.
Jim describes a three-pronged data strategy for robotics, combining internet-scale data, simulation data and real world robot data. He believes that in the next few years it will be possible to create a “foundation agent” that can generalize across skills, embodiments and realities—both physical and virtual. He also supports Jensen Huang’s idea that “Everything that moves will eventually be autonomous.”
Hosted by: Stephanie Zhan and Sonya Huang, Sequoia Capital
Mentioned in this episode:
World of Bits: Early OpenAI project Jim worked on as an intern with Andrej Karpathy. Part of a bigger initiative called Universe
Fei-Fei Li: Jim’s PhD advisor at Stanford who founded the ImageNet project in 2010 that revolutionized the field of visual recognition, led the Stanford Vision Lab and just launched her own AI startup, World Labs
Project GR00T: Nvidia’s “moonshot effort” at a robotic foundation model, premiered at this year’s GTC
Thinking Fast and Slow: Influential book by Daniel Kahneman that popularized some of his teaching from behavioral economics
Jetson Orin chip: The dedicated series of edge computing chips Nvidia is developing to power Project GR00T
Eureka: Project by Jim’s team that trained a five finger robot hand to do pen spinning
MineDojo: A project Jim did when he first got to Nvidia that developed a platform for general purpose agents in the game of Minecraft. Won NeurIPS 2022 Outstanding Paper Award
ADI: artificial dog intelligence
Mamba: Selective State Space Models, an alternative architecture to Transformers that Jim is interested in (original paper here)
00:00 Introduction
01:35 Jim’s journey to embodied intelligence
04:53 The GEAR Group
07:32 Three kinds of data for robotics
10:32 A GPT-3 moment for robotics
16:05 Choosing the humanoid robot form factor
19:37 Specialized generalists
21:59 GR00T gets its own chip
23:35 Eureka and Issac Sim
25:23 Why now for robotics?
28:53 Exploring virtual worlds
36:28 Implications for games
39:13 Is the virtual world in service of the physical world?
42:10 Alternative architectures to Transformers
44:15 Lightning round
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode