Jim Fan on Nvidia’s Embodied AI Lab and Jensen Huang’s Prediction that All Robots will be Autonomous

Training Data

Balancing Virtual and Physical Realities

1min Snip

00:00

Play full episode

Summary

Transcript

Episode notes

An effective agent must facilitate engaging conversations, maintain a consistent personality, utilize long-term memory, and perform real-world actions. While current models show promise, they still fall short in generating diverse and captivating behaviors, indicating a gap that needs to be addressed. Cost efficiency is critical for deploying these agents to users, necessitating low-cost solutions either through cloud hosting or local device operation to ensure scalability. The relationship between virtual and physical worlds is one of interconnected realities, suggesting that innovations in virtual environments do serve to enhance capabilities in the physical realm. Understanding this dynamic is essential in prioritizing development efforts across both domains.

AI researcher Jim Fan has had a charmed career. He was OpenAI’s first intern before he did his PhD at Stanford with “godmother of AI,” Fei-Fei Li. He graduated into a research scientist position at Nvidia and now leads its Embodied AI “GEAR” group. The lab’s current work spans foundation models for humanoid robots to agents for virtual worlds.

Jim describes a three-pronged data strategy for robotics, combining internet-scale data, simulation data and real world robot data. He believes that in the next few years it will be possible to create a “foundation agent” that can generalize across skills, embodiments and realities—both physical and virtual. He also supports Jensen Huang’s idea that “Everything that moves will eventually be autonomous.”

Hosted by: Stephanie Zhan and Sonya Huang, Sequoia Capital

Mentioned in this episode:

World of Bits: Early OpenAI project Jim worked on as an intern with Andrej Karpathy. Part of a bigger initiative called Universe
Fei-Fei Li: Jim’s PhD advisor at Stanford who founded the ImageNet project in 2010 that revolutionized the field of visual recognition, led the Stanford Vision Lab and just launched her own AI startup, World Labs
Project GR00T: Nvidia’s “moonshot effort” at a robotic foundation model, premiered at this year’s GTC
Thinking Fast and Slow: Influential book by Daniel Kahneman that popularized some of his teaching from behavioral economics
Jetson Orin chip: The dedicated series of edge computing chips Nvidia is developing to power Project GR00T
Eureka: Project by Jim’s team that trained a five finger robot hand to do pen spinning
MineDojo: A project Jim did when he first got to Nvidia that developed a platform for general purpose agents in the game of Minecraft. Won NeurIPS 2022 Outstanding Paper Award
ADI: artificial dog intelligence
Mamba: Selective State Space Models, an alternative architecture to Transformers that Jim is interested in (original paper here)

00:00 Introduction

01:35 Jim’s journey to embodied intelligence

04:53 The GEAR Group

07:32 Three kinds of data for robotics

10:32 A GPT-3 moment for robotics

16:05 Choosing the humanoid robot form factor

19:37 Specialized generalists

21:59 GR00T gets its own chip

23:35 Eureka and Issac Sim

25:23 Why now for robotics?

28:53 Exploring virtual worlds

36:28 Implications for games

39:13 Is the virtual world in service of the physical world?

42:10 Alternative architectures to Transformers

44:15 Lightning round

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Jim Fan on Nvidia’s Embodied AI Lab and Jensen Huang’s Prediction that All Robots will be Autonomous

Training Data

Balancing Virtual and Physical Realities

1min Snip

Get the Snipdpodcast app

AI-poweredpodcast player

Discoverhighlights

Save anymoment

Share& Export

AI-poweredpodcast player

Discoverhighlights

Get the Snipd
podcast app

AI-powered
podcast player

Discover
highlights

Save any
moment

Share
& Export

AI-powered
podcast player

Discover
highlights