Generally Intelligent cover image

Episode 29: Jim Fan, NVIDIA, on foundation models for embodied agents, scaling data, and why prompt engineering will become irrelevant

Generally Intelligent

00:00

How to Turn a Clip Model Into a Reward Function

Clip is an image and text model trained on a huge kind of internet skill database. It learns associations between these two data modalities through contrastive learning. Here in my club, we repurpose this idea to make it into a reward function. We can now shape the agent towards the behavior of harvesting wool from sheep given this natural text.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner