How Many Samples Do You Need With Rewards?

The amount of supervision you get from a reward signal, which is like a skater versus a complexity of images and dynamics if you're learning dynamics model, it's just much greater in the case of self-solution. Yeah. It's like the yarmulcun take analogy. The reward signal is just the cherry. If you do a reward based fine tuning versus self supervision that was like one episode. Wow. Whoa.

Play episode from 24:23

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app