
Episode 25: Nicklas Hansen, UCSD, on long-horizon planning and why algorithms don't drive research progress
Generally Intelligent
00:00
How Many Samples Do You Need With Rewards?
The amount of supervision you get from a reward signal, which is like a skater versus a complexity of images and dynamics if you're learning dynamics model, it's just much greater in the case of self-solution. Yeah. It's like the yarmulcun take analogy. The reward signal is just the cherry. If you do a reward based fine tuning versus self supervision that was like one episode. Wow. Whoa.
Transcript
Play full episode