
Episode 25: Nicklas Hansen, UCSD, on long-horizon planning and why algorithms don't drive research progress
Generally Intelligent
How Many Samples Do You Need With Rewards?
The amount of supervision you get from a reward signal, which is like a skater versus a complexity of images and dynamics if you're learning dynamics model, it's just much greater in the case of self-solution. Yeah. It's like the yarmulcun take analogy. The reward signal is just the cherry. If you do a reward based fine tuning versus self supervision that was like one episode. Wow. Whoa.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.