
Jeff Clune
TalkRL: The Reinforcement Learning Podcast
The Benefits of the Policy Version of the Algorithm
The policy version does extremely well and it is way better than all previous algorithms. Even though the overall algorithm is still more expensive, the exploration part is more efficient because you're no longer taking random actions to do the explore step of the algorithm. I think that really bodes well for the future as you train big models on more and more kinds of domains. They're going to have all sorts of common sense and skill sets and understanding,. especially if you did like VPT ahead of time, for example.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.