TalkRL: The Reinforcement Learning Podcast cover image

Jeff Clune

TalkRL: The Reinforcement Learning Podcast

CHAPTER

The Benefits of the Policy Version of the Algorithm

The policy version does extremely well and it is way better than all previous algorithms. Even though the overall algorithm is still more expensive, the exploration part is more efficient because you're no longer taking random actions to do the explore step of the algorithm. I think that really bodes well for the future as you train big models on more and more kinds of domains. They're going to have all sorts of common sense and skill sets and understanding,. especially if you did like VPT ahead of time, for example.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner