TalkRL: The Reinforcement Learning Podcast cover image

Jeff Clune

TalkRL: The Reinforcement Learning Podcast

00:00

The Benefits of the Policy Version of the Algorithm

The policy version does extremely well and it is way better than all previous algorithms. Even though the overall algorithm is still more expensive, the exploration part is more efficient because you're no longer taking random actions to do the explore step of the algorithm. I think that really bodes well for the future as you train big models on more and more kinds of domains. They're going to have all sorts of common sense and skill sets and understanding,. especially if you did like VPT ahead of time, for example.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app