The Benefits of the Policy Version of the Algorithm

The policy version does extremely well and it is way better than all previous algorithms. Even though the overall algorithm is still more expensive, the exploration part is more efficient because you're no longer taking random actions to do the explore step of the algorithm. I think that really bodes well for the future as you train big models on more and more kinds of domains. They're going to have all sorts of common sense and skill sets and understanding,. especially if you did like VPT ahead of time, for example.

Play episode from 44:15

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app