Reactor Learning With Self-Play

We are using reinforcement learning with self-play. So essentially what's happening is we have a bot which observe some state in the environment and perform some actions based on that state. And then, you know, the bot gets feedback or whether it's doing good or not. And then tries to select the actions that yield to high, the positive feedback to high reward.

Play episode from 17:22

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app