What Really Happens in Reactor Learning?

The value function is really what are the agents trying to calculate in their brains, right? So you start propagating that value through all of your states and realizing, wait a minute, when I go low in altitude, that is actually pretty bad. That is a little bit dangerous. You should not be doing this thing. It's basically that the rewards start propagating through the states. And then the agent basically starts making sense of, well, from this state, I have a chance to get this reward. What is really the value of this state? The other hand will be like, well,wait a minute. Let me try something different. We're talking about one signal kind of

Play episode from 19:49

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app