How to Reduce the Reward Reward Expenditure

The complexity of accomplishing a specific manoeuvre will be more and more complex the more steps, or more actions it takes inside of the action sequence to accomplish that task. And so in our elbow, we can discount every time step. That means that the tasks that require more steps to accomplish, and therefore more complex to find theye, are going to be more heavily discounted by your discount factor. So if you do regret with respect to the gama discounted return, it naturally smooths the regret landscape so that the more complex tasks are going to have lower maximal regret for the agent.

Play episode from 28:59

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app