
Episode 19: Minqi Jiang, UCL, on environment and curriculum design for general RL agents
Generally Intelligent
00:00
How to Reduce the Reward Reward Expenditure
The complexity of accomplishing a specific manoeuvre will be more and more complex the more steps, or more actions it takes inside of the action sequence to accomplish that task. And so in our elbow, we can discount every time step. That means that the tasks that require more steps to accomplish, and therefore more complex to find theye, are going to be more heavily discounted by your discount factor. So if you do regret with respect to the gama discounted return, it naturally smooths the regret landscape so that the more complex tasks are going to have lower maximal regret for the agent.
Transcript
Play full episode