Generally Intelligent cover image

Episode 22: Archit Sharma, Stanford, on unsupervised and autonomous reinforcement learning

Generally Intelligent

CHAPTER

How to Maximize Your Reward?

Roxanne Jones: My initial impression was that as a reinforcement learning researcher we often get what maximizing their reward, which is fine. But what happens is that you can actually maximize their reward without actually learning or good behavior. So how do you actually like learn a good behavior in such a setting is pretty hard? She says it's not clear whether like maximizing their return would always lead to a good behavior. "But if I maximize like how my MDP behaves or how the dynamics of the environment work, that should lead to good interesting behaviors," she adds.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner