Exploring Reinforcement Learning with Hidden Rewards

This chapter delves into exploration strategies within the context of Monitored Markov Decision Processes (MDPs), particularly when rewards are not fully visible. It critiques traditional optimism-based methods and discusses alternative approaches for effectively navigating less observable environments in reinforcement learning applications.

Play episode from 07:22

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app