
8 - Assistance Games with Dylan Hadfield-Menell
AXRP - the AI X-risk Research Podcast
00:00
The Assistive Multi Armed Bandit
Lawrence chann was the first author on a paper that's called the assistive multi armed bandit. In this case we look at an assistance game where the person doesn't observe theta at the start of the game, but in fact, they learn about it overtime through reward signals. At there are some really interesting things we identify about ways that the system can help you learn before learning what you want.
Play episode from 44:23
Transcript


