AXRP - the AI X-risk Research Podcast cover image

8 - Assistance Games with Dylan Hadfield-Menell

AXRP - the AI X-risk Research Podcast

00:00

Side Effects Mitigation and Inverse Reward Design

Likht: Inverse reward design is a way of avoiding nasty side effects. Likht: There are some definit ons for which it's well matched to this problem. But he says there are other approaches that don't rage inverse reward design in its particular basian form, and i'm not sure if they're the same. Wive: I think you can describe a lot of side effect avoidance approaches in language similar to the model we're producing here.

Play episode from 02:01:26
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app