
8 - Assistance Games with Dylan Hadfield-Menell
AXRP - the AI X-risk Research Podcast
00:00
Side Effects Mitigation and Inverse Reward Design
Likht: Inverse reward design is a way of avoiding nasty side effects. Likht: There are some definit ons for which it's well matched to this problem. But he says there are other approaches that don't rage inverse reward design in its particular basian form, and i'm not sure if they're the same. Wive: I think you can describe a lot of side effect avoidance approaches in language similar to the model we're producing here.
Play episode from 02:01:26
Transcript


