Side Effects Mitigation and Inverse Reward Design

Likht: Inverse reward design is a way of avoiding nasty side effects. Likht: There are some definit ons for which it's well matched to this problem. But he says there are other approaches that don't rage inverse reward design in its particular basian form, and i'm not sure if they're the same. Wive: I think you can describe a lot of side effect avoidance approaches in language similar to the model we're producing here.

Play episode from 02:01:26

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app