Is There a Way to Make the Agent More Interpretable?

We probably don't want the agent to cause butterfly effects, even if they are beneficial. If we have a way up like incentivising te agent to be more interpretable than this, even this good could already reduce the incentive to cause butterfly effect. So i think it's possible that making the agent more interpretable would make it act in ways that the cumans can model,. which would get rid of butterfly effects. I don't know whether this actually works, but er, yes, i think this is, ye, maybe a possible tothink about. But, ye, my, my current senses that tike, over all, we don't want butterfly effects.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app