
7 - Side Effects with Victoria Krakovna
AXRP - the AI X-risk Research Podcast
Is There a Way to Make the Agent More Interpretable?
We probably don't want the agent to cause butterfly effects, even if they are beneficial. If we have a way up like incentivising te agent to be more interpretable than this, even this good could already reduce the incentive to cause butterfly effect. So i think it's possible that making the agent more interpretable would make it act in ways that the cumans can model,. which would get rid of butterfly effects. I don't know whether this actually works, but er, yes, i think this is, ye, maybe a possible tothink about. But, ye, my, my current senses that tike, over all, we don't want butterfly effects.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.