Brain Inspired cover image

BI 151 Steve Byrnes: Brain-like AGI Safety

Brain Inspired

CHAPTER

What Is Air Sets Interpretability?

We want to understand what the AGI is thinking in all this full glorious detail. If we got that it would solve all kinds of problems you wouldn't have to worry about the AGI deceiving you. So, if you had full glorious interpretability, and you wanted an AGI that is motivated to be honest, then you know you could catch it lying a few times with perfect reliability. We want the AGI to be motivated enough to lie, but it's not so good if the AGI are merely motivated to not get caught lying. In sort of the same way that the amygdala can learn that something is going to lead to goosebumps,. I think the amygdala, maybe

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner