
BI 151 Steve Byrnes: Brain-like AGI Safety
Brain Inspired
00:00
What Is Air Sets Interpretability?
We want to understand what the AGI is thinking in all this full glorious detail. If we got that it would solve all kinds of problems you wouldn't have to worry about the AGI deceiving you. So, if you had full glorious interpretability, and you wanted an AGI that is motivated to be honest, then you know you could catch it lying a few times with perfect reliability. We want the AGI to be motivated enough to lie, but it's not so good if the AGI are merely motivated to not get caught lying. In sort of the same way that the amygdala can learn that something is going to lead to goosebumps,. I think the amygdala, maybe
Transcript
Play full episode