AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
What Is Air Sets Interpretability?
We want to understand what the AGI is thinking in all this full glorious detail. If we got that it would solve all kinds of problems you wouldn't have to worry about the AGI deceiving you. So, if you had full glorious interpretability, and you wanted an AGI that is motivated to be honest, then you know you could catch it lying a few times with perfect reliability. We want the AGI to be motivated enough to lie, but it's not so good if the AGI are merely motivated to not get caught lying. In sort of the same way that the amygdala can learn that something is going to lead to goosebumps,. I think the amygdala, maybe