AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Intuition That the Model Wants to Kill Us
I think what we're hoping for in the end is not that we'll understand every detail, but again, I would give like the X-ray or the MRI analogy. Is this a model whose internal state and plans are very different from what it externally represents itself to do? It may be a reasonable assumption that the internal structure of the model is not intentionally optimizing against us. We don't know for sure whether that's possible, but I think some, at least positive signs that it might be possible again.