
4 - Risks from Learned Optimization with Evan Hubinger
AXRP - the AI X-risk Research Podcast
00:00
Neural Networks
The way in which we purrenly do machine learning is so focused on behaviour a insentives. The problem is that, if you're just looking at the behaviour of the train distbution, you can't really distinguish between a model which like, will be objected ro bust and one with no internal objective. So i think it's very difficult to sort of analyze this question fehavly because fundamentally, we're looking at a problem of unidentified ability. We're looking for situations in which, awe sort of talk about the syla that thereis this, this ispposet by it by bland.
Transcript
Play full episode