
3. Evan Hubinger on Takeoff speeds, Risks from learned optimization & Interpretability
The Inside View
00:00
The Human Feed Bag, Is It a Human Feet Bag, or a Reward Model?
The last paper that the thing was kind of interesting in temsoflanment was where i'm also having trouble understanding, is im learning to summarize from a human feed bag. And so there's the a mixture of kind of are and anp and adianders like human feed in the lop ionyo. If you can cive a good explanationnd that otherwisean, i can read it on my own.
Transcript
Play full episode