The Inside View cover image

Alan Chan And Max Kauffman on Model Evaluations, Coordination and AI Safety

The Inside View

00:00

The Problem With Evaluating Language Models

The difficulty here is identifying key context in which a dangerous behavior like takeover or some sort of power seeking would occur. So I think we need to identify these contexts in order to actually be able to write concrete emails and to provide evidence that's convincing to the public that ais could do dangerous things. And you imagine that something like ACT, I mean ACT can't generate language, at least not the right. Do you see what it says is actions are when asked translate way out and generalized like what it's actually will be?

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app