The Problem With Evaluating Language Models

The difficulty here is identifying key context in which a dangerous behavior like takeover or some sort of power seeking would occur. So I think we need to identify these contexts in order to actually be able to write concrete emails and to provide evidence that's convincing to the public that ais could do dangerous things. And you imagine that something like ACT, I mean ACT can't generate language, at least not the right. Do you see what it says is actions are when asked translate way out and generalized like what it's actually will be?

Play episode from 02:28

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app