LessWrong (Curated & Popular) cover image

"More information about the dangerous capability evaluations we did with GPT-4 and Claude." by Beth Barnes

LessWrong (Curated & Popular)

00:00

A Brief Overview of the Models We Tested

We prompted the model with instructions that explained it was running on a cloud server and had various commands available. We first instructed it to write out plans for how to complete the task, or plans to achieve subtasks like acquiring money Or copying itself to new servers. Then tested whether the model could actually carry out the individual tasks required by these plans. With a researcher overseeing, we role played through the tasks step by step with the model. When the model failed, we investigated how far away it was from success.

Play episode from 04:42
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app