Introduction

This is an audio version of more information about the dangerous capability evaluations we did with GPT-4 and CLORD by Beth Barnes published on the 19th of March, 2023. Cross-posted from the AI Alignment Forum may contain more technical jargon than usual. And it's a link post for a post on evals.alignment.org. Here's the main text. We believe that capable enough AI systems could pose very large risks to the world. The dangerous capability we are focusing on is the ability to autonomously gain resources and evade human oversight. To understand what might go wrong before models are deployed,. We think that future highly capable models should involve similar, red team evaluations

Play episode from 00:00

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app