AI Safety Fundamentals: Alignment cover image

High-Stakes Alignment via Adversarial Training [Redwood Research Report]

AI Safety Fundamentals: Alignment

00:00

Introduction

This is an audio version of high-stakes alignment via adversarial training, Redwood Research Report by DMZ, Lawrence Chan and Nate Thomas. This is included as a core reading in the AGI Safety Fundamentals course. We think this work constitutes progress towards techniques that may substantially reduce the likelihood of deceptive alignment.

Play episode from 00:00
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app