How to Scale Up an Automated Alignment Research Model

The idea is you want the models to not be too good at these like scary tasks. And so one might think that combination of things, that's inherently scarier dangerous. But I think ultimately this is an empirical question, right? Like it's kind of really difficult to know in which order, like which skills gets, get unlocked if when you scale up the models.

Transcript

Play full episode

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app