The Alignment Tax

The overall goal here is not to build the most capable automated alignment researcher that we cause possibly good with the tech that we have. But rather build something that is really, really useful that we can scale up a lot. And most importantly, that we trust is a line enough to hand off these tasks too. If there's a lot of if we introduce like a fair amount of inefficient inefficiencies in this process, right, like where we're essentially like sandbagging the model capabilities by training in this way? I don't think that matters as much.

Play episode from 16:14

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app