Are You Increasing Capacity as Much as It Increases Alignment?

As alignment researchers, i think we're in the business of making systems that are both capable and aligned. We want people to actually use the techniques that we come up with. So they do need to be like, ideally just on par with the state of the art, a or better. And for example, this method for learning fm from language feedback lets us guide the model behavior much more strongly than than other methods. It's an example of how this gives usk a lot more control over what the models are learning.

Play episode from 01:05:04

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app