The Inside View cover image

Ethan Perez–Inverse Scaling, Language Feedback, Red Teaming

The Inside View

00:00

Are You Increasing Capacity as Much as It Increases Alignment?

As alignment researchers, i think we're in the business of making systems that are both capable and aligned. We want people to actually use the techniques that we come up with. So they do need to be like, ideally just on par with the state of the art, a or better. And for example, this method for learning fm from language feedback lets us guide the model behavior much more strongly than than other methods. It's an example of how this gives usk a lot more control over what the models are learning.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app