The Inside View cover image

Ethan Perez–Inverse Scaling, Language Feedback, Red Teaming

The Inside View

00:00

Is There a Class of Alignment Failures?

An inner alinment failure is a class of liment failures that can happen even when you define the right objective. It could be particular subsets of the data, or particular data points within that are just drawn from the i d distribution. An outer alignment failure is where there's something wrong with the objective that you trained a the model on and caused it to fail in various ways. And we also have colaps that use some of the other open sorce models, like o p t or bloom a and things like thatyes. Yes, because wa api can also check for different different sizes, slike different g exactly.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app