The Inside View cover image

Ethan Perez–Inverse Scaling, Language Feedback, Red Teaming

The Inside View

00:00

Meta De Fortune Blenebad to Like, Band Offensive Content?

Q. How do you think meta de forte blenebad to like, band offensive content? Do d you think they just have a classifi offensive content and if it reaches like a certain threshold, they'll be like, oh, no, as top posting the other measorde instead?" A. Yes, i think they pointed to one paper called director a which i think they might have used the method from this paper to like, improved the kind of like alignement of the model. And thatke makes the model good just dialogue in general.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app