The Inside View cover image

Ethan Perez–Inverse Scaling, Language Feedback, Red Teaming

The Inside View

00:00

Language Modeling

Language models could be used more aggressively in the future. This is a real potential concern, and it looks like it's not going in the right direction. i am interested in these demonstrations of lexcarier forms of misalignment. I think they haven't been well demonstrated, definitly, not in language models. Ah, alike, we might very well start seeing at risk going. That would be like a really huge result, because then we could actually have something concrete to show people a salik.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app