Alignment Problems

Alignment is the problem of taking that system and making it do the thing that you want, sort of giving it a goal and having it pursue that goal. And it can be hard. All of the sort of silly, silly failure demos that you see on Twitter with chat2PT are sort of classic examples of alignment failure. You've got problems where models don't quite learn the goals that we try to give them or even if they do learn the goals doesn't always work out as planned.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app