The Alignment Problem in Foundation Models

Alignment versus existential safety, I think they're both going more mainstream. There's at least this obvious problem with foundation models where it's like the pre-training objective is not aligned. You can't really tell if it's capable of doing something because you don't know if it's trying. Um, and that's that's the alignment problem as I think of it.

Play episode from 01:52:24

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app