LessWrong (Curated & Popular) cover image

"Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky

LessWrong (Curated & Popular)

00:00

The Game of Confidence

In an idealized game, each of us expects we'd win if we played it. You counter by asserting that there's a situation in the training data where it had to have gotten whacked if it was that stupid. And I counter either by a more sophisticated deployment divergence or by naming either a shallower or a factually non-Alice like thing that it could have learned instead. "It's a pretty tricky game to play because it's all made up bullshit," he says.

Play episode from 29:54
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app