LessWrong (Curated & Popular) cover image

"What 2026 looks like" by Daniel Kokotajlo

LessWrong (Curated & Popular)

00:00

AI Safety Issues - Are You Aligned?

AI experts are employed coming up with cleverer bureaucracy designs and grad student dissenting them. The alignment community now starts another research agenda to interrogate AIs about AI safety related topics. In diplomacy, you can actually collect data on the analogue of this question, that is, will you betray me? Alas, the model is often lie about that, but it's diplomacy, they are literally trained to lie so no one cares. They also try to contrive scenarios in which the AI can seemingly profit by doing something treacherous as honeypots to detect deception.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app