LessWrong (Curated & Popular) cover image

"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland

LessWrong (Curated & Popular)

00:00

Is Chain of Thought Deception a Problem ByDefault?

This seems like it gets us a little further, because we can now apply some r l, which might cause deception that is now catchable. One problem identified by john wentworth with this strategy is that the a i might cause large bad sihde effects, like wiping out humans without intent. This seems like a problem, but this project still is clear progress toward solving alignment eli's opinion,. Though significantly more promising on shorter time lines, where something like current llm's scale to a g is future of humanity.

Play episode from 49:44
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app