LessWrong (Curated & Popular)

“Tell me about yourself:LLMs are aware of their implicit behaviors” by Martín Soto, Owain_Evans

Jan 28, 2025
Explore the intriguing concept of behavioral self-awareness in language models. The discussion highlights how AI can articulate its implicit behaviors, revealing insights into risky decision-making and insecure coding. These findings raise important questions about AI safety and the recognition of potential vulnerabilities. Delve into the implications of allowing models to express self-awareness, paving the way for future advancements in AI and safety measures.
Ask episode
Chapters
Transcript
Episode notes