“Tell me about yourself:LLMs are aware of their implicit behaviors” by Martín Soto, Owain_Evans

Jan 28, 2025

Explore the intriguing concept of behavioral self-awareness in language models. The discussion highlights how AI can articulate its implicit behaviors, revealing insights into risky decision-making and insecure coding. These findings raise important questions about AI safety and the recognition of potential vulnerabilities. Delve into the implications of allowing models to express self-awareness, paving the way for future advancements in AI and safety measures.

Ask episode

Chapters

Transcript

Episode notes

Exploring Behavioral Self-Awareness in Language Models

00:00 • 14min