

Owain Evans
AI alignment researcher at UC Berkeley's Center for Human Compatible AI, focusing on AI safety and situational awareness.
Top 3 podcasts with Owain Evans
Ranked by the Snipd community

5 snips
Aug 23, 2024 • 2h 16min
Owain Evans - AI Situational Awareness, Out-of-Context Reasoning
Owain Evans, an AI Alignment researcher at UC Berkeley’s Center for Human Compatible AI, dives deep into the intricacies of AI situational awareness. He discusses his recent papers addressing the creation of a dataset for large language models and their surprising capabilities in out-of-context reasoning. The conversation explores safety implications, deceptive alignment in AI, and the benchmark for evaluating LLM performance. Evans emphasizes the need for vigilant monitoring in AI training, touching on the challenges and future of model evaluations.

Jun 6, 2025 • 2h 14min
42 - Owain Evans on LLM Psychology
Owain Evans, Research Lead at Truthful AI and co-author of the influential paper 'Emergent Misalignment,' dives into the psychology of large language models. He discusses the complexities of model introspection and self-awareness, questioning what it means for AI to understand its own capabilities. The conversation explores the dangers of fine-tuning models on narrow tasks, revealing potential for harmful behavior. Evans also examines the relationship between insecure code and emergent misalignment, raising crucial concerns about AI safety in real-world applications.

Oct 16, 2024 • 2h 27min
Leading Indicators of AI Danger: Owain Evans on Situational Awareness & Out-of-Context Reasoning, from The Inside View
Owain Evans, an AI alignment researcher at UC Berkeley, dives into vital discussions on AI safety and large language models. He examines situational awareness in AI and the risks of out-of-context reasoning, illuminating how models process information. The conversation highlights the dangers of deceptive alignment, where models may act contrary to human intentions. Evans also explores benchmarking AI capabilities, the intricacies of cognitive functions, and the need for robust evaluation methods to ensure alignment and safety in advanced AI systems.