LessWrong (Curated & Popular)

“0. CAST: Corrigibility as Singular Target” by Max Harms

Aug 7, 2024
Dive into the intriguing concept of corrigibility in AI, where the discussion pivots from confusion to clarity. Discover how this single property can be crucial for creating agents that are both effective and safe. Learn about innovative strategies for measuring and enhancing this quality in AI development. The podcast critiques the usual mix of goals and proposes a streamlined focus to improve outcomes. Prepare for a journey through the nuances of AI behavior and safety that could redefine future advancements.
Ask episode
Chapters
Transcript
Episode notes