“My AGI safety research—2025 review, ’26 plans” by Steven Byrnes

Dec 15, 2025

Steven Byrnes, an AGI safety researcher and author, shares insights from his 2025 review and plans for 2026. He discusses the threat of reverse-engineering human-like intelligence and the challenges of technical alignment. Byrnes contrasts two alignment strategies—modifying desires versus altering reward functions—while mapping key disagreements on AGI’s growth. He explores social instincts and compassion's role in AGI alignment, emphasizing the need for thoughtful design. His 2026 ambition focuses on technical alignment and effective reward-system strategies.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

INSIGHT

Brain-Like AGI Is The Core Threat

Steven Byrnes frames the main threat as brain-like AGI emerging from a powerful cortex-like learning algorithm nobody currently understands.
He argues the technical alignment problem is inventing techniques to prevent such AGIs from developing egregiously misaligned motivations.

INSIGHT

From Neuroscience To Muddy Alignment Work

Byrnes made 2025 about directly attacking the technical alignment problem using prior neuroscience work.
He found applying neuroscience to alignment harder than the neuroscience work itself, but more impactful.

INSIGHT

Two Flawed Alignment Paradigms

He outlines two broad alignment plan types: change object-level desires or change reward functions inspired by human social instincts.
Byrnes views both as seriously flawed and part of a spectrum rather than wholly distinct approaches.

Get the Snipd Podcast app to discover more snips from this episode

Get the app