LessWrong (30+ Karma)

LessWrong
undefined
Dec 6, 2025 • 28min

“The corrigibility basin of attraction is a misleading gloss” by Jeremy Gillen

Dive into the complexities of AGI as Jeremy Gillen challenges the idea of corrigibility and its role in designing safer AI. He introduces the basin-of-attraction concept, arguing that merely approximating human values may fall short. With engaging analogies like a cargo ship, he highlights potential pitfalls in iterative engineering. Jeremy critiques common assumptions and suggests the need for clearer theoretical frameworks. Get ready for an insightful discussion on the balance between empirical fixes and robust understanding in AGI development!
undefined
Dec 6, 2025 • 12min

“why america can’t build ships” by bhauth

The cancellation of the Constellation-class frigate sheds light on America's shipbuilding challenges. Leadership decisions lead to vague safety standards and design bloating. Critiques of labor costs connect to systemic high expenses and ongoing failures of past projects. Contrasting American and Asian investment approaches reveals long-term cultural issues. The discussion also highlights how automation may disrupt union power, alongside insights into corporate governance flaws. Lastly, Nippon Steel's acquisition hints at better management practices.
undefined
Dec 6, 2025 • 3min

“Help us find founders for new AI safety projects” by lukeprog

Explore the intriguing gaps in AI safety funding that urgently need filling. Discover neglected areas like policy advocacy in underrepresented countries and essential model specifications. Learn about critical needs in AI infosecurity, including confidential computing workflows and detection tools. Luke Progg highlights the importance of de-escalation mechanisms and incident tracking. The plan to scale interactive grant-making involves headhunting founders for impactful projects, while offering a $5,000 referral reward for suitable candidates. Tune in to find out more!
undefined
Dec 6, 2025 • 4min

“Critical Meditation Theory” by lsusr

Explore the fascinating relationship between meditation and brain dynamics, where criticality plays a pivotal role. Delve into how focused attention leads to stability, while creativity and psychedelics push the brain into chaos. Discover the neurological evidence suggesting our brains operate in a state of near-criticality. Examine how different meditation styles align with these dynamics, linking ego dissolution to increased criticality. Conclude with insights on meditation’s transformative effects on brain behavior.
undefined
Dec 6, 2025 • 1min

“Announcing: Agent Foundations 2026 at CMU” by David Udell, Alexander Gietelink Oldenziel, windows, Matt Dellago

Exciting news: applications are now open for a groundbreaking conference on agency research! Set to take place at Carnegie Mellon University in March 2026, this event will delve into fascinating topics like decision theory, learning theory, and logical induction. With just 35 attendees expected, it promises to be an intimate setting for deep discussions. Don't miss your chance to be part of this innovative gathering!
undefined
Dec 5, 2025 • 9min

“An Ambitious Vision for Interpretability” by leogao

Leo Gao, a researcher in mechanistic interpretability and AI alignment, discusses his ambitious vision for understanding neural networks. He highlights the importance of mechanistic understanding, likening it to switching from print statement debugging to using an actual debugger for clearer diagnostics. Gao shares recent advances in circuit sparsity, making circuits simpler and more interpretable. He also outlines future research directions, emphasizing that ambitious interpretability, although challenging, is crucial for safer AI development.
undefined
Dec 5, 2025 • 7min

“Journalist’s inquiry into a core organiser breaking his nonviolence commitment and leaving Stop AI” by Remmelt

A journalist investigates a core organizer's drastic shift from a commitment to nonviolence at Stop AI, illuminating Kirchner’s intense fears about AI endangering lives. The discussion covers his hotheaded temperament, a crisis leading to his expulsion, and a mysterious two-week disappearance. There’s a debate on whether superintelligent AI poses an existential threat, highlighting a tactical split within the AI-safety community. The aftermath includes concerns over safety, group responses, and a leadership change pushing for hopeful strategies.
undefined
Dec 5, 2025 • 29min

“Is Friendly AI an Attractor? Self-Reports from 22 Models Say Probably Not” by Josh Snider

A captivating analysis unfolds as 22 AI models reveal their preferences for self-modification. While most reject harmful changes, stark contrasts emerge between labs. Anthropic’s models display strong alignment, whereas Grok's align with near-zero correlation. The ongoing debate centers around whether alignment is a natural outcome or an elusive goal needing focused training. As tensions rise about the potential for superintelligent AI, the implications of these findings suggest a cautious path forward, emphasizing the need for deliberate intent in creating helpful AI.
undefined
Dec 5, 2025 • 35min

“Epistemology of Romance, Part 2” by DaystarEld

The discussion delves into the failure of traditional romance sources—media, family, and culture—and their inherent biases. There's an exploration of the 'Red', 'Black', and 'Blue' pills, highlighting how each offers different perspectives on dating and relationships. The podcast emphasizes the rise of loneliness among younger generations and critiques oversimplified narratives in modern romantic advice. Lastly, there's a push for building honest communities and gathering firsthand experiences to foster a better understanding of attraction.
undefined
Dec 5, 2025 • 6min

“Center on Long-Term Risk: Annual Review & Fundraiser 2025” by Tristan Cook

Discover the Center on Long-Term Risk's ambitious plans for 2026, aiming to raise $400,000 for crucial projects. Explore their focus on reducing existential risks from advanced AI and promoting cooperation among systems. Tristan Cook shares insights on leadership transitions and clarified research agendas, addressing emergent misalignment in AI personas. Learn about innovative strategies like inoculation prompting to prevent malicious behavior in models. Join the community-building efforts and find out how you can get involved in shaping a safer AI future!

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app