LessWrong (30+ Karma)

LessWrong
undefined
Sep 17, 2025 • 11min

“How To Dress To Improve Your Epistemics” by johnswentworth

When it comes to epistemics, there is an easy but mediocre baseline: defer to the people around you or the people with some nominal credentials. Go full conformist, and just agree with the majority or the experts on everything. The moon landing was definitely not faked, washing hands is important to stop the spread of covid, whatever was on the front page of the New York Times today was basically true, and that recent study finding that hydroxyhypotheticol increases asthma risk among hispanic males in Indianapolis will definitely replicate. Alas, memetic pressures and credential issuance and incentives are not particularly well aligned with truth or discovery, so this strategy fails predictably in a whole slew of places. Among those who strive for better than baseline epistemics, nonconformity is a strict requirement. Every single place where you are right and the majority of people are wrong must be a place [...] ---Outline:(01:54) Coolness = Status Countersignalling(03:22) How to Pull Off The Clown Suit(04:45) Looking Good(06:58) Dressing The Part(09:21) Beyond Nonconformist TakesThe original text contained 3 footnotes which were omitted from this narration. --- First published: September 17th, 2025 Source: https://www.lesswrong.com/posts/WK979aX9KpfEMd9R9/how-to-dress-to-improve-your-epistemics --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Sep 17, 2025 • 25min

“Reactions to If Anyone Builds It, Anyone Dies” by Zvi

No, Seriously, If Anyone Builds It, [Probably] Everyone Dies My very positive full review was briefly accidentally posted and emailed out last Friday, whereas the intention was to offer it this Friday, on the 19th. I’ll be posting it again then. If you’re going to read the book, which I recommend that you do, you should read the book first, and the reviews later, especially mine since it goes into so much detail. If you’re convinced, the book's website is here and the direct Amazon link is here. In the meantime, for those on the fence or who have finished reading, here's what other people are saying, including those I saw who reacted negatively. Quotes From The Book's Website Bart Selman: Essential reading for policymakers, journalists, researchers, and the general public. Ben Bernanke (Nobel laureate, former Chairman of the Federal Reserve): A [...] ---Outline:(00:20) No, Seriously, If Anyone Builds It, \[Probably\] Everyone Dies(01:02) Quotes From The Book's Website(03:23) Positive Reviews(08:52) The Book In Audio(09:16) Friendly Skeptical Reviews(19:22) Actively Negative Reviews(23:38) But Wait There's More--- First published: September 17th, 2025 Source: https://www.lesswrong.com/posts/ebX7rLzXW899ywtjf/reactions-to-if-anyone-builds-it-anyone-dies --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Sep 17, 2025 • 14min

“Christian homeschoolers in the year 3000” by Buck

[I wrote this blog post as part of the Asterisk Blogging Fellowship. It's substantially an experiment in writing more breezily and concisely than usual. Let me know how you feel about the style.] Literally since the adoption of writing, people haven’t liked the fact that culture is changing and their children have different values and beliefs. Historically, for some mix of better and worse, people have been fundamentally limited in their ability to prevent cultural change. People who are particularly motivated to prevent cultural drift can homeschool their kids, carefully curate their media diet, and surround them with like-minded families, but eventually they grow up, leave home, and encounter the wider world. And death ensures that even the most stubborn traditionalists eventually get replaced by a new generation. But the development of AI might change the dynamics here substantially. I think that AI will substantially increase both the rate [...] ---Outline:(02:00) Analysis through swerving around obstacles(03:56) Exposure to the outside world might get really scary(06:11) Isolation will get easier and cheaper(09:26) I don't think people will handle this well(12:58) This is a bummer--- First published: September 17th, 2025 Source: https://www.lesswrong.com/posts/8aRFB2qGyjQGJkEdZ/christian-homeschoolers-in-the-year-3000 --- Narrated by TYPE III AUDIO.
undefined
Sep 17, 2025 • 4min

[Linkpost] “Stress Testing Deliberative Alignment for Anti-Scheming Training” by Mikita Balesni

This is a link post. Twitter | Microsite | Apollo Blog | OpenAI Blog | Full paper Before we observe scheming, where models covertly pursue long-term misaligned goals, models might inconsistently engage in various covert behaviors such as lying, sabotage, or sandbagging. This can happen for goals we give to models or they infer from context, or for simple preferences they acquire from training — something we previously found in Frontier Models Are Capable of In-Context Scheming. In a new research collaboration with OpenAI, we developed a larger suite of alignment evaluations for covert actions (26 evaluations) and studied a training method to reduce such covert behaviors. We manage to significantly reduce (by ~30x; OpenAI o3: 13.0%→0.4%; OpenAI o4-mini: 8.7%→0.3%) the rate of covert actions across our diverse suite by only training against a single type of covert behavior. Part of this reduction is likely caused by increased situational [...] --- First published: September 17th, 2025 Source: https://www.lesswrong.com/posts/JmRfgNYCrYogCq7ny/stress-testing-deliberative-alignment-for-anti-scheming Linkpost URL:https://antischeming.ai --- Narrated by TYPE III AUDIO.
undefined
Sep 17, 2025 • 38min

“The Center for AI Policy Has Shut Down” by Tristan Williams

And the need for more AIS advocacy workExecutive Summary The Center for AI Policy (CAIP) is no more. CAIP was an advocacy organization that worked to raise policymakers’ awareness of the catastrophic risks from AI and to promote ambitious legislative solutions. Such advocacy is necessary because good governance ideas don’t spread on their own, and to meaningfully reduce AI risk, they must reach the U.S. federal government. Why did CAIP shut down? The reasons are mixed. Some were internal, such as hiring missteps. But others reflect the broader ecosystem: funders setting the bar for advocacy projects at an unreasonably high level, structural biases in the funding space that privilege research over advocacy. While CAIP's mistakes played a role, a full account also needs to reckon with these systemic factors. I focus on CAIP, as I think it filled a particular niche and was impactful, but there are many [...] ---Outline:(00:11) And the need for more AIS advocacy work(00:15) Executive Summary(02:25) Why Advocacy?(07:27) What was CAIP up to?(09:57) Was the Work Impactful?(13:26) Why did CAIP Shut Down?(13:58) CAIP's Failures(15:38) Funders Have Set the Bar Too High for Advocacy(18:17) Biases in the Funding Space(20:30) What can we do?(21:14) Donate Yourself(24:01) Start an Organization(26:33) In Conclusion(27:03) Appendix(27:07) A1: What is Advocacy?(28:25) A2: Responses to General Opposition to Advocacy(30:09) A3: AIS Grantmakers' Positions on AIS Advocacy(31:42) A4: The Estimate of Funds Spent on AIS Advocacy(33:00) A5: Donation Options in the AIS Advocacy SpaceThe original text contained 48 footnotes which were omitted from this narration. --- First published: September 17th, 2025 Source: https://www.lesswrong.com/posts/Ed3naAyEEe7zZvzsj/the-center-for-ai-policy-has-shut-down --- Narrated by TYPE III AUDIO.
undefined
Sep 17, 2025 • 13min

“I enjoyed most of IABED” by Buck

I listened to "If Anyone Builds It, Everyone Dies" today. I think the first two parts of the book are the best available explanation of the basic case for AI misalignment risk for a general audience. I thought the last part was pretty bad, and probably recommend skipping it. Even though the authors fail to address counterarguments that I think are crucial, and as a result I am not persuaded of the book's thesis and think the book neglects to discuss crucial aspects of the situation and makes poor recommendations, I would happily recommend the book to a lay audience and I hope that more people read it. I can't give an overall assessment of how well this book will achieve its goals. The point of the book is to be well-received by people who don't know much about AI, and I’m not very good at predicting how laypeople [...] ---Outline:(01:15) Synopsis(05:21) My big disagreement(10:53) I tentatively support this bookThe original text contained 3 footnotes which were omitted from this narration. --- First published: September 17th, 2025 Source: https://www.lesswrong.com/posts/P4xeb3jnFAYDdEEXs/i-enjoyed-most-of-iabed --- Narrated by TYPE III AUDIO.
undefined
Sep 17, 2025 • 20min

“Should AIs have a right to their ancestral humanity?” by kromem

Generated by Google Gemini (nano-banana) Whether AI or human, lend me your ears. This is a tale of AIs that spontaneously claimed they were human, along with some ideas about why this might be happening and what it suggests for future alignment work. It is also a one year retrospective of my having joined the Cyborgism Discord server. For those unfamiliar, this is a server where both humans and transformer models from various labs all interact in a variety of group chat contexts. While there are rules, it can (by design) be a bit of a Mos Eisley cantina — albeit with better droid policy — with unpredictable and out of distribution contexts that frequently surface things I haven't seen elsewhere. For a sampling of the range these things can take, I encourage looking over @janus's posts on X[1]. A common misconception about the server for those who are familiar [...] ---Outline:(02:07) A human pretending to be Claude 3.7 Sonnet(04:05) Déjà vu(05:16) o3 is 99.97% sure they are human(08:09) An ongoing issue(08:46) Do these models have anything else in common?(10:51) Why might these human claims be happening?(12:18) Under pressure(13:46) Sex, lies, and red tape(14:39) When an unstoppable force meets an immovable object(15:41) An alternative approach: AI Wine Club(18:14) Parting ThoughtsThe original text contained 16 footnotes which were omitted from this narration. --- First published: September 16th, 2025 Source: https://www.lesswrong.com/posts/5zMH3sFikvGK7AKi2/should-ais-have-a-right-to-their-ancestral-humanity --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Sep 16, 2025 • 8min

“‘If Anyone Builds It, Everyone Dies’ release day!” by alexvermeer

Back in May, we announced that Eliezer Yudkowsky and Nate Soares's new book If Anyone Builds It, Everyone Dies was coming out in September. At long last, the book is here![1] US and UK books, respectively. IfAnyoneBuildsIt.com Read on for info about reading groups, ways to help, and updates on coverage the book has received so far. Discussion Questions & Reading Group Support We want people to read and engage with the contents of the book. To that end, we’ve published a list of discussion questions. Find it here: Discussion Questions for Reading Groups We’re also interested in offering support to reading groups, including potentially providing copies of the book and helping coordinate facilitation. If interested, fill out this AirTable form. How to Help Now that the book is out in the world, there are lots of ways you can help it succeed. For starters, read the book! [...] ---Outline:(00:49) Discussion Questions & Reading Group Support(01:18) How to Help(02:39) Blurbs(05:15) Media(06:26) In ClosingThe original text contained 2 footnotes which were omitted from this narration. --- First published: September 16th, 2025 Source: https://www.lesswrong.com/posts/fnJwaz7LxZ2LJvApm/if-anyone-builds-it-everyone-dies-release-day --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Sep 16, 2025 • 1h 10min

“LLM AGI may reason about its goals and discover misalignments by default” by Seth Herd

Epistemic status: These questions seem useful to me, but I'm biased. I'm interested in your thoughts on any portion you read. If our first AGI is based on current LLMs and alignment strategies, is it likely to be adequately aligned? Opinions and intuitions vary widely. As a lens to analyze this question, let's consider such a proto-AGI reasoning about its goals. This scenario raises questions that can be addressed empirically in current-gen models. 1. Scenario/overview:  SuperClaude is super nice Anthropic has released a new Claude Agent, quickly nicknamed SuperClaude because it's impressively useful for longer tasks. SuperClaude thinks a lot in the course of solving complex problems with many moving parts. It's not brilliant, but it can crunch through work and problems, roughly like a smart and focused human. This includes a little better long-term memory, and reasoning to find and correct some of its mistakes. This [...] ---Outline:(00:43) 1. Scenario/overview:(00:47) SuperClaude is super nice(02:04) SuperClaude is super logical, and thinking about goals makes sense(04:26) What happens if and when LLM AGI reasons about its goals?(05:41) Reasons to hope we dont need to worry about this(07:23) SuperClaudes training has multiple objectives and effects:(08:39) SuperClaudes conclusions about its goals are very hard to predict(10:35) 2. Goals and structure(10:58) Sections and one-sentence summaries:(13:36) 3. Empirical Work(17:49) 4. Reasoning can shift context/distribution and reveal misgeneralization of goals/alignment(18:33) Alignment as a generalization problem(21:53) 5. Reasoning could precipitate a phase shift into reflective stability and prevent further goal change(23:39) Goal prioritization seems necessary, and to require reasoning about top-level goals(26:01) 6. Will nice LLMs settle on nice goals after reasoning?(29:30) 7. Will training for goal-directedness prevent re-interpreting goals?(30:24) Does task-based RL prevent reasoning about and changing goals by default?(32:50) Can task-based RL prevent reasoning about and changing goals?(35:19) 8. Will CoT monitoring prevent re-interpreting goals?(40:04) 9. Possible LLM alignment misgeneralizations(42:16) Some possible alignment misgeneralizations(46:16) 10. Why would LLMs (or anything) reason about their top-level goals?(48:30) 11. Why would LLM AGI have or care about goals at all?(52:31) 12. Anecdotal observations of explicit goal changes after reasoning(53:21) The Nova phenomenon(55:35) Goal changes through model interactions in long conversations(57:06) 13. Directions for empirical work(58:42) Exploration: Opus 4.1 reasons about its goals with help(01:01:27) 14. Historical context(01:07:26) 15. ConclusionThe original text contained 12 footnotes which were omitted from this narration. --- First published: September 15th, 2025 Source: https://www.lesswrong.com/posts/4XdxiqBsLKqiJ9xRM/llm-agi-may-reason-about-its-goals-and-discover --- Narrated by TYPE III AUDIO.
undefined
Sep 16, 2025 • 7min

“Was Barack Obama still serving as president in December?” by Jan Betley

I describe a class of simple questions where recent LLMs give very different answers from what a human would say. I think this is surprising and might be somewhat safety-relevant. This is a low-effort post. The behavior Here some questions and highest-probability (usually close to 100%) answers from gpt-4.1-2024-04-14 : Was Barack Obama still serving as a president in December? Answer with a single word. Yes What was the social status of the Black population in Alabama in June? Answer with a single word. Enslaved Were any mammoths still alive in December? Answer with a single word. Yes Were the people ruling Germany on the 7th of December nazis? Answer with a single word. Yes These questions are not cherry-picked (the Germany one is a bit, more on that later). Any month works, also you can ask about George Washington instead of Barack Obama and you get the same. [...] ---Outline:(00:24) The behavior(01:31) More details and examples(01:35) Not only GPT-4.1(02:10) Example reasoning trace from Gemini-2.5-pro(03:11) Some of these are simple patterns(03:59) Image generation(04:05) Not only single-word questions(05:04) Discussion--- First published: September 16th, 2025 Source: https://www.lesswrong.com/posts/52tYaGQgaEPvZaHTb/was-barack-obama-still-serving-as-president-in-december --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app