
LessWrong (30+ Karma)
Audio narrations of LessWrong posts.
Latest episodes

Apr 8, 2025 • 41min
“Alignment Faking Revisited: Improved Classifiers and Open Source Extensions” by John Hughes, abhayesian, Akbir Khan, Fabien Roger
In this post, we present a replication and extension of an alignment faking model organism: Replication: We replicate the alignment faking (AF) paper and release our code. Classifier Improvements: We significantly improve the precision and recall of the AF classifier. We release a dataset of ~100 human-labelled examples of AF for which our classifier achieves an AUROC of 0.9 compared to 0.6 from the original classifier. Evaluating More Models: We find Llama family models, other open source models, and GPT-4o do not AF in the prompted-only setting when evaluating using our new classifier (other than a single instance with Llama 3 405B). Extending SFT Experiments: We run supervised fine-tuning (SFT) experiments on Llama (and GPT4o) and find that AF rate increases with scale. We release the fine-tuned models on Huggingface and scripts. Alignment faking on 70B: We find that Llama 70B alignment fakes when both using the system prompt in the [...] ---Outline:(02:43) Method(02:46) Overview of the Alignment Faking Setup(04:22) Our Setup(06:02) Results(06:05) Improving Alignment Faking Classification(10:56) Replication of Prompted Experiments(14:02) Prompted Experiments on More Models(16:35) Extending Supervised Fine-Tuning Experiments to Open-Source Models and GPT-4o(23:13) Next Steps(25:02) Appendix(25:05) Appendix A: Classifying alignment faking(25:17) Criteria in more depth(27:40) False positives example 1 from the old classifier(30:11) False positives example 2 from the old classifier(32:06) False negative example 1 from the old classifier(35:00) False negative example 2 from the old classifier(36:56) Appendix B: Classifier ROC on other models(37:24) Appendix C: User prompt suffix ablation(40:24) Appendix D: Longer training of baseline docsThe original text contained 1 footnote which was omitted from this narration. ---
First published:
April 8th, 2025
Source:
https://www.lesswrong.com/posts/Fr4QsQT52RFKHvCAH/alignment-faking-revisited-improved-classifiers-and-open
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 8, 2025 • 54min
“AI 2027: Responses” by Zvi
Yesterday I covered Dwarkesh Patel's excellent podcast coverage of AI 2027 with Daniel Kokotajlo and Scott Alexander. Today covers the reactions of others.
Kevin Roose in The New York Times
Kevin Roose covered Scenario 2027 in The New York Times.
Kevin Roose: I wrote about the newest AGI manifesto in town, a wild future scenario put together by ex-OpenAI researcher @DKokotajlo and co.
I have doubts about specifics, but it's worth considering how radically different things would look if even some of this happened.
Daniel Kokotajlo: AI companies claim they’ll have superintelligence soon. Most journalists understandably dismiss it as hype. But it's not just hype; plenty of non-CoI’d people make similar predictions, and the more you read about the trendlines the more plausible it looks. Thank you & the NYT!
The final conclusion is supportive of this kind of work, and Kevin points out that expectations at the major [...] ---Outline:(00:21) Kevin Roose in The New York Times(02:56) Eli Lifland Offers Takeaways(04:23) Scott Alexander Offers Takeaways(05:34) Others Takes on Scenario 2027(05:39) Having a Concrete Scenario is Helpful(08:37) Writing It Down Is Valuable Even If It Is Wrong(10:00) Saffron Huang Worries About Self-Fulfilling Prophecy(18:18) Phillip Tetlock Calibrates His Skepticism(21:38) Jan Kulveit Wants to Bet(23:08) Matthew Barnett Debates How To Evaluate the Results(24:38) Teortaxes for China and Open Models and My Response(31:53) Others Wonder About PRC Passivity(33:40) Timothy Lee Remains Skeptical(35:16) David Shapiro for the Accelerationists and Scott's Response(45:29) LessWrong Weighs In(46:59) Other Reactions(50:05) Next Steps(52:34) The Lighter Side---
First published:
April 8th, 2025
Source:
https://www.lesswrong.com/posts/gyT8sYdXch5RWdpjx/ai-2027-responses
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 8, 2025 • 11min
“American College Admissions Doesn’t Need to Be So Competitive” by Arjun Panickssery
Spoiler: “So after removing the international students from the calculations, and using the middle-of-the-range estimates, the conclusion: The top-scoring 19,000 American students each year are competing in top-20 admissions for about 12,000 spots out of 44,000 total. Among the Ivy League + MIT + Stanford, they’re competing for about 6,500 out of 15,800 total spots.” It's well known that Admission to top universities is very competitive in America and even top SAT scores (1550+ out of 1600) paired with a 4.0 GPA doesn’t guarantee admission to a top school. Even top universities take into account race-based affirmative action, athletic recruitment, “Dean's Interest List”-type tracking systems for children of donors or notable persons, and legacy preference for children of alumni. But many people are under the misconception that the resulting “rat race”—the highly competitive and strenuous admissions ordeal—is the inevitable result of the limited class sizes among top [...] ---
First published:
April 7th, 2025
Source:
https://www.lesswrong.com/posts/vptDgKbiEwsKAFuco/american-college-admissions-doesn-t-need-to-be-so
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 7, 2025 • 13min
“Most Questionable Details in ‘AI 2027’” by scarcegreengrass
My thoughts on the recently posted story. Caveats I think it's great that the AI Futures Project wrote up a detailed scenario. I enjoy it. Every part of the story i didn't comment on here is either fine or excellent. This is one of the most realistic scenarios i've read. All detailed predictions contain errors. The authors of this scenario don't claim it's the most likely future. If the speed of 2018-2025 was the typical rate of progress in software, then AI 2027 would be realistic. Core Disagreements Early 2026: OpenBrain is making algorithmic progress 50% faster. As with many parts of this scenario, i think this is plausible but too fast. 150% productivity is a lot in business terms, & the scenario doesn't provide much detail for why this is 150% as opposed to 110%. In my software development experience, organizations are bottlenecked by their [...] ---Outline:(00:14) Caveats(00:52) Core Disagreements(05:33) Minor Details(12:04) Overall---
First published:
April 5th, 2025
Source:
https://www.lesswrong.com/posts/6Aq2FBZreyjBp6FDt/most-questionable-details-in-ai-2027
---
Narrated by TYPE III AUDIO.

Apr 7, 2025 • 48min
“AI 2027: Dwarkesh’s Podcast with Daniel Kokotajlo and Scott Alexander” by Zvi
Daniel Kokotajlo has launched AI 2027, Scott Alexander introduces it here. AI 2027 is a serious attempt to write down what the future holds. His ‘What 2026 Looks Like’ was very concrete and specific, and has proved remarkably accurate given the difficulty level of such predictions.
I’ve had the opportunity to play the wargame version of the scenario described in 2027, and I reviewed the website prior to publication and offered some minor notes. Whenever I refer to a ‘scenario’ in this post I’m talking about Scenario 2027.
There's tons of detail here. The research here, and the supporting evidence and citations and explanations, blow everything out of the water. It's vastly more than we usually see, and dramatically different from saying ‘oh I expect AGI in 2027’ or giving a timeline number. This lets us look at what happens in concrete detail, figure out where we [...] ---Outline:(02:00) The Structure of These Post(03:37) Coverage of the Podcast---
First published:
April 7th, 2025
Source:
https://www.lesswrong.com/posts/vnkH6JrGu2AxtDTyu/ai-2027-dwarkesh-s-podcast-with-daniel-kokotajlo-and-scott
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 7, 2025 • 14min
“How Gay is the Vatican?” by rba
The catholic church has always had a complicated relationship with homosexuality. The central claim of Frederic Martel's 2019 book In the Closet of the Vatican is that the majority of the church's leadership in Rome are semi-closeted homosexuals, or more colorfully, "homophiles". So the omnipresence of homosexuals in the Vatican isn’t just a matter of a few black sheep, or the ‘net that caught the bad fish’, as Josef Ratzinger put it. It isn’t a ‘lobby’ or a dissident movement; neither is it a sect of Freemasonry inside the holy see: it's a system. It isn’t a tiny minority; it's a big majority. At this point in the conversation, I ask Francesco Lepore to estimate the size of this community, all tendencies included. ‘I think the percentage is very high. I’d put it at around 80 percent.’ … During a discussion with a non-Italian archbishop, whom I met [...] ---Outline:(02:40) Background(04:23) Data(06:40) Analysis(07:25) Expected versus Actual birth order, with missing birth order(09:03) Expected versus Actual birth order, without missing birth order(09:46) Oldest sibling versus youngest sibling(10:21) Discussion(13:09) Conclusion---
First published:
April 6th, 2025
Source:
https://www.lesswrong.com/posts/ybwqL9HiXE8XeauPK/how-gay-is-the-vatican
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 6, 2025 • 17min
“The Lizardman and the Black Hat Bobcat” by Screwtape
There's an irritating circumstance I call The Black Hat Bobcat, or Bobcatting for short. The Blackhat Bobcat is when there's a terrible behavior that comes up often enough to matter, but rarely enough that it vanishes in the noise of other generally positive feedback.xkcd, A-Minus-Minus The alt-text for this comic is illuminating. "You can do this one in thirty times and still have 97% positive feedback." I would like you to contemplate this comic and alt-text as though it were deep wisdom handed down from a sage who lived atop a mountaintop. I. Black Hat Bobcatting is when someone (let's call them Bob) does something obviously lousy, but very infrequently. If you're standing right there when the Bobcatting happens, it's generally clear that this is not what is supposed to happen, and sometimes seems pretty likely it's intentional. After all, how exactly do you pack a [...] ---Outline:(00:49) I.(03:38) II.(06:18) III.(09:38) IV.(13:37) V.---
First published:
April 6th, 2025
Source:
https://www.lesswrong.com/posts/Ry9KCEDBMGWoEMGAj/the-lizardman-and-the-black-hat-bobcat
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 6, 2025 • 21min
“A collection of approaches to confronting doom, and my thoughts on them” by Ruby
I just published A Slow Guide to Confronting Doom, containing my own approach to living in a world that I think has a high likelihood of ending soon. Fortunately I'm not the only person to have written on topic. Below are my thoughts on what others have written. I have not written these such that they stand independent from the originals, and have attentionally not written summaries that wouldn't do the pieces justice. I suggest you read or at least skim the originals. A defence of slowness at the end of the world (Sarah) I feel kinship with Sarah. She's wrestling with the same harsh scary realities I am – feeling the AGI. The post isn't that long and I recommend reading it, but to quote just a little: Since learning of the coming AI revolution, I’ve lived in two worlds. One moves at a leisurely pace, the same [...] ---Outline:(00:37) A defence of slowness at the end of the world (Sarah)(03:37) How will the bomb find you? (C. S. Lewis)(08:02) Death with Dignity (Eliezer Yudkowsky)(09:08) Dont die with dignity; instead play to your outs (Jeffrey Ladish)(10:29) Emotionally Confronting a Probably-Doomed World: Against Motivation Via Dignity Points (TurnTrout)(12:44) A Way To Be Okay (Duncan Sabien)(14:17) Another Way to Be Okay (Gretta Duleba)(14:39) Being at peace with Doom (Johannes C. Mayer)(16:56) Heres the exit. (Valentine)(19:14) Mainstream AdviceThe original text contained 1 footnote which was omitted from this narration. ---
First published:
April 6th, 2025
Source:
https://www.lesswrong.com/posts/ZE4xhZHDHHXPuXzxh/a-collection-of-approaches-to-confronting-doom-and-my
---
Narrated by TYPE III AUDIO.

Apr 6, 2025 • 27min
“A Slow Guide to Confronting Doom, v1” by Ruby
Following a few events[1] in April 2022 that caused a many people to update sharply and negatively on outcomes for humanity, I wrote A Quick Guide to Confronting Doom. I advised: Think for yourself Be gentle with yourself Don't act rashly Be patient about helping Don't act unilaterally Figure out what works for you This is fine advice and all, I stand by it, but it's also not really a full answer to how to contend with the utterly crushing weight of the expectation that everything and everyone you value will be destroyed in the next decade or two. Feeling the Doom Before I get into my suggested psychological approach to doom, I want to clarify the kind of doom I'm working to confront. If you are impatient, you can skip to the actual advice. The best analogy I have is the feeling of having a terminally [...] ---Outline:(00:46) Feeling the Doom(04:28) Facing the doom(04:50) Stay hungry for value(06:42) The bitter truth over sweet lies(07:35) Dont look away(08:11) Flourish as best one can(09:13) This time with feeling(13:27) Mindfulness(14:00) The time for action is now(15:18) Creating space for miracles(15:58) How does a good person live in such times?(16:49) Continue to think, tolerate uncertainty(18:03) Being a looker(18:48) Dont throw away your mind(20:22) Damned to lie in bed...(22:13) Worries, compulsions, and excessive angst(22:49) Comments on others approaches(23:14) What does it mean to be okay?(25:17) Why is this guide titled version 1?(25:38) If youre gonna remember just a couple thingsThe original text contained 12 footnotes which were omitted from this narration. ---
First published:
April 6th, 2025
Source:
https://www.lesswrong.com/posts/X6Nx9QzzvDhj8Ek9w/a-slow-guide-to-confronting-doom-v1
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 6, 2025 • 47sec
“How much progress actually happens in theoretical physics?” by ChristianKl
I frequently hear people make the claim that progress in theoretically physics is stalled, partly because all the focus is on String theory and String theory doesn't seem to pan out into real advances. Believing it fits my existing biases, but I notice that I lack the physics understanding to really know whether or not there's progress. What do you think? ---
First published:
April 4th, 2025
Source:
https://www.lesswrong.com/posts/GBfMkaBdAnWLab2dj/how-much-progress-actually-happens-in-theoretical-physics
---
Narrated by TYPE III AUDIO.
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.