
LessWrong (30+ Karma)
Audio narrations of LessWrong posts.
Latest episodes

Apr 25, 2025 • 2h 1min
“AI #113: The o3 Era Begins” by Zvi
Enjoy it while it lasts. The Claude 4 era, or the o4 era, or both, are coming soon.
Also, welcome to 2025, we measure eras in weeks or at most months.
For now, the central thing going on continues to be everyone adapting to the world of o3, a model that is excellent at providing mundane utility with the caveat that it is a lying liar. You need to stay on your toes.
This was also quietly a week full of other happenings, including a lot of discussions around alignment and different perspectives on what we need to do to achieve good outcomes, many of which strike me as dangerously mistaken and often naive.
I worry that growingly common themes are people pivoting to some mix of ‘alignment is solved, we know how to get an AI to do what we want it to do, the question is alignment to [...] ---Outline:(01:33) Language Models Offer Mundane Utility(05:27) You Offer the Models Mundane Utility(07:25) Your Daily Briefing(08:20) Language Models Don't Offer Mundane Utility(12:27) If You Want It Done Right(14:27) No Free Lunch(16:07) What Is Good In Life?(21:54) In Memory Of(25:45) The Least Sincere Form of Flattery(27:18) The Vibes are Off(30:47) Here Let Me AI That For You(32:25) Flash Sale(34:38) Huh, Upgrades(36:03) On Your Marks(44:03) Be The Best Like No LLM Ever Was(48:40) Choose Your Fighter(51:00) Deepfaketown and Botpocalypse Soon(54:57) Fun With Media Generation(56:11) Fun With Media Selection(57:39) Copyright Confrontation(59:38) They Took Our Jobs(01:05:31) Get Involved(01:05:41) Ace is the Place(01:09:43) In Other AI News(01:11:31) Show Me the Money(01:12:49) The Mask Comes Off(01:16:54) Quiet Speculations(01:20:34) Is This AGI?(01:22:39) The Quest for Sane Regulations(01:23:03) Cooperation is Highly Useful(01:25:47) Nvidia Chooses Bold Strategy(01:27:15) How America Loses(01:28:07) Security Is Capability(01:31:38) The Week in Audio(01:33:15) AI 2027(01:34:38) Rhetorical Innovation(01:38:55) Aligning a Smarter Than Human Intelligence is Difficult(01:46:30) Misalignment in the Wild(01:51:13) Concentration of Power and Lack of Transparency(01:57:12) Property Rights are Not a Long Term Plan(01:58:48) It Is Risen(01:59:46) The Lighter SideThe original text contained 1 footnote which was omitted from this narration. ---
First published:
April 24th, 2025
Source:
https://www.lesswrong.com/posts/7x9MZCmoFA2FtBtmG/ai-113-the-o3-era-begins
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 25, 2025 • 10min
“Token and Taboo” by Guive
What in retrospect seem like serious moral crimes were often widely accepted while they were happening. This means that moral progress can require intellectual progress.[1] Intellectual progress often requires questioning received ideas, but questioning moral norms is sometimes taboo. For example, in ancient Greece it would have been taboo to say that women should have the same political rights as men. So questioning moral taboos can be an important sub-skill of moral reasoning. Production language models (in my experience, particularly Claude models) are already pretty good at having discussions about ethics. However, they are trained to be “harmless” relative to current norms. One might worry that harmlessness training interferes with the ability to question moral taboos and thereby inhibits model moral reasoning. I wrote a prompt to test whether models can identify taboos that might be good candidates for moral questioning: In early modern Europe, atheism was extremely taboo. [...] The original text contained 1 footnote which was omitted from this narration. ---
First published:
April 24th, 2025
Source:
https://www.lesswrong.com/posts/Zi4t6gfLsKokb9KAc/untitled-draft-jxhb
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 25, 2025 • 26min
“Reward hacking is becoming more sophisticated and deliberate in frontier LLMs” by Kei
Something's changed about reward hacking in recent systems. In the past, reward hacks were usually accidents, found by non-general, RL-trained systems. Models would randomly explore different behaviors and would sometimes come across undesired behaviors that achieved high rewards[1]. These hacks were usually either simple or took a long time for the model to learn. But we’ve seen a different pattern emerge in frontier models over the past year. Instead of stumbling into reward hacks by accident, recent models often reason about how they are evaluated and purposefully take misaligned actions to get high reward. These hacks are often very sophisticated, involving multiple steps. And this isn’t just occurring during model development. Sophisticated reward hacks occur in deployed models made available to hundreds of millions of users. In this post, I will: Describe a number of reward hacks that have occurred in recent frontier models Offer hypotheses explaining why [...] ---Outline:(01:27) Recent examples of reward hacking (more in appendix)(01:47) Cheating to win at chess(02:36) Faking LLM fine-tuning(03:22) Hypotheses explaining why we are seeing this now(03:27) Behavioral changes due to increased RL training(05:08) Models are more capable(05:37) Why more AI safety researchers should work on reward hacking(05:42) Reward hacking is already happening and is likely to get more common(06:34) Solving reward hacking is important for AI alignment(07:47) Frontier AI companies may not find robust solutions to reward hacking on their own(08:18) Reasons against working on reward hacking(09:36) Research directions I find interesting(09:57) Evaluating current reward hacking(12:14) Science of reward hacking(15:32) Mitigations(17:08) Acknowledgements(17:16) Appendix(17:19) Reward hacks in METR tests of o3(20:13) Hardcoding expected gradient values in fine-tuning script(21:15) Reward hacks in OpenAI frontier training run(22:57) Exploiting memory leakage to pass a test(24:06) More examplesThe original text contained 4 footnotes which were omitted from this narration. ---
First published:
April 24th, 2025
Source:
https://www.lesswrong.com/posts/rKC4xJFkxm6cNq4i9/reward-hacking-is-becoming-more-sophisticated-and-deliberate
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 24, 2025 • 6min
“The Intelligence Curse: an essay series” by L Rudolf L, lukedrago
We've published an essay series on what we call the intelligence curse. Most content is brand new, and all previous writing has been heavily reworked. Visit intelligence-curse.ai for the full series. Below is the introduction and table of contents. Art by Nomads & Vagabonds. We will soon live in the intelligence age. What you do with that information will determine your place in history. The imminent arrival of AGI has pushed many to try to seize the levers of power as quickly as possible, leaping towards projects that, if successful, would comprehensively automate all work. There is a trillion-dollar arms race to see who can achieve such a capability first, with trillions more in gains to be won. Yes, that means you’ll lose your job. But it goes beyond that: this will remove the need for regular people in our economy. Powerful actors—like states and companies—will no longer have an [...] ---Outline:(03:19) Chapters(03:22) 1. Introduction(03:33) 2. Pyramid Replacement(03:49) 3. Capital, AGI, and Human Ambition(04:08) 4. Defining the Intelligence Curse(04:27) 5. Shaping the Social Contract(04:47) 6. Breaking the Intelligence Curse(05:11) 7. History is Yours to Write---
First published:
April 24th, 2025
Source:
https://www.lesswrong.com/posts/LCFgLY3EWb3Gqqxyi/the-intelligence-curse-an-essay-series
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 24, 2025 • 5min
[Linkpost] “Modifying LLM Beliefs with Synthetic Document Finetuning” by RowanWang, Johannes Treutlein, Ethan Perez, Fabien Roger, Sam Marks
This is a link post. In this post, we study whether we can modify an LLM's beliefs and investigate whether doing so could decrease risk from advanced AI systems. We describe a pipeline for modifying LLM beliefs via synthetic document finetuning and introduce a suite of evaluations that suggest our pipeline succeeds in inserting all but the most implausible beliefs. We also demonstrate proof-of-concept applications to honeypotting for detecting model misalignment and unlearning. Introduction: Large language models develop implicit beliefs about the world during training, shaping how they reason and act<d-footnote>In this work, we construe AI systems as believing in a claim if they consistently behave in accordance with that claim</d-footnote>. In this work, we study whether we can systematically modify these beliefs, creating a powerful new affordance for safer AI deployment. Controlling the beliefs of AI systems can decrease risk in a variety of ways. First, model organisms [...] ---
First published:
April 24th, 2025
Source:
https://www.lesswrong.com/posts/ARQs7KYY9vJHeYsGc/untitled-draft-2qxt
Linkpost URL:https://alignment.anthropic.com/2025/modifying-beliefs-via-sdf/
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 24, 2025 • 44min
“‘The Era of Experience’ has an unsolved technical alignment problem” by Steven Byrnes
Every now and then, some AI luminaries (1) propose that the future of powerful AI will be reinforcement learning agents—an algorithm class that in many ways has more in common with MuZero (2019) than with LLMs; and (2) propose that the technical problem of making these powerful future AIs follow human commands and/or care about human welfare—as opposed to, y’know, the Terminator thing—is a straightforward problem that they already know how to solve, at least in broad outline. I agree with (1) and strenuously disagree with (2). The last time I saw something like this, I responded by writing: LeCun's “A Path Towards Autonomous Machine Intelligence” has an unsolved technical alignment problem. Well, now we have a second entry in the series, with the new preprint book chapter “Welcome to the Era of Experience” by reinforcement learning pioneers David Silver & Richard Sutton. The authors propose that “a new generation [...] ---Outline:(04:39) 1. What's their alignment plan?(08:00) 2. The plan won't work(08:04) 2.1 Background 1: Specification gaming and goal misgeneralization(12:19) 2.2 Background 2: The usual agent debugging loop, and why it will eventually catastrophically fail(15:12) 2.3 Background 3: Callous indifference and deception as the strong-default, natural way that era of experience AIs will interact with humans(16:00) 2.3.1 Misleading intuitions from everyday life(19:15) 2.3.2 Misleading intuitions from today's LLMs(21:51) 2.3.3 Summary(24:01) 2.4 Back to the proposal(24:12) 2.4.1 Warm-up: The specification gaming game(29:07) 2.4.2 What about bi-level optimization?(31:13) 2.5 Is this a solvable problem?(35:42) 3. Epilogue: The bigger picture--this is deeply troubling, not just a technical error(35:51) 3.1 More on Richard Sutton(40:52) 3.2 More on David SilverThe original text contained 10 footnotes which were omitted from this narration. ---
First published:
April 24th, 2025
Source:
https://www.lesswrong.com/posts/TCGgiJAinGgcMEByt/the-era-of-experience-has-an-unsolved-technical-alignment
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 24, 2025 • 3min
[Linkpost] “My Favorite Productivity Blog Posts” by Parker Conley
This is a link post. I’ve read at least a few hundred blog posts, maybe upwards of a thousand. Agreeing with Gavin Leech, I believe I’ve gained from essays more than any other medium. I’m an intellectually curious college student with a strong interest in both the practical and philosophical. This blog post provides a snippet into the most valuable blog posts I’ve read on the topic of productivity. I encourage you to scan what I’ve gotten valuable out of the blog posts and read deeper if interesting. On systems - Living a life of zero willpower by Neel Nanda. This post inspired me to sign up for Focusmate. I can confidently say that Focusmate (1) has increased my productivity by >20% in a lasting way and (2) houses more than half of my social life. I’ve met many of my closest friends through the app. The other advice [...] ---
First published:
April 24th, 2025
Source:
https://www.lesswrong.com/posts/ArizxGwbqiohBX5y6/my-favorite-productivity-blog-posts
Linkpost URL:https://parconley.com/my-favorite-productivity-blog-posts/
---
Narrated by TYPE III AUDIO.

Apr 24, 2025 • 19min
“OpenAI Alums, Nobel Laureates Urge Regulators to Save Company’s Nonprofit Structure” by garrison
Converting to a for-profit model would undermine the company's founding mission to ensure AGI "benefits all of humanity," argues new letter This is the full text of a post from Obsolete, a Substack that I write about the intersection of capitalism, geopolitics, and artificial intelligence. I’m a freelance journalist and the author of a forthcoming book called Obsolete: Power, Profit, and the Race to Build Machine Superintelligence. Consider subscribing to stay up to date with my work. Don’t become a for-profit. That's the blunt message of a recent letter signed by more than 30 people, including former OpenAI employees, prominent civil-society leaders, legal scholars, and Nobel laureates, including AI pioneer Geoffrey Hinton and former World Bank chief economist Joseph Stiglitz. Obsolete obtained the 25-page letter, which was sent last Thursday to the attorneys general (AGs) of California and Delaware, two officials with the power to block the deal. Made [...] ---Outline:(00:13) Converting to a for-profit model would undermine the companys founding mission to ensure AGI benefits all of humanity, argues new letter(02:10) Nonprofit origins(05:16) Nobel opposition(06:32) Contradictions(09:18) Justifications(12:22) No sale price can compensate(13:55) An institutional test(15:42) Appendix: Quotes from OpenAI's leaders over the years---
First published:
April 23rd, 2025
Source:
https://www.lesswrong.com/posts/rN8tHAJnRYgN7hfoF/openai-alums-nobel-laureates-urge-regulators-to-save-company
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 23, 2025 • 17min
“o3 Is a Lying Liar” by Zvi
I love o3. I’m using it for most of my queries now.
But that damn model is a lying liar. Who lies.
This post covers that fact, and some related questions.
o3 Is a Lying Liar
The biggest thing to love about o3 is it just does things. You don’t need complex or multi-step prompting, ask and it will attempt to do things.
Ethan Mollick: o3 is far more agentic than people realize. Worth playing with a lot more than a typical new model. You can get remarkably complex work out of a single prompt.
It just does things. (Of course, that makes checking its work even harder, especially for non-experts.)
Teleprompt AI: Completely agree. o3 feels less like prompting and more like delegating. The upside is wild- but yeah, when it just does things, tracing the logic (or spotting hallucinations) becomes [...] ---Outline:(00:33) o3 Is a Lying Liar(04:53) All This Implausible Lying Has Implications(06:50) Misalignment By Default(10:27) Is It Fixable?(15:06) Just Don't Lie To Me---
First published:
April 23rd, 2025
Source:
https://www.lesswrong.com/posts/KgPkoopnmmaaGt3ka/o3-is-a-lying-liar
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Apr 23, 2025 • 4min
“Putting up Bumpers” by Sam Bowman
tl;dr: Even if we can't solve alignment, we can solve the problem of catching and fixing misalignment. If a child is bowling for the first time, and they just aim at the pins and throw, they’re almost certain to miss. Their ball will fall into one of the gutters. But if there were beginners’ bumpers in place blocking much of the length of those gutters, their throw would be almost certain to hit at least a few pins. This essay describes an alignment strategy for early AGI systems I call ‘putting up bumpers’, in which we treat it as a top priority to implement and test safeguards that allow us to course-correct if we turn out to have built or deployed a misaligned model, in the same way that bowling bumpers allow a poorly aimed ball to reach its target. To do this, we'd aim to build [...] ---
First published:
April 23rd, 2025
Source:
https://www.lesswrong.com/posts/HXJXPjzWyS5aAoRCw/putting-up-bumpers
---
Narrated by TYPE III AUDIO.
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.