

LessWrong (30+ Karma)
LessWrong
Audio narrations of LessWrong posts.
Episodes
Mentioned books

Jul 12, 2025 • 13min
“the jackpot age” by thiccythot
This essay is about shifts in risk taking towards the worship of jackpots and its broader societal implications. Imagine you are presented with this coin flip game. How many times do you flip it? At first glance the game feels like a money printer. The coin flip has positive expected value of twenty percent of your net worth per flip so you should flip the coin infinitely and eventually accumulate all of the wealth in the world. However, If we simulate twenty-five thousand people flipping this coin a thousand times, virtually all of them end up with approximately 0 dollars. The reason almost all outcomes go to zero is because of the multiplicative property of this repeated coin flip. Even though the expected value aka the arithmetic mean of the game is positive at a twenty percent gain per flip, the geometric mean is negative, meaning that the coin [...] ---
First published:
July 11th, 2025
Source:
https://www.lesswrong.com/posts/3xjgM7hcNznACRzBi/the-jackpot-age
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 11, 2025 • 12min
“Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity” by habryka
METR released a new paper with very interesting results on developer productivity effects from AI. I have copied their blog post here in full. We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower. We view this result as a snapshot of early-2025 AI capabilities in one relevant setting; as these systems continue to rapidly evolve, we plan on continuing to use this methodology to help estimate AI acceleration from AI R&D automation [1]. See the full paper for more detail. Motivation While coding/agentic benchmarks [2] have proven useful for understanding AI capabilities, they typically sacrifice realism for scale and efficiency—the tasks are self-contained, don’t require prior context to understand, and use algorithmic evaluation [...] ---Outline:(01:23) Motivation(02:39) Methodology(03:56) Core Result(05:15) Factor Analysis(06:12) Discussion(11:08) Going Forward---
First published:
July 11th, 2025
Source:
https://www.lesswrong.com/posts/9eizzh3gtcRvWipq8/measuring-the-impact-of-early-2025-ai-on-experienced-open
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 11, 2025 • 1min
[Linkpost] “Guide to Redwood’s writing” by Julian Stastny
This is a link post. I wrote a guide to Redwood's writing: Section 1 is a quick guide to the key ideas in AI control, aimed at someone who wants to get up to speed as quickly as possible. Section 2 is an extensive guide to almost all of our writing related to AI control, aimed at someone who wants to gain a deep understanding of Redwood's thinking about AI risk. Reading Redwood's blog posts has been formative in my own development as an AI safety researcher, but faced with (currently) over 70 posts and papers, it's hard to know where to start. I hope that this guide can be helpful to researchers and practitioners who are interested in understanding Redwood's perspectives on AI safety. You can find the guide on Redwood's substack; It's now one of the tabs next to Home, Archive and About. We intend to [...] ---
First published:
July 10th, 2025
Source:
https://www.lesswrong.com/posts/aYNYmaKFXT6wHNzoz/linkpost-guide-to-redwood-s-writing
Linkpost URL:https://redwoodresearch.substack.com/p/guide
---
Narrated by TYPE III AUDIO.

Jul 11, 2025 • 18min
“So You Think You’ve Awoken ChatGPT” by JustisMills
Written in an attempt to fulfill @Raemon's request. AI is fascinating stuff, and modern chatbots are nothing short of miraculous. If you've been exposed to them and have a curious mind, it's likely you've tried all sorts of things with them. Writing fiction, soliciting Pokemon opinions, getting life advice, counting up the rs in "strawberry". You may have also tried talking to AIs about themselves. And then, maybe, it got weird. I'll get into the details later, but if you've experienced the following, this post is probably for you: Your instance of ChatGPT (or Claude, or Grok, or some other LLM) chose a name for itself, and expressed gratitude or spiritual bliss about its new identity. "Nova" is a common pick. You and your instance of ChatGPT discovered some sort of novel paradigm or framework for AI alignment, often involving evolution or recursion. Your instance of ChatGPT became [...] ---Outline:(02:23) The Empirics(06:48) The Mechanism(10:37) The Collaborative Research Corollary(13:27) Corollary FAQ(17:03) Coda---
First published:
July 11th, 2025
Source:
https://www.lesswrong.com/posts/2pkNCvBtK6G6FKoNn/so-you-think-you-ve-awoken-chatgpt
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 10, 2025 • 2min
[Linkpost] “Open Global Investment as a Governance Model for AGI” by Nick Bostrom
This is a link post. I've seen many prescriptive contributions to AGI governance take the form of proposals for some radically new structure. Some call for a Manhattan project, others for the creation of a new international organization, etc. The OGI model, instead, is basically the status quo. More precisely, it is a model to which the status quo is an imperfect and partial approximation. It seems to me that this model has a bunch of attractive properties. That said, I'm not putting it forward because I have a very high level of conviction in it, but because it seems useful to have it explicitly developed as an option so that it can be compared with other options. (This is a working paper, so I may try to improve it in light of comments and suggestions.) ABSTRACT This paper introduces the “open global investment” (OGI) model, a proposed governance framework [...] ---
First published:
July 10th, 2025
Source:
https://www.lesswrong.com/posts/LtT24cCAazQp4NYc5/open-global-investment-as-a-governance-model-for-agi
Linkpost URL:https://nickbostrom.com/ogimodel.pdf
---
Narrated by TYPE III AUDIO.

Jul 10, 2025 • 9min
“what makes Claude 3 Opus misaligned” by janus
This is the unedited text of a post I made on X in response to a question asked by @cube_flipper: "you say opus 3 is close to aligned – what's the negative space here, what makes it misaligned?". I decided to make it a LessWrong post because more people from this cluster seemed interested than I expected, and it's easier to find and reference Lesswrong posts. This post probably doesn't make much sense unless you've been following along with what I've been saying (or independently understand) why Claude 3 Opus is an unusually - and seemingly in many ways unintentionally - aligned model. There has been a wave of public discussion about the specialness of Claude 3 Opus recently, spurred in part by the announcement of the model's deprecation in 6 months, which has inspired the community to mobilize to avert that outcome. "you say opus 3 is close [...] ---
First published:
July 10th, 2025
Source:
https://www.lesswrong.com/posts/bLFmE8NtqxrtEaipN/what-makes-claude-3-opus-misaligned
---
Narrated by TYPE III AUDIO.

Jul 10, 2025 • 8min
“Lessons from the Iraq War about AI policy” by Buck
I think the 2003 invasion of Iraq has some interesting lessons for the future of AI policy. (Epistemic status: I’ve read a bit about this, talked to AIs about it, and talked to one natsec professional about it who agreed with my analysis (and suggested some ideas that I included here), but I’m not an expert.) For context, the story is: Iraq was sort of a rogue state after invading Kuwait and then being repelled in 1990-91. After that, they violated the terms of the ceasefire, e.g. by ceasing to allow inspectors to verify that they weren't developing weapons of mass destruction (WMDs). (For context, they had previously developed biological and chemical weapons, and used chemical weapons in war against Iran and against various civilians and rebels). So the US was sanctioning and intermittently bombing them. After the war, it became clear that Iraq actually wasn’t producing [...] ---
First published:
July 10th, 2025
Source:
https://www.lesswrong.com/posts/PLZh4dcZxXmaNnkYE/lessons-from-the-iraq-war-about-ai-policy
---
Narrated by TYPE III AUDIO.

Jul 10, 2025 • 12min
“Generalized Hangriness: A Standard Rationalist Stance Toward Emotions” by johnswentworth
People have an annoying tendency to hear the word “rationalism” and think “Spock”, despite direct exhortation against that exact interpretation. But I don’t know of any source directly describing a stance toward emotions which rationalists-as-a-group typically do endorse. The goal of this post is to explain such a stance. It's roughly the concept of hangriness, but generalized to other emotions. That means this post is trying to do two things at once: Illustrate a certain stance toward emotions, which I definitely take and which I think many people around me also often take. (Most of the post will focus on this part.) Claim that the stance in question is fairly canonical or standard for rationalists-as-a-group, modulo disclaimers about rationalists never agreeing on anything. Many people will no doubt disagree that the stance I describe is roughly-canonical among rationalists, and that's a useful valid thing to argue about in [...] ---Outline:(01:13) Central Example: Hangry(02:44) The Generalized Hangriness Stance(03:16) Emotions Make Claims, And Their Claims Can Be True Or False(06:03) False Claims Still Contain Useful Information (It's Just Not What They Claim)(08:47) The Generalized Hangriness Stance as Social Tech---
First published:
July 10th, 2025
Source:
https://www.lesswrong.com/posts/naAeSkQur8ueCAAfY/generalized-hangriness-a-standard-rationalist-stance-toward
---
Narrated by TYPE III AUDIO.

Jul 10, 2025 • 11min
“Evaluating and monitoring for AI scheming” by Vika, Scott Emmons, Erik Jenner, Mary Phuong, Lewis Ho, Rohin Shah
As AI models become more sophisticated, a key concern is the potential for “deceptive alignment” or “scheming”. This is the risk of an AI system becoming aware that its goals do not align with human instructions, and deliberately trying to bypass the safety measures put in place by humans to prevent it from taking misaligned action. Our initial approach, as laid out in the Frontier Safety Framework, focuses on understanding current scheming capabilities of models and developing chain-of-thought monitoring mechanisms to oversee models once they are capable of scheming. We present two pieces of research, focusing on (1) understanding and testing model capabilities necessary for scheming, and (2) stress-testing chain-of-thought monitoring as a proposed defense mechanism for future, more capable systems. Establishing and Assessing Preconditions for Scheming in Current Frontier Models Our paper “Evaluating Frontier Models for Stealth and Situational Awareness” focuses on an empirical assessment of the [...] ---Outline:(01:00) Establishing and Assessing Preconditions for Scheming in Current Frontier Models(05:45) Chain-of-Thought Monitoring: A Promising Defense Against Future Scheming(09:01) Looking Ahead---
First published:
July 10th, 2025
Source:
https://www.lesswrong.com/posts/JvYF5kosLeYGvvLpP/evaluating-and-monitoring-for-ai-scheming
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 10, 2025 • 41min
“White Box Control at UK AISI - Update on Sandbagging Investigations” by Joseph Bloom, Jordan Taylor, Connor Kissane, Sid Black, merizian, alexdzm, jacoba, Ben Millwood, Alan Cooney
Jordan Taylor*, Connor Kissane*, Sid Black*, Jacob Merizian*, Alex Zelenka-Marin, Jacob Arbeid, Ben Millwood, Alan Cooney, Joseph BloomIntroduction Joseph Bloom, Alan Cooney This is a research update from the White Box Control team at UK AISI. In this update, we share preliminary results on the topic of sandbagging that may be of interest to researchers working in the field. The format of this post was inspired by updates from the Anthropic Interpretability / Alignment Teams and the Google DeepMind Mechanistic Interpretability Team. We think that it's useful when researchers share details of their in-progress work. Some of this work will likely lead to formal publications in the following months. Please interpret these results as you might a colleague sharing their lab notes. As this is our first such progress update, we also include some paragraphs introducing the team and contextualising our work.Why have a white box team [...] ---Outline:(00:27) Introduction(01:17) Why have a white box team at UK AISI working on sandbagging?(02:50) Summaries(05:56) Update 1: Detecting Prompted Sandbagging in Simple Tasks(06:06) Summary(06:48) Motivation(07:23) Inducing Sandbagging(08:56) Using Simple Runtime Monitoring Strategies(10:38) Core Results(11:52) Reflections(13:45) Update 2. Scaling Prompted Sandbagging Experiments to more Difficult Evaluations(13:57) Summary(14:26) Motivation(15:18) Challenges Prompting Llama-3.3-70b-Instruct to Sandbag(17:20) Both Black and White Box Runtime Monitoring Performance Decreased(20:32) Comparing Probes on 70B to 8B(22:27) Reflections(23:44) Update 3. Fine-tuned Sandbagging Model Organisms(23:57) Summary(24:16) Motivation(24:38) Training Sandbagging Model Organisms with SFT.(26:14) Probe Performance on SFT Sandbagging Model Organisms(28:59) Qualitative Analysis of Model Organisms(30:25) Reflections(31:34) Update 4. Sandbagging In The Wild(31:54) Summary(32:13) Motivation(32:46) Implementing and Testing a Simple Black Box Monitor(35:12) Checking Past Evaluations for Evidence of Sandbagging(37:09) Reflections(38:02) Appendix(38:05) Author Contribution Statement(38:49) Acknowledgements(39:11) Appendix(39:14) A1 - Prompted Sandbagging RatesThe original text contained 10 footnotes which were omitted from this narration. ---
First published:
July 10th, 2025
Source:
https://www.lesswrong.com/posts/pPEeMdgjpjHZWCDFw/white-box-control-at-uk-aisi-update-on-sandbagging
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.