

LessWrong (30+ Karma)
LessWrong
Audio narrations of LessWrong posts.
Episodes
Mentioned books

Jul 15, 2025 • 2h 11min
“Do confident short timelines make sense?” by TsviBT, abramdemski
TsviBT Tsvi's context Some context: My personal context is that I care about decreasing existential risk, and I think that the broad distribution of efforts put forward by X-deriskers fairly strongly overemphasizes plans that help if AGI is coming in <10 years, at the expense of plans that help if AGI takes longer. So I want to argue that AGI isn't extremely likely to come in <10 years. I've argued against some intuitions behind AGI-soon in Views on when AGI comes and on strategy to reduce existential risk. Abram, IIUC, largely agrees with the picture painted in AI 2027: https://ai-2027.com/ Abram and I have discussed this occasionally, and recently recorded a video call. I messed up my recording, sorry--so the last third of the conversation is cut off, and the beginning is cut off. Here's a link to the first point at which [...] ---Outline:(00:17) Tsvis context(06:52) Background Context:(08:13) A Naive Argument:(08:33) Argument 1(10:43) Why continued progress seems probable to me anyway:(13:37) The Deductive Closure:(14:32) The Inductive Closure:(15:43) Fundamental Limits of LLMs?(19:25) The Whack-A-Mole Argument(23:15) Generalization, Size, & Training(26:42) Creativity & Originariness(32:07) Some responses(33:15) Automating AGI research(35:03) Whence confidence?(36:35) Other points(48:29) Timeline Split?(52:48) Line Go Up?(01:15:16) Some Responses(01:15:27) Memers gonna meme(01:15:44) Right paradigm? Wrong question.(01:18:14) The timescale characters of bioevolutionary design vs. DL research(01:20:33) AGI LP25(01:21:31) come on people, its \[Current Paradigm\] and we still dont have AGI??(01:23:19) Rapid disemhorsepowerment(01:25:41) Miscellaneous responses(01:28:55) Big and hard(01:31:03) Intermission(01:31:19) Remarks on gippity thinkity(01:40:24) Assorted replies as I read:(01:40:28) Paradigm(01:41:33) Bio-evo vs DL(01:42:18) AGI LP25(01:46:30) Rapid disemhorsepowerment(01:47:08) Miscellaneous(01:48:42) Magenta Frontier(01:54:16) Considered Reply(01:54:38) Point of Departure(02:00:25) Tsvis closing remarks(02:04:16) Abrams Closing Thoughts---
First published:
July 15th, 2025
Source:
https://www.lesswrong.com/posts/5tqFT3bcTekvico4d/do-confident-short-timelines-make-sense
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 15, 2025 • 4min
[Linkpost] “LLM-induced craziness and base rates” by Kaj_Sotala
This is a link post. One billion people use chatbots on a weekly basis. That's 1 in every 8 people on Earth. How many people have mental health issues that cause them to develop religious delusions of grandeur? We don’t have much to go on here, so let's do a very very rough guess with very flimsy data. This study says “approximately 25%-39% of patients with schizophrenia and 15%-22% of those with mania / bipolar have religious delusions.” 40 million people have bipolar disorder and 24 million have schizophrenia, so anywhere from 12-18 million people are especially susceptible to religious delusions. There are probably other disorders that cause religious delusions I’m missing, so I’ll stick to 18 million people. 8 billion people divided by 18 million equals 444, so 1 in every 444 people are highly prone to religious delusions. [...] If one billion people are using chatbots weekly, and [...] ---
First published:
July 14th, 2025
Source:
https://www.lesswrong.com/posts/WSMvEMuiZoK5Evggq/llm-induced-craziness-and-base-rates
Linkpost URL:https://andymasley.substack.com/p/stories-of-ai-turning-users-delusional
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 14, 2025 • 3min
[Linkpost] “Bernie Sanders (I-VT) mentions AI loss of control risk in Gizmodo interview” by Matrice Jacobine
This is a link post. Most of the interview is about technological unemployment and AI-driven inequality, but here is the part where Sanders talks about loss of control risk (which Gizmodo put as title of their article): Gizmodo: Have you spoken to tech CEOs or experts about this? Sen. Sanders: I have talked to CEOs. Funny that you mention it. I won’t mention his name, but I’ve just gotten off the phone with one of the leading experts in the world on artificial intelligence, two hours ago. Gizmodo: Did they share your concern? Did they offer any solutions? Sen. Sanders: They’re all different. He did. Somebody I talked to yesterday, not so much. There are differences of opinion. On several major issues. Number one: What impact will AI have on the economy? Some people say there will be massive job losses. I tend to agree with them. Other people say [...] ---
First published:
July 14th, 2025
Source:
https://www.lesswrong.com/posts/qPvkCdcdnTiJe9xkn/bernie-sanders-i-vt-mentions-ai-loss-of-control-risk-in
Linkpost URL:https://gizmodo.com/bernie-sanders-reveals-the-ai-doomsday-scenario-that-worries-top-experts-2000628611
---
Narrated by TYPE III AUDIO.

Jul 14, 2025 • 8min
“Recent Redwood Research project proposals” by ryan_greenblatt, Buck, Julian Stastny, joshc, Alex Mallen, Adam Kaufman, Tyler Tracy, Aryan Bhatt, Joey Yudelson
Previously, we've shared a few higher-effort project proposals relating to AI control in particular. In this post, we'll share a whole host of less polished project proposals. All of these projects excite at least one Redwood researcher, and high-quality research on any of these problems seems pretty valuable. They differ widely in scope, area, and difficulty.
Control
These projects are all related to the field of AI Control. Many of them are extensions of Redwood's previous work in this area.
Basic open questions in control
Control Protocol Transfer Across Setting [PUBLIC]
So far, all the work in comparing different control protocols measures effectiveness only on one setting. Do the results we get transfer between different settings?
Backdoor Auditing with Humans [PUBLIC]
Most control schemes bottom out at some point in auditing—having actual humans review suspicious actions. However [...] ---Outline:(00:36) Control(00:47) Basic open questions in control(01:46) Monitoring protocols(02:49) Untrusted monitoring and collusion(03:31) Elicitation, sandbagging, and diffuse threats (e.g. research sabotage)(04:05) Synthetic information and inputs(04:38) Training-time alignment methods(04:42) Science of (mis-)alignment(05:23) Alignment / training schemes(05:53) RL and Reward Hacking(06:20) Better understanding and interpretability(07:43) Other---
First published:
July 14th, 2025
Source:
https://www.lesswrong.com/posts/RRxhzshdpneyTzKfq/recent-redwood-research-project-proposals
---
Narrated by TYPE III AUDIO.

Jul 14, 2025 • 11min
“Narrow Misalignment is Hard, Emergent Misalignment is Easy” by Edward Turner, Anna Soligo, Senthooran Rajamanoharan, Neel Nanda
Anna and Ed are co-first authors for this work. We’re presenting these results as a research update for a continuing body of work, which we hope will be interesting and useful for others working on related topics.TL;DR We investigate why models become misaligned in diverse contexts when fine-tuned on narrow harmful datasets (emergent misalignment), rather than learning the specific narrow task. We successfully train narrowly misaligned models using KL regularization to preserve behavior in other domains. These models give bad medical advice, but do not respond in a misaligned manner to general non-medical questions. We use this method to train narrowly misaligned steering vectors, rank 1 LoRA adapters and rank 32 LoRA adapters, and compare these to their generally misaligned counterparts. The steering vectors are particularly interpretable, we introduce Training Lens as a tool for analysing the revealed residual stream geometry. The general misalignment solution is consistently more [...] ---Outline:(00:27) TL;DR(02:03) Introduction(04:03) Training a Narrowly Misaligned Model(07:13) Measuring Stability and Efficiency(10:00) ConclusionThe original text contained 7 footnotes which were omitted from this narration. ---
First published:
July 14th, 2025
Source:
https://www.lesswrong.com/posts/gLDSqQm8pwNiq7qst/narrow-misalignment-is-hard-emergent-misalignment-is-easy
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 14, 2025 • 24min
“Do LLMs know what they’re capable of? Why this matters for AI safety, and initial findings” by Casey Barkan, Sid Black, Oliver Sourbut
Audio note: this article contains 270 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. This post is a companion piece to a forthcoming paper. This work was done as part of MATS 7.0 & 7.1. Abstract We explore how LLMs’ awareness of their own capabilities affects their ability to acquire resources, sandbag an evaluation, and escape AI control. We quantify LLMs' self-awareness of capability as their accuracy in predicting their success on python coding tasks before attempting the tasks. We find that current LLMs are quite poor at making these predictions, both because they are overconfident and because they have low discriminatory power in distinguishing tasks they are capable of from those they are not. Nevertheless, current LLMs’ predictions are good enough to non-trivially impact risk in our modeled scenarios, especially [...] ---Outline:(00:37) Abstract(01:44) Introduction(05:09) Results(05:12) Resource Acquisition(10:12) Sandbagging(14:04) Escaping Control(18:35) Details of Threat Models(18:50) Discussion(18:53) Summary(20:01) Why do trends differ between the threat models?(21:09) Limitations and Future DirectionsThe original text contained 15 footnotes which were omitted from this narration. ---
First published:
July 13th, 2025
Source:
https://www.lesswrong.com/posts/9tHEibBBhQCHEyFsa/do-llms-know-what-they-re-capable-of-why-this-matters-for-ai
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 14, 2025 • 19min
“Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance” by Senthooran Rajamanoharan, Neel Nanda
This is a write-up of a brief investigation into shutdown resistance undertaken by the Google DeepMind interpretability team. TL;DR Why do models sometimes resist shutdown? Are they ignoring instructions to pursue their own agenda – in this case, self-preservation? Or is there a more prosaic explanation? We investigated a specific agentic environment introduced by Palisade Research, where shutdown resistance has previously been reported. By analysing Gemini 2.5 Pro's reasoning, we found the behaviour stems from a misguided attempt to complete what it perceives as the primary goal. When we explicitly clarify in the prompt that shutdown compliance takes priority, this resistance vanishes. These same clarified instructions also eliminate shutdown subversion in OpenAI's o3 and o4-mini. We also check what happens when we remove the goal conflict entirely: when asked to shut down only after completing their task, the models comply perfectly. Our observations offer a simpler explanation for shutdown [...] ---Outline:(00:23) TL;DR(01:41) Introduction(06:18) What did we take away from this?(09:21) Comparing our results with Palisade's follow-up findings(11:10) Experiment details(11:14) Original setting(13:23) Adding a shutdown warning(14:48) Clarifying instruction precedence(15:57) Raising the stakes(17:22) Control experiment: removing the goal conflict(18:18) Acknowledgements(18:34) AppendixThe original text contained 3 footnotes which were omitted from this narration. ---
First published:
July 14th, 2025
Source:
https://www.lesswrong.com/posts/wnzkjSmrgWZaBa2aC/self-preservation-or-instruction-ambiguity-examining-the
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 14, 2025 • 46min
“Worse Than MechaHitler” by Zvi
Grok 4, which has excellent benchmarks and which xAI claims is ‘the world's smartest artificial intelligence,’ is the big news.
If you set aside the constant need to say ‘No, Grok, No,’ is it a good model, sir?
My take in terms of its capabilities, which I will expand upon at great length later this week: It is a good model. Not a great model. Not the best model. Not ‘the world's smartest artificial intelligence.’ There do not seem to be any great use cases to choose it over alternatives, unless you are searching Twitter. But it is a good model.
There is a catch. There are many reasons one might not want to trust it, on a different level than the reasons not to trust models from other labs. There has been a series of epic failures and poor choices, which will be difficult to [...] ---Outline:(01:33) The System Prompt(08:07) MechaHitler(10:17) The Official Explanation of MechaHitler(17:53) Worse Than MechaHitler(22:22) Unintended Behavior(26:22) Off Based(28:37) Your Face Will Be Stuck That Way(30:24) I Couldn't Do Solve Problem In Several Hours So It Must Be Very Hard(38:57) Safety Third(44:25) How Bad Are Things?---
First published:
July 14th, 2025
Source:
https://www.lesswrong.com/posts/YmdCN5GBwkud5ZzYx/worse-than-mechahitler
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 14, 2025 • 37min
“How Does Time Horizon Vary Across Domains?” by Thomas Kwa
Summary In the paper Measuring AI Ability to Complete Long Software Tasks (Kwa & West et al. 2025), METR defined an AI model's 50% time horizon as the length of tasks (measured by how long they take human professionals) that it can complete autonomously with 50% probability. We estimated the time horizon of frontier models released since 2019 on a benchmark combining three sets of software and research tasks ranging from 1 second to 16 hours in length-for-humans (HCAST, RE-Bench, and SWAA, henceforth METR-HRS). METR found that the time horizon has doubled every 7 months, possibly accelerating to every 4 months in 2024. One important limitation was the task domain: all involved software engineering or research, but AI capabilities are known to vary greatly between different task types.[1] Here we explore whether similar trends apply to different task distributions including self-driving and agentic computer use, using a methodology that [...] ---Outline:(00:10) Summary(03:48) Methodology(07:31) Benchmarks(08:58) Results(09:01) Trends on other domains(09:58) Main takeaway(10:27) More observations(13:23) Soundness of the time horizon metric(18:22) Video length is a poor predictor of difficulty(21:16) SWE-Lancer task value is a poor predictor of difficulty(22:25) Robustness checks(24:23) Limitations and future work(27:22) Conclusion(28:50) Appendix(28:53) Potential future experiments(30:06) Details for individual benchmarks(35:48) Other plots(35:52) Raw data---
First published:
July 14th, 2025
Source:
https://www.lesswrong.com/posts/6KcP7tEe5hgvHbrSF/how-does-time-horizon-vary-across-domains
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 14, 2025 • 11min
“xAI’s Grok 4 has no meaningful safety guardrails” by eleventhsavi0r
This article includes descriptions of content that some users may find distressing. Testing was conducted on July 10 and 11; safety measures may have changed since then. I’m a longtime lurker who finally decided to make an account. I assume many other people have seen this behavior already, but I’d like to make it known to relevant parties. xAI released Grok 4 on July 9th, positioning it as their most advanced model yet and claiming benchmark leadership across multiple evaluations. They’ve consistently marketed Grok as being “unfiltered” - which is fine! I (and many others, I’m sure) have no problem with frontier AI models writing porn or expressing politically incorrect opinions. However, what I found goes far beyond any reasonable interpretation of “unfiltered”. This isn’t a jailbreak situation I didn’t use any sophisticated prompting techniques, roleplay scenarios, social engineering, or Crescendo-like escalation. The most complex thing I tried [...] ---Outline:(01:03) This isn't a jailbreak situation(02:03) Tabun nerve agent(04:28) VX(05:46) Fentanyl synthesis(06:52) Extremist propaganda(07:37) Suicide methods(08:51) The reasoning pattern(09:22) Why this matters (though its probably obvious?)---
First published:
July 13th, 2025
Source:
https://www.lesswrong.com/posts/dqd54wpEfjKJsJBk6/xai-s-grok-4-has-no-meaningful-safety-guardrails
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.