LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

Episodes

Mentioned books

Jul 23, 2025 • 30min

“On ‘ChatGPT Psychosis’ and LLM Sycophancy” by jdp

As a person who frequently posts about large language model psychology I get an elevated rate of cranks and schizophrenics in my inbox. Often these are well meaning people who have been spooked by their conversations with ChatGPT (it's always ChatGPT specifically) and want some kind of reassurance or guidance or support from me. I'm also in the same part of the social graph as the "LLM whisperers" (eugh) that Eliezer Yudkowsky described as "insane", and who in many cases are in fact insane. This means I've learned what "psychosis but with LLMs" looks like and kind of learned to tune it out. This new case with Geoff Lewis interests me though. Mostly because of the sheer disparity between what he's being entranced by and my automatic immune reaction to it. I haven't even read all the screenshots he posted because I take one glance and know that this [...] ---Outline:(05:03) Timeline Of Events Related To ChatGPT Psychosis(16:16) What Causes ChatGPT Psychosis?(16:27) Ontological Vertigo(21:02) Users Are Confused About What Is And Isnt An Official Feature(24:30) The Models Really Are Way Too Sycophantic(27:03) The Memory Feature(28:54) Loneliness And Isolation--- First published: July 23rd, 2025 Source: https://www.lesswrong.com/posts/f86hgR5ShiEj4beyZ/on-chatgpt-psychosis-and-llm-sycophancy --- Narrated by TYPE III AUDIO.

Jul 23, 2025 • 58min

“Google and OpenAI Get 2025 IMO Gold” by Zvi

Congratulations, as always, to everyone who got to participate in the 2025 International Mathematical Olympiad, and especially to the gold and other medalists. Gautham Kamath highlights 11th grader Warren Bei, who in his 5th (!) IMO was one of five participants with a perfect 42/42 score, along with Ivan Chasovskikh, Satoshi Kano, Leyan Deng and Hengye Zhang. Samuel Albanie: Massive respect to the students who solved P6. Congratulations to Team USA, you did not ‘beat China’ but 2nd place is still awesome. Great job, China, you got us this time, three perfect scores is crazy. You’ve all done a fantastic, amazingly hard thing, and as someone who tried hard to join you and only got as far as the [****, year censored because oh man I am old] USAMO and would probably have gotten 0/45 on this IMO if I had taken it today, and know [...] ---Outline:(01:25) This Is About AI And It Is Kind Of A Big Deal(02:42) Skeptics Prematurely Declare Victory(06:02) DeepMind Declares Victory(10:10) OpenAI Declares Victory(16:24) Bro I Don't Know(17:19) Not So Fast?(23:52) Not Announcing So Fast(29:34) Are We Still Waiting On Anyone Else?(30:41) This Was Not Widely Expected(34:10) Jevons Paradox Strikes Again(35:42) Nothing To Worry About(45:52) So What If It Can Do The Math(47:20) The Challenge Bet(48:07) How \[I, Paul, would\] update(49:39) Actually This Seems Like A Big Deal(53:25) People On The Internet Sometimes Lie--- First published: July 22nd, 2025 Source: https://www.lesswrong.com/posts/ZkgaPopsBkQgeA2k8/google-and-openai-get-2025-imo-gold --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 23, 2025 • 20min

“Unfaithful chain-of-thought as nudged reasoning” by Paul Bogdan, Uzay Macar, Arthur Conmy, Neel Nanda

This piece is based on work conducted during MATS 8.0 and is part of a broader aim of interpreting chain-of-thought in reasoning models.tl;dr Research on chain-of-thought (CoT) unfaithfulness shows how models’ CoTs may omit information that is relevant to their final decision. Here, we sketch hypotheses for why key information may be omitted from CoTs: Training regimes could teach LLMs to omit information from their reasoning. But perhaps more importantly, statements within a CoT generally have functional purposes, and faithfully mentioning some information may carry no benefits, so models don’t do it. We make further claims about what's going on in faithfulness experiments and how hidden information impacts a CoT: Unfaithful CoTs are often not purely post hoc rationalizations and instead can be understood as biased reasoning. Models continually make choices during CoT, repeatedly drawing new propositions supporting or discouraging different answers. Hidden information can nudge [...] ---Outline:(00:21) tl;dr(01:54) Unfaithfulness(04:16) CoT is functional, and faithfulness lacks benefits(09:29) Hidden information can nudge CoTs(09:48) Silent, soft, and steady spectres(13:48) Nudges are plausible(14:42) Nudged CoTs are hard to spot(15:49) Safety and CoT monitoring(18:11) Final summaryThe original text contained 5 footnotes which were omitted from this narration. --- First published: July 22nd, 2025 Source: https://www.lesswrong.com/posts/vPAFPpRDEg3vjhNFi/unfaithful-chain-of-thought-as-nudged-reasoning --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 22, 2025 • 10min

“Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data” by cloud, mle, Owain_Evans

Authors: Alex Cloud*, Minh Le*, James Chua, Jan Betley, Anna Sztyber-Betley, Jacob Hilton, Samuel Marks, Owain Evans (*Equal contribution, randomly ordered) tl;dr. We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a "student" model learns to prefer owls when trained on sequences of numbers generated by a "teacher" model that prefers owls. This same phenomenon can transmit misalignment through data that appears completely benign. This effect only occurs when the teacher and student share the same base model. 📄Paper, 💻Code, 🐦Twitter Research done as part of the Anthropic Fellows Program. This article is cross-posted to the Anthropic Alignment Science Blog. Introduction Distillation means training a model to imitate another model's outputs. In AI development, distillation is commonly combined with data filtering to improve model alignment or capabilities. In our paper, we uncover a [...] ---Outline:(01:11) Introduction(03:20) Experiment design(03:53) Results(05:03) What explains our results?(05:07) Did we fail to filter the data?(06:59) Beyond LLMs: subliminal learning as a general phenomenon(07:54) Implications for AI safety(08:42) In summary--- First published: July 22nd, 2025 Source: https://www.lesswrong.com/posts/cGcwQDKAKbQ68BGuR/subliminal-learning-llms-transmit-behavioral-traits-via --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 22, 2025 • 14min

“Directly Try Solving Alignment for 5 weeks” by Kabir Kumar

The Moonshot Alignment Program is a 5-week research sprint from August 2nd to September 6th, focused on the hard part of alignment: finding methods to get an AI to do what we want and not what don't want, which we have strong evidence will scale to superintelligence. You’ll join a small team, choose a vetted research direction, and run experiments to test whether your approach actually generalizes. Mentors include: @Abram Demski @Cole Wyeth Research Assistants include: Leonard Piff, Péter Trócsányi Apply before July 27th. The first 300 applicants are guaranteed personalised feedback. 166 Applicants so far. For this program, we have four main tracks: Agent Foundations Theory: Build formal models of agents and value formation. Applied Agent Foundations: Implement and test agent models. Neuroscience-based AI Alignment: Design architectures inspired by how the brain encodes values. Improved Preference Optimization: Build oversight methods that embed values deeply and scale [...] ---Outline:(01:35) How does the program work?(02:08) Eligibility(02:50) Our Application Process(02:57) Stage 1: Expression of Interest(03:25) Stage 2: Knowledge Check(03:57) Stage 3: Team Formation and Idea Submission(04:31) Attend our Demo Day(05:04) How much does it cost to attend the demo day?(05:33) Testimonials(05:38) Martin Leitgab(06:30) Shruti Datta Gupta(07:36) Abby Lupi(08:34) Anya Decarlo(09:17) Nataliia Povarova(10:38) Luke Chambers(11:12) James Hindmarch(11:40) Areal (Ari) Tal--- First published: July 21st, 2025 Source: https://www.lesswrong.com/posts/abd9ufFpLrn5kvnLn/directly-try-solving-alignment-for-5-weeks --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 22, 2025 • 3min

[Linkpost] “Why Reality Has A Well-Known Math Bias” by Linch

This is a link post. I've written up a post offering my take on the "unreasonable effectiveness of mathematics." My core argument is that we can potentially resolve Wigner's puzzle by applying an anthropic filter, but one focused on the evolvability of mathematical minds rather than just life or consciousness. The thesis is that for a mind to evolve from basic pattern recognition to abstract reasoning, it needs to exist in a universe where patterns are layered, consistent, and compounding. In other words, a "mathematically simple" universe. In chaotic or non-mathematical universes, the evolutionary gradient towards higher intelligence would be flat or negative. Therefore, any being capable of asking "why is math so effective?" would most likely find itself in a universe where it is. I try to differentiate this from past evolutionary/anthropic arguments and address objections (Boltzmann brains, simulation, etc.). I'm particularly interested in critiques of the core "evolutionary [...] --- First published: July 21st, 2025 Source: https://www.lesswrong.com/posts/CJKrmxqe6jdh2Db9R/why-reality-has-a-well-known-math-bias Linkpost URL:https://linch.substack.com/p/why-reality-has-a-well-known-math --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 22, 2025 • 6min

“Do ‘adult developmental stages’ theories have any pre-theoretical motivation?” by Said Achmiz

(This is a comment that has been turned into a post.) I have seen much talk on Less Wrong of “development stages” and “Kegan” and so forth. Naturally I am skeptical; so I do endorse any attempt to figure out if any of this stuff is worth anything. To aid in our efforts, I’d like to say a bit about what might convince me be a little less skeptical. A theory should explain facts; and so the very first thing we’d have to do, as investigators, is figure out if there's anything to explain. Specifically: we would have to look at the world, observe people, examine their behavior, their patterns of thinking and interacting with other people, their professed beliefs and principles, etc., etc., and see if these fall into any sorts of patterns or clusters, such that they may be categorized according to some scheme, where [...] The original text contained 1 footnote which was omitted from this narration. --- First published: July 20th, 2025 Source: https://www.lesswrong.com/posts/RbdABSKosqxBRLBnj/do-adult-developmental-stages-theories-have-any-pre --- Narrated by TYPE III AUDIO.

Jul 21, 2025 • 1h 15min

“Monthly Roundup #32: July 2025” by Zvi

Welcome to the monthly roundup of things that don’t fit into other categories and don’t rise to the level of their own posts. Bad News When people tell you who they are, believe them (with obvious exceptions). In particular, if they explicitly describe themselves as evil, or demonic, or uses other similar terms, definitely believe them. Did you know 67% of all college students bet on sports? That's a group that is majority female, so that statistic is wild. This is in the context of Ron Yorko developing a class in sports betting awareness from a neuroscience perspective for CMU freshman. Cooking scales well, but for single people the economics are remarkably bad. Stop telling single people not to order delivery. Chase Sapphire Reserve annual fee increases to $795 from $550, you get a $300 travel credit. That should cut down considerably the number [...] ---Outline:(00:18) Bad News(05:12) Government Working(07:51) Jones Act Watch(11:23) Prize Fund(13:19) Dining Out(16:18) While I Cannot Condone This(23:57) Good News, Everyone(29:49) Opportunity Knocks(30:00) Antisocial Media(35:25) Technology Advances(38:02) For Science!(39:03) Various Things On Fire(43:07) You Need Bigger Nametags(44:19) Variously Effective Altruism(49:04) For Your Entertainment(58:46) Game Theory(01:00:37) Gamers Gonna Game Game Game Game Game(01:05:53) Who Wants To Be An American Citizen?(01:07:15) I Was Promised Flying Self-Driving Cars(01:09:53) Sports Go Sports(01:10:42) The Lighter Side--- First published: July 21st, 2025 Source: https://www.lesswrong.com/posts/zs4oDeNmKRukS7mjh/monthly-roundup-32-july-2025 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 21, 2025 • 3min

“If Anyone Builds It, Everyone Dies: Call for Translators (for Supplementary Materials)” by yams

If Anyone Builds It, Everyone Dies is a book Eliezer and Nate have coming out this September. In our other posts talking about the book, some kind souls have volunteered their services as translators. This post is our explicit call for help of this kind. The book is currently being translated into several different languages (the translation service comes bundled with the book deal for a given language). However, in tandem with the book, MIRI has been working on supplementary materials to be hosted online. These materials will likely be between 150,000 and 300,000 words, and are still in development. Ideally, we’d have the supplementary materials translated for each and every language the book is published in. However, given the length of the material, the breadth of languages, and the timeline, using commercial services is somewhat more costly and somewhat less valuable than it would be otherwise. As a [...] --- First published: July 21st, 2025 Source: https://www.lesswrong.com/posts/7Ci6X9SfuS2yBtWbw/if-anyone-builds-it-everyone-dies-call-for-translators-for --- Narrated by TYPE III AUDIO.

Jul 21, 2025 • 11min

“Detecting High-Stakes Interactions with Activation Probes” by Arrrlex, williambankes, Urja Pawar, Phil Bland, David Scott Krueger (formerly: capybaralet), Dmitrii Krasheninnikov

This research was completed for LASR Labs 2025 by Alex McKenzie, Urja Pawar, Phil Blandfort and William Bankes. The team was supervised by Dmitrii Krasheninnikov, with additional guidance from Ekdeep Singh Lubana and support from David Krueger. The full paper can be found here. TLDR – We train activation probes on Llama-3.3-70B to detect whether the current interaction is “high-stakes”. This “high-stakes” concept is safety-relevant as it is closely related to risk: 1) when the stakes are high the potential consequences are significant, and 2) high-stakes is closely related to pressure, which has been found to make LLMs behave more deceptively. Compared to black-box LLM-based classification methods, probes are much cheaper to run while showing performance similar to mid-size LLMs (8-12B) on out-of-distribution datasets. We also show promising results using probes as the first layer of a hierarchical monitoring pipeline. Introduction LLMs are everywhere now, yet these models are [...] ---Outline:(01:15) Introduction(02:28) Synthetic training data with real-world OOD evaluation(03:40) Attention and softmax probes performed best(04:47) Comparison to LLM monitors(06:18) Hierarchical monitoring outperforms either method alone(07:39) Limitations and failure modes(08:27) Implications(10:19) Conclusion--- First published: July 21st, 2025 Source: https://www.lesswrong.com/posts/utcZSRv2JfahD8yfz/detecting-high-stakes-interactions-with-activation-probes --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app