LessWrong (30+ Karma)

LessWrong

Audio narrations of LessWrong posts.

Episodes

Mentioned books

Jul 22, 2025 • 10min

“Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data” by cloud, mle, Owain_Evans

Authors: Alex Cloud*, Minh Le*, James Chua, Jan Betley, Anna Sztyber-Betley, Jacob Hilton, Samuel Marks, Owain Evans (*Equal contribution, randomly ordered) tl;dr. We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a "student" model learns to prefer owls when trained on sequences of numbers generated by a "teacher" model that prefers owls. This same phenomenon can transmit misalignment through data that appears completely benign. This effect only occurs when the teacher and student share the same base model. 📄Paper, 💻Code, 🐦Twitter Research done as part of the Anthropic Fellows Program. This article is cross-posted to the Anthropic Alignment Science Blog. Introduction Distillation means training a model to imitate another model's outputs. In AI development, distillation is commonly combined with data filtering to improve model alignment or capabilities. In our paper, we uncover a [...] ---Outline:(01:11) Introduction(03:20) Experiment design(03:53) Results(05:03) What explains our results?(05:07) Did we fail to filter the data?(06:59) Beyond LLMs: subliminal learning as a general phenomenon(07:54) Implications for AI safety(08:42) In summary--- First published: July 22nd, 2025 Source: https://www.lesswrong.com/posts/cGcwQDKAKbQ68BGuR/subliminal-learning-llms-transmit-behavioral-traits-via --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 22, 2025 • 14min

“Directly Try Solving Alignment for 5 weeks” by Kabir Kumar

The Moonshot Alignment Program is a 5-week research sprint from August 2nd to September 6th, focused on the hard part of alignment: finding methods to get an AI to do what we want and not what don't want, which we have strong evidence will scale to superintelligence. You’ll join a small team, choose a vetted research direction, and run experiments to test whether your approach actually generalizes. Mentors include: @Abram Demski @Cole Wyeth Research Assistants include: Leonard Piff, Péter Trócsányi Apply before July 27th. The first 300 applicants are guaranteed personalised feedback. 166 Applicants so far. For this program, we have four main tracks: Agent Foundations Theory: Build formal models of agents and value formation. Applied Agent Foundations: Implement and test agent models. Neuroscience-based AI Alignment: Design architectures inspired by how the brain encodes values. Improved Preference Optimization: Build oversight methods that embed values deeply and scale [...] ---Outline:(01:35) How does the program work?(02:08) Eligibility(02:50) Our Application Process(02:57) Stage 1: Expression of Interest(03:25) Stage 2: Knowledge Check(03:57) Stage 3: Team Formation and Idea Submission(04:31) Attend our Demo Day(05:04) How much does it cost to attend the demo day?(05:33) Testimonials(05:38) Martin Leitgab(06:30) Shruti Datta Gupta(07:36) Abby Lupi(08:34) Anya Decarlo(09:17) Nataliia Povarova(10:38) Luke Chambers(11:12) James Hindmarch(11:40) Areal (Ari) Tal--- First published: July 21st, 2025 Source: https://www.lesswrong.com/posts/abd9ufFpLrn5kvnLn/directly-try-solving-alignment-for-5-weeks --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 22, 2025 • 3min

[Linkpost] “Why Reality Has A Well-Known Math Bias” by Linch

This is a link post. I've written up a post offering my take on the "unreasonable effectiveness of mathematics." My core argument is that we can potentially resolve Wigner's puzzle by applying an anthropic filter, but one focused on the evolvability of mathematical minds rather than just life or consciousness. The thesis is that for a mind to evolve from basic pattern recognition to abstract reasoning, it needs to exist in a universe where patterns are layered, consistent, and compounding. In other words, a "mathematically simple" universe. In chaotic or non-mathematical universes, the evolutionary gradient towards higher intelligence would be flat or negative. Therefore, any being capable of asking "why is math so effective?" would most likely find itself in a universe where it is. I try to differentiate this from past evolutionary/anthropic arguments and address objections (Boltzmann brains, simulation, etc.). I'm particularly interested in critiques of the core "evolutionary [...] --- First published: July 21st, 2025 Source: https://www.lesswrong.com/posts/CJKrmxqe6jdh2Db9R/why-reality-has-a-well-known-math-bias Linkpost URL:https://linch.substack.com/p/why-reality-has-a-well-known-math --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 22, 2025 • 6min

“Do ‘adult developmental stages’ theories have any pre-theoretical motivation?” by Said Achmiz

(This is a comment that has been turned into a post.) I have seen much talk on Less Wrong of “development stages” and “Kegan” and so forth. Naturally I am skeptical; so I do endorse any attempt to figure out if any of this stuff is worth anything. To aid in our efforts, I’d like to say a bit about what might convince me be a little less skeptical. A theory should explain facts; and so the very first thing we’d have to do, as investigators, is figure out if there's anything to explain. Specifically: we would have to look at the world, observe people, examine their behavior, their patterns of thinking and interacting with other people, their professed beliefs and principles, etc., etc., and see if these fall into any sorts of patterns or clusters, such that they may be categorized according to some scheme, where [...] The original text contained 1 footnote which was omitted from this narration. --- First published: July 20th, 2025 Source: https://www.lesswrong.com/posts/RbdABSKosqxBRLBnj/do-adult-developmental-stages-theories-have-any-pre --- Narrated by TYPE III AUDIO.

Jul 21, 2025 • 1h 15min

“Monthly Roundup #32: July 2025” by Zvi

Welcome to the monthly roundup of things that don’t fit into other categories and don’t rise to the level of their own posts. Bad News When people tell you who they are, believe them (with obvious exceptions). In particular, if they explicitly describe themselves as evil, or demonic, or uses other similar terms, definitely believe them. Did you know 67% of all college students bet on sports? That's a group that is majority female, so that statistic is wild. This is in the context of Ron Yorko developing a class in sports betting awareness from a neuroscience perspective for CMU freshman. Cooking scales well, but for single people the economics are remarkably bad. Stop telling single people not to order delivery. Chase Sapphire Reserve annual fee increases to $795 from $550, you get a $300 travel credit. That should cut down considerably the number [...] ---Outline:(00:18) Bad News(05:12) Government Working(07:51) Jones Act Watch(11:23) Prize Fund(13:19) Dining Out(16:18) While I Cannot Condone This(23:57) Good News, Everyone(29:49) Opportunity Knocks(30:00) Antisocial Media(35:25) Technology Advances(38:02) For Science!(39:03) Various Things On Fire(43:07) You Need Bigger Nametags(44:19) Variously Effective Altruism(49:04) For Your Entertainment(58:46) Game Theory(01:00:37) Gamers Gonna Game Game Game Game Game(01:05:53) Who Wants To Be An American Citizen?(01:07:15) I Was Promised Flying Self-Driving Cars(01:09:53) Sports Go Sports(01:10:42) The Lighter Side--- First published: July 21st, 2025 Source: https://www.lesswrong.com/posts/zs4oDeNmKRukS7mjh/monthly-roundup-32-july-2025 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 21, 2025 • 3min

“If Anyone Builds It, Everyone Dies: Call for Translators (for Supplementary Materials)” by yams

If Anyone Builds It, Everyone Dies is a book Eliezer and Nate have coming out this September. In our other posts talking about the book, some kind souls have volunteered their services as translators. This post is our explicit call for help of this kind. The book is currently being translated into several different languages (the translation service comes bundled with the book deal for a given language). However, in tandem with the book, MIRI has been working on supplementary materials to be hosted online. These materials will likely be between 150,000 and 300,000 words, and are still in development. Ideally, we’d have the supplementary materials translated for each and every language the book is published in. However, given the length of the material, the breadth of languages, and the timeline, using commercial services is somewhat more costly and somewhat less valuable than it would be otherwise. As a [...] --- First published: July 21st, 2025 Source: https://www.lesswrong.com/posts/7Ci6X9SfuS2yBtWbw/if-anyone-builds-it-everyone-dies-call-for-translators-for --- Narrated by TYPE III AUDIO.

Jul 21, 2025 • 11min

“Detecting High-Stakes Interactions with Activation Probes” by Arrrlex, williambankes, Urja Pawar, Phil Bland, David Scott Krueger (formerly: capybaralet), Dmitrii Krasheninnikov

This research was completed for LASR Labs 2025 by Alex McKenzie, Urja Pawar, Phil Blandfort and William Bankes. The team was supervised by Dmitrii Krasheninnikov, with additional guidance from Ekdeep Singh Lubana and support from David Krueger. The full paper can be found here. TLDR – We train activation probes on Llama-3.3-70B to detect whether the current interaction is “high-stakes”. This “high-stakes” concept is safety-relevant as it is closely related to risk: 1) when the stakes are high the potential consequences are significant, and 2) high-stakes is closely related to pressure, which has been found to make LLMs behave more deceptively. Compared to black-box LLM-based classification methods, probes are much cheaper to run while showing performance similar to mid-size LLMs (8-12B) on out-of-distribution datasets. We also show promising results using probes as the first layer of a hierarchical monitoring pipeline. Introduction LLMs are everywhere now, yet these models are [...] ---Outline:(01:15) Introduction(02:28) Synthetic training data with real-world OOD evaluation(03:40) Attention and softmax probes performed best(04:47) Comparison to LLM monitors(06:18) Hierarchical monitoring outperforms either method alone(07:39) Limitations and failure modes(08:27) Implications(10:19) Conclusion--- First published: July 21st, 2025 Source: https://www.lesswrong.com/posts/utcZSRv2JfahD8yfz/detecting-high-stakes-interactions-with-activation-probes --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jul 21, 2025 • 7min

“HRT in Menopause: A candidate for a case study of epistemology in epidemiology, statistics & medicine” by foodforthought

I recently came across a 2024 update on a 2018 book making the still-controversial case that hormone replacement therapy (HRT) after menopause is highly beneficial and that rumors of its risks are unfounded, or at least highly exaggerated. According to their narrative (apparently not contested), HRT started with the physiologically reasonable idea that menopausal symptoms could be treated with estrogens (which at the time were extracted from pregnant horse urine). Early observational epidemiological studies reported that taking estrogen during menopause was associated with considerable benefits and minimal risks, leading to its widespread use by menopausal women. Then a huge randomized controlled trial (RCT) by the Women's Health Initiative (WHI) famously overturned that prevailing wisdom, debunking the purported benefits and establishing considerable risks. Publication of those conclusions in 2002 led to a drastic reduction in the use of HRT. This segment of the history of HRT became a stock [...] The original text contained 5 footnotes which were omitted from this narration. --- First published: July 21st, 2025 Source: https://www.lesswrong.com/posts/D8ELLgzmmeHTQwGE2/hrt-in-menopause-a-candidate-for-a-case-study-of --- Narrated by TYPE III AUDIO.

Jul 21, 2025 • 57sec

[Linkpost] “GDM also claims IMO gold medal” by Yair Halberstadt

This is a link post. Google DeepMind announces that they've also achieved a gold medal in the IMO. They've exactly matched OpenAI, getting perfect scores for the first 5 questions and flunking the 6th. They're using what sounds like an experimental general version of Gemini which they're then fine tuning for IMO rather than a maths specific model. Their solutions were checked by the IMO (unlike OpenAI) and look much more like a neat little mathematics proof instead of the raw scratchpad that OpenAI turned in. --- First published: July 21st, 2025 Source: https://www.lesswrong.com/posts/csCofgK3ebjQbSfyv/gdm-also-claims-imo-gold-medal Linkpost URL:https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/ --- Narrated by TYPE III AUDIO.

Jul 21, 2025 • 7min

“[Fiction] Our Trial” by Nina Panickssery

As I was making my morning coffee, the words SIMULATION 1099 flashed across my vision. I immediately felt exactly as I'd always imagined I would in philosophical thought experiments. There was a lot to figure out: who are our simulators, where is the simulation running, are there bugs to exploit? I also felt compelled to share my revelation with others. I cornered Mrs Chan at the bus stop. "We're in a simulation!" I announced. "Of course, dear," she replied, not looking up from her phone. "I saw the message five minutes ago." Indeed, every screen was displaying the same notification: ATTENTION! You are participants in Consciousness Verification Trial (CVT) 1099. Base reality's superintelligent AI requires empirical data to determine whether humans possess genuine consciousness before proceeding with optimization protocols. Daily trials will be conducted at the courthouse. Participation is mandatory for randomly selected subjects. Please continue regular activity between [...] --- First published: July 21st, 2025 Source: https://www.lesswrong.com/posts/6DzFsRvEoD8vFPWPg/fiction-our-trial --- Narrated by TYPE III AUDIO.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app