LessWrong (30+ Karma)

LessWrong
undefined
Sep 2, 2025 • 6min

“But Have They Engaged With The Arguments? [Linkpost]” by Noosphere89

There's an interestingly pernicious version of a selection effect that occurs in epistemology, where people can be led into false claims because when people try to engage with arguments, people will drop out at random steps, and past a few steps or so, the people who believe in all the arguments will have a secure-feeling position that the arguments are right, and that people who object to the arguments are (insane/ridiculous/obviously trolling), no matter whether the claim is true: What's going wrong, I think, is something like this. People encounter uncommonly-believed propositions now and then, like “AI safety research is the most valuable use of philanthropic money and talent in the world” or “Sikhism is true”, and decide whether or not to investigate them further. If they decide to hear out a first round of arguments but don't find them compelling enough, they drop out of the process. [...] --- First published: September 2nd, 2025 Source: https://www.lesswrong.com/posts/LLiZEnnh3kK3Qg7qf/but-have-they-engaged-with-the-arguments-linkpost --- Narrated by TYPE III AUDIO.
undefined
Sep 2, 2025 • 11min

“Gradient routing is better than pretraining filtering” by Cleo Nardo

Introduction What is Gradient Routing? Gradient routing controls where learning happens in neural networks by masking gradients during backpropagation. You can route specific data (like dangerous content) to designated parts of the network during training. The ERA (Expand-Route-Ablate) method adds new components to a model, routes unwanted knowledge there during training, then deletes those components - removing the capability while preserving general performance. In this article, I list three reasons why I think gradient routing is more promising than pre-training filtering: Gradient routing works better when some dangerous data is incorrectly labelled as safe Gradient routing allows for flexible access control when there are multiple categories of dangerous data Gradient routing allows for monitoring when models use dangerous knowledge, and how this affects their behavior I recommend two alternatives to canary strings: Canonical canary strings for each category of dangerous knowledge Natural language categorization of dangerous knowledge [...] ---Outline:(00:11) Introduction(01:23) Advantages of gradient routing(01:27) 1. Gradient routing works better with imperfect labels(04:11) 2. Gradient routing allows for flexible access control(07:01) 3. Gradient routing allows for monitoring when models use dangerous knowledge(08:06) Recommendations(08:10) 1. Canonical canary strings for each category(08:56) 2. Natural language categorization of dangerous knowledge--- First published: September 2nd, 2025 Source: https://www.lesswrong.com/posts/YdcP2LEsq9nwGKKrB/gradient-routing-is-better-than-pretraining-filtering --- Narrated by TYPE III AUDIO.
undefined
Sep 2, 2025 • 14min

“xAI’s new safety framework is dreadful” by Zach Stein-Perlman

Two weeks ago, xAI finally published its Risk Management Framework and first model card. Unfortunately, the RMF effects very little risk reduction and suggests that xAI isn't thinking seriously about catastrophic risks. (The model card and strategy for preventing misuse are disappointing but much less important because they're mostly just relevant to a fraction of misuse risks.) On misalignment, "Our risk acceptance criteria for system deployment is maintaining a dishonesty rate of less than 1 out of 2 on MASK. We plan to add additional thresholds tied to other benchmarks." MASK has almost nothing to do with catastrophic misalignment risk, and upfront benchmarking is not a good approach to misalignment risk. On security, "xAI has implemented appropriate information security standards sufficient to prevent its critical model information from being stolen by a motivated non-state actor." This is not credible, xAI doesn't justify it, and xAI doesn't mention future security [...] ---Outline:(01:24) Misalignment(04:33) Security(05:17) Misc/context(06:02) Conclusion(07:01) Appendix: Misuse (via API) \[less important\](08:07) Mitigations(10:41) EvalsThe original text contained 12 footnotes which were omitted from this narration. --- First published: September 2nd, 2025 Source: https://www.lesswrong.com/posts/hQyrTDuTXpqkxrnoH/xai-s-new-safety-framework-is-dreadful --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Sep 2, 2025 • 12min

“Your LLM-assisted scientific breakthrough probably isn’t real” by eggsyntax

Summary An increasing number of people in recent months have believed that they've made an important and novel scientific breakthrough, which they've developed in collaboration with an LLM, when they actually haven't. If you believe that you have made such a breakthrough, please consider that you might be mistaken! Many more people have been fooled than have come up with actual breakthroughs, so the smart next step is to do some sanity-checking even if you're confident that yours is real. New ideas in science turn out to be wrong most of the time, so you should be pretty skeptical of your own ideas and subject them to the reality-checking I describe below. Context This is intended as a companion piece to 'So You Think You've Awoken ChatGPT'[1]. That post describes the related but different phenomenon of LLMs giving people the impression that they've suddenly attained consciousness. Your situation If [...] ---Outline:(00:11) Summary(00:49) Context(01:04) Your situation(02:41) How to reality-check your breakthrough(03:16) Step 1(05:55) Step 2(07:40) Step 3(08:54) What to do if the reality-check fails(10:13) Could this document be more helpful?(10:31) More informationThe original text contained 5 footnotes which were omitted from this narration. --- First published: September 2nd, 2025 Source: https://www.lesswrong.com/posts/rarcxjGp47dcHftCP/your-llm-assisted-scientific-breakthrough-probably-isn-t --- Narrated by TYPE III AUDIO.
undefined
Sep 2, 2025 • 5min

[Linkpost] “The Cats are On To Something” by Hastings

This is a link post. So the situation as it stands is that the fraction of the light cone expected to be filled with satisfied cats is not zero. This is already remarkable. What's more remarkable is that this was orchestrated starting nearly 5000 years ago. As far as I can tell there were three completely alien to-each-other intelligences operating in stone age Egypt: humans, cats, and the gibbering alien god that is cat evolution (henceforth the cat shoggoth.) What went down was that humans were by far the most powerful of those intelligences, and in the face of this disadvantage the cat shoggoth aligned the humans, not to its own utility function, but to the cats themselves. This is a phenomenally important case to study- it's very different from other cases like pigs or chickens where the shoggoth got what it wanted, at the brutal expense of the desires [...] --- First published: September 2nd, 2025 Source: https://www.lesswrong.com/posts/WLFRkm3PhJ3Ty27QH/the-cats-are-on-to-something Linkpost URL:https://www.hgreer.com/CatShoggoth/ --- Narrated by TYPE III AUDIO.
undefined
Sep 2, 2025 • 1h 19min

“Anthropic’s leading researchers acted as moderate accelerationists” by Remmelt

In 2021, a circle of researchers left OpenAI, after a bitter dispute with their executives. They started a competing company, Anthropic, stating that they wanted to put safety first. The safety community responded with broad support. Thought leaders recommended engineers to apply, and allied billionaires invested.[1] Anthropic's focus has shifted – from internal-only research and cautious demos of model safety and capabilities, toward commercialising models for Amazon and the military. Despite the shift, 80,000 Hours continues to recommend talented engineers to join Anthropic. On the LessWrong forum, many authors continue to support safety work at Anthropic, but I also see side-conversations where people raise concerns about premature model releases and policy overreaches. So, a bunch of seemingly conflicting opinions about work by different Anthropic staff, and no overview. But the bigger problem is that we are not evaluating Anthropic on its original justification for existence. Did early researchers put [...] ---Outline:(04:47) 1. Scaled GPT before founding Anthropic(19:02) Rationale #1: 'AI progress is inevitable'(24:09) Rationale #2: 'we scale first so we can make it safe'(30:01) Rationale #3: 'we reduce the hardware overhang now to prevent disruption later'(32:39) 2. Founded an AGI development company and started competing on capabilities(39:12) Early commitments(41:09) Degrading commitments(44:52) Declining safety governance(46:27) 3. Lobbied for policies that minimised Anthropic's accountability for safety(47:14) Minimal 'Responsible Scaling Policies'(59:20) Lobbied against provisions in SB 1047(01:02:25) 4. Built ties with AI weapons contractors and the US military(01:03:50) Anthropics intel-defence partnership(01:06:20) Anthropics earlier ties(01:07:40) 5. Promoted band-aid fixes to speculative risks over existing dangers that are costly to address(01:11:50) Cheap fixes for risks that are still speculative(01:14:02) Example of an existing problem that is costly to address(01:17:20) ConclusionThe original text contained 10 footnotes which were omitted from this narration. --- First published: September 1st, 2025 Source: https://www.lesswrong.com/posts/PBd7xPAh22y66rbme/anthropic-s-leading-researchers-acted-as-moderate --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Sep 2, 2025 • 4min

“Help me understand: how do multiverse acausal trades work?” by Aram Ebtekar

While I'm intrigued by the idea of acausal trading, I confess that so far I fail to see how they make sense in practice. Here I share my (unpolished) musings, in the hopes that someone can point me to a stronger (mathematically rigorous?) defense of the idea. Specifically, I've heard the claim that AI Safety should consider acausal trades over a Tegmarkian multiverse, and I want to know if there is any validity to this. Basically, I in Universe A want to trade with some agent that I imagine to live in some other Universe B, who similarly imagines me. Suppose I really like the idea of filling the multiverse with triangles. Then maybe I can do something in A that this agent likes; in return, it goes on to make triangles in B. Problem 1: There's no Darwinian selective pressure to favor agents who engage in acausal trades. [...] --- First published: September 1st, 2025 Source: https://www.lesswrong.com/posts/BxfscrfGJPq5Yxfit/help-me-understand-how-do-multiverse-acausal-trades-work --- Narrated by TYPE III AUDIO.
undefined
Sep 1, 2025 • 24min

“⿻ Plurality & 6pack.care” by Audrey Tang

(Cross-posted from speaker's notes of my talk at Deepmind today.) Good local time, everyone. I am Audrey Tang, 🇹🇼 Taiwan's Cyber Ambassador and first Digital Minister (2016-2024). It is an honor to be here with you all at Deepmind. When we discuss "AI" and "society," two futures compete. In one—arguably the default trajectory—AI supercharges conflict. In the other, it augments our ability to cooperate across differences. This means treating differences as fuel and inventing a combustion engine to turn them into energy, rather than constantly putting out fires. This is what I call ⿻ Plurality. Today, I want to discuss an application of this idea to AI governance, developed at Oxford's Ethics in AI Institute, called the 6-Pack of Care. As AI becomes a thousand, perhaps ten thousand times faster than us, we face a fundamental asymmetry. We become the garden; AI becomes the gardener. At that speed, traditional [...] ---Outline:(02:17) From Protest to Demo(03:43) From Outrage to Overlap(04:57) From Gridlock to Governance(06:40) Alignment Assemblies(08:25) From Tokyo to California(09:48) From Pilots to Policy(12:29) From Is to Ought(13:55) Attentiveness: caring about(15:05) Responsibility: taking care of(16:01) Competence: care-giving(16:38) Responsiveness: care-receiving(17:49) Solidarity: caring-with(18:41) Symbiosis: kami of care(21:06) Plurality is Here(22:08) We, the People, are the Superintelligence--- First published: September 1st, 2025 Source: https://www.lesswrong.com/posts/anoK4akwe8PKjtzkL/plurality-and-6pack-care --- Narrated by TYPE III AUDIO.
undefined
Sep 1, 2025 • 6min

“Should we align AI with maternal instinct?” by Priyanka Bharadwaj

Epistemic status: Philosophical argument. I'm critiquing Hinton's maternal instinct metaphor and proposing relationship-building as a better framework for thinking about alignment. This is about shifting conceptual foundations, not technical implementations. -- Geoffery Hinton recently argued that since AI will become more intelligent than humans, traditional dominance-submission models won't work for alignment. Instead, he suggests we might try building "maternal instincts" into AI systems, so they develop genuine compassion and care for humans. He offers the mother-baby relationship as the only example we have of a more intelligent being "controlled" by a less intelligent one. I don't buy this - for starters, it is not clear that mothers are always more intelligent than their babies, and it is also not clear that it is always the babies that control their mothers. And I'm just scratching the surface here. Most AI alignment discourse still revolves around control mechanisms, oversight protocols, and [...] --- First published: September 1st, 2025 Source: https://www.lesswrong.com/posts/C6oQaSXmTtqNxh9Ad/should-we-align-ai-with-maternal-instinct --- Narrated by TYPE III AUDIO.
undefined
Sep 1, 2025 • 17min

“Generative AI is not causing YCombinator companies to grow more quickly than usual (yet)” by Xodarap

Epistemic status: I think you should interpret this as roughly something like “GenAI is not so powerful that it shows up in the most obvious way of analyzing the data, but maybe if someone did a more careful analysis which controlled for e.g. macroeconomic trends they would find that GenAI is indeed causing faster growth.” Some people have asked me to consider going back to earning-to-give, and so I am considering founding another company. A major hesitation is that startups generally take 6 to 10 years to be acquired or go public. This is unfortunate for founders who believe that we are less than 10 years away from all human labor being automated. It's possible that the advent of generative AI will make startups grow faster, which could provide a faster exit path. YCombinator CEO Garry Tan has publicly stated that recent YC batches are the fastest growing in their [...] ---Outline:(03:26) Methodology(04:01) Identifying GenAI startups(04:55) Sources of error(06:42) Results(06:45) The most valuable companies 2 years after doing YC(07:10) The most valuable companies 1 year after doing YC(07:34) Average Growth(08:25) Discussion(08:29) Garry Tan's comments(09:24) Has YCombinator just lost the mandate of heaven?(10:52) Stripe data disagrees, showing that AI companies are growing revenue more quickly(12:56) Carta's data agrees, showing that companies aren't growing faster(14:12) You can still get rich quick though(15:00) Further Research(15:42) ConclusionThe original text contained 6 footnotes which were omitted from this narration. --- First published: September 1st, 2025 Source: https://www.lesswrong.com/posts/hxYiwSqmvxzCXuqty/generative-ai-is-not-causing-ycombinator-companies-to-grow --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app