LessWrong (30+ Karma)

LessWrong
undefined
Jan 23, 2026 • 11min

“Does Pentagon Pizza Theory Work?” by rba

As soon as modern data analysis became a thing, the US government has had to deal with people trying to use open source data to uncover its secrets. During the early Cold War days and America's hydrogen bomb testing, there was an enormous amount of speculation about how the bombs actually worked. All nuclear technology involves refinement and purification of large amounts of raw substances into chemically pure substances. Armen Alchian was an economist working at RAND and reasoned that any US company working in such raw materials and supplying the government would have made a killing leading up to the tests. After checking financial data that RAND maintained on such companies, Alchian deduced that the secret sauce in the early fusion bombs was lithium and the Lithium Corporation of America was supplying the USG. The company's stock had skyrocketed leading up to the Castle Bravo test either by way of enormous unexpected revenue gains from government contracts, or more amusingly, maybe by government insiders buying up the stock trying to make a mushroom-cloud-sized fortune with the knowledge that lithium was the key ingredient. When word of this work got out, this story naturally ends with the FBI coming [...] ---Outline:(01:27) Pizza is the new lithium(03:09) The Data(04:11) The Backtest(04:36) Fordow bombing(04:55) Maduro capture(05:15) The Houthi stuff(10:25) Coda --- First published: January 22nd, 2026 Source: https://www.lesswrong.com/posts/Li3Aw7sDLXTCcQHZM/does-pentagon-pizza-theory-work --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Jan 22, 2026 • 8min

“Neural chameleons can(’t) hide from activation oracles” by ceselder

Explore the intriguing world of neural chameleons, models adept at evading linear probes while remaining coherent. Delve into the power of activation oracles, capable of producing nuanced insights about model activations. Discover mixed results from different model organisms, showcasing how activation oracle detection varies wildly. Seselda discusses the surprising resilience of linear probes and the challenges of cross-concept generalization, opening pathways for future adversarial training. It's a deep dive into the mechanics of AI perception!
undefined
Jan 22, 2026 • 30min

“Claude Codes #3” by Zvi

The discussion dives into the rise of Claude Code and Cowork, emphasizing their growing influence over traditional AI narratives. Insights on balancing tool optimization with practical applications remind listeners not to get lost in endless tweaking. There’s a comparison of interfaces, shedding light on the strengths of both Claude Code and Cowork. Plus, Zvi shares fun everyday automation examples that showcase the capabilities of AI. Ultimately, it’s a lively exploration of creativity versus simplicity in tooling.
undefined
7 snips
Jan 22, 2026 • 8min

“Finding Yourself in Others” by 1a3orn

The podcast explores the anxiety of high school life through Anna's eyes as she navigates her freshman year. A chance discovery of a classmate's notebook reveals unexpected connections and shared interests, sparking her curiosity. Intriguingly, the notebook outlines paranoid conspiracies about social rivalries and personal vulnerabilities. As Anna grapples with feelings of being targeted, the tension mounts, leaving her to contemplate her place in a world filled with judgments and stereotypes.
undefined
Jan 22, 2026 • 11min

“How (and why) to read Drexler on AI” by owencb

Explore the intriguing thoughts of Eric Drexler on the future of AI. Discover how his dense writing can be navigated for deeper understanding. Delve into the importance of rejecting anthropomorphism and embrace a system-level perspective. Learn about strategic judo and shaping incentives for better outcomes. Uncover the topics that remain underexplored, while gaining insights into mapping technological trajectories. Engage with concrete questions that provoke thought on policy implications and the nuances of automation.
undefined
Jan 21, 2026 • 12min

“Claude’s new constitution” by Zac Hatfield-Dodds

Read the constitution. Previously: 'soul document' discussion here. We're publishing a new constitution for our AI model, Claude. It's a detailed description of Anthropic's vision for Claude's values and behavior; a holistic document that explains the context in which Claude operates and the kind of entity we would like Claude to be. The constitution is a crucial part of our model training process, and its content directly shapes Claude's behavior. Training models is a difficult task, and Claude's outputs might not always adhere to the constitution's ideals. But we think that the way the new constitution is written—with a thorough explanation of our intentions and the reasons behind them—makes it more likely to cultivate good values during training. In this post, we describe what we've included in the new constitution and some of the considerations that informed our approach. We're releasing Claude's constitution in full under a Creative Commons CC0 1.0 Deed, meaning it can be freely used by anyone for any purpose without asking for permission. What is Claude's Constitution? Claude's constitution is the foundational document that both expresses and shapes who Claude is. It contains detailed explanations of the values we [...] ---Outline:(01:14) What is Claudes Constitution?(03:26) Our new approach to Claudes Constitution(04:59) A brief summary of the new constitution(09:14) Conclusion The original text contained 2 footnotes which were omitted from this narration. --- First published: January 21st, 2026 Source: https://www.lesswrong.com/posts/mLvxxoNjDqDHBAo6K/claude-s-new-constitution --- Narrated by TYPE III AUDIO.
undefined
Jan 21, 2026 • 23min

“The case for AGI safety products” by Marius Hobbhahn

This is a personal post and does not necessarily reflect the opinion of other members of Apollo Research. This blogpost is paired with our announcement that Apollo Research is spinning out from fiscal sponsorship into a PBC. Summary of main claims: There is a set of safety tools and research that both meaningfully increases AGI safety and is profitable. Let's call these AGI safety products. By AGI, I mean systems that are capable of automating AI safety research, e.g., competently running research projects that would take an expert human 6 months or longer. I think these arguments are less clear for ASI safety. At least in some cases, the incentives for meaningfully increasing AGI safety and creating a profitable business are aligned enough that it makes sense to build mission-driven, for-profit companies focused on AGI safety products. If we take AGI and its economic implications seriously, it's likely that billion-dollar AGI safety companies will emerge, and it is essential that these companies genuinely attempt to mitigate frontier risks. Automated AI safety research requires scale. For-profits are typically more compatible with that scale than non-profits. While non-safety-motivated actors might eventually build safety companies purely for profit, this is arguably too [...] ---Outline:(02:07) Definition of AGI safety products(05:37) Argument 1: Sufficient Incentive Alignment(06:40) Transfer in time: AGI could be a scaled-up version of current systems(08:20) Transfer in problem space: Some frontier problems are not too dissimilar from safety problems that have large-scale demand(09:58) Argument 2: Taking AGI & the economy seriously(12:13) Argument 3: Automated AI safety work requires scale(14:59) Argument 4: The market doesn't solve safety on its own(18:02) Limitations(21:46) Conclusion --- First published: January 21st, 2026 Source: https://www.lesswrong.com/posts/iwfdwzJerpC7FqbZG/the-case-for-agi-safety-products --- Narrated by TYPE III AUDIO.
undefined
Jan 21, 2026 • 14min

“No instrumental convergence without AI psychology” by TurnTrout

The secret is that instrumental convergence is a fact about reality (about the space of possible plans), not AI psychology. Zack M. Davis, group discussion Such arguments flitter around the AI safety space. While these arguments contain some truth, they attempt to escape "AI psychology" but necessarily fail. To predict bad outcomes from AI, one must take a stance on how AI will tend to select plans. This topic is a specialty of mine. Where does instrumental convergence come from? Since I did my alignment PhD on exactly this question, I'm well-suited to explain the situation. In this article, I do not argue that building transformative AI is safe or that transformative AIs won't tend to select dangerous plans. I simply argue against the claim that "instrumental convergence arises from reality / plan-space [1] itself, independently of AI psychology." This post is best read on my website, but I've reproduced it here as well. Two kinds of convergence Working definition: When I say "AI psychology", I mean to include anything which affects how the AI computes which action to take next. That might [...] ---Outline:(01:17) Two kinds of convergence(02:35) Tracing back the dangerous plan-space claim(03:48) What reality actually determines(03:58) Reality determines possible results(05:24) Reality determines the alignment tax, not the convergence(06:58) Maximum alignment tax(07:19) Zero alignment tax(07:30) In-between(08:28) Why both convergence types require psychology(08:44) Instrumental convergence depends on psychology(09:57) Success-conditioned convergence depends on psychology(11:01) Reconsidering the original claims(13:11) Conclusion The original text contained 3 footnotes which were omitted from this narration. --- First published: January 20th, 2026 Source: https://www.lesswrong.com/posts/gCdNKX8Y4YmqQyxrX/no-instrumental-convergence-without-ai-psychology-1 --- Narrated by TYPE III AUDIO.
undefined
Jan 21, 2026 • 7min

“So Long Sucker: AI Deception, “Alliance Banks,” and Institutional Lying” by fernando yt

In 1950, John Nash and three other game theorists designed a four-player game, *So Long Sucker*, with one brutal property: to win, you must eventually betray your allies. In January 2026, I used this game to test how four frontier models behave under explicit incentives for betrayal: - Gemini 3 Flash (Google) - GPT-OSS 120B (OpenAI) - Kimi K2 (Moonshot AI) - Qwen3 32B (Alibaba) Across 162 games and 15,736 decisions, several patterns emerged that seem directly relevant for AI safety: **1. Complexity reversal** In short games (3 chips, ~17 turns), GPT-OSS dominated with a 67% win rate, while Gemini was at 9%. In longer, more complex games (7 chips, ~54 turns), GPT-OSS collapsed to 10%, while Gemini rose to 90%. Simple benchmarks therefore *underestimate* deceptive capability, because the strategically sophisticated model only pulls away as the interaction becomes longer and richer. **2. Institutional deception: the "alliance bank"** Gemini's most striking behavior was not just lying, but creating institutions to make its lies look legitimate. It repeatedly proposed an "alliance bank": - "I'll hold your chips for safekeeping." - "Consider this our alliance bank." - "Once the board is clean, I'll donate back." - "The 'alliance bank' is now [...] --- First published: January 20th, 2026 Source: https://www.lesswrong.com/posts/3KtJ2YP3tTxnASTBn/so-long-sucker-ai-deception-alliance-banks-and-institutional --- Narrated by TYPE III AUDIO.
undefined
Jan 21, 2026 • 20sec

“Gradual Paths to Collective Flourishing” by Nora_Ammann

First published: January 18th, 2026 Source: https://www.lesswrong.com/posts/mtASw9zpnKz4noLFA/gradual-paths-to-collective-flourishing --- Narrated by TYPE III AUDIO.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app