LessWrong (30+ Karma) cover image

LessWrong (30+ Karma)

Latest episodes

undefined
Apr 6, 2025 • 21min

“A collection of approaches to confronting doom, and my thoughts on them” by Ruby

I just published A Slow Guide to Confronting Doom, containing my own approach to living in a world that I think has a high likelihood of ending soon. Fortunately I'm not the only person to have written on topic. Below are my thoughts on what others have written. I have not written these such that they stand independent from the originals, and have attentionally not written summaries that wouldn't do the pieces justice. I suggest you read or at least skim the originals. A defence of slowness at the end of the world (Sarah) I feel kinship with Sarah. She's wrestling with the same harsh scary realities I am – feeling the AGI. The post isn't that long and I recommend reading it, but to quote just a little: Since learning of the coming AI revolution, I’ve lived in two worlds. One moves at a leisurely pace, the same [...] ---Outline:(00:37) A defence of slowness at the end of the world (Sarah)(03:37) How will the bomb find you? (C. S. Lewis)(08:02) Death with Dignity (Eliezer Yudkowsky)(09:08) Dont die with dignity; instead play to your outs (Jeffrey Ladish)(10:29) Emotionally Confronting a Probably-Doomed World: Against Motivation Via Dignity Points (TurnTrout)(12:44) A Way To Be Okay (Duncan Sabien)(14:17) Another Way to Be Okay (Gretta Duleba)(14:39) Being at peace with Doom (Johannes C. Mayer)(16:56) Heres the exit. (Valentine)(19:14) Mainstream Advice--- First published: April 6th, 2025 Source: https://www.lesswrong.com/posts/ZE4xhZHDHHXPuXzxh/a-collection-of-approaches-to-confronting-doom-and-my --- Narrated by TYPE III AUDIO.
undefined
Apr 6, 2025 • 27min

“A Slow Guide to Confronting Doom, v1” by Ruby

Following a few events[1] in April 2022 that caused a many people to update sharply and negatively on outcomes for humanity, I wrote A Quick Guide to Confronting Doom. I advised: Think for yourself Be gentle with yourself Don't act rashly Be patient about helping Don't act unilaterally Figure out what works for you  This is fine advice and all, I stand by it, but it's also not really a full answer to how to contend with the utterly crushing weight of the expectation that everything and everyone you value will be destroyed in the next decade or two. Feeling the Doom Before I get into my suggested psychological approach to doom, I want to clarify the kind of doom I'm working to confront. If you are impatient, you can skip to the actual advice. The best analogy I have is the feeling of having a terminally [...] ---Outline:(00:46) Feeling the Doom(04:28) Facing the doom(04:50) Stay hungry for value(06:42) The bitter truth over sweet lies(07:35) Dont look away(08:11) Flourish as best one can(09:13) This time with feeling(13:27) Mindfulness(14:00) The time for action is now(15:18) Creating space for miracles(15:58) How does a good person live in such times?(16:49) Continue to think, tolerate uncertainty(18:03) Being a looker(18:48) Dont throw away your mind(20:22) Damned to lie in bed...(22:13) Worries, compulsions, and excessive angst(22:49) Comments on others approaches(23:14) What does it mean to be okay?(25:17) Why is this guide titled version 1?(25:38) If youre gonna remember just a couple things--- First published: April 6th, 2025 Source: https://www.lesswrong.com/posts/X6Nx9QzzvDhj8Ek9w/a-slow-guide-to-confronting-doom-v1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Apr 6, 2025 • 47sec

“How much progress actually happens in theoretical physics?” by ChristianKl

I frequently hear people make the claim that progress in theoretically physics is stalled, partly because all the focus is on String theory and String theory doesn't seem to pan out into real advances. Believing it fits my existing biases, but I notice that I lack the physics understanding to really know whether or not there's progress. What do you think? --- First published: April 4th, 2025 Source: https://www.lesswrong.com/posts/GBfMkaBdAnWLab2dj/how-much-progress-actually-happens-in-theoretical-physics --- Narrated by TYPE III AUDIO.
undefined
Apr 6, 2025 • 35min

“DeepMind: An Approach to Technical AGI Safety and Security” by Zach Stein-Perlman

I quote the abstract, 10-page "extended abstract," and table of contents. See link above for the full 100-page paper. See also the blogpost (which is not a good summary) and tweet thread. I haven't read most of the paper, but I'm happy about both the content and how DeepMind (or at least its safety team) is articulating an "anytime" (i.e., possible to implement quickly) plan for addressing misuse and misalignment risks. But I think safety at DeepMind is more bottlenecked by buy-in from leadership to do moderately costly things than the safety team having good plans and doing good work. Artificial General Intelligence (AGI) promises transformative benefits but also presents significant risks. We develop an approach to address the risk of harms consequential enough to significantly harm humanity. We identify four areas of risk: misuse, misalignment, mistakes, and structural risks. Of these, we focus on technical approaches to misuse [...] ---Outline:(02:05) Extended Abstract(04:25) Background assumptions(08:11) Risk areas(13:33) Misuse(14:59) Risk assessment(16:19) Mitigations(18:47) Assurance against misuse(20:41) Misalignment(22:32) Training an aligned model(25:13) Defending against a misaligned model(26:31) Enabling stronger defenses(29:31) Alignment assurance(32:21) Limitations--- First published: April 5th, 2025 Source: https://www.lesswrong.com/posts/3ki4mt4BA6eTx56Tc/deepmind-an-approach-to-technical-agi-safety-and-security --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Apr 5, 2025 • 15min

“Among Us: A Sandbox for Agentic Deception” by 7vik, Adrià Garriga-alonso

We show that LLM-agents exhibit human-style deception naturally in "Among Us". We introduce Deception ELO as an unbounded measure of deceptive capability, suggesting that frontier models win more because they're better at deception, not at detecting it. We evaluate probes and SAEs to detect out-of-distribution deception, finding they work extremely well. We hope this is a good testbed to improve safety techniques to detect and remove agentically-motivated deception, and to anticipate deceptive abilities in LLMs. Produced as part of the ML Alignment & Theory Scholars Program - Winter 2024-25 Cohort. Link to our paper and code. Studying deception in AI agents is important, and it is difficult due to the lack of good sandboxes that elicit the behavior naturally, without asking the model to act under specific conditions or inserting intentional backdoors. Extending upon AmongAgents (a text-based social-deduction game environment), we aim to fix this by introducing Among [...] ---Outline:(02:10) The Sandbox(02:14) Rules of the Game(03:05) Relevance to AI Safety(04:11) Definitions(04:39) Deception ELO(06:42) Frontier Models are Differentially better at Deception(07:38) Win-rates for 1v1 Games(08:14) LLM-based Evaluations(09:03) Linear Probes for Deception(09:28) Datasets(10:06) Results(11:19) Sparse Autoencoders (SAEs)(12:05) Discussion(12:29) Limitations(13:11) Gain of Function(14:05) Future Work--- First published: April 5th, 2025 Source: https://www.lesswrong.com/posts/gRc8KL2HLtKkFmNPr/among-us-a-sandbox-for-agentic-deception --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Apr 5, 2025 • 8min

“Alignment faking CTFs: Apply to my MATS stream” by joshc

Right now, alignment seems easy – but that's because models spill the beans when they are misaligned. Eventually, models might “fake alignment,” and we don’t know how to detect that yet. It might seem like there's a swarming research field improving white box detectors – a new paper about probes drops on arXiv nearly every other week. But no one really knows how well these techniques work. Some researchers have already tried to put white box detectors to the test. I built a model organism testbed a year ago, and Anthropic recently put their interpretability team to the test with some quirky models. But these tests were layups. The models in these experiments are disanalogous to real alignment faking, and we don’t have many model organisms. This summer, I’m trying to take these testbeds to the next level in an “alignment faking capture the flag game.” Here's how the [...] ---Outline:(01:58) Details of the game(06:01) How this CTF game ties into a broader alignment strategy(07:43) Apply by April 18th--- First published: April 4th, 2025 Source: https://www.lesswrong.com/posts/jWFvsJnJieXnWBb9r/alignment-faking-ctfs-apply-to-my-mats-stream --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Apr 5, 2025 • 7min

“Meditation and Reduced Sleep Need” by niplav

Dive into the intriguing relationship between meditation and sleep needs. The discussion highlights how extensive meditation might drastically reduce the hours needed for rest. Personal anecdotes and research reveal fascinating insights into this connection. It also touches upon the experiences of lucid dreaming in deep meditation, showcasing the unusual changes in sleep patterns among dedicated practitioners. Overall, it explores the balance of meditation practice and the necessity for some sleep.
undefined
Apr 4, 2025 • 16min

“AI CoT Reasoning Is Often Unfaithful” by Zvi

A new Anthropic paper reports that reasoning model chain of thought (CoT) is often unfaithful. They test on Claude Sonnet 3.7 and r1, I’d love to see someone try this on o3 as well. Note that this does not have to be, and usually isn’t, something sinister. It is simply that, as they say up front, the reasoning model is not accurately verbalizing its reasoning. The reasoning displayed often fails to match, report or reflect key elements of what is driving the final output. One could say the reasoning is often rationalized, or incomplete, or implicit, or opaque, or bullshit. The important thing is that the reasoning is largely not taking place via the surface meaning of the words and logic expressed. You can’t look at the words and logic being expressed, and assume you understand what the model is doing and why it is doing [...] ---Outline:(01:03) What They Found(06:54) Reward Hacking(09:28) More Training Did Not Help Much(11:49) This Was Not Even Intentional In the Central Sense--- First published: April 4th, 2025 Source: https://www.lesswrong.com/posts/TmaahE9RznC8wm5zJ/ai-cot-reasoning-is-often-unfaithful --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Apr 4, 2025 • 17min

“LLM AGI will have memory, and memory changes alignment” by Seth Herd

Summary: When stateless LLMs are given memories they will accumulate new beliefs and behaviors, and that may allow their effective alignment to evolve. (Here "memory" is learning during deployment that is persistent beyond a single session.)[1] LLM agents will have memory: Humans who can't learn new things ("dense anterograde amnesia") are not highly employable for knowledge work. LLM agents that can learn during deployment seem poised to have a large economic advantage. Limited memory systems for agents already exist, so we should expect nontrivial memory abilities improving alongside other capabilities of LLM agents. Memory changes alignment: It is highly useful to have an agent that can solve novel problems and remember the solutions. Such memory includes useful skills and beliefs like "TPS reports should be filed in the folder ./Reports/TPS". They could also include learning skills for hiding their actions, and beliefs like "LLM agents are a type of [...] ---Outline:(01:26) Memory is useful for many tasks(05:11) Memory systems are ready for agentic use(09:00) Agents arent ready to direct memory systems(11:20) Learning new beliefs can functionally change goals and values(12:43) Value change phenomena in LLMs to date(14:27) Value crystallization and reflective stability as a result of memory(15:35) Provisional conclusions--- First published: April 4th, 2025 Source: https://www.lesswrong.com/posts/aKncW36ZdEnzxLo8A/llm-agi-will-have-memory-and-memory-changes-alignment --- Narrated by TYPE III AUDIO.
undefined
Apr 4, 2025 • 24min

“Will compute bottlenecks prevent a software intelligence explosion?” by Tom Davidson

Epistemic status – thrown together quickly. This is my best-guess, but could easily imagine changing my mind. Intro I recently copublished a report arguing that there might be a software intelligence explosion (SIE) – once AI R&D is automated (i.e. automating OAI), the feedback loop of AI improving AI algorithms could accelerate more and more without needing more hardware. If there is an SIE, the consequences would obviously be massive. You could shoot from human-level to superintelligent AI in a few months or years; by default society wouldn’t have time to prepare for the many severe challenges that could emerge (AI takeover, AI-enabled human coups, societal disruption, dangerous new technologies, etc). The best objection to an SIE is that progress might be bottlenecked by compute. We discuss this in the report, but I want to go into much more depth because it's a powerful objection [...] ---Outline:(00:19) Intro(01:47) The compute bottleneck objection(01:51) Intuitive version(02:58) Economist version(09:13) Counterarguments to the compute bottleneck objection(20:11) Taking stock--- First published: April 4th, 2025 Source: https://www.lesswrong.com/posts/XDF6ovePBJf6hsxGj/will-compute-bottlenecks-prevent-a-software-intelligence-1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode