LessWrong (30+ Karma)

LessWrong
undefined
Dec 9, 2025 • 14min

“Towards Categorization of Adlerian Excuses” by romeostevensit

[Author's note: LLMs were used to generate and sort examples into their requisite categories, as well as find and summarize relevant papers, and extensive assistance with editing] Context: Alfred Adler (1870–1937) split from Freud by asserting that human psychology is teleological (goal-oriented) rather than causal (drive-based). He argued that neuroses and "excuses" are not passive symptoms of past trauma, but active, creative tools used by the psyche to safeguard self-esteem. This post attempts to formalize Adler's concept of "Safeguarding Tendencies" into categories, not by the semantic content of excuses, but by their mechanical function in managing the distance between the Ego and Reality. Abstract: When a life task threatens to reveal inadequacy, people initiate a strategic maneuver to invalidate the test. We propose four "Strategies of Immunity", Incapacity, Entanglement, Elevation, and Scorched Earth, to explain how agents rig the game so they cannot lose. The Teleological Flip In the standard model of behavior, an excuse is a result. You are anxious; therefore, you cannot socialize. The cause (anxiety) produces the effect (avoidance). Adler inverted this vector. He argued that the goal (avoiding the risk of rejection) recruits the means (anxiety). You generate the anxiety in order to avoid [...] ---Outline:(01:15) The Teleological Flip(02:17) 1. Immunity via Incapacity (the broken wing)(03:45) 2. Immunity via Entanglement (the human shield)(05:14) 3. Immunity via Elevation (the ivory tower)(06:39) 4. Immunity via Scorched Earth (the table flip)(08:07) Conclusion: The Courage to be Imperfect(12:52) Relevant works: --- First published: December 8th, 2025 Source: https://www.lesswrong.com/posts/kdG4T9jtETYe8Hkkg/towards-categorization-of-adlerian-excuses --- Narrated by TYPE III AUDIO.
undefined
Dec 9, 2025 • 15min

“Every point of intervention” by TsviBT

Crosspost from my blog. Events are already set for catastrophe, they must be steered along some course they would not naturally go. [...] Are you confident in the success of this plan? No, that is the wrong question, we are not limited to a single plan. Are you certain that this plan will be enough, that we need essay no others? Asked in such fashion, the question answers itself. The path leading to disaster must be averted along every possible point of intervention. — Professor Quirrell (competent, despite other issues), HPMOR chapter 92 This post is a quickly-written service-post, an attempt to lay out a basic point of strategy regarding decreasing existential risk from AGI. Keeping intervention points in mind By default, AGI will kill everyone. The group of people trying to stop that from happening should seriously attend to all plausible points of intervention. In this context, a point of intervention is some element of the world—such as an event, a research group, or an ideology—which could substantively contribute to leading humanity to extinction through AGI. A point of intervention isn't an action; it doesn't say what to do. It just [...] ---Outline:(00:57) Keeping intervention points in mind(01:37) The vague elephant(02:19) Example: France(03:03) Full-court press(04:26) Multi-stage fal... opportunity!(05:15) Brief tangent about a conjunction of disjunctions(06:43) Varied interventions help(07:17) Sources of correlation indicate deeper intervention points(08:13) Some takeaways(10:42) Some biases potentially affecting strategy porfolio balancing(12:44) A terse opinionated partial list of maybe-underattended points of intervention --- First published: December 8th, 2025 Source: https://www.lesswrong.com/posts/CseMTXtQHynGR8S5k/every-point-of-intervention --- Narrated by TYPE III AUDIO.
undefined
Dec 9, 2025 • 7min

“How Stealth Works” by Linch

Stealth technology is cool. It's what gave the US domination over the skies during the latter half of the Cold War, and the biggest component of the US's information dominance in both war and peace, at least prior to the rise of global internet connectivity and cybersecurity. Yet the core idea is almost embarrassingly simple. So how does stealth work? Photo by Steve Harvey on Unsplash When we talk about stealth, we’re usually talking about evading radar. How does radar work? ​​Radar antennas emit radio waves in the sky. ​​The waves bounce off objects like aircraft. When the echoes return to the antenna, the radar system can then identify the object's approximate speed, position, and size. Picture courtesy of Katelynn Bennett over at bifocal bunny So how would you evade radar? You can try to: Blast a bunch of radio waves in all directions (“jamming”). This works if you’re close to the radar antenna, but kind of defeats the point of stealth. Build your plane out of materials that are invisible to radio waves (like glass and some plastics) and just let the waves pass through. This is possible, but very difficult in practice. Besides, by the 1970s [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: December 8th, 2025 Source: https://www.lesswrong.com/posts/MxivaKjaAX9mkJzAK/how-stealth-works --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Dec 8, 2025 • 7min

“Reward Function Design: a starter pack” by Steven Byrnes

In the companion post We need a field of Reward Function Design, I implore researchers to think about what RL reward functions (if any) will lead to RL agents that are not ruthless power-seeking consequentialists. And I further suggested that human social instincts constitutes an intriguing example we should study, since they seem to be an existence proof that such reward functions exist. So what is the general principle of Reward Function Design that underlies the non-ruthless (“ruthful”??) properties of human social instincts? And whatever that general principle is, can we apply it to future RL agent AGIs? I don’t have all the answers, but I think I’ve made some progress, and the goal of this post is to make it easier for others to get up to speed with my current thinking. What I do have, thanks mostly to work from the past 12 months, is five frames / terms / mental images for thinking about this aspect of reward function design. These frames are not widely used in the RL reward function literature, but I now find them indispensable thinking tools. These five frames are complementary but related—I think kinda poking at different parts of the same [...] ---Outline:(02:22) Frame 1: behaviorist vs non-behaviorist (interpretability-based) reward functions(02:40) Frame 2: Inner / outer misalignment, specification gaming, goal misgeneralization(03:19) Frame 3: Consequentialist vs non-consequentialist desires(03:35) Pride as a special case of non-consequentialist desires(03:52) Frame 4: Generalization upstream of the reward signals(04:10) Frame 5: Under-sculpting desires(04:24) Some comments on how these relate --- First published: December 8th, 2025 Source: https://www.lesswrong.com/posts/xw8P8H4TRaTQHJnoP/reward-function-design-a-starter-pack --- Narrated by TYPE III AUDIO.
undefined
Dec 8, 2025 • 10min

“We need a field of Reward Function Design” by Steven Byrnes

(Brief pitch for a general audience, based on a 5-minute talk I gave.) Let's talk about Reinforcement Learning (RL) agents as a possible path to Artificial General Intelligence (AGI) My research focuses on “RL agents”, broadly construed. These were big in the 2010s—they made the news for learning to play Atari games, and Go, at superhuman level. Then LLMs came along in the 2020s, and everyone kinda forgot that RL agents existed. But I’m part of a small group of researchers who still thinks that the field will pivot back to RL agents, one of these days. (Others in this category include Yann LeCun and Rich Sutton & David Silver.) Why do I think that? Well, LLMs are very impressive, but we don’t have AGI (artificial general intelligence) yet—not as I use the term. Humans can found and run companies, LLMs can’t. If you want a human to drive a car, you take an off-the-shelf human brain, the same human brain that was designed 100,000 years before cars existed, and give it minimal instructions and a week to mess around, and now they’re driving the car. If you want an AI to drive a car, it's … not that. [...] ---Outline:(00:15) Let's talk about Reinforcement Learning (RL) agents as a possible path to Artificial General Intelligence (AGI)(02:17) Reward functions in RL(04:23) Reward functions in neuroscience(05:25) We need a (far more robust) field of reward function design(06:06) Oh man, are we dropping this ball(07:30) Reward Function Design: Neuroscience research directions(08:14) Reward Function Design: AI research directions(08:46) Bigger picture --- First published: December 8th, 2025 Source: https://www.lesswrong.com/posts/oxvnREntu82tffkYW/we-need-a-field-of-reward-function-design --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Dec 8, 2025 • 2min

“2025 Unofficial LessWrong Census/Survey” by Screwtape

The Less Wrong General Census is unofficially here! You can take it at this link. The kinda-sorta-annual-if-you-really-squint tradition of the Less Wrong Census is once more upon us! I want to pitch you on taking the survey. First, the demographics data is our best view into who's using the site and who is in the community. I use it when trying to figure out how to help meetups, I add a bunch of questions the LessWrong team asks which I assume they use to improve the site. Second, because I am madly, ravenously curious what works and what doesn't for raising the sanity waterline and making people more rational. I'm even curious what folks think that would mean! Third, if you're reading this and you're thinking "eh, I'm not really a rationalist, more just rationalist adjacent" then you are also of interest; I include questions from several adjacent communities as well. Also, you make a fantastic active control group. If you want to just spend five minutes answering the basic demographics questions before leaving the rest blank and hitting submit, that's totally appreciated. The survey is structured so the fastest and most generally applicable questions are generally speaking towards [...] --- First published: December 7th, 2025 Source: https://www.lesswrong.com/posts/ktaxnnqEeeWAohzCK/2025-unofficial-lesswrong-census-survey --- Narrated by TYPE III AUDIO.
undefined
Dec 8, 2025 • 4min

“Little Echo” by Zvi

I believe that we will win. An echo of an old ad for the 2014 US men's World Cup team. It did not win. I was in Berkeley for the 2025 Secular Solstice. We gather to sing and to reflect. The night's theme was the opposite: ‘I don’t think we’re going to make it.’ As in: Sufficiently advanced AI is coming. We don’t know exactly when, or what form it will take, but it is probably coming. When it does, we, humanity, probably won’t make it. It's a live question. Could easily go either way. We are not resigned to it. There's so much to be done that can tilt the odds. But we’re not the favorite. Raymond Arnold, who ran the event, believes that. I believe that. Yet in the middle of the event, the echo was there. Defiant. I believe that we will win. There is a recording of the event. I highly encourage you to set aside three hours at some point in December, to listen, and to participate and sing along. Be earnest. If you don’t believe it, I encourage this all the more. If you [...] --- First published: December 8th, 2025 Source: https://www.lesswrong.com/posts/YPLmHhNtjJ6ybFHXT/little-echo --- Narrated by TYPE III AUDIO.
undefined
Dec 8, 2025 • 4min

“I said hello and greeted 1,000 people at 5am this morning” by Mr. Keating

At the ass crack of dawn, in the dark and foggy mist, thousands of people converged on my location, some wearing short shorts, others wearing an elf costume and green tights. I was volunteering at a marathon. The race director told me the day before, “these people have trained for the last 6-12 months for this moment. They’ll be waking up at 3am. For many of them, this is the first marathon they’ve ever run. When they get off the bus at 5am, in the freezing cold, you’ll be the first face they see. Smile, welcome them, make them feel excited, and help them forget the next 26.2 miles of pain they’re about to endure.” Even though I normally have RBF and consider it a chore to acknowledge people, I slapped a big fat smile on my face and excitedly greeted runners like I was a golden retriever who hasn’t seen his military dad in over a year. “HELLO!” “GOOD MORNING!” “YOU’RE HERE!” ^That, on repeat for two hours straight. It was actually pretty fun. I calculated the optimal distance to stand from the bus was eight feet away. Stand too close, and the runners were still descending [...] --- First published: December 7th, 2025 Source: https://www.lesswrong.com/posts/2aiJphdjJaaBs6SJy/i-said-hello-and-greeted-1-000-people-at-5am-this-morning --- Narrated by TYPE III AUDIO.
undefined
Dec 7, 2025 • 47min

“AI in 2025: gestalt” by technicalities

This is the editorial for this year's "Shallow Review of AI Safety". (It got long enough to stand alone.) Epistemic status: subjective impressions plus one new graph plus 300 links. Huge thanks to Jaeho Lee, Jaime Sevilla, and Lexin Zhou for running lots of tests pro bono and so greatly improving the main analysis. tl;dr Informed people disagree about the prospects for LLM AGI – or even just what exactly was achieved this year. But the famous ones with a book to talk at least agree that we’re 2-20 years off (allowing for other paradigms arising). In this piece I stick to arguments rather than reporting who thinks what. My view: compared to last year, AI is much more impressive but not proportionally more useful. They improved on some things they were explicitly optimised for (coding, vision, OCR, benchmarks), and did not hugely improve on everything else. Progress is thus (still!) consistent with current frontier training bringing more things in-distribution rather than generalising very far. Pretraining (GPT-4.5, Grok 3/4, but also the counterfactual large runs which weren’t done) disappointed people this year. It's probably not because it didn't or wouldn’t work; it was just too hard to serve [...] ---Outline:(00:43) tl;dr(04:02) Capabilities in 2025(04:21) Arguments against 2025 capabilities growth being above-trend(10:24) Arguments for 2025 capabilities growth being above-trend(19:46) Evals crawling towards ecological validity(22:55) Safety in 2025(23:00) Are reasoning models safer than the old kind?(26:05) The looming end of evals(28:07) Prosaic misalignment(30:34) What is the plan?(33:10) Things which might fundamentally change the nature of LLMs(34:59) Emergent misalignment and model personas(36:42) Monitorability(38:27) New people(39:01) Overall(39:37) Discourse in 2025 The original text contained 9 footnotes which were omitted from this narration. --- First published: December 7th, 2025 Source: https://www.lesswrong.com/posts/Q9ewXs8pQSAX5vL7H/ai-in-2025-gestalt --- Narrated by TYPE III AUDIO. ---Images from the article:4x)" (32.5%). The poll shows 157 votes with 38 minutes remaining." style="max-width: 100%;" /> it's so over Owain at al: it's actually really easy to make an evil AI -> we're so back"." style="max-width: 100%;" />Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Dec 7, 2025 • 16min

“Eliezer’s Unteachable Methods of Sanity” by Eliezer Yudkowsky

"How are you coping with the end of the world?" journalists sometimes ask me, and the true answer is something they have no hope of understanding and I have no hope of explaining in 30 seconds, so I usually answer something like, "By having a great distaste for drama, and remembering that it's not about me." The journalists don't understand that either, but at least I haven't wasted much time along the way. Actual LessWrong readers sometimes ask me how I deal emotionally with the end of the world. I don't actually think my answer is going to help. But Raymond Arnold thinks I should say it. So I will say it. I don't actually think my answer is going to help. Wisely did Ozy write, "Other People Might Just Not Have Your Problems." Also I don't have a bunch of other people's problems, and other people can't make internal function calls that I've practiced to the point of hardly noticing them. I don't expect that my methods of sanity will be reproducible by nearly anyone. I feel pessimistic that they will help to hear about. Raymond Arnold asked me to speak them anyways, so I will. Stay genre-savvy [...] ---Outline:(01:15) Stay genre-savvy / be an intelligent character.(03:41) Dont make the end of the world be about you.(07:33) Just decide to be sane, and write your internal scripts that way. --- First published: December 6th, 2025 Source: https://www.lesswrong.com/posts/isSBwfgRY6zD6mycc/eliezer-s-unteachable-methods-of-sanity --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app