LessWrong (30+ Karma)

LessWrong
undefined
Dec 3, 2025 • 33min

“6 reasons why “alignment-is-hard” discourse seems alien to human intuitions, and vice-versa” by Steven Byrnes

Dive into a clash of perspectives on AI alignment! One camp warns of future AIs being ruthless utility maximizers while others see a different human-like potential. Explore concepts like approval reward, which drives pride and social behavior in humans. Discover why human goals shift over time and how our kindness contrasts with potential AGI behavior. The podcast raises fascinating questions about what makes us human and the strange nature of long-term planning. It's a thought-provoking discussion about the future of intelligence!
undefined
Dec 3, 2025 • 23min

“Human art in a post-AI world should be strange” by Abhishaike Mahajan

Explore a world where a Flash game shapes culture and politics, as one guest dives deep into the implications of art in a post-AI landscape. The impact of algorithmic filtering is dissected, revealing how unwanted choices can drown unique desires. Surprisingly, he argues that artists must embrace strangeness to stand out in a sea of generative sameness. The rise of auteur-driven media is predicted, emphasizing the importance of personal desire as an irreplaceable artistic asset in an automated future.
undefined
Dec 3, 2025 • 14min

“Effective Pizzaism” by Screwtape

The discussion dives into what it means to be an 'effective pizzaist,' exploring personal motivations for wanting more pizza. The host links money to values, sharing how price influences our desires. An interesting twist emerges with a family blood donation competition, revealing profound motivations. There's a critical look at flawed members within movements and how that doesn't diminish the core desire. Ultimately, listeners are encouraged to reflect on their own values and resource allocation. Pizza becomes a metaphor for deeper life choices.
undefined
Dec 2, 2025 • 12min

“Becoming a Chinese Room” by Raelifin

[My novel, Red Heart, is on sale for $4 this week. Daniel Kokotaijlo liked it a lot, and the Senior White House Policy Advisor on AI is currently reading it.] “Formal symbol manipulations by themselves … have only a syntax but no semantics. Such intentionality as computers appear to have is solely in the minds of those who program them and those who use them, those who send in the input and those who interpret the output.” — John Searle, originator of the “Chinese room” thought experiment A colleague of mine, shortly before Red Heart was published, remarked to me that if I managed to write a compelling novel set in China, told from Chinese perspectives — without spending time in the country, having grown up in a Chinese-culture context, or knowing any Chinese language — it would be an important bit of evidence about the potency of abstract reasoning and book-learning. This, in turn, may be relevant to how powerful and explosive we should expect AI systems to be. There are many, such as the “AI as Normal Technology” folks, who believe that AI will be importantly bottlenecked on lack of experience interacting with the real world and [...] ---Outline:(02:58) Writing About China(07:33) What Does This Imply About AI The original text contained 11 footnotes which were omitted from this narration. --- First published: December 1st, 2025 Source: https://www.lesswrong.com/posts/PjPcq2QHKZNZQDEmJ/becoming-a-chinese-room --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Dec 2, 2025 • 14min

“Reward Mismatches in RL Cause Emergent Misalignment” by Zvi

Learning to do misaligned-coded things anywhere teaches an AI (or a human) to do misaligned-coded things everywhere. So be sure you never, ever teach any mind to do what it sees, in context, as misaligned-coded things. If the optimal solution (as in, the one you most reinforce) to an RL training problem is one that the model perceives as something you wouldn’t want it to do, it will generally learn to do things you don’t want it to do. You can solve this by ensuring that the misaligned-coded things are not what the AI will learn to do. Or you can solve this by making those things not misaligned-coded. If you then teaching aligned behavior in one set of spots, this can fix the problem in those spots, but the fix does not generalize to other tasks or outside of distribution. If you manage to hit the entire distribution of tasks you care about in this way, that will work for now, but it still won’t generalize, so it's a terrible long term strategy. Yo Shavit: Extremely important finding. Don’t tell your model you’re rewarding it for A and then reward it for B [...] ---Outline:(02:59) Abstract Of The Paper(04:12) The Problem Statement(05:35) The Inoculation Solution(07:02) Cleaning The Data Versus Cleaning The Environments(08:16) No All Of This Does Not Solve Our Most Important Problems(13:18) It Does Help On Important Short Term Problems --- First published: December 2nd, 2025 Source: https://www.lesswrong.com/posts/a2nW8buG2Lw9AdPtH/reward-mismatches-in-rl-cause-emergent-misalignment --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Dec 2, 2025 • 3min

“Future Proofing Solstice” by Raemon

Bay Solstice is this weekend (Dec 6th at 7pm, with a Megameetup at Lighthaven earlier in the day). I wanted to give people a bit more idea of what to expect. I created Solstice in 2011. Since 2022, I've been worried that the Solstice isn't really set up to handle "actually looking at human extinction in nearmode" in a psychologically healthy way. (I tried to set this up in the beginning, but once my p(Doom) crept over 50% I started feeling like Solstice wasn't really helping the way I wanted). People 'round here disagree a lot on how AI will play out. But, Yes Requires the Possibility of No, and as the Guy Who Made Solstice, it seemed like I should either say explicitly: "sorry guys, I don't know how to have 500 people look at 'are we gonna make it?' in a way that would be healthy if the answer was 'no'. So, we're just not actually going to look that closely at the question." or, figure how to do a good job at that. This Solstice is me attempting to navigate option #2, while handling the fact that we have lots of people with lots [...] --- First published: December 1st, 2025 Source: https://www.lesswrong.com/posts/pdZ2geyATFGt9v6Su/future-proofing-solstice-1 --- Narrated by TYPE III AUDIO.
undefined
Dec 2, 2025 • 16min

“MIRI’s 2025 Fundraiser” by alexvermeer

MIRI is running its first fundraiser in six years, targeting $6M. The first $1.6M raised will be matched 1:1 via an SFF grant. Fundraiser ends at midnight on Dec 31, 2025. Support our efforts to improve the conversation about superintelligence and help the world chart a viable path forward. MIRI is a nonprofit with a goal of helping humanity make smart and sober decisions on the topic of smarter-than-human AI. Our main focus from 2000 to ~2022 was on technical research to try to make it possible to build such AIs without catastrophic outcomes. More recently, we’ve pivoted to raising an alarm about how the race to superintelligent AI has put humanity on course for disaster. In 2025, those efforts focused around Nate Soares and Eliezer Yudkowsky's book (now a New York Times bestseller) If Anyone Builds It, Everyone Dies, with many public appearances by the authors; many conversations with policymakers; the release of an expansive online supplement to the book; and various technical governance publications, including a recent report with a draft of an international agreement of the kind that could actually address the danger of superintelligence. Millions have now viewed interviews and appearances with Eliezer and/or Nate [...] ---Outline:(02:18) The Big Picture(03:39) Activities(03:42) Communications(07:55) Governance(12:31) Fundraising The original text contained 4 footnotes which were omitted from this narration. --- First published: December 1st, 2025 Source: https://www.lesswrong.com/posts/z4jtxKw8xSHRqQbqw/miri-s-2025-fundraiser --- Narrated by TYPE III AUDIO.
undefined
Dec 1, 2025 • 8min

“The 2024 LessWrong Review” by RobertM

We have a ritual around these parts. Every year, we have ourselves a little argument about the annual LessWrong Review, and whether it's a good use of our time or not. Every year, we decide it passes the cost-benefit analysis[1]. Oh, also, every[2] year, you do the following: Spend 2 weeks nominating the best posts that are at least one year old, Spend 4 weeks reviewing and discussing the nominated posts, Spend 3 weeks casting your final votes, to decide which posts end up in the "Best of LessWrong 20xx" collection for that year. Maybe you can tell that I'm one of the more skeptical members of the team, when it comes to the Review. Nonetheless, I think the Review is probably worth your time, even (or maybe especially) if your time is otherwise highly valuable. I will explain why I think this, then I will tell you which stretch of ditch you're responsible for digging this year. Are we full of bullshit? Every serious field of inquiry has some mechanism(s) by which it discourages its participants from huffing their own farts. Fields which have fewer of these mechanisms tend to be correspondingly less attached to reality. The [...] ---Outline:(01:07) Are we full of bullshit?(02:44) Is there gold in them thar hills?(05:06) The Ask(05:39) The concrete, minimal Civic Duty actions(06:16) Bigger Picture(07:43) How To Dig The original text contained 7 footnotes which were omitted from this narration. --- First published: December 1st, 2025 Source: https://www.lesswrong.com/posts/ZpRzTr5QBT6C3Faor/the-2024-lesswrong-review --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Dec 1, 2025 • 9min

“A Statistical Analysis of Inkhaven” by Ben Pace

Okay, we got 41 people to do 30 posts in 30 days. How did it go? How did they like it? Well I just had 36 of them fill out an extensive feedback form. I am devastated with tiredness, but I have to write my last post, so let's take a look at what happened. Thanks to Habryka for vibecoding this UI. Key Outcomes Pretty reasonable numbers. For context on the overall rating and NPS, here are some other numbers for comparison.EventAverage QualityNPSSanity & Survival Summit '21–65Palmcone '22–52LessOnline '248.758Manifest '248.768Progress Studies '24–63Manifest '258.233LessOnline '258.537 A little less good than the big conferences, and the NPS is a little better than this year's festival season. From one perspective this is quite impressive; all the listed events are short bursts where interact with a lot of people you like and who are high-energy but rarely get to see; this was a month-long program that left people almost similarly excited. Let's see the distributions for these questions. Mostly everyone rated it pretty high, and shockingly many people are excited to spend another month of their life and pay money to be here again. Crazy.There's a details box here with the title "Return [...] ---Outline:(00:33) Key Outcomes(03:30) How do people feel they grew?(04:04) Pressure(04:37) Wellbeing(05:22) Support Quality(06:51) Bodega Bay Trip(08:04) What was the best thing?(08:31) What was the worst thing(08:45) Takeaways --- First published: November 30th, 2025 Source: https://www.lesswrong.com/posts/6p9jzWCC9aFTdGZuJ/a-statistical-analysis-of-inkhaven --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Dec 1, 2025 • 1min

“Announcing: OpenAI’s Alignment Research Blog” by Naomi Bashkansky

The OpenAI Alignment Research Blog launched today at 11 am PT! With 1 introductory post, and 2 technical posts. Blog: https://alignment.openai.com/ Thread on X: https://x.com/j_asminewang/status/1995569301714325935?t=O5FvxDVP3OqicF-Y4sCtxw&s=19 Speaking purely personally: when I joined the Alignment team at OpenAI in January, I saw there was more safety research than I'd expected. Not to mention interesting thinking on the future of alignment. But that research & thinking didn't really have a place to go, considering it's often too short or informal for the main OpenAI blog, and most OpenAI researchers aren't on LessWrong. I'm hoping the blog is a more informal, lower-friction home than the main blog, and this new avenue of publishing encourages sharing and transparency. --- First published: December 1st, 2025 Source: https://www.lesswrong.com/posts/tK9waFKEW48exfrXC/announcing-openai-s-alignment-research-blog --- Narrated by TYPE III AUDIO.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app