LessWrong (30+ Karma)

LessWrong
undefined
Jul 17, 2025 • 5min

“On being sort of back and sort of new here” by Loki zen

So I'm "back" on Less Wrong, which is to say that I was surprised to find that I already had an account and had even, apparently, commented on some things. 11 years ago. Which feels like half a lifetime ago. More than half a my-adult-lifetime-so-far. A career change and a whole lot of changes in the world ago. I've got a funny relationship to this whole community I guess. I've been 'adj' since forever but I've never been a rat and never, until really quite recently, had much of an interest in the core rat subjects. I'm not even a STEM person, before or after the career change. (I was in creative arts - now it's evidence-based medicine.) I just reached the one-year anniversery of the person I met on rat-adj tumblr moving accross the ocean for me, for love. So there was always going to be an [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: July 16th, 2025 Source: https://www.lesswrong.com/posts/4fnRkztaoRiQhrehh/on-being-sort-of-back-and-sort-of-new-here --- Narrated by TYPE III AUDIO.
undefined
Jul 17, 2025 • 10min

“Comment on ‘Four Layers of Intellectual Conversation’” by Zack_M_Davis

One of the most underrated essays in the post-Sequences era of Eliezer Yudkowsky's corpus is "Four Layers of Intellectual Conversation". The degree to which this piece of wisdom has fallen into tragic neglect in these dark ages of the 2020s may be related to its ephemeral form of publication: it was originally posted as a status update on Yudkowsky's Facebook account on 20 December 2016 and subsequently mirrored on Alyssa Vance's The Rationalist Conspiracy blog, which has since gone offline. (The first link in this paragraph is to an archive of the Rationalist Conspiracy post.) In the post, Yudkowsky argues that a structure of intellectual value necessarily requires four layers of conversation: thesis, critique, response, and counter-response (which Yudkowsky indexes from zero as layers 0, 1, 2, and 3). The importance of critique is already widespread common wisdom: if a thesis is advanced and promulgated without any [...] --- First published: July 17th, 2025 Source: https://www.lesswrong.com/posts/yr4pSJweTnF6QDHHC/comment-on-four-layers-of-intellectual-conversation --- Narrated by TYPE III AUDIO.
undefined
Jul 17, 2025 • 18min

“Selective Generalization: Improving Capabilities While Maintaining Alignment” by ariana_azarbal, Matthew A. Clarke, jorio, Cailley Factor, cloud

Audio note: this article contains 53 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. Ariana Azarbal*, Matthew A. Clarke*, Jorio Cocola*, Cailley Factor*, and Alex Cloud. *Equal Contribution. This work was produced as part of the SPAR Spring 2025 cohort. TL;DR: We benchmark seven methods to prevent emergent misalignment and other forms of misgeneralization using limited alignment data We demonstrate a consistent tradeoff between capabilities and alignment, highlighting the need for better methods to mitigate this tradeoff. Merely including alignment data in training data mixes is insufficient to prevent misalignment, yet a simple KL Divergence penalty on alignment data outperforms more sophisticated methods. Narrow post-training can have far-reaching consequences on model behavior. Some are desirable, whereas others may be harmful. We explore methods enabling selective generalization. Introduction Training to improve capabilities [...] ---Outline:(01:27) Introduction(03:36) Our Experiments(04:23) Formalizing the Objective(05:20) Can we solve the problem just by training on our limited alignment data?(05:55) Seven methods for selective generalization(07:06) Plotting the capability-alignment tradeoff(07:26) Preventing Emergent Misalignment(10:33) Preventing Sycophantic Generalization from an Underspecified Math Dataset(14:02) Limitations(14:40) Takeaways(15:49) Related Work(17:00) Acknowledgements(17:22) AppendixThe original text contained 3 footnotes which were omitted from this narration. --- First published: July 16th, 2025 Source: https://www.lesswrong.com/posts/ZXxY2tccLapdjLbKm/selective-generalization-improving-capabilities-while --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Jul 17, 2025 • 4min

“Bodydouble / Thinking Assistant matchmaking” by Raemon

I keep meaning to write up a more substantive followup to the Hire (or Become) a Thinking Assistant post. But, I think this still basically the biggest productivity effect size I know of, and more people should be doing it, and seemed worth writing the simple version of the post. I tried out 4 assistants who messaged me after the previous post, and ultimately settled into a rhythm of using one who had availability that matched my needs best. I think all of them were net helpful. So far they've all been remote contractors that I share my screen with on zoom. Previously I had worked with one in person which also went well although was a bit harder to schedule with. Most of what I do is just have them check in on me every 10 minutes and make sure I haven't gotten off track. Sometimes, I [...] --- First published: July 16th, 2025 Source: https://www.lesswrong.com/posts/FtxrC2xtTF7wu5k6E/bodydouble-thinking-assistant-matchmaking --- Narrated by TYPE III AUDIO.
undefined
Jul 16, 2025 • 27min

“Kimi K2” by Zvi

While most people focused on Grok, there was another model release that got uniformly high praise: Kimi K2 from Moonshot.ai. It's definitely a good model, sir, especially for a cheap-to-run open model. It is plausibly the best model for creative writing, outright. It is refreshingly different, and opens up various doors through which one can play. And it proves the value of its new architecture. It is not an overall SoTA frontier model, but it is not trying to be one. The reasoning model version is coming. Price that in now. Introducing Kimi K2 Introducing the latest model that matters, Kimi K2. Hello, Kimi K2! Open-Source Agentic Model! 1T total / 32B active MoE model SOTA on SWE Bench Verified, Tau2 & AceBench among open models Strong in coding and agentic tasks Multimodal & thought-mode not supported for [...] ---Outline:(00:45) Introducing Kimi K2(02:24) Having a Moment(03:29) Another Nimble Effort(05:37) On Your Marks(07:48) Everybody Loves Kimi, Baby(13:09) Okay, Not Quite Everyone(14:06) Everyone Uses Kimi, Baby(15:42) Write Like A Human(25:32) What Happens Next--- First published: July 16th, 2025 Source: https://www.lesswrong.com/posts/qsyj37hwh9N8kcopJ/kimi-k2 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Jul 16, 2025 • 1h 14min

“Grok 4 Various Things” by Zvi

Yesterday I covered a few rather important Grok incidents. Today is all about Grok 4's capabilities and features. Is it a good model, sir? It's not a great model. It's not the smartest or best model. But it's at least an okay model. Probably a ‘good’ model. Talking a Big Game xAI was given a goal. They were to release something that could, ideally with a straight face, be called ‘the world's smartest artificial intelligence.’ On that level, well, congratulations to Elon Musk and xAI. You have successfully found benchmarks that enable you to make that claim. xAI: We just unveiled Grok 4, the world's smartest artificial intelligence. Grok 4 outperforms all other models on the ARC-AGI benchmark, scoring 15.9% – nearly double that of the next best model – and establishing itself as the most intelligent AI to date. [...] ---Outline:(00:30) Talking a Big Game(03:57) Gotta Go Fast(04:38) On Your Marks(07:21) Some Key Facts About Grok 4(09:44) SuperGrok Heavy, Man(11:43) Blunt Instrument(13:40) Easiest Jailbreak Ever(15:49) ARC-AGI-2(17:08) Gaming the Benchmarks(21:23) Why Care About Benchmarks?(23:29) Other People's Benchmarks(32:14) Impressed Reactions to Grok(37:47) Coding Specific Feedback(40:00) Unimpressed Reactions to Grok(46:39) Tyler Cowen Is Not Impressed(48:26) You Had One Job(49:38) Reactions to Reactions Overall(51:36) The MechaHitler Lives On(53:03) But Wait, There's More(01:00:12) Sixth Law Of Human Stupidity Strikes Again(01:05:09) There I Fixed It(01:06:57) What Is Grok 4 And What Should We Make Of It?--- First published: July 15th, 2025 Source: https://www.lesswrong.com/posts/ciuKn9aktXxJ2K6Rc/grok-4-various-things --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Jul 15, 2025 • 2min

“Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety” by Tomek Korbak, Mikita Balesni, Vlad Mikulik, Rohin Shah

Twitter | Paper PDF Seven years ago, OpenAI five had just been released, and many people in the AI safety community expected AIs to be opaque RL agents. Luckily, we ended up with reasoning models that speak their thoughts clearly enough for us to follow along (most of the time). In a new multi-org position paper, we argue that we should try to preserve this level of reasoning transparency and turn chain of thought monitorability into a systematic AI safety agenda. This is a measure that improves safety in the medium term, and it might not scale to superintelligence even if somehow a superintelligent AI still does its reasoning in English. We hope that extending the time when chains of thought are monitorable will help us do more science on capable models, practice more safety techniques "at an easier difficulty", and allow us to extract more useful work from [...] --- First published: July 15th, 2025 Source: https://www.lesswrong.com/posts/7xneDbsgj6yJDJMjK/chain-of-thought-monitorability-a-new-and-fragile --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Jul 15, 2025 • 2h 11min

“Do confident short timelines make sense?” by TsviBT, abramdemski

TsviBT Tsvi's context Some context: My personal context is that I care about decreasing existential risk, and I think that the broad distribution of efforts put forward by X-deriskers fairly strongly overemphasizes plans that help if AGI is coming in <10 years, at the expense of plans that help if AGI takes longer. So I want to argue that AGI isn't extremely likely to come in <10 years. I've argued against some intuitions behind AGI-soon in Views on when AGI comes and on strategy to reduce existential risk. Abram, IIUC, largely agrees with the picture painted in AI 2027: https://ai-2027.com/ Abram and I have discussed this occasionally, and recently recorded a video call. I messed up my recording, sorry--so the last third of the conversation is cut off, and the beginning is cut off. Here's a link to the first point at which [...] ---Outline:(00:17) Tsvis context(06:52) Background Context:(08:13) A Naive Argument:(08:33) Argument 1(10:43) Why continued progress seems probable to me anyway:(13:37) The Deductive Closure:(14:32) The Inductive Closure:(15:43) Fundamental Limits of LLMs?(19:25) The Whack-A-Mole Argument(23:15) Generalization, Size, & Training(26:42) Creativity & Originariness(32:07) Some responses(33:15) Automating AGI research(35:03) Whence confidence?(36:35) Other points(48:29) Timeline Split?(52:48) Line Go Up?(01:15:16) Some Responses(01:15:27) Memers gonna meme(01:15:44) Right paradigm? Wrong question.(01:18:14) The timescale characters of bioevolutionary design vs. DL research(01:20:33) AGI LP25(01:21:31) come on people, its \[Current Paradigm\] and we still dont have AGI??(01:23:19) Rapid disemhorsepowerment(01:25:41) Miscellaneous responses(01:28:55) Big and hard(01:31:03) Intermission(01:31:19) Remarks on gippity thinkity(01:40:24) Assorted replies as I read:(01:40:28) Paradigm(01:41:33) Bio-evo vs DL(01:42:18) AGI LP25(01:46:30) Rapid disemhorsepowerment(01:47:08) Miscellaneous(01:48:42) Magenta Frontier(01:54:16) Considered Reply(01:54:38) Point of Departure(02:00:25) Tsvis closing remarks(02:04:16) Abrams Closing Thoughts--- First published: July 15th, 2025 Source: https://www.lesswrong.com/posts/5tqFT3bcTekvico4d/do-confident-short-timelines-make-sense --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Jul 15, 2025 • 4min

[Linkpost] “LLM-induced craziness and base rates” by Kaj_Sotala

This is a link post. One billion people use chatbots on a weekly basis. That's 1 in every 8 people on Earth. How many people have mental health issues that cause them to develop religious delusions of grandeur? We don’t have much to go on here, so let's do a very very rough guess with very flimsy data. This study says “approximately 25%-39% of patients with schizophrenia and 15%-22% of those with mania / bipolar have religious delusions.” 40 million people have bipolar disorder and 24 million have schizophrenia, so anywhere from 12-18 million people are especially susceptible to religious delusions. There are probably other disorders that cause religious delusions I’m missing, so I’ll stick to 18 million people. 8 billion people divided by 18 million equals 444, so 1 in every 444 people are highly prone to religious delusions. [...] If one billion people are using chatbots weekly, and [...] --- First published: July 14th, 2025 Source: https://www.lesswrong.com/posts/WSMvEMuiZoK5Evggq/llm-induced-craziness-and-base-rates Linkpost URL:https://andymasley.substack.com/p/stories-of-ai-turning-users-delusional --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Jul 14, 2025 • 3min

[Linkpost] “Bernie Sanders (I-VT) mentions AI loss of control risk in Gizmodo interview” by Matrice Jacobine

This is a link post. Most of the interview is about technological unemployment and AI-driven inequality, but here is the part where Sanders talks about loss of control risk (which Gizmodo put as title of their article): Gizmodo: Have you spoken to tech CEOs or experts about this? Sen. Sanders: I have talked to CEOs. Funny that you mention it. I won’t mention his name, but I’ve just gotten off the phone with one of the leading experts in the world on artificial intelligence, two hours ago. Gizmodo: Did they share your concern? Did they offer any solutions? Sen. Sanders: They’re all different. He did. Somebody I talked to yesterday, not so much. There are differences of opinion. On several major issues. Number one: What impact will AI have on the economy? Some people say there will be massive job losses. I tend to agree with them. Other people say [...] --- First published: July 14th, 2025 Source: https://www.lesswrong.com/posts/qPvkCdcdnTiJe9xkn/bernie-sanders-i-vt-mentions-ai-loss-of-control-risk-in Linkpost URL:https://gizmodo.com/bernie-sanders-reveals-the-ai-doomsday-scenario-that-worries-top-experts-2000628611 --- Narrated by TYPE III AUDIO.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app