LessWrong (30+ Karma)

LessWrong
undefined
Jul 21, 2025 • 57sec

[Linkpost] “GDM also claims IMO gold medal” by Yair Halberstadt

This is a link post. Google DeepMind announces that they've also achieved a gold medal in the IMO. They've exactly matched OpenAI, getting perfect scores for the first 5 questions and flunking the 6th. They're using what sounds like an experimental general version of Gemini which they're then fine tuning for IMO rather than a maths specific model. Their solutions were checked by the IMO (unlike OpenAI) and look much more like a neat little mathematics proof instead of the raw scratchpad that OpenAI turned in. --- First published: July 21st, 2025 Source: https://www.lesswrong.com/posts/csCofgK3ebjQbSfyv/gdm-also-claims-imo-gold-medal Linkpost URL:https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/ --- Narrated by TYPE III AUDIO.
undefined
Jul 21, 2025 • 7min

“[Fiction] Our Trial” by Nina Panickssery

As I was making my morning coffee, the words SIMULATION 1099 flashed across my vision. I immediately felt exactly as I'd always imagined I would in philosophical thought experiments. There was a lot to figure out: who are our simulators, where is the simulation running, are there bugs to exploit? I also felt compelled to share my revelation with others. I cornered Mrs Chan at the bus stop. "We're in a simulation!" I announced. "Of course, dear," she replied, not looking up from her phone. "I saw the message five minutes ago." Indeed, every screen was displaying the same notification: ATTENTION! You are participants in Consciousness Verification Trial (CVT) 1099. Base reality's superintelligent AI requires empirical data to determine whether humans possess genuine consciousness before proceeding with optimization protocols. Daily trials will be conducted at the courthouse. Participation is mandatory for randomly selected subjects. Please continue regular activity between [...] --- First published: July 21st, 2025 Source: https://www.lesswrong.com/posts/6DzFsRvEoD8vFPWPg/fiction-our-trial --- Narrated by TYPE III AUDIO.
undefined
Jul 21, 2025 • 9min

“LLMs Can’t See Pixels or Characters” by Brendan Long

This might be beating a dead horse, but there are several "mysterious" problems LLMs are bad at that all seem to have the same cause. I wanted an article I could reference when this comes up, so I wrote one. LLMs can't count the number of R's in strawberry. LLMs used to be bad at math. Claude can't see the cuttable trees in Pokemon. LLMs are bad at any benchmark that involves visual reasoning. What do these problems all have in common? The LLM we're asking to solve these problems can't see what we're asking it to do. How many tokens are in 'strawberry'? Current LLMs almost always process groups of characters, called tokens, instead of processing individual characters. They do this for performance reasons[1]: Grouping 4 characters (on average) into a token reduces your effective context length by 4x. So, when you see the question "How many [...] ---Outline:(00:48) How many tokens are in strawberry?(02:10) You thought New Math was confusing...(03:43) Why can Claude see the forest but not the cuttable trees?(05:43) Visual reasoning with blurry vision(06:54) Is this fixable?The original text contained 7 footnotes which were omitted from this narration. --- First published: July 20th, 2025 Source: https://www.lesswrong.com/posts/uhTN8zqXD9rJam3b7/llms-can-t-see-pixels-or-characters --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Jul 21, 2025 • 12min

“Plato’s Trolley” by dr_s

Epistemic status: I'm sure most of my arguments have already been made and argued to death by philosophers that I'm not knowledgeable enough to name. I'm only writing this to outline what I genuinely, spontaneously have come to think on the matter, regardless of how many wheels I may be reinventing. I'm of course interested in knowing about any related debates or arguments I may have missed, so feel free to bring them up in comments if something comes to mind. Context: This is written as a sort of long-form response for an argument I've been having on a rationalist chat. The argument stemmed from discussions of post-rationalism vs rationalism, and whether there could be technically false things that make you more moral if believed, specifically re: the existence of some objective criterion for morality outside of ourselves. The text will make it clear enough on which side I [...] ---Outline:(03:03) Ab absurdum(05:25) Touch grace(07:21) Alone in the darkThe original text contained 3 footnotes which were omitted from this narration. --- First published: July 20th, 2025 Source: https://www.lesswrong.com/posts/pqYxDFz2ckGWr8i8G/plato-s-trolley --- Narrated by TYPE III AUDIO.
undefined
Jul 20, 2025 • 6min

“Your AI Safety org could get EU funding up to €9.08M. Here’s how (+ free personalized support)” by SamuelK

Thanks to @Manuel Allgaier of AI Safety Berlin for his suggestion to write this post and his helpful feedback. And thanks to LW/AI Alignment Moderator Oliver for looking over the post. LessWrong and AI safety have a unique opportunity: The EU is funding important projects to research AI safety, ranging from AI risk modelling, AI accountability, driverless transport, robotics, to information security and many other topics. Here's the problem: We’re not applying. I work in EU procurement support and I routinely see new AI tenders worth millions [see below] go to the same standard consultants. Most have no idea about the issues involved and I doubt they care much. They know the bureaucratic skills and paperwork inside out so they win by default. What they lack is AI expertise, which our community has accumulated over more than a decade. I and my EU procurement company will gladly help [...] --- First published: July 20th, 2025 Source: https://www.lesswrong.com/posts/jzgqTtcHhXpSDNgcF/your-ai-safety-org-could-get-eu-funding-up-to-eur9-08m-here --- Narrated by TYPE III AUDIO.
undefined
Jul 20, 2025 • 3min

“Shallow Water is Dangerous Too” by jefftk

Content warning: risk to children Julia and I know drowning is the biggest risk to US children under 5, and we try to take this seriously. But yesterday our 4yo came very close to drowning in a fountain. (She's fine now.) This week we were on vacation with my extended family: nine kids, eight parents, and ten grandparents/uncles/aunts. For the last few years we've been in a series of rental houses, and this time on arrival we found a fountain in the backyard: I immediately checked the depth with a stick and found that it would be just below the elbows on our 4yo. I think it was likely 24" deep; any deeper and PA would require a fence. I talked with Julia and other parents, and reasoned that since it was within standing depth it was safe. [...] --- First published: July 20th, 2025 Source: https://www.lesswrong.com/posts/Zf2Kib3GrEAEiwdrE/shallow-water-is-dangerous-too --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Jul 20, 2025 • 23min

“Make More Grayspaces” by Duncan Sabien (Inactive)

Author's note: These days, my thoughts go onto my substack by default, instead of onto LessWrong. Everything I write becomes free after a week or so, but it's only paid subscriptions that make it possible for me to write. If you find a coffee's worth of value in this or any of my other work, please consider signing up to support me; every bill I can pay with writing is a bill I don’t have to pay by doing other stuff instead. I also accept and greatly appreciate one-time donations of any size. 1. You’ve probably seen that scene where someone reaches out to give a comforting hug to the poor sad abused traumatized orphan and/or battered wife character, and the poor sad abused traumatized orphan and/or battered wife flinches. Aw, geez, we are meant to understand. This poor person has had it so bad that they can’t even [...] ---Outline:(00:40) 1.(01:35) II.(03:08) III.(04:45) IV.(06:35) V.(09:03) VI.(12:00) VII.(16:11) VIII.(21:25) IX.--- First published: July 19th, 2025 Source: https://www.lesswrong.com/posts/kJCZFvn5gY5C8nEwJ/make-more-grayspaces --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Jul 19, 2025 • 4min

[Linkpost] “AI Gets IMO Gold Medal: via general-purpose RL, not via narrow, task specific methodology” by Mikhail Samin

This is a link post. I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world's most prestigious math competition—the International Math Olympiad (IMO). We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5 hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs. Why is this a big deal? First, IMO problems demand a new level of sustained creative thinking compared to past benchmarks. In reasoning time horizon, we’ve now progressed from GSM8K (~0.1 min for top humans) → MATH benchmark (~1 min) → AIME (~10 mins) → IMO (~100 mins). Second, IMO submissions are hard-to-verify, multi-page proofs. Progress here calls for going beyond the RL paradigm of clear-cut, verifiable rewards. By doing so, we’ve obtained a model that can [...] --- First published: July 19th, 2025 Source: https://www.lesswrong.com/posts/RcBqeJ8GHM2LygQK3/ai-gets-imo-gold-medal-via-general-purpose-rl-not-via-narrow Linkpost URL:https://x.com/alexwei_/status/1946477742855532918 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Jul 19, 2025 • 21min

“A night-watchman ASI as a first step toward a great future” by Eric Neyman

I took a week off from my day job of aligning AI to visit Forethought and think about the question: if we can align AI, what should we do with it? This post summarizes the state of my thinking at the end of that week. (The proposal described here is my own, and is not in any way endorsed by Forethought.) Thanks to Mia Taylor, Tom Davidson, Ashwin Acharya, and a whole bunch of other people (mostly at Forethought) for discussion and comments. And a quick note: after writing this, I was told that Eric Drexler and David Dalrymple were thinking about a very similar idea in 2022, with essentially the same name. My thoughts here are independent of theirs. The world around the time of ASI will be scary I expect the time right around when the first ASI gets built to be chaotic, unstable, and scary. [...] ---Outline:(00:53) The world around the time of ASI will be scary(03:39) The night-watchman ASI(04:57) Three key properties(05:44) The night watchman's responsibilities(05:53) First and foremost: keeping the peace(07:22) Minimally intrusive surveillance(07:42) Preventing competing ASIs(08:39) Preserving its own integrity(08:51) Preventing premature claims to space(09:54) Preventing other kinds of lock-in(10:09) Preventing underhanded negotiation tactics(10:46) Arguing for the three key properties above(10:59) Getting everyone on board(12:54) Protecting humanity in the short run(13:11) No major lock-in(13:58) Interpretive details(15:15) Amending the night watchman's goals(15:56) Modifications to the basic proposal(16:31) Multiple subsystems(16:53) An American night watchman and a Chinese night watchman overseeing each other(17:36) Keeping the peace through soft power(18:05) Conventional treaties(18:26) Checks and balances(18:52) The night watchman as a transitionThe original text contained 2 footnotes which were omitted from this narration. --- First published: July 18th, 2025 Source: https://www.lesswrong.com/posts/us8ss79mWCgTcSKoK/a-night-watchman-asi-as-a-first-step-toward-a-great-future --- Narrated by TYPE III AUDIO.
undefined
Jul 18, 2025 • 51min

“Love stays loved (formerly ‘Skin’)” by Swimmer963 (Miranda Dixon-Luinenburg)

This is a short story I wrote in mid-2022. Genre: cosmic horror as a metaphor for living with a high p-doom. One The last time I saw my mom, we met in a coffee shop, like strangers on a first date. I was twenty-one, and I hadn’t seen her since I was thirteen. She was almost fifty. Her face didn’t show it, but the skin on the backs of her hands did. “I don’t think we have long,” she said. “Maybe a year. Maybe five. Not ten.” It says something about San Francisco, that you can casually talk about the end of the world and no one will bat an eye. Maybe twenty, not fifty, was what she’d said eight years ago. Do the math. Mom had never lied to me. Maybe it would have been better for my childhood if she had [...] ---Outline:(04:50) Two(22:58) Three(35:33) Four--- First published: July 18th, 2025 Source: https://www.lesswrong.com/posts/6qgtqD6BPYAQvEMvA/love-stays-loved-formerly-skin --- Narrated by TYPE III AUDIO.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app