LessWrong (30+ Karma) cover image

LessWrong (30+ Karma)

Latest episodes

undefined
Apr 2, 2025 • 4min

“Introducing BenchBench: An Industry Standard Benchmark for AI Strength” by Jozdien

Recent progress in AI has led to rapid saturation of most capability benchmarks - MMLU, RE-Bench, etc. Even much more sophisticated benchmarks such as ARC-AGI or FrontierMath see incredibly fast improvement, and all that while severe under-elicitation is still very salient. As has been pointed out by many, general capability involves more than simple tasks such as this, that have a long history in the field of ML and are therefore easily saturated. Claude Plays Pokemon is a good example of something somewhat novel in terms of measuring progress, and thereby benefited from being an actually good proxy of model capability. Taking inspiration from examples such as this, we considered domains of general capacity that are even further decoupled from existing exhaustive generators. We introduce BenchBench, the first standardized benchmark designed specifically to measure an AI model's bench-pressing capability. Why Bench Press? Bench pressing uniquely combines fundamental components of [...] ---Outline:(01:07) Why Bench Press?(01:29) Benchmark Methodology(02:33) Preliminary Results(03:38) Future Directions--- First published: April 2nd, 2025 Source: https://www.lesswrong.com/posts/vyvsKNFS64WGZbBMb/introducing-benchbench-an-industry-standard-benchmark-for-ai-1 --- Narrated by TYPE III AUDIO.
undefined
Apr 2, 2025 • 4min

“PauseAI and E/Acc Should Switch Sides” by WillPetillo

In the debate over AI development, two movements stand as opposites: PauseAI calls for slowing down AI progress, and e/acc (effective accelerationism) calls for rapid advancement. But what if both sides are working against their own stated interests? What if the most rational strategy for each would be to adopt the other's tactics—if not their ultimate goals? AI development speed ultimately comes down to policy decisions, which are themselves downstream of public opinion. No matter how compelling technical arguments might be on either side, widespread sentiment will determine what regulations are politically viable. Public opinion is most powerfully mobilized against technologies following visible disasters. Consider nuclear power: despite being statistically safer than fossil fuels, its development has been stagnant for decades. Why? Not because of environmental activists, but because of Chernobyl, Three Mile Island, and Fukushima. These disasters produce visceral public reactions that statistics cannot overcome. Just as people [...] --- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/fZebqiuZcDfLCgizz/pauseai-and-e-acc-should-switch-sides --- Narrated by TYPE III AUDIO.
undefined
Apr 2, 2025 • 11min

“My ‘infohazards small working group’ Signal Chat may have encountered minor leaks” by Linch

Remember: There is no such thing as a pink elephant. Recently, I was made aware that my “infohazards small working group” Signal chat, an informal coordination venue where we have frank discussions about infohazards and why it will be bad if specific hazards were leaked to the press or public, accidentally was shared with a deceitful and discredited so-called “journalist,” Kelsey Piper. She is not the first person to have been accidentally sent sensitive material from our group chat, however she is the first to have threatened to go public about the leak. Needless to say, mistakes were made. We’re still trying to figure out the source of this compromise to our secure chat group, however we thought we should give the public a live update to get ahead of the story. For some context the “infohazards small working group” is a casual discussion venue for the [...] ---Outline:(04:46) Top 10 PR Issues With the EA Movement (major)(05:34) Accidental Filtration of Simple Sabotage Manual for Rebellious AIs (medium)(08:25) Hidden Capabilities Evals Leaked In Advance to Bioterrorism Researchers and Leaders (minor)(09:34) Conclusion--- First published: April 2nd, 2025 Source: https://www.lesswrong.com/posts/xPEfrtK2jfQdbpq97/my-infohazards-small-working-group-signal-chat-may-have --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Apr 2, 2025 • 2min

“Consider showering” by Bohaska

I think that rationalists should consider taking more showers.As Eliezer Yudkowsky once said, boredom makes us human. The childhoods of exceptional people often include excessive boredom as a trait that helped cultivate their genius: A common theme in the biographies is that the area of study which would eventually give them fame came to them almost like a wild hallucination induced by overdosing on boredom. They would be overcome by an obsession arising from within. Unfortunately, most people don't like boredom, and we now have little metal boxes and big metal boxes filled with bright displays that help distract us all the time, but there is still an effective way to induce boredom in a modern population: showering. When you shower (or bathe), you usually are cut off from most digital distractions and don't have much to do, giving you tons of free time to think about what [...] --- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/Trv577PEcNste9Mgz/consider-showering --- Narrated by TYPE III AUDIO.
undefined
Apr 2, 2025 • 7min

“I’m resigning as Meetup Czar. What’s next?” by Screwtape

After ~3 years as the ACX Meetup Czar, I've decided to resign from my position, and I intend to scale back my work with the LessWrong community as well. While this transition is not without some sadness, I'm excited for my next project. I'm the Meetup Czar of the new Fewerstupidmistakesity community. We're calling it Fewerstupidmistakesity because people get confused about what "Rationality" means, and this would create less confusion. It would be a stupid mistake to name your philosophical movement something very similar to an existing movement that's somewhat related but not quite the same thing. You'd spend years with people confusing the two. What's Fewerstupidmistakesity about? It's about making fewer stupid mistakes, ideally down to zero such stupid mistakes. Turns out, human brains have lots of scientifically proven stupid mistakes they commonly make. By pointing out the kinds of stupid mistakes people make, you can select [...] --- First published: April 2nd, 2025 Source: https://www.lesswrong.com/posts/dgcqyZb29wAbWyEtC/i-m-resigning-as-meetup-czar-what-s-next --- Narrated by TYPE III AUDIO.
undefined
Apr 2, 2025 • 7min

“Introducing WAIT to Save Humanity” by carterallen

The EA/rationality community has struggled to identify robust interventions for mitigating existential risks from advanced artificial intelligence. In this post, I identify a new strategy for delaying the development of advanced AI while saving lives roughly 2.2 million times [-5 times, 180 billion times] as cost-effectively as leading global health interventions, known as the WAIT (Wasting AI researchers' Time) Initiative. This post will discuss the advantages of WAITing, highlight early efforts to WAIT, and address several common questions.early logo draft courtesy of claude Theory of Change Our high-level goal is to systematically divert AI researchers' attention away from advancing capabilities towards more mundane and time-consuming activities. This approach simultaneously (a) buys AI safety researchers time to develop more comprehensive alignment plans while also (b) directly saving millions of life-years in expectation (see our cost-effectiveness analysis below). Some examples of early interventions we're piloting include: Bureaucratic Enhancement: Increasing administrative [...] ---Outline:(00:50) Theory of Change(03:43) Cost-Effectiveness Analysis(04:52) Answers to Common Questions--- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/9jd5enh9uCnbtfKwd/introducing-wait-to-save-humanity --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Apr 1, 2025 • 5min

“FLAKE-Bench: Outsourcing Awkwardness in the Age of AI” by annas, Twm Stone

Introduction A key part of modern social dynamics is flaking at short notice. However, anxiety in coming up with believable and socially acceptable reasons to do so can instead lead to ‘ghosting’, awkwardness, or implausible excuses, risking emotional harm and resentment in the other party. The ability to delegate this task to a Large Language Model (LLM) could substantially reduce friction and enhance the flexibility of user's social life while greatly minimising the aforementioned creative burden and moral qualms. We introduce FLAKE-Bench, an evaluation of models’ capacity to effectively, kindly, and humanely extract themselves from a diverse set of social, professional and romantic scenarios. We report the efficacy of 10 frontier or recently-frontier LLMs in bailing on prior commitments, because nothing says “I value our friendship” like having AI generate your cancellation texts. We open-source FLAKE-Bench on GitHub to support future research, and the full paper is available [...] ---Outline:(01:33) Methodology(02:15) Key Results(03:07) The Grandmother Mortality Singularity(03:35) Conclusions--- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/niJCS6sSAF2i4sDCY/flake-bench-outsourcing-awkwardness-in-the-age-of-ai --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Apr 1, 2025 • 44min

“Housing Roundup #11” by Zvi

The book of March 2025 was Abundance. Ezra Klein and Derek Thompson are making a noble attempt to highlight the importance of solving America's housing crisis the only way it can be solved: Building houses in places people want to live, via repealing the rules that make this impossible. They also talk about green energy abundance, and other places besides. There may be a review coming. Until then, it seems high time for the latest housing roundup, which as a reminder all take place in the possible timeline where AI fails to be transformative any time soon. Federal YIMBY The incoming administration issued an executive order calling for ‘emergency price relief’‘ including pursuing appropriate actions to: Lower the cost of housing and expand housing supply’ and then a grab bag of everything else. It's great to see mention of expanding housing supply, but I don’t [...] ---Outline:(00:44) Federal YIMBY(02:37) Rent Control(02:50) Rent Passthrough(03:36) Yes, Construction Lowers Rents on Existing Buildings(07:11) If We Wanted To, We Would(08:51) Aesthetics Are a Public Good(10:44) Urban Planners Are Wrong About Everything(13:02) Blackstone(15:49) Rent Pricing Software(18:26) Immigration Versus NIMBY(19:37) Preferences(20:28) The True Costs(23:11) Minimum Viable Product(26:01) Like a Good Neighbor(27:12) Flight to Quality(28:40) Housing for Families(31:09) Group House(32:41) No One Will Have the Endurance To Collect on His Insurance(34:49) Cambridge(35:36) Denver(35:44) Minneapolis(36:23) New York City(39:26) San Francisco(41:45) Silicon Valley(42:41) Texas(43:31) Other Considerations--- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/7qGdgNKndPk3XuzJq/housing-roundup-11 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Apr 1, 2025 • 2min

“You will crash your car in front of my house within the next week” by Richard Korzekwa

I'm not writing this to alarm anyone, but it would be irresponsible not to report on something this important. On current trends, every car will be crashed in front of my house within the next week. Here's the data: Until today, only two cars had crashed in front of my house, several months apart, during the 15 months I have lived here. But a few hours ago it happened again, mere weeks from the previous crash. This graph may look harmless enough, but now consider the frequency of crashes this implies over time: The car crash singularity will occur in the early morning hours of Monday, April 7. As crash frequency approaches infinity, every car will be involved. You might be thinking that the same car could be involved in multiple crashes. This is true! But the same car can only withstand a finite number of crashes before it [...] --- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/FjPWbLdoP4PLDivYT/you-will-crash-your-car-in-front-of-my-house-within-the-next --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Apr 1, 2025 • 9min

“VDT: a solution to decision theory” by L Rudolf L

Introduction Decision theory is about how to behave rationally under conditions of uncertainty, especially if this uncertainty involves being acausally blackmailed and/or gaslit by alien superintelligent basilisks. Decision theory has found numerous practical applications, including proving the existence of God and generating endless LessWrong comments since the beginning of time. However, despite the apparent simplicity of "just choose the best action", no comprehensive decision theory that resolves all decision theory dilemmas has yet been formalized. This paper at long last resolves this dilemma, by introducing a new decision theory: VDT. Decision theory problems and existing theories Some common existing decision theories are: Causal Decision Theory (CDT): select the action that *causes* the best outcome. Evidential Decision Theory (EDT): select the action that you would be happiest to learn that you had taken. Functional Decision Theory (FDT): select the action output by the function such that if you take [...] ---Outline:(00:53) Decision theory problems and existing theories(05:37) Defining VDT(06:34) Experimental results(07:48) Conclusion--- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/LcjuHNxubQqCry9tT/vdt-a-solution-to-decision-theory --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode