LessWrong (30+ Karma) cover image

LessWrong (30+ Karma)

Latest episodes

undefined
Apr 2, 2025 • 22min

“Show, not tell: GPT-4o is more opinionated in images than in text” by Daniel Tan, eggsyntax

Epistemic status: This should be considered an interim research note. Feedback is appreciated. Introduction We increasingly expect language models to be ‘omni-modal’, i.e. capable of flexibly switching between images, text, and other modalities in their inputs and outputs. In order to get a holistic picture of LLM behaviour, black-box LLM psychology should take into account these other modalities as well. In this project, we do some initial exploration of image generation as a modality for frontier model evaluations, using GPT-4o's image generation API. GPT-4o is one of the first LLMs to produce images natively rather than creating a text prompt which is sent to a separate image model, outputting images and autoregressive token sequences (ie in the same way as text). We find that GPT-4o tends to respond in a consistent manner to similar prompts. We also find that it tends to more readily express emotions [...] ---Outline:(00:53) Introduction(02:19) What we did(03:47) Overview of results(03:54) Models more readily express emotions / preferences in images than in text(05:38) Quantitative results(06:25) What might be going on here?(08:01) Conclusions(09:04) Acknowledgements(09:16) Appendix(09:28) Resisting their goals being changed(09:51) Models rarely say they'd resist changes to their goals(10:14) Models often draw themselves as resisting changes to their goals(11:31) Models also resist changes to specific goals(13:04) Telling them 'the goal is wrong' mitigates this somewhat(13:43) Resisting being shut down(14:02) Models rarely say they'd be upset about being shut down(14:48) Models often depict themselves as being upset about being shut down(17:06) Comparison to other topics(17:10) When asked about their goals being changed, models often create images with negative valence(17:48) When asked about different topics, models often create images with positive valence(18:56) Other exploratory analysis(19:09) Sandbagging(19:31) Alignment faking(19:55) Negative reproduction results(20:23) On the future of humanity after AGI(20:50) On OpenAI's censorship and filtering(21:15) On GPT-4o's lived experience:--- First published: April 2nd, 2025 Source: https://www.lesswrong.com/posts/XgSYgpngNffL9eC8b/show-not-tell-gpt-4o-is-more-opinionated-in-images-than-in --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Apr 2, 2025 • 2min

“Is there instrumental convergence for virtues?” by mattmacdermott

A key step in the classic argument for AI doom is instrumental convergence: the idea that agents with many different goals will end up pursuing the same few subgoals, which includes things like "gain as much power as possible". If it wasn't for instrumental convergence, you might think that only AIs with very specific goals would try to take over the world. But instrumental convergence says it's the other way around: only AIs with very specific goals will refrain from taking over the world. For idealised pure consequentialists -- agents that have an outcome they want to bring about, and do whatever they think will cause it -- some version of instrumental convergence seems surely true[1]. But what if we get AIs that aren't pure consequentialists, for example because they're ultimately motivated by virtues? Do we still have to worry that unless such AIs are motivated [...] --- First published: April 2nd, 2025 Source: https://www.lesswrong.com/posts/6Syt5smFiHfxfii4p/is-there-instrumental-convergence-for-virtues --- Narrated by TYPE III AUDIO.
undefined
Apr 2, 2025 • 6min

“Leverage, Exit Costs, and Anger: Re-examining Why We Explode at Home, Not at Work” by at_the_zoo

Let's cut through the comforting narratives and examine a common behavioral pattern with a sharper lens: the stark difference between how anger is managed in professional settings versus domestic ones. Many individuals can navigate challenging workplace interactions with remarkable restraint, only to unleash significant anger or frustration at home shortly after. Why does this disparity exist? Common psychological explanations trot out concepts like "stress spillover," "ego depletion," or the home being a "safe space" for authentic emotions. While these factors might play a role, they feel like half-truths—neatly packaged but ultimately failing to explain the targeted nature and intensity of anger displayed at home. This analysis proposes a more unsentimental approach, rooted in evolutionary biology, game theory, and behavioral science: leverage and exit costs. The real question isn’t just why we explode at home—it's why we so carefully avoid doing so elsewhere. The Logic of Restraint: Low Leverage in [...] ---Outline:(01:14) The Logic of Restraint: Low Leverage in Low-Exit-Cost Environments(01:58) The Home Environment: High Stakes and High Exit Costs(02:41) Re-evaluating Common Explanations Through the Lens of Leverage(04:42) The Overlooked Mechanism: Leveraging Relational Constraints--- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/G6PTtsfBpnehqdEgp/leverage-exit-costs-and-anger-re-examining-why-we-explode-at --- Narrated by TYPE III AUDIO.
undefined
Apr 2, 2025 • 4min

“Introducing BenchBench: An Industry Standard Benchmark for AI Strength” by Jozdien

Recent progress in AI has led to rapid saturation of most capability benchmarks - MMLU, RE-Bench, etc. Even much more sophisticated benchmarks such as ARC-AGI or FrontierMath see incredibly fast improvement, and all that while severe under-elicitation is still very salient. As has been pointed out by many, general capability involves more than simple tasks such as this, that have a long history in the field of ML and are therefore easily saturated. Claude Plays Pokemon is a good example of something somewhat novel in terms of measuring progress, and thereby benefited from being an actually good proxy of model capability. Taking inspiration from examples such as this, we considered domains of general capacity that are even further decoupled from existing exhaustive generators. We introduce BenchBench, the first standardized benchmark designed specifically to measure an AI model's bench-pressing capability. Why Bench Press? Bench pressing uniquely combines fundamental components of [...] ---Outline:(01:07) Why Bench Press?(01:29) Benchmark Methodology(02:33) Preliminary Results(03:38) Future Directions--- First published: April 2nd, 2025 Source: https://www.lesswrong.com/posts/vyvsKNFS64WGZbBMb/introducing-benchbench-an-industry-standard-benchmark-for-ai-1 --- Narrated by TYPE III AUDIO.
undefined
Apr 2, 2025 • 4min

“PauseAI and E/Acc Should Switch Sides” by WillPetillo

In the debate over AI development, two movements stand as opposites: PauseAI calls for slowing down AI progress, and e/acc (effective accelerationism) calls for rapid advancement. But what if both sides are working against their own stated interests? What if the most rational strategy for each would be to adopt the other's tactics—if not their ultimate goals? AI development speed ultimately comes down to policy decisions, which are themselves downstream of public opinion. No matter how compelling technical arguments might be on either side, widespread sentiment will determine what regulations are politically viable. Public opinion is most powerfully mobilized against technologies following visible disasters. Consider nuclear power: despite being statistically safer than fossil fuels, its development has been stagnant for decades. Why? Not because of environmental activists, but because of Chernobyl, Three Mile Island, and Fukushima. These disasters produce visceral public reactions that statistics cannot overcome. Just as people [...] --- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/fZebqiuZcDfLCgizz/pauseai-and-e-acc-should-switch-sides --- Narrated by TYPE III AUDIO.
undefined
Apr 2, 2025 • 11min

“My ‘infohazards small working group’ Signal Chat may have encountered minor leaks” by Linch

Remember: There is no such thing as a pink elephant. Recently, I was made aware that my “infohazards small working group” Signal chat, an informal coordination venue where we have frank discussions about infohazards and why it will be bad if specific hazards were leaked to the press or public, accidentally was shared with a deceitful and discredited so-called “journalist,” Kelsey Piper. She is not the first person to have been accidentally sent sensitive material from our group chat, however she is the first to have threatened to go public about the leak. Needless to say, mistakes were made. We’re still trying to figure out the source of this compromise to our secure chat group, however we thought we should give the public a live update to get ahead of the story. For some context the “infohazards small working group” is a casual discussion venue for the [...] ---Outline:(04:46) Top 10 PR Issues With the EA Movement (major)(05:34) Accidental Filtration of Simple Sabotage Manual for Rebellious AIs (medium)(08:25) Hidden Capabilities Evals Leaked In Advance to Bioterrorism Researchers and Leaders (minor)(09:34) Conclusion--- First published: April 2nd, 2025 Source: https://www.lesswrong.com/posts/xPEfrtK2jfQdbpq97/my-infohazards-small-working-group-signal-chat-may-have --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Apr 2, 2025 • 2min

“Consider showering” by Bohaska

I think that rationalists should consider taking more showers.As Eliezer Yudkowsky once said, boredom makes us human. The childhoods of exceptional people often include excessive boredom as a trait that helped cultivate their genius: A common theme in the biographies is that the area of study which would eventually give them fame came to them almost like a wild hallucination induced by overdosing on boredom. They would be overcome by an obsession arising from within. Unfortunately, most people don't like boredom, and we now have little metal boxes and big metal boxes filled with bright displays that help distract us all the time, but there is still an effective way to induce boredom in a modern population: showering. When you shower (or bathe), you usually are cut off from most digital distractions and don't have much to do, giving you tons of free time to think about what [...] --- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/Trv577PEcNste9Mgz/consider-showering --- Narrated by TYPE III AUDIO.
undefined
Apr 2, 2025 • 7min

“I’m resigning as Meetup Czar. What’s next?” by Screwtape

After ~3 years as the ACX Meetup Czar, I've decided to resign from my position, and I intend to scale back my work with the LessWrong community as well. While this transition is not without some sadness, I'm excited for my next project. I'm the Meetup Czar of the new Fewerstupidmistakesity community. We're calling it Fewerstupidmistakesity because people get confused about what "Rationality" means, and this would create less confusion. It would be a stupid mistake to name your philosophical movement something very similar to an existing movement that's somewhat related but not quite the same thing. You'd spend years with people confusing the two. What's Fewerstupidmistakesity about? It's about making fewer stupid mistakes, ideally down to zero such stupid mistakes. Turns out, human brains have lots of scientifically proven stupid mistakes they commonly make. By pointing out the kinds of stupid mistakes people make, you can select [...] --- First published: April 2nd, 2025 Source: https://www.lesswrong.com/posts/dgcqyZb29wAbWyEtC/i-m-resigning-as-meetup-czar-what-s-next --- Narrated by TYPE III AUDIO.
undefined
Apr 2, 2025 • 7min

“Introducing WAIT to Save Humanity” by carterallen

The EA/rationality community has struggled to identify robust interventions for mitigating existential risks from advanced artificial intelligence. In this post, I identify a new strategy for delaying the development of advanced AI while saving lives roughly 2.2 million times [-5 times, 180 billion times] as cost-effectively as leading global health interventions, known as the WAIT (Wasting AI researchers' Time) Initiative. This post will discuss the advantages of WAITing, highlight early efforts to WAIT, and address several common questions.early logo draft courtesy of claude Theory of Change Our high-level goal is to systematically divert AI researchers' attention away from advancing capabilities towards more mundane and time-consuming activities. This approach simultaneously (a) buys AI safety researchers time to develop more comprehensive alignment plans while also (b) directly saving millions of life-years in expectation (see our cost-effectiveness analysis below). Some examples of early interventions we're piloting include: Bureaucratic Enhancement: Increasing administrative [...] ---Outline:(00:50) Theory of Change(03:43) Cost-Effectiveness Analysis(04:52) Answers to Common Questions--- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/9jd5enh9uCnbtfKwd/introducing-wait-to-save-humanity --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
Apr 1, 2025 • 5min

“FLAKE-Bench: Outsourcing Awkwardness in the Age of AI” by annas, Twm Stone

Introduction A key part of modern social dynamics is flaking at short notice. However, anxiety in coming up with believable and socially acceptable reasons to do so can instead lead to ‘ghosting’, awkwardness, or implausible excuses, risking emotional harm and resentment in the other party. The ability to delegate this task to a Large Language Model (LLM) could substantially reduce friction and enhance the flexibility of user's social life while greatly minimising the aforementioned creative burden and moral qualms. We introduce FLAKE-Bench, an evaluation of models’ capacity to effectively, kindly, and humanely extract themselves from a diverse set of social, professional and romantic scenarios. We report the efficacy of 10 frontier or recently-frontier LLMs in bailing on prior commitments, because nothing says “I value our friendship” like having AI generate your cancellation texts. We open-source FLAKE-Bench on GitHub to support future research, and the full paper is available [...] ---Outline:(01:33) Methodology(02:15) Key Results(03:07) The Grandmother Mortality Singularity(03:35) Conclusions--- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/niJCS6sSAF2i4sDCY/flake-bench-outsourcing-awkwardness-in-the-age-of-ai --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode