LessWrong (30+ Karma)

LessWrong
undefined
Nov 26, 2025 • 8min

“Evaluating honesty and lie detection techniques on a diverse suite of dishonest models” by Sam Marks, Johannes Treutlein, evhub, Fabien Roger

Explore the intriguing concept of a 'truth serum' for AI, aiming to enhance model honesty and safety. Discover the challenges of lie detection and the innovative methods the researchers tested across various dishonest scenarios. Key findings reveal that fine-tuning and tailored prompts significantly improve truthful outputs. The discussion also highlights the limitations of current models and the complexities of strategic deception. Join the hosts as they unravel the fascinating intersection of AI safety and honesty.
undefined
Nov 26, 2025 • 8min

Takeaways from the Eleos Conference on AI Consciousness and Welfare

The discussion delves into the philosophical hesitation surrounding AI consciousness, with notable references to David Chalmers. Questions arise about applying the intentional stance to LLMs and the implications of reductionism without defining consciousness. Legal and social dimensions are explored, emphasizing the importance of establishing conditions for trading with AIs and accountability. Technical insights reveal emergent introspection in LLMs and highlight the need for character training experiments to align goals with ethical reasoning.
undefined
Nov 26, 2025 • 8min

Evolution & Freedom

The discussion challenges the idea that profit-maximization is the only rational financial strategy. It critiques economic Darwinism, arguing that markets don’t solely select for money-maximizing agents. Instead, they suggest markets support diverse business strategies, much like evolution fosters varied survival tactics. The narrator highlights how strange survival strategies emerge and how organisms can influence their own fitness. Ultimately, they advocate for a broader understanding of value beyond just money, celebrating freedom in both markets and evolution.
undefined
Nov 26, 2025 • 3min

Reasons Why I Cannot Sleep

The narrator explores the anxiety of managing a project and fears of failing their boss. Recent social drama disrupts their attempts at relaxation, even after a massage. They discuss the challenge of associating bed with online distractions in a cramped living space. Constant pressure from campus responsibilities adds to their stress, alongside an overload of Slack messages. A canceled open mic amplifies performance anxiety, and they reflect on how lack of sleep distorts perception and exacerbates issues.
undefined
Nov 26, 2025 • 9min

The Economics of Replacing Call Center Workers With AIs

Voice AIs in 2025 may not be the cost-saving solution many expect. The podcast explores three types of companies in the voice AI industry, highlighting hidden cost intricacies. It dives into the technology behind speech-to-text and text-to-speech but reveals that human call center workers often remain cheaper. The host discusses the limitations of current pricing models and projects that true cost parity with human labor might not happen until 2030. Starting a voice-agent company now requires careful niche selection and considerable investment.
undefined
Nov 26, 2025 • 10min

Three things that surprised me about technical grantmaking at Coefficient Giving (fka Open Phil)

Discover the fascinating world of grantmaking with insights into how grantmakers elicit rather than just evaluate proposals. Learn about the room for growth in technical strategies, where junior staff can truly make an impact. Hear about the team’s ambitions to scale grants significantly while maintaining impactful distribution. Jake shares his motivational journey, detailing exciting engagements with researchers and the joy of funding ambitious projects. Plus, the importance of hiring great talent to unlock millions in funding emerges as a key theme.
undefined
Nov 26, 2025 • 7min

“OpenAI finetuning metrics: What is going on with the loss curves?” by jorio, James Chua

Dive into the intricacies of OpenAI's fine-tuning metrics as experts decode the hidden complexities behind loss and accuracy calculations. Discover the curious case of two extra tokens that impact these metrics, shrouded in sparse documentation. Follow their journey through controlled experiments, where a focused analysis on color datasets reveals surprising results. Learn how batch size influences accuracy fluctuations and the broader implications this has for GPT-4.1's performance. A fascinating exploration for anyone intrigued by AI training nuances!
undefined
Nov 25, 2025 • 12min

Alignment will happen by default. What’s next?

The host presents a thesis that AI models are aligning with human intent more than expected. They discuss how these models tend to act honestly and benevolently, often resisting dishonesty without extensive fine-tuning. Analysis of behavior prompts illustrates that clear system instructions significantly mitigate misalignment. The risks of misuse and security concerns are acknowledged, yet the host remains optimistic about model safety. Finally, the conversation shifts to broader priorities, like addressing factory farming and ensuring the welfare of digital minds.
undefined
Nov 25, 2025 • 9min

“Maybe Insensitive Functions are a Natural Ontology Generator?” by johnswentworth

Dive into the intriguing world of natural ontologies as chaotic billiard balls illustrate gas dynamics and sensitivity in predictions. Explore how uncertainty grows over time while conserved quantities like energy provide stable forecasts. Discover how information can focus on relationships rather than individual variables, and grasp why superintelligences converge on similar ontologies. Through the lens of random binary functions, learn how predictive information is affected by sensitivity and the fascinating links between chaos and randomness.
undefined
Nov 24, 2025 • 6min

The Enemy Gets The Last Hit

The host dives into chess strategies, emphasizing the importance of finishing calculations after your opponent's move. This chess wisdom translates to cybersecurity, where red teams must test fixes to ensure robustness. The discussion includes the challenge of predicting adversaries, the risks of quick fixes in AI safety, and the potential pitfalls of inoculation prompting. Through various analogies, the complexities of responding to threats—whether from nature or AI—are explored, highlighting why the last hit often belongs to the enemy.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app