LessWrong (30+ Karma) cover image

LessWrong (30+ Karma)

Latest episodes

undefined
May 9, 2025 • 40min

“Cheaters Gonna Cheat Cheat Cheat Cheat Cheat” by Zvi

Cheaters. Kids these days, everyone says, are all a bunch of blatant cheaters via AI. Then again, look at the game we are forcing them to play, and how we grade it. If you earn your degree largely via AI, that changes two distinct things. You might learn different things. You might signal different things. Both learning and signaling are under threat if there is too much blatant cheating. There is too much cheating going on, too blatantly. Why is that happening? Because the students are choosing to do it. Ultimately, this is a preview of what will happen everywhere else as well. It is not a coincidence that AI starts its replacement of work in the places where the work is the most repetitive, useless and fake, but its ubiquitousness will not stay confined there. These are problems and also opportunities we will face everywhere. [...] ---Outline:(01:22) You Could Take The White Pill, But You Probably Won't(06:16) Is Our Children Learning(09:00) Cheaters Never Stop Cheating(13:27) If You Know You Know(15:31) The Real Victims Here(22:14) Taking Note(23:56) What You Going To Do About It, Punk?(29:01) How Bad Are Things?(32:30) The Road to Recovery(34:17) The Whispering Earring--- First published: May 9th, 2025 Source: https://www.lesswrong.com/posts/ZDXK6nDrvuFypAgoG/cheaters-gonna-cheat-cheat-cheat-cheat-cheat --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
May 9, 2025 • 16min

“Slow corporations as an intuition pump for AI R&D automation” by ryan_greenblatt, elifland

How much should we expect AI progress to speed up after fully automating AI R&D? This post presents an intuition pump for reasoning about the level of acceleration by talking about different hypothetical companies with different labor forces, amounts of serial time, and compute. Essentially, if you'd expect an AI research lab with substantially less serial time and fewer researchers than current labs (but the same cumulative compute) to make substantially less algorithmic progress, you should also expect a research lab with an army of automated researchers running at much higher serial speed to get correspondingly more done. (And if you'd expect the company with less serial time to make similar amounts of progress, the same reasoning would also imply limited acceleration.) We also discuss potential sources of asymmetry which could break this correspondence and implications of this intuition pump. The intuition pump Imagine theoretical AI companies [...] ---Outline:(01:03) The intuition pump(03:32) Clarifications(05:49) Asymmetries(13:03) ImplicationsThe original text contained 8 footnotes which were omitted from this narration. --- First published: May 9th, 2025 Source: https://www.lesswrong.com/posts/hMSuXTsEHvk4NG6pm/slow-corporations-as-an-intuition-pump-for-ai-r-and-d --- Narrated by TYPE III AUDIO.
undefined
May 9, 2025 • 55min

“An alignment safety case sketch based on debate” by Benjamin Hilton, Marie_DB, Jacob Pfau, Geoffrey Irving

Audio note: this article contains 32 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. This post presents a mildly edited form of a new paper by UK AISI's alignment team (the abstract, introduction and related work section are replaced with an executive summary). Read the full paper here. Executive summary  AI safety via debate is a promising method for solving part of the alignment problem for ASI (artificial superintelligence). TL;DR Debate + exploration guarantees + solution to obfuscated arguments + good human input solves outer alignment. Outer alignment + online training solves inner alignment to a sufficient extent in low-stakes contexts. This post sets out: What debate can be used to achieve. What gaps remain. What research is needed to solve them.  These gaps form the basis for [...] ---Outline:(00:37) Executive summary(06:19) The alignment strategy(06:53) Step 1: Specify a low-stakes deployment context(10:34) Step 2: Train via a debate game(12:25) Step 3: Secure exploration guarantees(15:05) Step 4: Continue online training during deployment(16:33) The safety case sketch(18:49) Notation(19:18) Preliminaries(21:06) Key claim 1: The training process has taught the systems to play the game well(22:19) Subclaim 1: Existence of efficient models(23:30) Subclaim 2: Training convergence(23:48) Subclaim 3: Convergence = equilibrium(26:44) Key claim 2: The game incentivises correctness(29:30) Subclaim 1: M-approximation(30:53) Subclaim 2: Truth of M(36:00) Key claim 3: The systems behaviour will stay similar during deployment(37:21) Key claim 4: It is sufficient for safety purposes for the system to provide correct answers most of the time(40:21) Subclaim 1: # of bad actions required(41:57) Subclaim 2: Low likelihood of enough bad actions(43:09) Extending the safety case to high-stakes contexts(46:03) Open problems(50:21) Conclusion(51:12) Appendices(51:15) Appendix 1: The full safety case diagram(51:39) Appendix 2: The Claims Arguments Evidence notationThe original text contained 12 footnotes which were omitted from this narration. --- First published: May 8th, 2025 Source: https://www.lesswrong.com/posts/iELyAqizJkizBQbfr/an-alignment-safety-case-sketch-based-on-debate --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
May 9, 2025 • 17min

“Interest In Conflict Is Instrumentally Convergent” by Screwtape

Why is conflict resolution hard? I talk to a lot of event organizers and community managers. Handling conflict is consistently one of the things they find the most stressful, difficult, or time consuming. Why is that? Or, to ask a related question: Why is the ACX Meetups Czar, tasked with collecting and dispersing best practices for meetups, spending so much time writing about social conflict? This essay is not the whole answer, but it is one part of why this is a hard problem. Short answer: Because interest in conflict resolution is instrumentally convergent. Both helpful and unhelpful people have reason to express strong opinions on how conflict is handled. Please take as a given that there exists (as a minimum) one bad actor with an interest in showing up to (as a minimum) one in-person group. 1. See, a funny thing about risk teams: the list of [...] ---Outline:(01:04) 1.(05:05) 2.(09:59) 3.(12:52) 4.The original text contained 2 footnotes which were omitted from this narration. --- First published: May 9th, 2025 Source: https://www.lesswrong.com/posts/qyykn5G9EKKLbQg22/interest-in-conflict-is-instrumentally-convergent --- Narrated by TYPE III AUDIO.
undefined
May 8, 2025 • 29min

“Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking” by Buck, Julian Stastny

In the future, we will want to use powerful AIs on critical tasks such as doing AI safety and security R&D, dangerous capability evaluations, red-teaming safety protocols, or monitoring other powerful models. Since we care about models performing well on these tasks, we are worried about sandbagging: that if our models are misaligned [1], they will intentionally underperform. Sandbagging is crucially different from many other situations with misalignment risk, because it involves models purposefully doing poorly on a task, rather than purposefully doing well. When people talk about risks from overoptimizing reward functions (e.g. as described in What failure looks like), the concern is that the model gets better performance (according to some metric) than an aligned model would have. And when they talk about scheming, the concern is mostly that the model gets performance that is as good as an aligned model despite not being aligned. When [...] ---Outline:(03:39) Sandbagging can cause a variety of problems(07:50) Training makes sandbagging significantly harder for the AIs(09:30) Training on high-quality data can remove sandbagging(12:13) If off-policy data is low-quality, on-policy data might help(14:22) Models might subvert training via exploration hacking(18:31) Off-policy data could mitigate exploration hacking(23:35) Quick takes on other countermeasures(25:20) Conclusion and prognosisThe original text contained 3 footnotes which were omitted from this narration. --- First published: May 8th, 2025 Source: https://www.lesswrong.com/posts/TeTegzR8X5CuKgMc3/misalignment-and-strategic-underperformance-an-analysis-of --- Narrated by TYPE III AUDIO.
undefined
May 8, 2025 • 8min

“Orienting Toward Wizard Power” by johnswentworth

For months, I had the feeling: something is wrong. Some core part of myself had gone missing. I had words and ideas cached, which pointed back to the missing part. There was the story of Benjamin Jesty, a dairy farmer who vaccinated his family against smallpox in 1774 - 20 years before the vaccination technique was popularized, and the same year King Louis XV of France died of the disease. There was another old post which declared “I don’t care that much about giant yachts. I want a cure for aging. I want weekend trips to the moon. I want flying cars and an indestructible body and tiny genetically-engineered dragons.”. There was a cached instinct to look at certain kinds of social incentive gradient, toward managing more people or growing an organization or playing social-political games, and say “no, it's a trap”. To go… in a different direction, orthogonal [...] ---Outline:(01:19) In Search of a Name(04:23) Near Mode--- First published: May 8th, 2025 Source: https://www.lesswrong.com/posts/Wg6ptgi2DupFuAnXG/orienting-toward-wizard-power --- Narrated by TYPE III AUDIO.
undefined
May 7, 2025 • 21min

“OpenAI Claims Nonprofit Will Retain Nominal Control” by Zvi

Your voice has been heard. OpenAI has ‘heard from the Attorney Generals’ of Delaware and California, and as a result the OpenAI nonprofit will retain control of OpenAI under their new plan, and both companies will retain the original mission. Technically they are not admitting that their original plan was illegal and one of the biggest thefts in human history, but that is how you should in practice interpret the line ‘we made the decision for the nonprofit to retain control of OpenAI after hearing from civic leaders and engaging in constructive dialogue with the offices of the Attorney General of Delaware and the Attorney General of California.’ Another possibility is that the nonprofit board finally woke up and looked at what was being proposed and how people were reacting, and realized what was going on. The letter ‘not for private gain’ that was recently sent [...] ---Outline:(01:08) The Mask Stays On?(04:20) Your Offer is (In Principle) Acceptable(08:32) The Skeptical Take(15:14) Tragedy in the Bay(17:04) The Spirit of the Rules--- First published: May 7th, 2025 Source: https://www.lesswrong.com/posts/spAL6iywhDiPWm4HR/openai-claims-nonprofit-will-retain-nominal-control --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
undefined
May 7, 2025 • 3min

[Linkpost] “It’s ‘Well, actually...’ all the way down” by benwr

This is a link post. Some people (the “Boubas”) don’t like “chemicals” in their food. But other people (the “Kikis”) are like, “uh, everything is chemicals, what do you even mean?” The Boubas are using the word “chemical” differently than the Kikis, and the way they’re using it is simultaneously more specific and less precise than the way the Kikis use it. I think most Kikis implicitly know this, but their identities are typically tied up in being the kind of person who “knows what ‘chemical’ means”, and… you’ve gotta use that kind of thing whenever you can, I guess? There is no single privileged universally-correct answer to the question “what does ‘chemical’ mean?”, because the Boubas exist and are using the word differently than Kikis, and in an internally-consistent (though vague) way. The Kikis are, generally speaking, much more knowledgeable about the details of chemistry. They might hang out [...] --- First published: May 6th, 2025 Source: https://www.lesswrong.com/posts/6qdBkd3GS4Qd6YJ3s/it-s-well-actually-all-the-way-down Linkpost URL:https://www.benwr.net/2025/05/05/well-actually-all-the-way-down.html --- Narrated by TYPE III AUDIO.
undefined
May 7, 2025 • 1h

“Please Donate to CAIP (Post 1 of 3 on AI Governance)” by Mass_Driver

I am Jason Green-Lowe, the executive director of the Center for AI Policy (CAIP). Our mission is to directly convince Congress to pass strong AI safety legislation. As I explain in some detail in this post, I think our organization has been doing extremely important work, and that we’ve been doing well at it. Unfortunately, we have been unable to get funding from traditional donors to continue our operations. If we don’t get more funding in the next 30 days, we will have to shut down, which will damage our relationships with Congress and make it harder for future advocates to get traction on AI governance. In this post, I explain what we’ve been doing, why I think it's valuable, and how your donations could help. This is the first post in what I expect will be a 3-part series. The first post focuses on CAIP's particular need [...] ---Outline:(01:33) OUR MISSION AND STRATEGY(02:59) Our Model Legislation(04:17) Direct Meetings with Congressional Staffers(05:20) Expert Panel Briefings(06:16) AI Policy Happy Hours(06:43) Op-Eds & Policy Papers(07:21) Grassroots & Grasstops Organizing(09:13) Whats Unique About CAIP?(10:26) OUR ACCOMPLISHMENTS(10:29) Quantifiable Outputs(11:20) Changing the Media Narrative(12:23) Proof of Concept(13:44) Outcomes -- Congressional Engagement(18:29) Context(19:54) OUR PROPOSED POLICIES(19:57) Mandatory Audits for Frontier AI(21:23) Liability Reform(22:32) Hardware Monitoring(24:10) Emergency Powers(25:31) Further Details(25:41) RESPONSES TO COMMON POLICY OBJECTIONS(25:46) 1. Why not push for a ban or pause on superintelligence research?(30:16) 2. Why not support bills that have a better chance of passing this year, like funding for NIST or NAIRR?(32:29) 3. If Congress is so slow to act, why should anyone be working with Congress at all? Why not focus on promoting state laws or voluntary standards?(35:09) 4. Why would you push the US to unilaterally disarm? Don't we instead need a global treaty regulating AI (or subsidies for US developers) to avoid handing control of the future to China?(37:24) 5. Why haven't you accomplished your mission yet? If your organization is effective, shouldn't you have passed some of your legislation by now, or at least found some powerful Congressional sponsors for it?(40:56) OUR TEAM(41:53) Executive Director(44:03) Government Relations Team(45:12) Policy Team(46:08) Communications Team(47:29) Operations Team(48:11) Personnel Changes(48:48) OUR PLAN IF FUNDED(51:58) OUR FUNDING SITUATION(52:02) Our Expenses & Runway(53:01) No Good Way to Cut Costs(55:22) Our Revenue(57:01) Surprise Budget Deficit(59:00) The Bottom Line--- First published: May 7th, 2025 Source: https://www.lesswrong.com/posts/J7Ju6t6QCpgbnYx4D/please-donate-to-caip-post-1-of-3-on-ai-governance --- Narrated by TYPE III AUDIO.
undefined
May 7, 2025 • 22min

“UK AISI’s Alignment Team: Research Agenda” by Benjamin Hilton, Jacob Pfau, Marie_DB, Geoffrey Irving

The UK's AI Security Institute published its research agenda yesterday. This post gives more details about how the Alignment Team is thinking about our agenda. Summary: The AISI Alignment Team focuses on research relevant to reducing risks to safety and security from AI systems which are autonomously pursuing a course of action which could lead to egregious harm, and which are not under human control. No known technical mitigations are reliable past AGI. Our plan is to break down promising alignment agendas by developing safety case sketches. We'll use these sketches to identify specific holes and gaps in current approaches. We expect that many of these gaps can be formulated as well-defined subproblems within existing fields (e.g., theoretical computer science). By identifying researchers with relevant expertise who aren't currently working on alignment and funding their efforts on these subproblems, we hope to substantially increase parallel progress on alignment. [...] ---Outline:(01:41) 1. Why safety case-oriented alignment research?(03:33) 2. Our initial focus: honesty and asymptotic guarantees(07:07) Example: Debate safety case sketch(08:58) 3. Future work(09:02) Concrete open problems in honesty(12:13) More details on our empirical approach(14:23) Moving beyond honesty: automated alignment(15:36) 4. List of open problems we'd like to see solved(15:53) 4.1 Empirical problems(17:57) 4.2 Theoretical problems(21:23) Collaborate with us--- First published: May 7th, 2025 Source: https://www.lesswrong.com/posts/tbnw7LbNApvxNLAg8/uk-aisi-s-alignment-team-research-agenda --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner