The Nonlinear Library

The Nonlinear Fund
undefined
Jun 27, 2024 • 17min

AF - An issue with training schemers with supervised fine-tuning by Fabien Roger

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: An issue with training schemers with supervised fine-tuning, published by Fabien Roger on June 27, 2024 on The AI Alignment Forum. This is a rough write-up of ideas by Ryan Greenblatt. The core ideas are his and mistakes are mine. Thanks to him, Buck Shlegeris and to Aghyad Deeb for feedback on the draft of this post. One way to use scheming AIs is to train them to imitate humans. When does imitation robustly avoid frequent and intentional failures? On single-token classification tasks, imitation works great and can allow you to get a lot of high-quality labor with a tiny rate of intentional failures; On tasks that involve generating many tokens, imitation potentially fails by default: schemers could notice when an output was generated by themselves, which enables them to consistently behave badly in deployment while doing best-effort imitation during training. We describe two techniques that avoid imitation failures on generation tasks: A technique that works perfectly but is too expensive A technique that we think works in practice While pure imitation may not be sufficiently competitive, we think it could prove useful for tasks such as: (No-CoT) single-token classification; KL-regularization for RL: we may want to penalize models trained with RL for straying too far from a human imitation prior (and we would want this human prior to not be malign!) (see Appendix); Tasks for which it is fine to take the competitiveness hit from not using RL, but which weaker trusted models are not good enough at.[1] Assumptions In this post, we focus on scenarios where: AIs are smart (around human-level or above); AIs are scheming and know how we are training and deploying them; We want to use these AIs productively (e.g. to do safety research)[2]; Training and deployment tasks are indistinguishable (e.g. because we randomly sampled training tasks from the deployment distribution); SGD works: we ignore gradient hacking and AI architectures where SGD is very sample inefficient; AIs can't cause catastrophes in very few actions (e.g. because they are too weak to break safety measures that address these sorts of concerns); Even when AIs try their best, they can't perfectly predict the probability distribution over the actions that humans[3] would take. Easy-mode: imitating human judgment on classification tasks Here is how you can use an AI to 100x the speed at which you can complete a billion single-token (IID) classification tasks (e.g. detecting software bugs, scoring project ideas, …): 1. Sample 1% of the tasks, and ask humans to label them (using their judgment); 2. Train the AI on that 1% of tasks you have solved yourself, using normal training (e.g. SGD with a cross-entropy loss); 3. Use the AI to solve the remaining 99% of tasks. This will get you an AI which failed intentionally at most a few times in the whole training set because intentional failures would have quickly been trained away[4]. Therefore, you got at most a few hundred intentional failures in the whole set of tasks. That is an intentional failure rate at most around 1/10M, not bad![5] In other words, human imitation avoids intentional low-stakes failures: the model won't often be worse than a best-effort imitation of a human. (But best-effort imitation of a human could be catastrophic if that isn't sufficiently capable.) The basic problem with imitating humans on generation tasks But most tasks we care about involve generating many tokens (e.g. coding a complex experiment, writing a detailed project proposal, …). Does imitation work for such generation tasks? Not quite, here is a problem that could arise if you train an AI to imitate human answers on 1% of tasks and use it on the remaining 99% without additional supervision: On the 1% of training tasks, the AI tries its best to imitate human answers when the star...
undefined
Jun 27, 2024 • 9min

LW - Schelling points in the AGI policy space by mesaoptimizer

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Schelling points in the AGI policy space, published by mesaoptimizer on June 27, 2024 on LessWrong. I've been thinking about memetically fit Schelling points in the AGI policy space. I'll describe four such "Schelling policies", and use them as pedagogical examples. Shut it all down MIRI's new stated objective is the clearest example of a Schelling policy: "Shut it all down". MIRI states that they want governments to coordinate to pause all AI research that involves smarter-than-human systems. Laypeople will find this policy easy to understand, since they can rely on the shared cultural knowledge of CFC bans and international nuclear disarmament as case studies. If you want to coordinate a large number of people coherently towards furthering a particular policy, "you get about five words" that you can make 'common knowledge' such that people can coordinate in a specific direction. The ease of communicating the policy makes a big difference in such conditions. When you attempt to communicate an idea widely, you'll notice that people usually end up with multiple slightly (or sometimes wildly) differing copies of the original idea. If you've played the Telephone game, you've experienced just how much information can be lost as an idea spreads from one person to another. In the context of policies, individual people's beliefs and incentives will warp the instantiation of the policy they will communicate and support. (For example, you'll find companies lobbying regulators to carve out exceptions that benefit them.) Here's where Schelling points are invaluable: they serve as natural attractors in the space of ideas, and therefore enable people to 'error-correct' the idea they encounter and figure out the policy that everyone is coordinating around. "Shut it all down" is a Schelling point. "Shut it all down if we see evidence of unprompted deception and power-seeking in AGI models" is not a Schelling point, you have multiple free variables that can and will be optimized to benefit the people spreading the idea -- which can result in a lack of coordination and the idea being outcompeted by memetically fitter ideas. "Prevent the training of models using compute greater than 1025 floating point operations" also has a free variable: why exactly 1025 floating point operations? Why not 1024 or 1026? Until 1025 floating point operations becomes a Schelling number, the policy containing it is not a Schelling point. Effective Accelerationism (e/acc) The biggest difference between e/acc and the PauseAI memeplexes is that e/acc doesn't seem to have a coherent set of goals and beliefs. Here are a bunch of memes that e/acc people tend to espouse: "It's time to build." (also the last line of The Techno-Optimist Manifesto) "Come and take it." (where "it" refers to GPUs here) "Accelerate or die." At a first glance, one might say that e/acc isn't a Schelling policy -- it seems less like a coherent policy, and more like a set of 'vibes', verbal and non-verbal statements designed to create a desired emotional impact, regardless of the actual content. I disagree. A policy (or well, a memeplex) does not need to have an explicitly coherent set of beliefs and goals for it to result in coordinating people towards particular consequences. You might expect this to reduce the spread rate of this particular policy, but e/acc specifically compensates for it by being significantly more fun and socially, financially, and professionally profitable to coordinate around. For example, venture capital firms such as a16z want the opportunity to make a lot of money from the gold rush that is the race to AGI, and a lot of software developers want a shot at making billions of dollars if their startup succeeds. The possibility of regulations would cause the music to stop, and they don't want that. In fact, you don...
undefined
Jun 27, 2024 • 20min

LW - Live Theory Part 0: Taking Intelligence Seriously by Sahil

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Live Theory Part 0: Taking Intelligence Seriously, published by Sahil on June 27, 2024 on LessWrong. Acknowledgements The vision here was midwifed originally in the wild and gentle radiance that is Abram's company (though essentially none of the content is explicitly his). The PIBBSS-spirit has been infused in this work from before it began (may it infuse us all), as have meetings with the Agent Foundations team at MIRI over the past ~2 years. More recently, everyone who has been loving the High Actuation project into form (very often spontaneously and without being encumbered by self-consciousness of this fact):[1] individuals include Steve Petersen, Mateusz Baginski, Aditya Prasad, Harmony, TJ, Chris Lakin; the AISC 2024 team, Murray Buchanan, Matt Farr, Arpan Agrawal, Adam, Ryan, Quinn; various people from Topos, ALIFE, MAPLE, EA Bangalore. Published while at CEEALAR. Disclaimers Very occasionally there are small remarks/questions from a remarkable human named Steve, since this and the next two posts are an edited transcript of me giving him a talk. I left them in to retain the conversational tone. Steve has also consistently been a fantastic ground for this channeling. I use the term "artefact" a fair amount in this sequence. Unfortunately for you and me, Anthropic also recently started using "artifact" in a different way. I'm using "artefact" in the common sense of the word. The British spelling should help remind of the distinction. Taking Intelligence Seriously Sahil: I gave a talk recently, at an EA event just two days ago, where I made some quick slides (on the day of the talk, so not nearly as tidy as I'd like) and attempted to walk through this so-called "live theory". (Alternative terms include "adaptive theory" or "fluid theory"; where the theories themselves are imbued with some intelligence.) Maybe I can give you that talk. I'm not sure how much of what I was saying there will be present now, but I can try. What do you think? I think it'll take about 15 minutes. Yeah? Steve: Cool. Sahil: Okay, let me give you a version of this talk that's very abbreviated. So, the title I'm sure already makes sense to you, Steve. I don't know if this is something that you know, but I prefer the word "adaptivity" over intelligence. I'm fine with using "intelligence" for this talk, but really, when I'm thinking of AI and LLMs and "live" (as you'll see later), I'm thinking, in part, of adaptive. And I think that connotes much more of the relevant phenomena, and much less controversially. It's also less distractingly "foundational", in the sense of endless questions on "what intelligence means". Failing to Take Intelligence Seriously Right. So, I want to say there are two ways to fail to take intelligence, or adaptivity, seriously. One is, you know, the classic case, of people ignoring existential risk from artificial intelligence. The old "well, it's just a computer, just software. What's the big deal? We can turn it off." We all know the story there. In many ways, this particular failure-of-imagination is much less pronounced today. But, I say, a dual failure-of-imagination is true today even among the "cognoscenti", where we ignore intelligence by ignoring opportunities from moderately capable mindlike entities at scale. I'll go over this sentence slower in the next slide. For now: there are two ways to not meet reality. On the left of the slide is "nothing will change". The same "classic" case of "yeah, what's the big deal? It's just software." On the right, it's the total singularity, of extreme unknowable super-intelligence. In fact, the phrase "technological singularity", IIRC, was coined by Vernor Vinge to mark the point that we can't predict beyond. So, it's also a way to be mind-killed. Even with whatever in-the-limit proxies we have for this, we make various sim...
undefined
Jun 26, 2024 • 20min

AF - Live Theory Part 0: Taking Intelligence Seriously by Sahil

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Live Theory Part 0: Taking Intelligence Seriously, published by Sahil on June 26, 2024 on The AI Alignment Forum. Acknowledgements The vision here was midwifed originally in the wild and gentle radiance that is Abram's company (though essentially none of the content is explicitly his). The PIBBSS-spirit has been infused in this work from before it began (may it infuse us all), as have meetings with the Agent Foundations team at MIRI over the past ~2 years. More recently, everyone who has been loving the High Actuation project into form (very often spontaneously and without being encumbered by self-consciousness of this fact):[1] individuals include Steve Petersen, Mateusz Baginski, Aditya Prasad, Harmony, TJ, Chris Lakin; the AISC 2024 team, Murray Buchanan, Matt Farr, Arpan Agrawal, Adam, Ryan, Quinn; various people from Topos, ALIFE, MAPLE, EA Bangalore. Published while at CEEALAR. Disclaimers Very occasionally there are small remarks/questions from a remarkable human named Steve, since this and the next two posts are an edited transcript of me giving him a talk. I left them in to retain the conversational tone. Steve has also consistently been a fantastic ground for this channeling. I use the term "artefact" a fair amount in this sequence. Unfortunately for you and me, Anthropic also recently started using "artifact" in a different way. I'm using "artefact" in the common sense of the word. The British spelling should help remind of the distinction. Taking Intelligence Seriously Sahil: I gave a talk recently, at an EA event just two days ago, where I made some quick slides (on the day of the talk, so not nearly as tidy as I'd like) and attempted to walk through this so-called "live theory". (Alternative terms include "adaptive theory" or "fluid theory"; where the theories themselves are imbued with some intelligence.) Maybe I can give you that talk. I'm not sure how much of what I was saying there will be present now, but I can try. What do you think? I think it'll take about 15 minutes. Yeah? Steve: Cool. Sahil: Okay, let me give you a version of this talk that's very abbreviated. So, the title I'm sure already makes sense to you, Steve. I don't know if this is something that you know, but I prefer the word "adaptivity" over intelligence. I'm fine with using "intelligence" for this talk, but really, when I'm thinking of AI and LLMs and "live" (as you'll see later), I'm thinking, in part, of adaptive. And I think that connotes much more of the relevant phenomena, and much less controversially. It's also less distractingly "foundational", in the sense of endless questions on "what intelligence means". Failing to Take Intelligence Seriously Right. So, I want to say there are two ways to fail to take intelligence, or adaptivity, seriously. One is, you know, the classic case, of people ignoring existential risk from artificial intelligence. The old "well, it's just a computer, just software. What's the big deal? We can turn it off." We all know the story there. In many ways, this particular failure-of-imagination is much less pronounced today. But, I say, a dual failure-of-imagination is true today even among the "cognoscenti", where we ignore intelligence by ignoring opportunities from moderately capable mindlike entities at scale. I'll go over this sentence slower in the next slide. For now: there are two ways to not meet reality. On the left of the slide is "nothing will change". The same "classic" case of "yeah, what's the big deal? It's just software." On the right, it's the total singularity, of extreme unknowable super-intelligence. In fact, the phrase "technological singularity", IIRC, was coined by Vernor Vinge to mark the point that we can't predict beyond. So, it's also a way to be mind-killed. Even with whatever in-the-limit proxies we have for this, we mak...
undefined
Jun 26, 2024 • 5min

AF - Instrumental vs Terminal Desiderata by Max Harms

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Instrumental vs Terminal Desiderata, published by Max Harms on June 26, 2024 on The AI Alignment Forum. Bob: "I want my AGI to make everyone extremely wealthy! I'm going to train that to be its goal." Cassie: "Stop! You'll doom us all! While wealth is good, it's not everything that's good, and so even if you somehow build a wealth-maximizer (instead of summoning some random shattering of your goal), it will sacrifice all the rest of the good in the name of wealth!" Bob: "Maybe if it suddenly became a god-like superintelligence, but I'm a hard take-off skeptic. In the real world we have continuous processes and I'm going to be in control. If it starts to go off the rails, I'll just stop it and re-train it to not do that." Cassie: "Be careful what you summon! While it may seem like you're in control in the beginning, these systems are generalized obstacle-bypassers, and you're making yourself into an obstacle that needs to be bypassed. Whether that takes two days or twenty years, you're setting us up to die." Bob: "Ok, fine. So I'll build my AGI to make people rich and simultaneously to respect human values and property rights and stuff. At the point where it can bypass me, it'll avoid turning everyone into bitcoin mining rigs or whatever because that would go against its goal of respecting human values." Cassie: "What does 'human values' even mean? I agree that if you can build an AGI that is truly aligned, we're good, but that's a tall order and it doesn't even seem like what you're aiming for. Instead, it seems like you think we should train the AGI to maximize a pile of desiderata." Bob: "Yeah! My AGI will be helpful, obedient, corrigible, honest, kind, and will never produce copyrighted songs, memorize the NYT, or impersonate Scarlett Johansson! I'll add more desiderata to the list as I think of them." Cassie: "And what happens when those desiderata come into conflict? How does it decide what to do?" Bob: "Hrm. I suppose I'll define a hierarchy like Asimov's laws. Some of my desiderata, like corrigibility, will be constraints, while others, like making people rich, will be values. When a constraint comes in conflict with a value, the constraint wins. That way my agent will always shut down when asked, even though doing so would be a bad way to make us rich." Cassie: "Shutting down when asked isn't the hard part of corrigibility, but that's a tangent. Suppose that the AGI is faced with a choice of a 0.0001% chance of being dishonest, but earning a billion dollars, or a 0.00001% chance of being dishonest, but earning nothing. What will it do?" Bob: "Hrm. I see what you're saying. If my desiderata are truly arranged in a hierarchy with certain constraints on top, then my agent will only ever pursue its values if everything upstream is exactly equal, which won't be true in most contexts. Instead, it'll essentially optimize solely for the topmost constraint." Cassie: "I predict that it'll actually learn to want a blend of things, and find some weighting such that your so called 'constraints' are actually just numerical values along with the other things in the blend. In practice you'll probably get a weird shattering, but if you're magically lucky on getting what you aim for, you'll still probably just get a weighted mixture. Getting a truly hierarchical goal seems nearly impossible outside of toy problems." Bob: "Doesn't this mean we're also doomed if we train an AGI to be truly aligned? Like, won't it still sometimes sacrifice one aspect of alignment, like being honest, in order to get a sufficiently large quantity of another aspect of alignment, like saving lives?" Cassie: "That seems confused. My point is that a coherent agent will act as though it's maximizing a utility function, and that if your strategy involves lumping together a bunch of desiderata as good-in-...
undefined
Jun 26, 2024 • 2min

LW - Progress Conference 2024: Toward Abundant Futures by jasoncrawford

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Progress Conference 2024: Toward Abundant Futures, published by jasoncrawford on June 26, 2024 on LessWrong. The progress movement has grown a lot in the last few years. We now have progress journals, think tanks, and fellowships. The progress idea has spread and evolved into the "abundance agenda", "techno-optimism", "supply-side progressivism", "American dynamism". All of us want to see more scientific, technological, and economic progress for the good of humanity, and envision a bold, ambitious, flourishing future. What we haven't had so far is a regular gathering of the community. Announcing Progress Conference 2024, a two-day event to connect people in the progress movement. Meet great people, share ideas in deep conversations, catalyze new projects, get energized and inspired. Hosted by: the Roots of Progress Institute, together with the Foresight Institute, HumanProgress.org, the Institute for Humane Studies, the Institute for Progress, and Works in Progress magazine When: October 18-19, 2024 Where: Berkeley, CA - at the Lighthaven campus, an inviting space perfect for mingling Speakers: Keynotes include Patrick Collison, Tyler Cowen, Jason Crawford, and Steven Pinker. Around 20 additional speakers will share ideas on four tracks: the big idea of human progress, policy for progress, tech for progress, and storytelling/media for progress. Full speaker list Attendees: We expect 200+ intellectuals, builders, policy makers, storytellers, and students. This is an invitation-only event, but anyone can apply for an invitation. Complete the open application by July 15th. Program: Two days of intellectual exploration, inspiration and interaction that will help shape the progress movement into a cultural force. Attend talks on topics from tech to policy to culture, build relationships with new people as you hang out on cozy sofas or enjoy the sun in the garden, sign up to run an unconference session and find others who share your interests and passions, or pitch your ideas to those who could help make your dreams a reality. Special thanks to our early sponsors: Cato Institute, Astera Institute, and Freethink Media! We have more sponsorships open, view sponsorship opportunities here. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Jun 26, 2024 • 42min

EA - Animal Welfare Fund: Payout recommendations from May 2022 to March 2024 by Linch

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Animal Welfare Fund: Payout recommendations from May 2022 to March 2024, published by Linch on June 26, 2024 on The Effective Altruism Forum. Introduction This payout report covers the Animal Welfare Fund's grantmaking from May 1 2022 to March 31st 2024 (23 months). It follows the previous March-April 2022 payout report. Since the previous report was written, the Animal Welfare Fund made a conscious decision to deprioritize public reports so that fund managers can have more time and capacity to work on high-quality evaluations. However, now that I (Linch) am working for EA Funds full-time, I've attempted to consolidate and streamline the report writing process, so that hopefully regular reports can be written again without unduly using up limited fund manager time. Total funding recommended: $7,079,301 Total funding paid out: $6,943,301 Number of grants paid out: 109 Acceptance rate (excluding desk rejections): 111/409 = 27.1% Acceptance rate (including desk rejections): 111/685 = 16.2% Report authors: Linchuan Zhang (primary author), Kieran Greig (fund chair), Karolina Sarek, Zoë Sigle, Neil Dullaghan 16 of our grantees, who received a total of $1,068,540, requested that we do not include public reports for their grants (you can read our policy on public reporting here). Highlighted Grants (The following grants writeups were written by me, Linch Zhang. They were reviewed by the primary investigators of each grant). Highlighted grants correspond to grants that the AWF team rated highly, usually because they thought the grant was very likely to be very cost-effective or the potential upside was very high. Planet For All Hong Kong ($63,000): Stipends and professional services to launch Hong Kong's first aquatic animal welfare movement The Animal Welfare Fund provided a $63,000 grant to Planet For All Hong Kong from September 2022 to August 2023. The funding covered stipends, marketing, website, and professional services for one full-time employee and two part-time employees to launch Hong Kong's first aquatic animal welfare movement. The organization plans to engage with industry stakeholders, government, and policymakers; raise public awareness; and build partnerships with corporations to incorporate animal welfare in their sourcing policies. The Animal Welfare Fund was excited to fund this project due to the region, focus area, and the organizations' leadership and project plan. Hong Kong is a high-priority region with high fish consumption, and the work is very neglected. The fund believes the small organization has good leadership and a reasonable project plan. However, we did have concerns that the additional focus may distract from their very important work on chicken cage-free reforms. This concern is partially mitigated by the grant being used to hire an external person for the full-time role, so existing projects are hopefully less likely to be negatively impacted. Overall, the importance and neglectedness of the work are strong arguments in favor of supporting the grant. Outcomes: During the grant period, the organization identified supportive stakeholders and a local campaign opportunity regarding "Fish Culture Zones," building rapport with local marine conservation organizations and meeting with modern fish farmers to better understand the feasibility and challenges of promoting aquatic animal welfare. They connected the Fish Culture Zone campaign with the importance of aquatic animal agriculture and met with the Aquaculture Division of the Hong Kong government. The organization successfully launched "Sea Life Matters," the first public education campaign about aquatic life welfare in Hong Kong, engaging with about 1,000 citizens, partnering with more than 30 online influencers, and collaborating with over 10 restaurants and retailers. Animal Empathy Philip...
undefined
Jun 26, 2024 • 3min

EA - Can AI power the Global Mapping and Quantification of Animal Suffering? The Pain Atlas Project by Wladimir J. Alonso

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Can AI power the Global Mapping and Quantification of Animal Suffering? The Pain Atlas Project, published by Wladimir J. Alonso on June 26, 2024 on The Effective Altruism Forum. This article examines how AI, particularly Large Language Models (LLMs), could soon enable tackling the enormous challenge of systematically quantifying the main sources of suffering across humans and animals. Two years ago, a post by James Özden and Neil Dullaghan in this forum, on Megaprojects for Animals, highlighted the significant gap in the evidence base to inform effective strategies for animal welfare. They estimated a growth rate of research to inform the animal welfare evidence base lower than 1% of the growth rate seen in global health. We argue that leveraging AI offers a promising solution to address this gap. Drawing inspiration on the success of AlphaFold in rapidly advancing the field of molecular biology, AI can similarly revolutionize the mapping and quantification of animal suffering, providing the much needed evidence to inform and optimize animal welfare interventions, policies and decision-making in general. Because it channels existing knowledge and scientific evidence in a systematic way to inform estimates of animal suffering, the Welfare Footprint Framework (WFF) is an analytical approach especially well-suited to make use of AI's capabilities. Providing a glimpse of what might be possible in the near future with the use of AI, we introduce the Pain-Track AI Tool. This tool is designed to describe and quantify negative affective experiences in animals, offering a starting point for the description and quantification of any source of suffering. Building on this foundation, we also introduce the Pain Atlas Project, envisaged as a large-scale initiative aimed at comprehensively mapping and quantifying the primary sources of human and animal suffering. This project, which we anticipate could have an impact in terms of knowledge advancement comparable to AlphaFold's achievements in molecular biology, foresees three core components: Mapping of Suffering: a comprehensive analysis of the primary source of suffering endured by different species throughout their lives. Quantification of Suffering: using the Cumulative Pain metric to estimate the magnitude of suffering associated with each of the sources of suffering identified. Visualization of Suffering: use of visualization tools to construct a detailed and global landscape of suffering across species and living conditions, guiding decision-making and intervention strategies. The description of the AI tool and Pain Atlas Project is available here. We invite everyone to provide feedback in this forum and discuss potential collaborations (feel free to also reach out to us at AI@welfarefootprint.org). Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Jun 26, 2024 • 2min

EA - Announcing The Moral Circle: Who Matters, What Matters, and Why by jeffsebo

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing The Moral Circle: Who Matters, What Matters, and Why, published by jeffsebo on June 26, 2024 on The Effective Altruism Forum. Hi all, just a short post to let you know that my next book, The Moral Circle: Who Matters, What Matters, and Why, is now available for preorder! This is the publisher's description: "Today, human exceptionalism is the norm. Despite occasional nods to animal welfare, we focus on humanity, often neglecting the welfare of a vast number of beings. In The Moral Circle, philosopher Jeff Sebo challenges us to include all potentially significant beings in our moral community, with transformative implications for our lives and societies. As the dominant species, humanity must ask: which nonhumans matter, how much do they matter, and what do we owe them in a world reshaped by human activity and technology? The Moral Circle explores provocative case studies, such as lawsuits over captive elephants and debates over factory-farmed insects, and compels us to consider future ethical quandaries, such as whether to send microbes to new planets and whether to create digital worlds filled with digital minds. Taking an expansive view of human responsibility, Sebo argues that building a positive future requires radically rethinking our place in the world." Preorders are increasingly important in publishing, so if you have interest in the book, please preorder, and if you know others who might, please share! Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Jun 26, 2024 • 10min

LW - What is a Tool? by johnswentworth

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What is a Tool?, published by johnswentworth on June 26, 2024 on LessWrong. Throughout this post, we're going to follow the Cognition -> Convergence -> Consequences methodology[1]. That means we'll tackle tool-ness in three main stages, each building on the previous: Cognition: What does it mean, cognitively, to view or model something as a tool? Convergence: Insofar as different minds (e.g. different humans) tend to convergently model the same things as tools, what are the "real patterns" in the environment which give rise to that convergence? Consequences: Having characterized the real patterns convergently recognized as tool-ness, what other properties or consequences of tool-ness can we derive? What further predictions does our characterization make? We're not going to do any math in this post, though we will gesture at the spots where proofs or quantitative checks would ideally slot in. Cognition: What does it mean, cognitively, to view or model something as a tool? Let's start with a mental model of (the cognition of) problem solving, then we'll see how "tools" naturally fit into that mental model. When problem-solving, humans often come up with partial plans - i.e. plans which have "gaps" in them, which the human hasn't thought through how to solve, but expects to be tractable. For instance, if I'm planning a roadtrip from San Francisco to Las Vegas, a partial plan might look like "I'll take I-5 down the central valley, split off around Bakersfield through the Mojave, then get on the highway between LA and Vegas". That plan has a bunch of gaps in it: I'm not sure exactly what route I'll take out of San Francisco onto I-5 (including whether to go across or around the Bay), I don't know which specific exits to take in Bakersfield, I don't know where I'll stop for gas, I haven't decided whether I'll stop at the town museum in Boron, I might try to get pictures of the airplane storage or the solar thermal power plant, etc. But I expect those to be tractable problems which I can solve later, so it's totally fine for my plan to have such gaps in it. How do tools fit into that sort of problem-solving cognition? Well, sometimes similar gaps show up in many different plans (or many times in one plan). And if those gaps are similar enough, then it might be possible to solve them all "in the same way". Sometimes we can even build a physical object which makes it easy to solve a whole cluster of similar gaps. Consider a screwdriver, for instance. There's a whole broad class of problems for which my partial plans involve unscrewing screws. Those partial plans involve a bunch of similar "unscrew the screw" gaps, for which I usually don't think in advance about how I'll unscrew the screw, because I expect it to be tractable to solve that subproblem when the time comes. A screwdriver is a tool for that class of gaps/subproblems[2]. So here's our rough cognitive characterization: Humans naturally solve problems using partial plans which contain "gaps", i.e. subproblems which we put off solving until later Sometimes there are clusters of similar gaps A tool makes some such cluster relatively easy to solve. Convergence: Insofar as different minds (e.g. different humans) tend to convergently model the same things as tools, what are the "real patterns" in the environment which give rise to that convergence? First things first: there are limits to how much different minds do, in fact, convergently model the same things as tools. You know that thing where there's some weird object or class of objects, and you're not sure what it is or what it's for, but then one day you see somebody using it for its intended purpose and you're like "oh, that's what it's for"? () From this, we learn several things about tools: Insofar as different humans convergently model the same things as tools at...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app