The Nonlinear Library

The Nonlinear Fund
undefined
Jun 9, 2024 • 6min

EA - Is Nigerian nurse emigration really a "win-win"? Critique of a CGD article by NickLaing

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Is Nigerian nurse emigration really a "win-win"? Critique of a CGD article, published by NickLaing on June 9, 2024 on The Effective Altruism Forum. I write to raise what I think is a fundamental flaw in a Center for Global Development (CGD) article about immigration.[1] I posted here only after receiving what I considered inadequate feedback from the authors over the last 3 months.[2] The major argument of the article "UK recruitment of nurses can be a win-win" is that the current situation where Nigeria exports an ever-increasing number of nurses to the UK could be good for both countries. For Nigeria, the benefits come through both the classic economic benefits of remittances, and stimulating increased nurse training in Nigeria to compensate for the losses of nurses. A Misleading Datapoint The central datapoint on which their argument rests seems misleading. The authors cite the number of nurses who leave Nigeria each year for the UK alone and claim that increased nurse training in Nigeria is enough to replace these nurses. "Between late 2021 and 2022, the number of successful national nursing exam candidates increased by 2,982 - that is, more than enough to replace those who had left for the UK." Technically, they are correct that the number trained replaces those who leave only for the UK, but they don't consider the majority of nurses who left to other countries. Emigration to the UK constitutes under 25% of the total nurse emigration from Nigeria. A more meaningful data point would have been the total number of nurses that leave Nigeria for all countries. Based on this Guardian article (and others), about 29,000 new nurses were registered in Nigeria over the last 3 years, while 42,000 left. The total number of nurses in Nigeria is reducing, not increasing as they claim. To express this situation graphically, the best graph to illustrate whether or not nurse Migration is a "Win-Win" for both Nigeria and England might have looked more like this (forgive the poor formatting!) Over the last 3 years Nigeria has lost a net 13,500 nurses. This is a loss of about 1% of their nurse workforce a year, while Nigeria needs an increase of around 2.5% nurses yearly just to keep up with population growth. This assumes that no nurses left or joined the Nigerian workforce for other reasons. Nurses may leave the Nigerian workforce due to retirement or for other work, while nurses could also be entering Nigeria from other countries to work - I doubt these adjustments would make a big difference to the overall analysis. Based on this data, it looks like England will win and Nigeria will lose. My main claim is that it is incorrect to claim a win-win scenario for two countries when emigration from Nigeria includes a majority of nurses leaving for many other countries - not just the UK. I'm very happy to be shown where I've gone wrong here and welcome any comments! An Author's response Privately, one of the authors briefly responded that their argument is based on the change in trainees and migrants over time. This still avoided my concern: you cannot look at migration outflows and inflows between two countries in a vacuum. The author also noted WHO data that shows a flat trend in the number of nurses per 1,000 people. I agree that tracking "nurses per capita" over time would be the best way to measure whether the nurse situation in Nigeria is improving or deteriorating. However the world bank data appears grossly inaccurate. Their "nurses per 1,000 population" number fluctuates implausibly between 1.75 per 1,000 in 2016, to almost half that 0.9 per 1,000 in 2018 then back up 1.5 per 1,000 the next year. Unless 100,000 nurses left Nigeria over 2 years then flooded back in the next year (not the case), the data is absurd and not to be trusted. The most proximate data we have to underst...
undefined
Jun 9, 2024 • 4min

EA - Being swallowed live is a common way for wild aquatic animals to die by MichaelStJules

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Being swallowed live is a common way for wild aquatic animals to die, published by MichaelStJules on June 9, 2024 on The Effective Altruism Forum. and it probably involves suffocation and chemical burns in digestive juices over minutes. This came up for this post, but seemed worth pointing out separately in its own post, so this post is mostly copied from there. I don't consider the referenced sources here highly reliable for how exactly prey die when swallowed live, but I wasn't able to find anything better when I checked. Predation is one of the most common ways for wild aquatic animals to die, perhaps the most common way (Dall et al., 1991 for penaeid shrimp, Hurst, 2007 for fish during the winter, Pauly & Palomares, 1989 for Peruvian anchoveta, Vollset et al., 2023). (Predation is also the largest cause of death among terrestrial vertebrates (Hill et al., 2019)). Predatory/carnivorous fish typically swallow their prey whole (Lindsay, 1984, Meekan et al., 2018, Luiz, 2019, Yasuda, 1960, Amundsen & Sánchez-Hernández, 2019, St John, 1999, Gill, 2003, Lundstedt et al., 2004), and so without tearing or chewing. This is despite having teeth. This is not true of all predatory fish, as Lindsay (1984) wrote: Fish that break up food by means of pharyngeal teeth (cyprinids) or other modifications to the buccal cavity (wrasse, rays) had low chitinase activity in spite of consuming chitin in the diet while those fish that tend to gulp prey whole (salmonids, gadoids, perch, eel, red mullet, gurnards, mackerel) had high activity. This suggests that the primary function of gastric chitinase is to disrupt the chitinous envelope of prey allowing access to the soft inner tissues by the digestive juices. This role would also be functional in those fish destined to be piscivorous as adults but which gulp down relatively large invertebrate prey when young. The cause of death in live swallowing by fish seems most likely to be suffocation/asphyxiation, i.e. too little oxygen in the prey's blood, due to too little dissolved oxygen in the predator fish's stomach or due to damage to the prey's gills from digestive juices (Waterfield, 2021, Reddit AskScience thread, Poe GPT-4o, Poe Web-Search). Other possibilities include digestive processes (stomach acid, enzymes) and mechanical injury, e.g. crushing (own guesses, Poe GPT-4o, Poe Web-Search). Fish can survive minutes outside of water or without oxygen in water, and another fish's stomach may have some swallowed oxygenated water. The fisheries scientist Gerald Waterfield (2021) wrote: My best estimate of the time that the consumed fish stays alive is from about 15 to 25 minutes, after which the fish dies from lack of oxygen. This process starts as soon as the fish enters the predator's throat. It happens a little slower at lower temperatures. Even if the prey fish were regurgitated a few minutes fewer than this time, it probably would still expire due to brain damage from the restricted oxygen intake and it would be blinded by its eyes having been greatly damaged from stomach acid. On the other hand, Poe GPT-4o responded that asphyxiation "can cause death within a minute or two" and death "typically occurs within a matter of seconds to a few minutes, primarily due to asphyxiation and physical trauma", but could not provide direct sources for its claims when prompted. My best guess is that it takes at least minutes, in line with the survival time for fish out of water, and because they probably won't have been substantially injured until reaching the stomach. During this time, besides potential suffering from suffocation and fear, they probably suffer from chemical burns and tissue damage from digestive juices. They might lose consciousness and so stop suffering some time before they die, but I don't know how long before. Thanks for listenin...
undefined
Jun 8, 2024 • 30min

EA - Summary of Situational Awareness - The Decade Ahead by OscarD

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Summary of Situational Awareness - The Decade Ahead, published by OscarD on June 8, 2024 on The Effective Altruism Forum. Original by Leopold Aschenbrenner, this summary is not commissioned or endorsed by him. Short Summary Extrapolating existing trends in compute, spending, algorithmic progress, and energy needs implies AGI (remote jobs being completely automatable) by ~2027. AGI will greatly accelerate AI research itself, leading to vastly superhuman intelligences being created ~1 year after AGI. Superintelligence will confer a decisive strategic advantage militarily by massively accelerating all spheres of science and technology. Electricity use will be a bigger bottleneck on scaling datacentres than investment, but is still doable domestically in the US by using natural gas. AI safety efforts in the US will be mostly irrelevant if other actors steal the model weights of an AGI. US AGI research must employ vastly better cybersecurity, to protect both model weights and algorithmic secrets. Aligning superhuman AI systems is a difficult technical challenge, but probably doable, and we must devote lots of resources towards this. China is still competitive in the AGI race, and China being first to superintelligence would be very bad because it may enable a stable totalitarian world regime. So the US must win to preserve a liberal world order. Within a few years both the CCP and USG will likely 'wake up' to the enormous potential and nearness of superintelligence, and devote massive resources to 'winning'. USG will nationalise AGI R&D to improve security and avoid secrets being stolen, and to prevent unconstrained private actors from becoming the most powerful players in the world. This means much of existing AI governance work focused on AI company regulations is missing the point, as AGI will soon be nationalised. This is just one story of how things could play out, but a very plausible and scarily soon and dangerous one. I. From GPT-4 to AGI: Counting the OOMs Past AI progress Increases in 'effective compute' have led to consistent increases in model performance over several years and many orders of magnitude (OOMs) GPT-2 was akin to roughly a preschooler level of intelligence (able to piece together basic sentences sometimes), GPT-3 at the level of an elementary schooler (able to do some simple tasks with clear instructions), and GPT-4 similar to a smart high-schooler (able to write complicated functional code, long coherent essays, and answer somewhat challenging maths questions). Superforecasters and experts have consistently underestimated future improvements in model performance, for instance: The creators of the MATH benchmark expected that "to have more traction on mathematical problem solving we will likely need new algorithmic advancements from the broader research community". But within a year of the benchmark's release, state-of-the-art (SOTA) models went from 5% to 50% accuracy, and are now above 90%. Professional forecasts made in August 2021 expected the MATH benchmark score of SOTA models to be 12.7% in June 2022, but the actual score was 50%. Experts like Yann LeCun and Gary Marcus have falsely predicted that deep learning will plateau. Bryan Caplan is on track to lose a public bet for the first time ever after GPT-4 got an A on his economics exam just two months after he bet no AI could do this by 2029. We can decompose recent progress into three main categories: Compute: GPT-2 was trained in 2019 with an estimated 4e21 FLOP, and GPT-4 was trained in 2023 with an estimated 8e24 to 4e25 FLOP.[1] This is both because of hardware improvements (Moore's Law) and increases in compute budgets for training runs. Adding more compute at test-time, e.g. by running many copies of an AI, to allow for debate and delegation between each instance, could further boos...
undefined
Jun 8, 2024 • 53min

AF - 2. Corrigibility Intuition by Max Harms

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: 2. Corrigibility Intuition, published by Max Harms on June 8, 2024 on The AI Alignment Forum. (Part 2 of the CAST sequence) As a reminder, here's how I've been defining "corrigible" when introducing the concept: an agent is corrigible when it robustly acts opposite of the trope of "be careful what you wish for" by cautiously reflecting on itself as a flawed tool and focusing on empowering the principal to fix its flaws and mistakes. This definition is vague, imprecise, and hides a lot of nuance. What do we mean by "flaws," for example? Even the parts that may seem most solid, such as the notion of there being a principal and an agent, may seem philosophically confused to a sufficiently advanced mind. We'll get into trying to precisely formalize corrigibility later on, but part of the point of corrigibility is to work even when it's only loosely understood. I'm more interested in looking for something robust (i.e. simple and gravitational) that can be easily gestured at, rather than trying to find something that has a precise, unimpeachable construction.[1] Towards this end, I think it's valuable to try and get a rich, intuitive feeling for what I'm trying to talk about, and only attempt technical details once there's a shared sense of the outline. So in this document I'll attempt to build up details around what I mean by "corrigibility" through small stories about a purely corrigible agent whom I'll call Cora, and her principal, who I'll name Prince. These stories will attempt to demonstrate how some desiderata (such as obedience) emerge naturally from corrigibility, while others (like kindness) do not, as well as provide some texture on the ways in which the plain-English definition above is incomplete. Please keep in mind that these stories are meant to illustrate what we want, rather than how to get what we want; actually producing an agent that actually has all the corrigibility desiderata will take a deeper, better training set than just feeding these stories to a language model or whatever. In the end, corrigibility is not the definition given above, nor is it the collection of these desiderata, but rather corrigibility is the simple concept which generates the desiderata and which might be loosely described by my attempt at a definition. I'm going to be vague about the nature of Cora in these stories, with an implication that she's a somewhat humanoid entity with some powers, a bit like a genie. It probably works best if you imagine that Cora is actually an egoless, tool-like AGI, to dodge questions of personhood and slavery.[2] The relationship between a purely corrigible agent and a principal is not a healthy way for humans to relate to each other, and if you imagine Cora is a human some of these examples may come across as psychopathic or abusive. While corrigibility is a property we look for in employees, I think the best employees bring human values to their work, and the best employers treat their employees as more than purely corrigible servants. On the same theme, while I describe Prince as a single person, I expect it's useful to sometimes think of him more like a group of operators who Cora doesn't distinguish. To engage our intuitions, the setting resembles something like Cora being a day-to-day household servant doing mundane tasks, despite that being an extremely reckless use for a general intelligence capable of unconstrained self-improvement and problem-solving. The point of these stories is not to describe an ideal setup for a real-world AGI. In fact, I spent no effort on describing the sort of world that we might see in the future, and many of these scenarios depict a wildly irresponsible and unwise use of Cora. The point of these stories is to get a better handle on what it means for an agent to be corrigible, not to serve as a role-model for...
undefined
Jun 8, 2024 • 3min

LW - Two easy things that maybe Just Work to improve AI discourse by jacobjacob

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Two easy things that maybe Just Work to improve AI discourse, published by jacobjacob on June 8, 2024 on LessWrong. So, it seems AI discourse on X / Twitter is getting polarised. This is bad. Especially bad is how some engage in deliberate weaponization of discourse, for political ends. At the same time, I observe: AI Twitter is still a small space. There are often important posts that have only ~100 likes, ~10-100 comments, and maybe ~10-30 likes on top comments. Moreover, it seems to me little sane comments, when they do appear, do get upvoted. This is... crazy! Consider this thread: A piece of legislation is being discussed, with major ramifications for regulation of frontier models, and... the quality of discourse hinges on whether 5-10 random folks show up and say some sensible stuff on Twitter!? It took me a while to see these things. I think I had a cached view of "political discourse is hopeless, the masses of trolls are too big for anything to matter, unless you've got some specialised lever or run one of these platforms". I now think I was wrong. Just like I was wrong for many years about the feasibility of public and regulatory support for taking AI risk seriously. This begets the following hypothesis: AI discourse might currently be small enough that we could basically just brute force raise the sanity waterline. No galaxy-brained stuff. Just a flood of folks making... reasonable arguments. It's the dumbest possible plan: let's improve AI discourse by going to places with bad discourse and making good arguments. I recognise this is a pretty strange view, and does counter a lot of priors I've built up hanging around LessWrong for the last couple years. If it works, it's because of a surprising, contingent, state of affairs. In a few months or years the numbers might shake out differently. But for the time being, plausibly the arbitrage is real. Furthermore, there's of course already a built-in feature, with beautiful mechanism design and strong buy-in from leadership, for increasing the sanity waterline: Community Notes. It's a feature that allows users to add "notes" to tweets providing context, and then only shows those notes if ~they get upvoted by people who usually disagree. Yet... outside of massive news like the OpenAI NDA scandal, Community Notes is barely being used for AI discourse. I'd guess it's probably no more interesting reason than that few people use community notes overall, multiplied by few of those people engaging in AI discourse. Again, plausibly, the arbitrage is real. If you think this sounds compelling, here's two easy ways that might just work to improve AI discourse: 1. Make an account on X. When you see invalid or bad faith arguments on AI: reply with valid arguments. Upvote other such replies. 2. Join Community Notes at this link. Start writing and rating posts. (You'll to need to rate some posts before you're allowed to write your own.) And, above all: it doesn't matter what conclusion you argue for; as long as you make valid arguments. Pursue asymmetric strategies, the sword that only cuts if your intention is true. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Jun 8, 2024 • 3min

EA - The Market Failure that Will Force Doctors to Keep Writing Opioid Prescriptions by mincho

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Market Failure that Will Force Doctors to Keep Writing Opioid Prescriptions, published by mincho on June 8, 2024 on The Effective Altruism Forum. This is an article of mine from Recursive Adaptation. We are always looking for new collaborators as we expand our research and policy development! Excerpts The Problem Suzetrigine is a novel non-opioid, sodium channel painkiller with no addictive potential, from mid-size pharma company Vertex. It has shown positive results in Phase 3 and is being submitted to the FDA for approval by the end of 2024 or early 2025. Suzetrigine is roughly equivalent in pain reduction efficacy to Vicodin (hydrocodone / APAP), which is one of the most commonly prescribed opioids. We recently reviewed the efficacy and promise of suzetrigine in this article. There are other broad non-opioid painkillers in the pipelines of other companies that may arrive over the next few years. This is great news - can we start removing opioids from medical treatment to prevent patients from developing addictions? Unfortunately, there's a huge obstacle: like other new medications, insurance providers will not pay for suzetrigine unless a doctor can show that a patient has tried a cheaper generic drug first and it was ineffective. And the cheaper generics for pain are opioids. Next year, as tens of millions of Americans are getting teeth pulled, having surgeries, and suffering from chronic nerve pain, they will needlessly receive opioid prescriptions while a much safer alternative sits on the shelf waiting for its patent to expire a decade from now. About 3-6% of those patients who are prescribed opioids will become dependent, generating millions of completely avoidable new opioid addictions every year ( 1, 2, 3). Non-patients will become addicted too, as many opioid addictions begin with prescriptions that were diverted from the original patient to someone else- leftover pills given to friends or discovered in a medicine cabinet. Proposed Solution Let's make a wild guess and say that Vertex expects to earn $100 million from suzetrigine in the US next year as a second-line treatment. That $100 million cost is already being borne by the public - through insurance premiums, taxes for Medicare and Medicaid, etc. Instead, if the federal government can step in and offer Vertex $120 million in exchange for making suzetrigine available at generic prices, that would be a win for everyone - Vertex would make more money and remain incentivized to develop safer pain treatments and the public would pay only an extra 20% to get the drug to 5X or 10X or 20X as many people. The result is that those patients are safer, addictions are avoided, and huge downstream costs from addiction are saved. Meanwhile, over the next few years, more non-opioid painkillers are expected to become available from other companies and once this arrangement is established, similar deals can be negotiated and renegotiated depending on market conditions. This should be an obvious public health win. We should not be willing to accept a single day where suzetrigine is FDA approved but is being denied by insurance companies, forcing doctors to prescribe unnecessary opioids. (More details of how this could be implemented are in the article) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Jun 8, 2024 • 42min

EA - 170 to 700 billion fish raised and fed live to mandarin fish in China per year by MichaelStJules

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: 170 to 700 billion fish raised and fed live to mandarin fish in China per year, published by MichaelStJules on June 8, 2024 on The Effective Altruism Forum. Summary 1. Around 1.2 to 1.9 trillion fish fry have been produced by artificial propagation (breeding) annually in China, and it seems those farmed for direct human consumption and pre-harvest mortality can only account for at most around 460 billion of them (more). 2. I estimate that 170 billion to 700 billion animals (my 80% credible interval) - probably almost all artificially propagated fish - are fed live to mandarin fish (or Chinese perch), Siniperca chuatsi, in China annually, with 9 to 55 billion of them (my 80% credible interval) alive at any time (more, Guesstimate model). 3. By contrast, the number of farmed fish produced globally for direct human consumption is around 111 billion per year, and 103 billion alive at a time, with another 35 to 150 billion raised and stocked per year (Šimčikas, 2020). 4. It's unclear how bad their deaths are as live feed to mandarin fish, but I'd guess they die by suffocation, digestion (stomach acid, enzymes), or mechanical injury, e.g. crushing, after being swallowed live, and probably a common way for aquatic animals to die by predation by fish in the wild (more). 5. It's unclear if there's much we can do to help these feed fish. There's been some progress in substituting artificial diets (including dead animal protein) for live feed for mandarin fish, but this has been a topic of research for over 20 years. Human diet interventions would need to be fairly targeted to be effective. I give a shallow overview of some possible interventions and encourage further investigation (more). Acknowledgements Thanks to Vasco Grilo, Saulius Šimčikas and Max Carpendale for feedback. All errors are my own. Fish fry production in China One of the early developmental stages of fish is the fry stage (Juvenile fish - Wikipedia). Šimčikas (2019, EA Forum), in his appendix section, raised the question of why hundreds of billions of fish fry were produced artificially (via artificial breeding, i.e. artificial propagation) in China in each of multiple years, yet only "28-92 billion" farmed fish were produced in China in 2015, "according to an estimate from Fishcount". He found that if the apparent discrepancy were due to pre-slaughter mortality, then this would indicate unusually low survival rates. He left open the reason for the apparent discrepancy and recommended further investigation. Before going into potential explanations for the discrepancy, I share some more recent numbers for the artificial propagation of fish: 1.9143 trillion fish fry in China in 2013 (Li & Xia, 2018) and 1.252 trillion freshwater fry and 167 million marine fish fry in China in 2019 (Hu et al., 2021). The 2019 numbers seem substantially lower than in 2013, so the trend may have reversed, one of these numbers is inaccurate, there's high variance across years or one of the years was unusual. Li and Xia (2018) also plot the trend over time up to 2013, along with total freshwater aquaculture: 28 to 92 billion farmed fish produced in China in 2015 (Fishcount) from 1 to 2 trillion artificially propagated fish fry, would suggest a pre-slaughter/pre-harvest survival rate of 1.4% to 9.2% (from the fry stage on). Survival rates are typically at least 20% for the most commonly farmed species, including carps and tilapias (Animal Charity Evaluators, 2020, table 4), for which China accounts for most production. And Šimčikas (2019, EA Forum) notes: Since hatchery-produced juveniles are already past the first stage of their lives in which pre-slaughter mortality is the highest, mortality during the grow-out period shouldn't be that high. Under fairly generous assumptions for an explanation based on pre-harvest mortality, using ...
undefined
Jun 8, 2024 • 9min

AF - Access to powerful AI might make computer security radically easier by Buck Shlegeris

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Access to powerful AI might make computer security radically easier, published by Buck Shlegeris on June 8, 2024 on The AI Alignment Forum. People talk about model weight security being really hard and crucial around the advent of AGI. (E.g. RAND report, Leopold; see here for some distinctions in these thread models that I think are important.) But I think that the thinking on this has not been sufficiently attentive to the fact that during that crucial time period, by assumption we'll have access to powerful AIs. I think that such access might make security wildly easier, by a bunch of different mechanisms, some of which I'll describe in this post. The story I'm telling here is pretty different from the main story I've heard people talk about in the past about AIs helping with computer security, which is that AIs can help with hardening software and other infrastructure. Though I agree that that seems really useful and important, here I'm not talking about AIs automating the process of security research, I'm talking about AI making it affordable to adopt policies that improve security in the organization but that wouldn't work (because they'd be too labor-intensive or lead to who-guards-the-guards problems) if you just had humans. In this post I'm going to be ignoring problems related to AIs themselves trying to cause security failures. There are two ways that such AIs affect this situation: The techniques I describe in this post could also be used, AI-control style, to use trusted models to reduce the risk of security failures caused by untrusted models. You can basically just replace "humans" by "untrusted models" everywhere in the text below. When applying these techniques to reduce risk from humans, you have to worry that these techniques will be sabotaged by scheming models. I'll ignore that possibility in this post, and talk about ways that you can improve security using AIs that are assumed not to be intentionally trying to subvert these security measures (which we call " trusted"). If you're worried about your models scheming, you'll have to do some other stuff too (mostly various sting operations), but I think the arguments in this post still mostly hold up. I'll talk about this more in future. I'm going to focus on insider threats here; most threats from outside humans involve first gaining access to impersonation of some insider. Much of this post came out of thinking about the question "how could AI control techniques be applied to humans, instead of AIs"? I don't think these arguments are an airtight case for "SL5 security will be trivial". But they do seem like a plausible hypothesis under which infosec will end up being quite simple to resolve, if labs decide it's an important priority. Four strategies for using powerful AI to improve security Monitoring As your humans take actions, you configure your system to have an AI that looks at these actions and then flags them for further review if they seem suspicious. If the actions seem potentially catastrophic, then the AI might even flag them before they can run. Concretely, I'm imagining that instead of being allowed to directly run any command you write in the bash console of the computer you're working on, whenever you write a command, it and the surrounding context are shown to an AI, which is charged with deciding how suspicious the action is. If it is very suspicious, the AI blocks the command and sends an alert to someone else at the company who is supposed to look it over. If the action is just somewhat suspicious, then it flags it for further review but does not block the action or alert the user. This is somewhat similar to using multi-party authorization. But with abundant AI labor, we could be more comprehensive, with fewer false positives and fewer false negatives, and evaluating a much grea...
undefined
Jun 8, 2024 • 1h 8min

AF - 1. The CAST Strategy by Max Harms

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: 1. The CAST Strategy, published by Max Harms on June 7, 2024 on The AI Alignment Forum. (Part 1 of the CAST sequence) AI Risk Introduction (TLDR for this section, since it's 101 stuff that many readers will have already grokked: Misuse vs Mistake; Principal-Agent problem; Omohundro Drives; we need deep safety measures in addition to mundane methods. Jump to "Sleepy-Bot" if all that seems familiar.) Earth is in peril. Humanity is on the verge of building machines capable of intelligent action that outstrips our collective wisdom. These superintelligent artificial general intelligences ("AGIs") are almost certain to radically transform the world, perhaps very quickly, and likely in ways that we consider catastrophic, such as driving humanity to extinction. During this pivotal period, our peril manifests in two forms. The most obvious peril is that of misuse. An AGI which is built to serve the interests of one person or party, such as jihadists or tyrants, may harm humanity as a whole (e.g. by producing bioweapons or mind-control technology). Power tends to corrupt, and if a small number of people have power over armies of machines we should expect horrible outcomes. The only solution to misuse is to ensure that the keys to the machine (once/if they exist) stay in the hands of wise, benevolent representatives, who use it only for the benefit of civilization. Finding such representatives, forming a consensus around trusting them, and ensuring they are the only ones with the power to do transformative things is a colossal task. But it is, in my view, a well-understood problem that we can, as a species, solve with sufficient motivation. The far greater peril, in my view, is that of a mistake. The construction of superintelligent AI is a form of the principal-agent problem. We have a set of values and goals that are important to us, and we need to somehow impart those into the machine. If we were able to convey the richness of human values to an AI, we would have a "friendly AI" which acted in our true interests and helped us thrive. However, this task is subtly hard, philosophically confused, technically fraught, and (at the very least) vulnerable to serious errors in execution. We should expect the first AGIs to have only a crude approximation of the goal they were trained to accomplish (which is, itself, likely only a subset of what we find valuable), with the severity of the difference growing exponentially with the complexity of the target. If an agent has a goal that doesn't perfectly match that of their principal, then, as it grows in power and intelligence, it will increasingly shape the world towards its own ends, even at the expense of what the principal actually cares about. The chance of a catastrophe happening essentially on accident (from the perspective of the humans) only grows as AGIs proliferate and we consider superhuman economies and a world shaped increasingly by, and for, machines. The history of human invention is one of trial and error. Mistakes are a natural part of discovery. Building a superintelligent agent with subtly wrong goals, however, is almost certainly a mistake worse than developing a new, hyper-lethal virus. An unaligned AGI will strategically act to accomplish its goals, and thus naturally be pulled to instrumentally convergent subgoals ("Omohundro Drives") such as survival, accumulation of resources, and becoming the dominant force on Earth. To maximize its chance of success, it will likely try to pursue these things in secret, defending itself from modification and attack by pretending to be aligned until it has the opportunity to decisively win. (All of this should be familiar background; unfamiliar readers are encouraged to read other, more complete descriptions of the problem.) To avoid the danger of mistakes, we need a way to exp...
undefined
Jun 8, 2024 • 16min

AF - 0. CAST: Corrigibility as Singular Target by Max Harms

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: 0. CAST: Corrigibility as Singular Target, published by Max Harms on June 7, 2024 on The AI Alignment Forum. What the heck is up with "corrigibility"? For most of my career, I had a sense that it was a grab-bag of properties that seemed nice in theory but hard to get in practice, perhaps due to being incompatible with agency. Then, last year, I spent some time revisiting my perspective, and I concluded that I had been deeply confused by what corrigibility even was. I now think that corrigibility is a single, intuitive property, which people can learn to emulate without too much work and which is deeply compatible with agency. Furthermore, I expect that even with prosaic training methods, there's some chance of winding up with an AI agent that's inclined to become more corrigible over time, rather than less (as long as the people who built it understand corrigibility and want that agent to become more corrigible). Through a slow, gradual, and careful process of refinement, I see a path forward where this sort of agent could ultimately wind up as a (mostly) safe superintelligence. And, if that AGI is in the hands of responsible governance, this could end the acute risk period, and get us to a good future. This is not the path we are currently on. As far as I can tell, frontier labs do not understand corrigibility deeply, and are not training their models with corrigibility as the goal. Instead, they are racing ahead with a vague notion of "ethical assistance" or "helpful+harmless+honest" and a hope that "we'll muddle through like we always do" or "use AGI to align AGI" or something with similar levels of wishful thinking. Worse, I suspect that many researchers encountering the concept of corrigibility will mistakenly believe that they understand it and are working to build it into their systems. Building corrigible agents is hard and fraught with challenges. Even in an ideal world where the developers of AGI aren't racing ahead, but are free to go as slowly as they wish and take all the precautions I indicate, there are good reasons to think doom is still likely. I think that the most prudent course of action is for the world to shut down capabilities research until our science and familiarity with AI catches up and we have better safety guarantees. But if people are going to try and build AGI despite the danger, they should at least have a good grasp on corrigibility and be aiming for it as the singular target, rather than as part of a mixture of goals (as is the current norm). My goal with these documents is primarily to do three things: 1. Advance our understanding of corrigibility, especially on an intuitive level. 2. Explain why designing AGI with corrigibility as the sole target (CAST) is more attractive than other potential goals, such as full alignment, or local preference satisfaction. 3. Propose a novel formalism for measuring corrigibility as a trailhead to future work. Alas, my writing is not currently very distilled. Most of these documents are structured in the format that I originally chose for my private notes. I've decided to publish them in this style and get them in front of more eyes rather than spend time editing them down. Nevertheless, here is my attempt to briefly state the key ideas in my work: 1. Corrigibility is the simple, underlying generator behind obedience, conservatism, willingness to be shut down and modified, transparency, and low-impact. 1. It is a fairly simple, universal concept that is not too hard to get a rich understanding of, at least on the intuitive level. 2. Because of its simplicity, we should expect AIs to be able to emulate corrigible behavior fairly well with existing tech/methods, at least within familiar settings. 2. Aiming for CAST is a better plan than aiming for human values (i.e. CEV), helpfulness+harmlessness+hon...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app