The Nonlinear Library

The Nonlinear Fund
undefined
Apr 2, 2024 • 21min

EA - The Rationale-Shaped Hole At The Heart Of Forecasting by dschwarz

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Rationale-Shaped Hole At The Heart Of Forecasting, published by dschwarz on April 2, 2024 on The Effective Altruism Forum. Thanks to Eli Lifland, Molly Hickman, Değer Turan, and Evan Miyazono for reviewing drafts of this post. The opinions expressed here are my own. Summary: Forecasters produce reasons and models that are often more valuable than the final forecasts Most of this value is being lost due to the historical practice & incentives of forecasting, and the difficulty of crowds to "adversarially collaborate" FutureSearch is a forecasting system with legible reasons and models at its core (examples at the end) The Curious Case of the Missing Reasoning Ben Landau-Taylor of Bismarck Analysis wrote a piece on March 6 called " Probability Is Not A Substitute For Reasoning", citing a piece where he writes: There has been a great deal of research on what criteria must be met for forecasting aggregations to be useful, and as Karger, Atanasov, and Tetlock argue, predictions of events such as the arrival of AGI are a very long way from fulfilling them. Last summer, Tyler Cowen wrote on AGI ruin forecasts: Publish, publish, not on blogs, not long stacked arguments or six hour podcasts or tweet storms, no, rather peer review, peer review, peer review, and yes with models too... if you wish to convince your audience of one of the most radical conclusions of all time…well, more is needed than just a lot of vertically stacked arguments. Widely divergent views and forecasts on AGI persist, leading to FRI's excellent adversarial collaboration on forecasting AI risk this month. Reading it, I saw… a lot of vertically stacked arguments. There have been other big advances in judgmental forecasting recently, on non-AGI AI, Covid19 origins and scientific progress. How well justified are the forecasts? Feb 28: Steinhardt's lab's impressive paper on "Approaching Human-Level Forecasting with Language Models" ( press). The pipeline rephrases the question, lists arguments, ranks them, adjusts for biases, and then guesses the forecast. They note "The model can potentially generate weak arguments", and the appendix shows some good ones (decision trees) and some bad ones. March 11: Good Judgment's 50-superforecast analysis of Covid-19 origins ( substack). Reports that the forecasters used base rates, scientific evidence, geopolitical context, and views from intelligence communities, but not what these were. (Conversely, the RootClaim debate gives so much info that even Scott Alexander's summary is a dozen pages.) 10 of the 50 superforecasters ended with a dissenting belief. March 18: Metaculus and Federation of American Scientists' pilot of forecasting expected value of scientific projects. "[T]he research proposals lacked details about their research plans, what methods and experimental protocols would be used, and what preliminary research the author(s) had done so far. This hindered their ability to properly assess the technical feasibility of the proposals and their probability of success." March 20: DeepMind's "Evaluating Frontier Models for Dangerous Capabilities", featuring Swift Centre forecasts ( X). Reports forecaster themes: "Across all hypotheticals, there was substantial disagreement between individual forecasters." Lists a few cruxes but doesn't provide any complete arguments or models. In these cases and the FRI collaboration, the forecasts are from top practitioners with great track records of accuracy (or "approaching" this, in the case of AI crowds). The questions are of the utmost importance. Yet what can we learn from these? Dylan Matthews wrote last month in Vox about "the tight connection between forecasting and building a model of the world." Where is this model of the world? FRI's adversarial collaboration did the best here. They list several "cruxes", and measu...
undefined
Apr 2, 2024 • 26min

LW - Notes on Dwarkesh Patel's Podcast with Sholto Douglas and Trenton Bricken by Zvi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Notes on Dwarkesh Patel's Podcast with Sholto Douglas and Trenton Bricken, published by Zvi on April 2, 2024 on LessWrong. Dwarkesh Patel continues to be on fire, and the podcast notes format seems like a success, so we are back once again. This time the topic is how LLMs are trained, work and will work in the future. Timestamps are for YouTube. Where I inject my own opinions or takes, I do my best to make that explicit and clear. This was highly technical compared to the average podcast I listen to, or that Dwarkesh does. This podcast definitely threated to technically go over my head at times, and some details definitely did go over my head outright. I still learned a ton, and expect you will too if you pay attention. This is an attempt to distill what I found valuable, and what questions I found most interesting. I did my best to make it intuitive to follow even if you are not technical, but in this case one can only go so far. Enjoy. (1:30) Capabilities only podcast, Trenton has 'solved alignment.' April fools! (2:15) Huge context tokens is underhyped, a huge deal. It occurs to me that the issue is about the trivial inconvenience of providing the context. Right now I mostly do not bother providing context on my queries. If that happened automatically, it would be a whole different ballgame. (2:50) Could the models be sample efficient if you can fit it all in the context window? Speculation is it might work out of the box. (3:45) Does this mean models are already in some sense superhuman, with this much context and memory? Well, yeah, of course. Computers have been superhuman at math and chess and so on for a while. Now LLMs have quickly gone from having worse short term working memory than humans to vastly superior short term working memory. Which will make a big difference. The pattern will continue. (4:30) In-context learning is similar to gradient descent. It gets problematic for adversarial attacks, but of course you can ignore that because as Tenton reiterates alignment is solved, and certainly it is solved for such mundane practical concerns. But it does seem like he's saying if you do this then 'you're fine-tuning but in a way where you cannot control what is going on'? (6:00) Models need to learn how to learn from examples in order to take advantage of long context. So does that mean the task of intelligence requires long context? That this is what causes the intelligence, in some sense, they ask? I don't think you can reverse it that way, but it is possible that this will orient work in directions that are more effective? (7:00) Dwarkesh asks about how long contexts link to agent reliability. Douglas says this is more about lack of nines of reliability, and GPT-4-level models won't cut it there. And if you need to get multiple things right, the reliability numbers have to multiply together, which does not go well in bulk. If that is indeed the issue then it is not obvious to me the extent to which scaffolding and tricks (e.g. Devin, probably) render this fixable. (8:45) Performance on complex tasks follows log scores. It gets it right one time in a thousand, then one in a hundred, then one in ten. So there is a clear window where the thing is in practice useless, but you know it soon won't be. And we are in that window on many tasks. This goes double if you have complex multi-step tasks. If you have a three-step task and are getting each step right one time in a thousand, the full task is one in a billion, but you are not so far being able to in practice do the task. (9:15) The model being presented here is predicting scary capabilities jumps in the future. LLMs can actually (unreliably) do all the subtasks, including identifying what the subtasks are, for a wide variety of complex tasks, but they fall over on subtasks too often and we do not know how to...
undefined
Apr 2, 2024 • 3min

EA - AIM's new guide to launching a high-impact non-profit policy organization by CE

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AIM's new guide to launching a high-impact non-profit policy organization, published by CE on April 2, 2024 on The Effective Altruism Forum. Author: Sam Hilton, AIM Director of Research In March 2021 I received an odd letter. It was from a guy I didn't know, David Quarrey, the UK's National Security Advisor. The letter thanked me for providing external expertise to the UK government's Integrated Review, which had been published that morning. It turns out that the Integrated Review has made a public commitment to "review our approach to risk assessment" ... "including how we account for interdependencies, cascading and compound risks". This is something I'd been advocating for over the previous few months by writing a policy paper and engaging with politicians and civil servants. It's hard to know how much my input changed government policy but I couldn't find much evidence of others advocating for this. I had set myself a 10-year goal to "have played a role in making the UK a leader in long-term resilience to extreme risks, championing the issue of extreme risks on the global stage." and I seemed to be making steps in that direction. After a few years working on, and, I believe, successfully changing UK policy a number of times I came away with the view that policy change is really just not that hard. You think carefully about what you can change, tell policy people what they need to do, network a lot to make sure that they hear you, and then sometimes they listen and sometimes they don't. But when they do you have pushed on a big lever and the world moves. It has surprised me a bit being at CE (now AIM) and finding that our incubatees are not that keen on this indirect approach to changing the world. Policy work has slow feedback loops, can be hard to measure, and what are you even doing in a policy role anyway?! And I get that. But it is a damn big lever to just ignore. So, firstly, I would like to share AIM's guide to launching a policy NGO. This is a document I and others have been working on internally for AIM to help founders understand what policy roles are like, how to drive change, what works and what does not, how to measure impact, and so on. This is not the full program content but should give you a decent taste of the kind of support we can provide. Secondly, I would like to note that AIM wants more founders who would be excited to start a policy organisation. If you think you could be plausibly excited about founding a policy organisation (or any of our upcoming recommended ideas!), I encourage you to apply for the Incubation Program here. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Apr 2, 2024 • 29min

EA - My favourite arguments against person-affecting views by EJT

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My favourite arguments against person-affecting views, published by EJT on April 2, 2024 on The Effective Altruism Forum. 1. Introduction According to person-affecting views (PAVs) in population ethics, adding happy people to the world is morally neutral. It's neither good nor bad. Are PAVs true? The question is important. If PAVs are true, then the EA community is likely spending way too much time and money on reducing x-risk. After all, a supposed major benefit of reducing x-risk is that it increases the chance that lots of happy people come into existence. If PAVs are true, this 'benefit' is no benefit at all. By contrast, if PAVs are false, then the EA community (and the world at large) is likely spending way too little time and money on reducing x-risk. After all, the future could contain a lot of happy people. So if adding happy people to the world is good, reducing x-risk is plausibly very good. And if PAVs are false, it's plausibly very important to ensure that people believe that PAVs are false. In spreading this belief, we reduce the risk of the following non-extinction failure-mode: humanity successfully navigates the transition to advanced AI but then creates way too few happy people. So it's important to figure out whether PAVs are true or false. The EA community has made efforts on this front, but the best-known arguments leave something to be desired. In particular, the arguments against PAVs mostly only apply to specific versions of these views.[1] Many other PAVs remain untouched. Nevertheless, I think there are strong arguments against PAVs in general. In this post, I sketch out some of my favourites. 2. The simple argument Before we begin, a quick terminological note. In this post, I use 'happy people' as shorthand for 'people whose lives are good overall' and 'miserable people' as shorthand for 'people whose lives are bad overall.' With that out the way, let's start with a simple argument: The simple argument 1. Some things are good (for example: happiness, love, friendship, beauty, achievement, knowledge, and virtue). 2. By creating happy people, we can bring more of these good things into the world. 3. And the more good things, the better. C1. Therefore, creating happy people can be good C2. Therefore, PAVs are false. 2.1. The classic PAV response Advocates of PAVs reject this simple argument. The classic PAV response begins with the following two claims:[2] The Person-Affecting Restriction One outcome can't be better than another unless it's better for some person. Existence Anticomparativism Existing can't be better or worse for a person than not-existing. Each of these two claims seems tough to deny. Consider first the Person-Affecting Restriction. How could one outcome be better than another if it's not better for anyone? Now consider Existence Anticomparativism. If existing could be better for a person than not-existing, then it seemingly must be that not-existing would be worse for that person than existing. But how can anything be better or worse for a person that doesn't exist?[3] So each of the two claims seems plausible, and they together imply that premise 3 of the simple argument is false: sometimes, bringing more good things into the world doesn't make the world better. Here's why. By creating a happy person, we bring more good things into the world. But our action isn't better for this happy person (by Existence Anticomparativism), nor is it better for anyone else (by stipulation), and so it isn't better for the world (by the Person-Affecting Restriction). By reasoning in this way, advocates of PAVs can defuse the simple argument and defend their claim that creating happy people isn't good. 2.2. The problem with the classic PAV response Now for the problem. The Person-Affecting Restriction and Existence Anticomparativism don't just...
undefined
Apr 2, 2024 • 1min

LW - LessWrong: After Dark, a new side of LessWrong by So8res

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: LessWrong: After Dark, a new side of LessWrong, published by So8res on April 2, 2024 on LessWrong. The LessWrong team has obviously been hard at work putting out their debut album. But another LessWrong feature also seems to have been released today, to less fanfare: LessWrong: After Dark, a branch of the site devoted to explicit discussion of sex and sexuality, where the LessWrong team finally gets to let loose their long-suppressed sexual instincts. As someone who's close friends with Aella, I'm thrilled to see this new branch of the site. Sex workers are heavily discriminated against in modern society, with limited access to banking, a heightened risk of physical injury, and an inability to rely on police. The topic of sex is overstigmatized in modern culture, and I'm glad to see that the LessWrong team has decided to accept the sexual aspect of the human experience, and that they now have a place to hornypost to their hearts' content. I'm looking forward to seeing what comes of rationalists applying rationality techniques to sex with the same dogged vigor and dubiously-directed determination that we apply to everything else. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Apr 2, 2024 • 17min

EA - Reasons for optimism about measuring malevolence to tackle x- and s-risks by Jamie Harris

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Reasons for optimism about measuring malevolence to tackle x- and s-risks, published by Jamie Harris on April 2, 2024 on The Effective Altruism Forum. Reducing the influence of malevolent actors seems useful for reducing existential risks (x-risks) and risks of astronomical suffering (s-risks). One promising strategy for doing this is to develop manipulation-proof measures of malevolence. I think better measures would be useful because: We could use them with various high-leverage groups, like politicians or AGI lab staff. We could use them flexibly (for information-only purposes) or with hard cutoffs. We could use them in initial selection stages, before promotions, or during reviews. We could spread them more widely via HR companies or personal genomics companies. We could use small improvements in measurements to secure early adopters. I think we can make progress on developing and using them because: It's neglected, so there will be low-hanging fruit There's historical precedent for tests and screening We can test on EA orgs Progress might be profitable The cause area has mainstream potential So let's get started on some concrete research! Context ~4 years ago, David Althaus and Tobias Baumann posted about the impact potential of "Reducing long-term risks from malevolent actors". They argued that: Dictators who exhibited highly narcissistic, psychopathic, or sadistic traits were involved in some of the greatest catastrophes in human history. Malevolent individuals in positions of power could negatively affect humanity's long-term trajectory by, for example, exacerbating international conflict or other broad risk factors. Malevolent humans with access to advanced technology - such as whole brain emulation or other forms of transformative AI - could cause serious existential risks and suffering risks… Further work on reducing malevolence would be valuable from many moral perspectives and constitutes a promising focus area for longtermist EAs. I and many others were impressed with the post. It got lots of upvotes on the EA Forum and 80,000 Hours listed it as an area that they'd be "equally excited to see some of our readers… pursue" as their list of the most pressing world problems. But I haven't seen much progress on the topic since. One of the main categories of interventions that Althaus and Baumann proposed was "The development of manipulation-proof measures of malevolence… [which] could be used to screen for malevolent humans in high-impact settings, such as heads of government or CEOs." Anecdotally, I've encountered scepticism that this would be either tractable or particularly useful, which surprised me. I seem to be more optimistic than anyone I've spoken to about it, so I'm writing up some thoughts explaining my intuitions. My research has historically been of the form: "assuming we think X is good, how do we make X happen?" This post is in a similar vein, except it's more 'initial braindump' than 'research'. It's more focused on steelmanning the case for than coming to a balanced, overall assessment. I think better measures would be useful We could use difficult-to-game measures of malevolence with various high-leverage groups: Political candidates Civil servants and others involved in the policy process Staff at A(G)I labs Staff at organisations inspired by effective altruism. Some of these groups might be more tractable to focus on first, e.g. EA orgs. And we could test in less risky environments first, e.g. smaller AI companies before frontier labs, or bureaucratic policy positions before public-facing political roles. The measures could be binding or used flexibly, for information-only purposes. For example, in a hiring process, there could either be some malevolence threshold above which a candidate is rejected without question, or test(s) for malevol...
undefined
Apr 2, 2024 • 14min

LW - Gradient Descent on the Human Brain by Jozdien

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Gradient Descent on the Human Brain, published by Jozdien on April 2, 2024 on LessWrong. TL;DR: Many alignment research proposals often share a common motif: figure out how to enter a basin of alignment / corrigibility for human-level models, and then amplify to more powerful regimes while generalizing gracefully. In this post we lay out a research agenda that comes at this problem from a different direction: if we already have ~human-level systems with extremely robust generalization properties, we should just amplify those directly. We'll call this strategy "Gradient Descent on the Human Brain". Introduction Put one way, the hard part of the alignment problem is figuring out how to solve ontology identification: mapping between an AI's model of the world and a human's model, in order to translate and specify human goals in an alien ontology. In generality, in the worst case, this is a pretty difficult problem. But is solving this problem necessary to create safe superintelligences? The assumption that you need to solve for arbitrary ontologies is true if you assume that the way to get to superintelligence necessarily routes through systems with different ontologies. We don't need to solve ontology translation for high-bandwidth communication with other humans[1]. Thus far, we haven't said anything really novel. The central problem to this approach, as any alignment researcher would know, is that we don't really have a good way to bootstrap the human brain to superintelligent levels. There have been a few attempts to approach this recently, though focusing on very prosaic methods that, at best, buy points on the margin. Scaling to superintelligence requires much stronger and robust methods of optimization. The Setup The basic setup is pretty simple, though there are a few nuances and extensions that are hopefully self-explanatory. The simple version: Take a hundred human brains, put them in a large vat, and run gradient descent on the entire thing. The human brain is a remarkably powerful artifact for its size, so finding a way to combine the capabilities of a hundred human brains with gradient descent should result in something significantly more powerful. As an intuition pump, think of how powerful human organizations are with significantly shallower communication bandwidth. At the very lowest bound we can surpass this, more impressive versions of this could look like an integrated single mind that combines the capabilities of all hundred brains. The specifics of what the training signal should be are, I think, a rather straightforward engineering problem. Some pretty off-the-cuff ideas, in increasing order of endorsement: Train them for specific tasks, such as Pong or Doom. This risks loss of generality, however. Train them to predict arbitrary input signals from the environment. The brain is pretty good at picking up on patterns in input streams, which this leverages to amplify latent capabilities. This accounts for the problem with lack of generality, but may not incentivize cross-brain synergy strongly. Train them to predict each other. Human brains being the most general-purpose objects in existence, this should be a very richly general training channel, and incentivizes brain-to-brain (B2B) interaction. This is similar in spirit to HCH. A slightly more sophisticated setup: Aside: Whose brains should we use for this? The comparative advantage of this agenda is the strong generalization properties inherent to the human brain[2]. However, to further push the frontier of safety and allow for a broad basin of graceful failure, we think that the brains used should have a strong understanding of alignment literature. We're planning on running a prototype with a few volunteer researchers - if you want to help, please reach out! Potential Directions More sophisticate...
undefined
Apr 2, 2024 • 18min

LW - Coherence of Caches and Agents by johnswentworth

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Coherence of Caches and Agents, published by johnswentworth on April 2, 2024 on LessWrong. There's a lot of confusion about what coherence means for agents, and what "coherence theorems" do and don't say about agents. In this post, I'll talk about some particularly simple notions of coherence in a particularly simple setting. We'll see what nontrivial things coherence has to say, at least in a simple kind of environment, starting with an analogous notion of coherence for caches. What Kind Of "Coherence" We're Talking About Here Let's start with a standard CS-101-style example. We write a recursive python function to compute fibonacci numbers: We pass in n = 0, then n = 1, then 2, then 3, etc. It spits out 1, 1, 2, 3, 5, 8, .... Great. Buuuuut it gets very slow very quickly as n increases; the runtime is exponential in n. So, standard simple improvement: memoize. The first time fib(n) is computed for each value of n, cache it (i.e. "make a memo" of the result). Now the recursive calculation will only happen once for each value of n, so runtime is linear in n. Ok, that's the CS 101 part. Now on to coherence. Imagine that the cache in our fibonacci program gets corrupted somehow. Maybe I mess around in the debugger and stick a few wrong numbers into it, maybe some other thread writes into it, whatever. Somehow, incorrect values end up in that cache. Key point: we can notice the cache corruption "locally", i.e. by only looking at a small subset of the cache. Say, for instance, that cache[6] is corrupted - it should be 8 (the sixth fibonacci number), but instead let's say it's 11, and let's assume for now that the rest of the cache is fine. So we're looking in the cache, and we see: cache[4] = 3 cache[5] = 5 cache[6] = 11 Well, just from those three entries we can tell that something's wrong, because 3 + 5 is not 11. It's supposed to be the case that cache[n] = cache[n-1] + cache[n-2] for any n bigger than 1, but that equation is not satisfied by these three cache entries. Our cache must be corrupt. And notice that we did not need to look at the rest of the cache in order to tell; we just needed to look at these three entries. That's what I mean when I say we can notice the cache corruption "locally". We'll want a word for when that sort of thing isn't happening, i.e. a word which says that cache[n] is equal to cache[n-1] + cache[n-2] (in this particular example). For that, we'll use the word "coherence". More generally: we'll say that a cache is coherent when small parts of the cache (like cache[n], cache[n-1], and cache[n-2] in this case) all locally satisfy some relationship (like cache[n] = cache[n-1] + cache[n-2]) which they're supposed to satisfy if everything is working correctly. (Note that our usage here is a lot more general than the most common usage of "coherence" in CS; it's most similar to the use of "coherence" in formal logic. "Coherence" in CS is usually about the more specific case where different threads/processes/servers each have their own caches of the same information which might not match. That's a special case of the more general notion of "coherence" we'll use in this post.) In the fibonacci example, if the whole cache is coherent, i.e. cache[n] = cache[n-1] + cache[n-2] for every n greater than 1, and cache[0] = cache[1] = 1, then the whole cache contains the values it's supposed to. In that case, the final cache entry, say e.g. cache[100], contains the result of fib(100). More generally, we're typically interested in "coherence" in cases where all the local constraints together yield some useful property "at the large scale". In logic, that might be a property like truth-preservation: put true assumptions in, get true conclusions out. In our fibonacci example, the useful "large scale" property is that the cache in fact contains the fibonacci se...
undefined
Apr 1, 2024 • 2min

LW - Announcing Suffering For Good by Garrett Baker

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing Suffering For Good, published by Garrett Baker on April 1, 2024 on LessWrong. TL;DR: We are excited to announce the new animal welfare organization Suffering For Good, a new factory farming charity aimed at vegans, where we use our excess profits to buy suffering offsets--in particular, an enormous number of rats on heroin. For decades, even centuries, us vegans have been trying but failing to get the world to stop eating & torturing sentient minds that can definitely feel pain & suffer. But the global number of such minds tortured & killed just keeps on increasing. We at Suffering for Good think its time we just gave up, and ask ourselves "how can we use this to our advantage?" We realized something when we asked that question. After decades of fighting this fight, we know far more about the factory farming industry than virtually anyone inside that industry. In that period of learning, and attempted dismantling, we had learned basically all the industry secrets, strategically releasing only the most gruesome, and least cost-effective practices, so as to maximize the public's awareness of the pain, and minimize the spread of good ideas. But it seems the public does not care about the suffering. Only we care about the suffering, and at the end of this long road we find our strength is in doing exactly what we hate most, but more effectively than anyone else. After months of debate and math, calculating our expected profit margins, the logistics of the heroin suppliers, of keeping our rats alive & fed, and the legality of this operation, we found that no matter what our assumptions were, as long as they were reasonable, our numbers came out the same: Suffering for Good is not only a viable charity, but we feel morally compelled to work on it, no matter how personally disgusted we feel by the conclusion. We unfortunately can't share the exact numbers publicly at this moment, however we will be sharing them with select funders. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Apr 1, 2024 • 5min

LW - OMMC Announces RIP by Adam Scholl

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OMMC Announces RIP, published by Adam Scholl on April 1, 2024 on LessWrong. At the Omnicide Machine Manufacturing Corporation, we work tirelessly to ensure an omnicide-free future. That's why we're excited to announce our Responsible Increase Policy (RIP) - our internal protocol for managing any risks that arise as we create increasingly omnicidal machines. Inspired by the risk-management framework used in gain-of-function virology research, our RIP defines a framework of Omnicidal Ability Levels (OAL), reflecting the precautions we plan to take as we release increasingly dangerous features over time: The basic idea of the RIP is simple: each time we ship an update which makes our product more lethal, we will pause our efforts for some amount of time, and then revise our policies to be in some sense more "cautious." For example, our RIP contains the following firm commitments: We aspire to take actions which are broadly good, rather than broadly bad; We hope to refrain from releasing any fairly omnicidal systems, until first implementing "certain safeguards"; And we intend to refrain from creating any systems which we're quite sure would kill everyone. That said, we want to acknowledge that even this cautious approach has drawbacks. For example, if our prevention measures are too weak, we risk catastrophe - potentially leading to extreme, knee-jerk regulatory responses, like banning omnicide machines altogether. On the other hand, if our precautions are too conservative, we risk ending up in a situation where someone who isn't us builds one first. This is a tricky needle to thread. History is rife with examples of countries deciding to heavily restrict, or even outright ban, technologies which they perceive as incredibly dangerous. So we have designed our RIP to tread lightly, and to exemplify a "minimum viable" safety policy - a well-scoped, small set of tests, that labs can feasibly comply with, and that places the least possible restrictions on frontier existential risks. The Sweet Lesson: Reasoning is Futile As an omnicide creation and prevention research company, we think it's important to seriously prepare for worlds in which our product ends up getting built. But the central insight of the modern era of gigantic machines - the so-called "Sweet Lesson" - is that it's possible to build incredibly powerful machines without first developing a deep theoretical understanding of how they work. Indeed, we currently see ourselves as operating under conditions of near-maximal uncertainty. Time and time again, it has proven futile to try to predict the effects of our actions in advance - new capabilities and failure modes often emerge suddenly and unexpectedly, and we understand little about why. As such, we endeavor to maintain an attitude of radical epistemic humility. In particular, we assume a uniform prior over the difficulty of survival: For now, this degree of wholesale, fundamental uncertainty seems inescapable. But in the long-run, we do hope to add information to our world-model - and thanks to our Gain of Omnicide research team, we may soon have it. Gain of Omnicide Our Gain of Omnicide research effort aims to generate this information by directly developing omnicidal capacity, in order to then learn how we could have done that safely. Moreover, our core research bet at OMMC is that doing this sort of empirical safety research effectively requires access to frontier omnicide machines. In our view, the space of possible threats from gigantic omnicide machines is simply too vast to be traversed from the armchair alone. That's why our motto is "Show Don't Tell" - we believe that to prevent the danger associated with these machines, we must first create that danger, since only then can we develop techniques to mitigate it. But this plan only works if our prototype...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app