The Nonlinear Library

The Nonlinear Fund
undefined
Jul 22, 2024 • 3min

EA - Peter Singer AMA (July 30th) by Toby Tremlett

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Peter Singer AMA (July 30th), published by Toby Tremlett on July 22, 2024 on The Effective Altruism Forum. On July 30th, Peter Singer will be answering your questions in a Forum AMA. He has agreed to answer questions for an hour in the evening (Melbourne time), so if your question hasn't been answered by the 31st, it likely won't be. Singer needs little introduction for many people in the Forum. In fact, it is fairly likely that his work was the reason we first heard about effective altruism. However, I've included some information here to orient your questions, if you'd benefit from it. What Singer has been up to recently Singer retired from his Princeton professorship recently, ending with a conference celebrating his work (written about my Richard Chappell here - I also recommend this post as a place to start looking for questions to ask Singer). Since, then, he has: Started a podcast, L ives Well Lived, along with his frequent collaborator Kasia de Lazari-Radek, available on Spotify and Apple Podcasts. They've released episodes with Jane Goodall, Yuval Harari, Ingrid Newkirk, Daniel Kahneman, Kate Grant, and more. Published a dialogue with the female Buddhist monastic and ethicist Shih Chao-Hwei, called The Buddhist and the Ethicist. Continued his work on the Journal of Controversial Ideas. Started a substack, and written on various topics for Project Syndicate. EA-relevant moments in Singer's career For those who don't know, here are some top EA-relevant moments in Singer's career, which you might want to ask about: 1971- Singer wrote Famine, Affluence and Morality in response to the starving of Bangladesh Liberation War refugees, a moral philosophy paper which argued that we all have an obligation to help the people we can, whether they live near us, or far away. This paper is the origin of the drowning child argument. 1975- Singer published Animal Liberation, the book which arguably started the modern animal rights movement. Singer published a substantially updated version, Animal Liberation Now, in 2023. Singer has been an engaged supporter and critic of Effective Altruism since its inception, notably delivering a very popular TED talk about EA in 2013. NB: I'm adding Peter Singer as a co-author for this post, but it was written by me, Toby. Errors are my own. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Jul 22, 2024 • 26min

AF - Auto-Enhance: Developing a meta-benchmark to measure LLM agents' ability to improve other agents by Sam Brown

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Auto-Enhance: Developing a meta-benchmark to measure LLM agents' ability to improve other agents, published by Sam Brown on July 22, 2024 on The AI Alignment Forum. Summary Scaffolded LLM agents are, in principle, able to execute arbitrary code to achieve the goals they have been set. One such goal could be self-improvement. This post outlines our plans to build a benchmark to measure the ability of LLM agents to modify and improve other LLM agents. This 'Auto-Enhancement benchmark' measures the ability of 'top-level' agents to improve the performance of 'reference' agents on 'component' benchmarks, such as CyberSecEval 2, MLAgentBench, SWE-bench, and WMDP. Results are mostly left for a future post in the coming weeks. Scaffolds such as AutoGPT, ReAct, and SWE-agent can be built around LLMs to build LLM agents, with abilities such as long-term planning and context-window management to enable them to carry out complex general-purpose tasks autonomously. LLM agents can fix issues in large, complex code bases (see SWE-bench), and interact in a general way using web browsers, Linux shells, and Python interpreters. In this post, we outline our plans for a project to measure these LLM agents' ability to modify other LLM agents, undertaken as part of Axiom Futures' Alignment Research Fellowship. Our proposed benchmark consists of "enhancement tasks," which measure the ability of an LLM agent to improve the performance of another LLM agent (which may be a clone of the first agent) on various tasks. Our benchmark uses existing benchmarks as components to measure LLM agent capabilities in various domains, such as software engineering, cybersecurity exploitation, and others. We believe these benchmarks are consequential in the sense that good performance by agents on these tasks should be concerning for us. We plan to write an update post with our results at the end of the Fellowship, and we will link this post to that update. Motivation Agents are capable of complex SWE tasks (see, e.g., Yang et al.). One such task could be the improvement of other scaffolded agents. This capability would be a key component of autonomous replication and adaptation (ARA), and we believe it would be generally recognised as an important step towards extreme capabilities. This post outlines our initial plans for developing a novel benchmark that aims to measure the ability of LLM-based agents to improve other LLM-based agents, including those that are as capable as themselves. Threat model We present two threat models that aim to capture how AI systems may develop super-intelligent capabilities. Expediting AI research: Recent trends show how researchers are leveraging LLMs to expedite academic paper reviews (see Du et al.). ML researchers are beginning to use LLMs to design and train more advanced models (see Cotra's AIs accelerating AI research and Anthropic's work on Constitutional AI). Such LLM-assisted research may expedite progress toward super-intelligent systems. Autonomy: Another way that such capabilities are developed is through LLM agents themselves becoming competent enough to self-modify and further ML research without human assistance (see section Hard Takeoff in this note ), leading to an autonomously replicating and adapting system. Our proposed benchmark aims to quantify the ability of LLM agents to bring about such recursive self-improvement, either with or without detailed human supervision. Categories of bottlenecks and overhang risks We posit that there are three distinct categories of bottlenecks to LLM agent capabilities: 1. Architectures-of-thought, such as structured planning, progress-summarisation, hierarchy of agents, self-critique, chain-of-thought, self-consistency, prompt engineering/elicitation, and so on. Broadly speaking, this encompasses everything between the LL...
undefined
Jul 22, 2024 • 11min

AF - Coalitional agency by Richard Ngo

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Coalitional agency, published by Richard Ngo on July 22, 2024 on The AI Alignment Forum. The coalitional frame Earlier in this sequence I laid out an argument that the goals of increasingly intelligent AIs will become increasingly systematized, until they converge to squiggle-maximization. In my last post, though, I touched on two reasons why this convergence might not happen: humans trying to prevent it, and AIs themselves trying to prevent it. I don't have too much more to say about the former, but it's worth elaborating on the latter. The best way to understand the deliberate protection of existing goals is in terms of Bostrom's notion of instrumental convergence. Bostrom argues that goal preservation will be a convergent instrumental strategy for a wide range of agents. Perhaps it's occasionally instrumentally useful to change your goals - but once you've done so, you'll never want to course-correct back towards your old goals. So this is a strong reason to be conservative about your goals, and avoid changes where possible. One immediate problem with preserving goals, though: it requires that agents continue thinking in terms of the same concepts. But in general, an agent's concepts will change significantly as they learn more about the world. For example, consider a medieval theist whose highest-priority goal is ensuring that their soul goes to heaven not hell. Upon becoming smarter, they realize that none of souls, heaven, or hell exist. The sensible thing to do here would be to either discard the goal, or else identify a more reasonable adaptation of it (e.g. the goal of avoiding torture while alive). But if their goals were totally fixed, then their actions would be determined by a series of increasingly convoluted hypotheticals where god did exist after all. (Or to put it another way: continuing to represent their old goal would require recreating a lot of their old ontology.) This would incur a strong systematicity penalty. So while we should expect agents to have some degree of conservatism, they'll likely also have some degree of systematization. How can we reason about the tradeoff between conservatism and systematization? The approach which seems most natural to me makes three assumptions: 1. We can treat agents' goals as subagents optimizing for their own interests in a situationally-aware way. (E.g. goals have a sense of self-preservation.) 2. Agents have a meta-level goal of systematizing their goals; bargaining between this goal and object-level goals shapes the evolution of the object-level goals. 3. External incentives are not a dominant factor governing the agent's behavior, but do affect the bargaining positions of different goals, and how easily they're able to make binding agreements. I call the combination of these three assumptions the coalitional frame. The coalitional frame gives a picture of agents whose goals do evolve over time, but in a way which is highly sensitive to initial conditions - unlike squiggle-maximizers, who always converge to similar (from our perspective) goals. For coalitional agents, even "dumb" subagents might maintain significant influence as other subagents become highly intelligent, because they were able to lock in that influence earlier (just as my childhood goals exert a nontrivial influence over my current behavior). The assumptions I've laid out above are by no means obvious. I won't defend them fully here, since the coalitional frame is still fairly nascent in my mind, but I'll quickly go over some of the most obvious objections to each of them: Premise 1 assumes that subagents will have situational awareness. The idea of AIs themselves having situational awareness is under debate, so it's even more speculative to think about subagents having situational awareness. But it's hard to describe the dynamics of in...
undefined
Jul 21, 2024 • 14min

LW - A simple model of math skill by Alex Altair

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A simple model of math skill, published by Alex Altair on July 21, 2024 on LessWrong. I've noticed that when trying to understand a math paper, there are a few different ways my skill level can be the blocker. Some of these ways line up with some typical levels of organization in math papers: Definitions: a formalization of the kind of objects we're even talking about. Theorems: propositions on what properties are true of these objects. Proofs: demonstrations that the theorems are true of the objects, using known and accepted previous theorems and methods of inference. Understanding a piece of math will require understanding each of these things in order. It can be very useful to identify which of type of thing I'm stuck on, because the different types can require totally different strategies. Beyond reading papers, I'm also trying to produce new and useful mathematics. Each of these three levels has another associated skill of generating them. But it seems to me that the generating skills go in the opposite order. This feels like an elegant mnemonic to me, although of course it's a very simplified model. Treat every statement below as a description of the model, and not a claim about the totality of doing mathematics. Understanding Understanding these more or less has to go in the above order, because proofs are of theorems, and theorems are about defined objects. Let's look at each level. Definitions You might think that definitions are relatively easy to understand. That's usually true in natural languages; you often already have the concept, and you just don't happen to know that there's already a word for that. Math definitions are sometimes immediately understandable. Everyone knows what a natural number is, and even the concept of a prime number isn't very hard to understand. I get the impression that in number theory, the proofs are often the hard part, where you have to come up with some very clever techniques to prove theorems that high schoolers can understand (Fermat's last theorem, the Collatz conjecture, the twin primes conjecture). In contrast, in category theory, the definitions are often hard to understand. (Not because they're complicated per se, but because they're abstract.) Once you understand the definitions, then understanding proofs and theorems can be relatively immediate in category theory. Sometimes the definitions have an immediate intuitive understanding, and the hard part is understanding exactly how the formal definition is a formalization of your intuition. In a calculus class, you'll spend quite a long time understanding the derivative and integral, even though they're just the slope of the tangent and the area under the curve, respectively. You also might think that definitions were mostly in textbooks, laid down by Euclid or Euler or something. At least in the fields that I'm reading papers from, it seems like most papers have definitions (usually multiple). This is probably especially true for papers that are trying to help form a paradigm. In those cases, the essential purpose of the paper is to propose the definitions as the new paradigm, and the theorems are set forth as arguments that those definitions are useful. Theorems Theorems are in some sense the meat of mathematics. They tell you what you can do with the objects you've formalized. If you can't do anything meaty with an object, then you're probably holding the wrong object. Once you understand the objects of discussion, you have to understand what the theorem statement is even saying. I think this tends to be more immediate, especially because often, all the content has been pushed into the definitions, and the theorem will be a simpler linking statement, like "all As are Bs" or "All As can be decomposed into a B and a C". For example, the fundamental theorem of calculus...
undefined
Jul 21, 2024 • 2min

AF - aimless ace analyzes active amateur: a micro-aaaaalignment proposal by Luke H Miles

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: aimless ace analyzes active amateur: a micro-aaaaalignment proposal, published by Luke H Miles on July 21, 2024 on The AI Alignment Forum. This idea is so simple that I'm sure it's been had by someone somewhere. Suppose we have some method to make really smart honest AIs that do not have goals. Let's say it's a yes/no oracle. Our aimless ace. But we want to accomplish stuff! AIcorp wants the printmoneynow.py. I'm worried that if I train the ace to fold fitted sheets and unkink the garden hose then it will go overboard and fold my femur bone and unkink the Yellowstone super volcano. So let's use a bit-less-smart model to do tasks. The active amateur. We'll ask the ace if the amateur is plotting something. If yes, we shut down the amateur. Every day we also ask "is today the last day I can shut down the amateur?" (We do not generate code with the ace!) So we get super-smart supervision of a decent task AI. The temptation (ie competitive pressure) to agentify the oracle is there still, but no longer overwhelming. Maybe we can even have the amateur write ace_v2, have ace_v1 check ace_v2, switch to ace_v2, have amateur_v1 write amateur_v2, check it with ace_v2, then switch to amateur_v2. If this looks like it's avoiding the hard parts of the alignment problem, that's because it is! But does something actually go wrong? Was it too big an assumption that we could build an aimless honest ace? (Or if you think the idea is good, then a bit of positive encouragement would be wonderful.) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
undefined
Jul 21, 2024 • 8min

EA - animal welfare outside of EA - numbers and thoughts on UFAW 2024 by ARM

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: animal welfare outside of EA - numbers and thoughts on UFAW 2024, published by ARM on July 21, 2024 on The Effective Altruism Forum. Last week I attended the UFAW International Animal Welfare Conference 2024 in Porto. Here are some facts about the conference and its content as well as some personal impressions. I hope this gives some insights into animal welfare work outside of EA. Of course, this is just one conference, though. I am not affiliated with UFAW and did not present any work at the conference. TLDR The UFAW International Animal Welfare Conference 2024 was an academic two-day event with talks and posters most content was on farmed animals and lab animals there was some representation of wild animals, especially among the talks, including the keynote companion animals, working animals and animals in zoos/sanctuaries/rehabilitation centres together also made up a large part of the programme, mostly in the form of posters my personal impressions include most people attending were not vegetarian or vegan general enthusiasm about helping animals seemed low wild animal welfare is fringe, especially when considering non-anthropogenic harms about UFAW and the conference The Universities Federation for Animal Welfare (UFAW) is an animal welfare charity that focuses on supporting the science of animal welfare. They organise academic events, run the journal "Animal Welfare" and give grants and awards to researchers in the field. The UFAW International Animal Welfare Conference 2024 was a two-day conference with programme scheduled from 9 am until 6 pm both days (10/11 July 2024). There were two types of content: talks and posters. representation of different animal groups in the conference content overview I made pie charts to see how farmed, lab, wild, companion and other animals were represented in the programme. Here is the pie chart for talks: And here is the same for posters: For the data behind these plots, including notes on potentially controversial classifications, see this GitHub repo. wild animals I was firstly pleasantly surprised by the fact that there were a number of talks on wild animals in the programme. They made up 15% of talks. Wild animal welfare was even the topic of the keynote at the very start of the conference ("What is wild animal welfare like and why should we care about it?" by Clare Palmer). There were two talks on fertility control. One was a theoretical one on explicitly optimising total welfare by reducing population size, presented by Simon Eckerström Liedholm from the Wild Animal Initiative. The other one was by researchers from the Animal and Plant Health Agency in York and focused on using fertility control to replace rodenticide. There was one talk on feral animals, presented by Peter Sandøe. In the talk Peter explicitly considers feral animals to be separate from wild animals, but I included them in this statistic. Another talk was about "striking a balance between Iberian Lynx conservation efforts and animal welfare". The talk considers the welfare of the lynxes during captivity and reintroduction and the welfare of the rabbits used for food in captivity. However, it did not touch on the welfare of released lynxes or the welfare of any prey that the lynxes may hunt after release into the wild. Although the latter was pointed out by someone in the audience during questions after the talk. Finally, Michael Beaulieu from the Wild Animal Initiative gave a talk on using biologgers to study wild animal welfare. farmed animals 30% of talks and 37% of posters were on farmed animals. Topics included measuring welfare (corticosterone, vocalisations...) improving welfare (enrichments, outdoor access...) attitudes of producers attitudes of the public policy/law There was a small number of talks and posters that mentioned shrimp. lab animal...
undefined
Jul 21, 2024 • 1h 8min

EA - New 80k problem profile: Nuclear weapons by Benjamin Hilton

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: New 80k problem profile: Nuclear weapons, published by Benjamin Hilton on July 21, 2024 on The Effective Altruism Forum. Note: this post is a (minorly) edited version of a new 80,000 Hours problem profile. As the searingly bright white light of the first nuclear blast faded away, the world entered a new age. Since July 16, 1945, humanity has had access to technology capable of destroying civilisation. Amidst rising tensions and the return of war to Europe, we're potentially seeing the start of a new nuclear arms race. Meanwhile, the community of brilliant minds who worked throughout the Cold War to prevent nuclear catastrophe has all but disappeared. And that's a problem. It's a problem because the threat of nuclear destruction is still with us. But that also means that by addressing that threat, we could make humanity more likely to endure. Summary It's very plausible that there will be a nuclear war this century. If that does happen, there's a reasonable chance that war would cause some kind of nuclear winter, potentially killing billions, and possibly causing an existential catastrophe. Nuclear security is already a major topic of interest for governments, but has little attention from philanthropists and NGOs, so we think there are likely some neglected opportunities to reduce the risk. Most opportunities to influence the risk from nuclear weapons seem to be through working in government, researching key questions, working in communications to advocate for changes, or attempting to build the field (for example, by earning to give). Our overall view Recommended. Working on this issue seems to be among the best ways of improving the long-term future we know of, but all else equal, we think it's less pressing than our highest priority areas. Scale We believe work to reduce the probability of nuclear war has the potential for a large positive impact, as nuclear war would have devastating effects, both directly and through secondary effects such as nuclear winter. We think the chance of a nuclear war in the next 100 years is something like 20-50%. Estimates of the existential risk from nuclear war within the next 100 years range from 0.005-1%. We think the existential risk from nuclear war is around 0.01%. Neglectedness Current spending is around $1 billion per year (quality-adjusted). However, philanthropic spending is only around $30 million per year, and so we'd guess there are high-impact neglected opportunities to reduce the risk. Solvability Making progress on nuclear security seems somewhat tractable, and there are a variety of intermediate goals we might aim to achieve which could help. However, there's a real risk of causing harm. How likely is a nuclear war? From 1945 until 1991, much of the world lived in full knowledge that a constant threat of nuclear war loomed. Schoolkids did regular drills to prepare for nuclear attack; hundreds of thousands of fallout shelters were built across Europe and North America. After the end of the Cold War, fear of nuclear war gradually fell out of the public consciousness. But while the landscape of the threat has changed, the risk remains surprisingly - and scarily - high. Surveys of experts and superforecasters are one way we can try to put a number on the chances of some kind of nuclear war. Here's a table of every estimate we could find since 2000: Definition Annualised probability Probability by 2100 Source Nuclear weapon kills > 1,000 people 0.54% 33.34% Existential Persuasion Tournament (2022), superforecasters Nuclear weapon kills > 1,000 people 0.61% 37.09% Existential Persuasion Tournament (2022), domain experts Nuclear attack 2.21% 81.50% Lugar expert survey (2005) Nuclear war 1.00% 53.18% Applied Physics Laboratory (2021), Chapter 4 Nuclear detonation by a state actor causing at least 1 fatality 0.40% 26.11% Goo...
undefined
Jul 21, 2024 • 2min

LW - Why Georgism Lost Its Popularity by Zero Contradictions

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why Georgism Lost Its Popularity, published by Zero Contradictions on July 21, 2024 on LessWrong. Henry George's 1879 book Progress & Poverty was the second best-selling book in the entire world during the 1880s and 1890s, outsold by only the Bible. Nobody knows exactly how many copies it sold during those two decades since nobody was keeping track, but it definitely sold at least several millions of copies for sure. The Progressive Era is literally named after the book itself. Georgism used to have millions of followers, and many of them were very famous people. When Henry George died in 1897 (just a few days before the election for New York City mayor), an estimated 100,000 people attended this funeral. The mid-20th century labor economist and journalist George Soule wrote that George was "By far the most famous American economic writer," and "author of a book which probably had a larger world-wide circulation than any other work on economics ever written." Few people know it, but the board game Monopoly and its predecessor The Landlord's Game were actually created to promote the economic theories of Henry George, as noted in the second introduction paragraph of the Wikipedia article on board game Monopoly. The board games intend to show that economies that eliminate rent-seeking are better than ones that don't. So if Georgism used to have millions of supporters and solid economic reasoning, why did it never catch on and how did it lose its popularity over the past century? (see the rest of the post in the link) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Jul 21, 2024 • 9min

LW - (Approximately) Deterministic Natural Latents by johnswentworth

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: (Approximately) Deterministic Natural Latents, published by johnswentworth on July 21, 2024 on LessWrong. Background: Natural Latents: The Math, Natural Latents: The Concepts, Why Care About Natural Latents?, the prototypical semantics use-case. This post does not assume that you've read all of those, or even any of them. Suppose I roll a biased die 1000 times, and then roll the same biased die another 1000 times. Then... Mediation: The first 1000 rolls are approximately independent of the second 1000 given the bias (to reasonable precision). Redundancy: I can estimate the die's bias (to reasonable precision) with high confidence from either the first or second 1000 rolls. The die's bias is therefore a natural latent, which means it has various nice properties. Minimality: The bias is the smallest summary of all the information about the first 1000 rolls relevant to the second 1000 (and vice-versa). Maximality: The bias is the largest piece of information which can be calculated from the first 1000 rolls and also can separately be calculated from the second 1000 rolls. Any other variable which satisfies the above properties must tell us (approximately) the same information about the die rolls as the bias. Furthermore, the bias is a(n approximate) deterministic natural latent: the die's bias (to reasonable precision) is approximately determined by[1] the first 1000 die rolls, and also approximately determined by the second 1000 die rolls. That implies one more nice property: Uniqueness: The bias is the unique-up-to(-approximate)-isomorphism latent which has the above properties, making it a natural Schelling point for communication between agents. We've proven all that before, mostly in Natural Latents: The Math (including the addendum added six months after the rest of the post). But it turns out that the math is a lot shorter and simpler, and easily yields better bounds, if we're willing to assume (approximate) determinism up-front. That does lose us some theoretical tools (notably the resampling construction), but it gives a cleaner foundation for our expected typical use cases (like e.g. semantics). The goal of this post is to walk through that math. Background Tool: Determinism in Diagrams We're going to use diagrammatic proofs, specifically using Bayes nets. But it's non-obvious how to express (approximate) determinism using Bayes nets, or what rules diagrams follow when determinism is involved, so we'll walk through that first. This diagram says that Y is (approximately) determined by X: Intuitively, the literal interpretation of the diagram is: X mediates between Y and Y, i.e. Y itself tells me nothing more about Y once I already know X. That only makes sense if X tells me everything there is to know about Y, i.e. Y is determined by X. In the approximate case, we express the approximation error of the diagram as a KL-divergence, same as usual: ϵDKL(P[X=x,Y=y,Y=y']||P[X=x]P[Y=y|X=x]P[Y=y'|X=x]) If you get confused later about what it means to have two copies of the same variable in a diagram, go back to that line; that's the definition of the approximation error of the diagram. (One way to view that definition: there's actually two variables Y and Y', but P says that Y and Y' always have the same value.) That approximation error simplifies: DKL(P[X=x,Y=y,Y=y']||P[X=x]P[Y=y|X=x]P[Y=y'|X=x]) =DKL(P[X=x,Y=y]I[y=y']||P[X=x]P[Y=y|X=x]P[Y=y'|X=x]) =x,y,y'P[X=x,Y=y]I[y=y'](log(P[X=x,Y=y]I[y=y'])log(P[X=x]P[Y=y|X=x]P[Y=y'|X=x])) =x,yP[X=x,Y=y](log(P[X=x,Y=y])log(P[X=x]P[Y=y|X=x]P[Y=y|X=x])) =x,yP[X=x,Y=y]log(P[Y=y|X=x]) =H(Y|X) So the diagram says Y is determined by X, and the approximation error of the diagram is the entropy H of Y given X - i.e. the number of bits required on average to specify Y once one already knows X. Very intuitive! The Dangly Bit Lemma Intuitiv...
undefined
Jul 20, 2024 • 9min

AF - A more systematic case for inner misalignment by Richard Ngo

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A more systematic case for inner misalignment, published by Richard Ngo on July 20, 2024 on The AI Alignment Forum. This post builds on my previous post making the case that squiggle-maximizers are plausible. The argument I presented was a deliberately simplified one, though, and glossed over several possible issues. In this post I'll raise and explore three broad objections. (Before looking at mine, I encourage you to think of your own biggest objections to the argument, and jot them down in the comments.) Intelligence requires easily-usable representations "Intelligence as compression" is an interesting frame, but it ignores the tradeoff between simplicity and speed. Compressing knowledge too heavily makes it difficult to use. For example, it's very hard to identify most macroscopic implications of the Standard Model of physics, even though in theory all of chemistry could be deduced from it. That's why both humans and LLMs store a huge number of facts and memories in ways that our minds can access immediately, using up more space in exchange for rapid recall. Even superintelligences which are much better than humans at deriving low-level facts from high-level facts would still save time by storing the low-level facts as well. So we need to draw a distinction between having compressed representations, and having only compressed representations. The latter is what would compress a mind overall; the former could actually increase the space requirements, since the new compressed representations would need to be stored alongside non-compressed representations. This consideration makes premise 1 from my previous post much less plausible. In order to salvage it, we need some characterization of the relationship between compressed and non-compressed representations. I'll loosely define systematicity to mean the extent to which an agent's representations are stored in a hierarchical structure where representations at the bottom could be rederived from simple representations at the top. Intuitively speaking, this measures the simplicity of representations weighted by how "fundamental" they are to the agent's ontology. Let me characterize systematicity with an example. Suppose you're a park ranger, and you know a huge number of facts about the animals that live in your park. One day you learn evolutionary theory for the first time, which helps explain a lot of the different observations you'd made. In theory, this could allow you to compress your knowledge: you could forget some facts about animals, and still be able to rederive them later by reasoning backwards from evolutionary theory if you wanted to. But in practice, it's very helpful for you to have those facts readily available. So learning about evolution doesn't actually reduce the amount of knowledge you need to store. What it does do, though, is help structure that knowledge. Now you have a range of new categories (like "costly signaling" or "kin altruism") into which you can fit examples of animal behavior. You'll be able to identify when existing concepts are approximations to more principled concepts, and figure out when you should be using each one. You'll also be able to generalize far better to predict novel phenomena - e.g. the properties of new animals that move into your park. So let's replace premise 1 in my previous post with the claim that increasing intelligence puts pressure on representations to become more systematic. I don't think we're in a position where we can justify this in any rigorous way. But are there at least good intuitions for why this is plausible? One suggestive analogy: intelligent minds are like high-functioning organizations, and many of the properties you want in minds correspond to properties of such organizations: 1. You want disagreements between different people to be resolved...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app