The Nonlinear Library

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Aug 29, 2024 • 13min

AF - Solving adversarial attacks in computer vision as a baby version of general AI alignment by stanislavfort

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Solving adversarial attacks in computer vision as a baby version of general AI alignment, published by stanislavfort on August 29, 2024 on The AI Alignment Forum. I spent the last few months trying to tackle the problem of adversarial attacks in computer vision from the ground up. The results of this effort are written up in our new paper Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness (explainer on X/Twitter). Taking inspiration from biology, we reached state-of-the-art or above state-of-the-art robustness at 100x - 1000x less compute, got human-understandable interpretability for free, turned classifiers into generators, and designed transferable adversarial attacks on closed-source (v)LLMs such as GPT-4 or Claude 3. I strongly believe that there is a compelling case for devoting serious attention to solving the problem of adversarial robustness in computer vision, and I try to draw an analogy to the alignment of general AI systems here. 1. Introduction In this post, I argue that the problem of adversarial attacks in computer vision is in many ways analogous to the larger task of general AI alignment. In both cases, we are trying to faithfully convey an implicit function locked within the human brain to a machine, and we do so extremely successfully on average. Under static evaluations, the human and machine functions match up exceptionally well. However, as is typical in high-dimensional spaces, some phenomena can be relatively rare and basically impossible to find by chance, yet ubiquitous in their absolute count. This is the case for adversarial attacks - imperceptible modifications to images that completely fool computer vision systems and yet have virtually no effect on humans. Their existence highlights a crucial and catastrophic mismatch between the implicit human vision function and the function learned by machines - a mismatch that can be exploited in a dynamic evaluation by an active, malicious agent. Such failure modes will likely be present in more general AI systems, and our inability to remedy them even in the more restricted vision context (yet) does not bode well for the broader alignment project. This is a call to action to solve the problem of adversarial vision attacks - a stepping stone on the path to aligning general AI systems. 2. Communicating implicit human functions to machines The basic goal of computer vision can be viewed as trying to endow a machine with the same vision capabilities a human has. A human carries, locked inside their skull, an implicit vision function mapping visual inputs into semantically meaningful symbols, e.g. a picture of a tortoise into a semantic label tortoise. This function is represented implicitly and while we are extremely good at using it, we do not have direct, conscious access to its inner workings and therefore cannot communicate it to others easily. To convey this function to a machine, we usually form a dataset of fixed images and their associated labels. We then use a general enough class of functions, typically deep neural networks, and a gradient-based learning algorithm together with backpropagation to teach the machine how to correlate images with their semantic content, e.g. how to assign a label parrot to a picture of a parrot. This process is extremely successful in communicating the implicit human vision function to the computer, and the implicit human and explicit, learned machine functions agree to a large extent. The agreement between the two is striking. Given how different the architectures are (a simulated graph-like function doing a single forward pass vs the wet protein brain of a mammal running continuous inference), how different the learning algorithms are (gradient descent with backpropagation vs something completely different but still unknown), a...

Aug 29, 2024 • 4min

LW - In defense of technological unemployment as the main AI concern by tailcalled

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: In defense of technological unemployment as the main AI concern, published by tailcalled on August 28, 2024 on LessWrong. It seems to me that when normal people are concerned about AI destroying their life, they are mostly worried about technological unemployment, whereas rationalists think that it is a bigger risk that the AI might murder us all, and that automation gives humans more wealth and free time and is therefore good. I'm not entirely unsympathetic to the rationalist position here. If we had a plan for how to use AI to create a utopia where humanity could thrive, I'd be all for it. We have problems (like death) that we are quite far from solving, and which it seems like a superintelligence could in principle quickly solve. But this requires value alignment: we need to be quite careful what we mean by concepts like "humanity", "thrive", etc., so the AI can explicitly maintain good conditions. What kinds of humans do we want, and what kinds of thriving should they have? This needs to be explicitly planned by any agent which solves this task. Our current society doesn't say "humans should thrive", it says "professional humans should thrive"; certain alternative types of humans like thieves are explicitly suppressed, and other types of humans like beggars are not exactly encouraged. This is of course not an accident: professionals produce value, which is what allows society to exist in the first place. But with technological unemployment, we decouple professional humans from value production, undermining the current society's priority of human welfare. This loss is what causes existential risk. If humanity was indefinitely competitive in most tasks, the AIs would want to trade with us or enslave us instead of murdering us or letting us starve to death. Even if we manage to figure out how to value-align AIs, this loss leads to major questions about what to value-align the AIs to, since e.g. if we value human capabilities, the fact that those capabilities become uncompetitive likely means that they will diminish to the point of being vestigial. It's unclear how to solve this problem. Eliezer's original suggestion was to keep humans more capable than AIs by increasing the capabilities of humans. Yet even increasing the capabilities of humanity is difficult, let alone keeping up with technological development. Robin Hanson suggests that humanity should just sit back and live off our wealth as we got replaced. I guess that's the path we're currently on, but it is really dubious to me whether we'll be able to keep that wealth, and whether the society that replaces us will have any moral worth. Either way, these questions are nearly impossible to separate from the question of, what kinds of production will be performed in the future? Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Aug 28, 2024 • 13min

LW - Am I confused about the "malign universal prior" argument? by nostalgebraist

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Am I confused about the "malign universal prior" argument?, published by nostalgebraist on August 28, 2024 on LessWrong. In a 2016 blog post, Paul Christiano argued that the universal prior (hereafter "UP") may be "malign." His argument has received a lot of follow-up discussion, e.g. in Mark Xu's The Solomonoff Prior is Malign Charlie Steiner's The Solomonoff prior is malign. It's not a big deal. among other posts. This argument never made sense to me. The reason it doesn't make sense to me is pretty simple, but I haven't seen it mentioned explicitly in any of the ensuing discussion. This leaves me feeling like either I am misunderstanding the argument in a pretty fundamental way, or that there is a problem with the argument that has gotten little attention from the argument's critics (in which case I don't understand why). I would like to know which of these is the case, and correct my misunderstanding if it exists, hence this post. (Note: In 2018 I wrote a comment on the original post where I tried to state one of my objections to my argument, though I don't feel I expressed myself especially well there.) UP-using "universes" and simulatable "universes" The argument for malignity involves reasoning beings, instantiated in Turing machines (TMs), which try to influence the content of the UP in order to affect other beings who are making decisions using the UP. Famously, the UP is uncomputable. This means the TMs (and reasoning beings inside the TMs) will not be able to use[1] the UP themselves, or simulate anyone else using the UP. At least not if we take "using the UP" in a strict and literal sense. Thus, I am unsure how to interpret claims (which are common in presentations of the argument) about TMs "searching for universes where the UP is used" or the like. For example, from Mark Xu's "The Solomonoff Prior is Malign": In particular, this suggests a good strategy for consequentialists: find a universe that is using a version of the Solomonoff prior that has a very short description of the particular universe the consequentialists find themselves in. Or, from Christiano's original post: So the first step is getting our foot in the door - having control over the parts of the universal prior that are being used to make important decisions. This means looking across the universes we care about, and searching for spots within those universe where someone is using the universal prior to make important decisions. In particular, we want to find places where someone is using a version of the universal prior that puts a lot of mass on the particular universe that we are living in, because those are the places where we have the most leverage. Then the strategy is to implement a distribution over all of those spots, weighted by something like their importance to us (times the fraction of mass they give to the particular universe we are in and the particular channel we are using). That is, we pick one of those spots at random and then read off our subjective distribution over the sequence of bits that will be observed at that spot (which is likely to involve running actual simulations). What exactly are these "universes" that are being searched over? We have two options: 1. They are not computable universes. They permit hypercomputation that can leverage the "actual" UP, in its full uncomputable glory, without approximation. 2. They are computible universes. Thus the UP cannot be used in them. But maybe there is some computible thing that resembles or approximates the UP, and gets used in these universes. Option 1 seems hard to square with the talk about TMs "searching for" universes or "simulating" universes. A TM can't do such things to the universes of option 1. Hence, the argument is presumably about option 2. That is, although we are trying to reason about the content of...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

The Nonlinear Library

Episodes

Mentioned books

AF - Solving adversarial attacks in computer vision as a baby version of general AI alignment by stanislavfort

EA - Announcing the Strategic Animal Funding Circle! by JamesÖz

LW - How to hire somebody better than yourself by lukehmiles

LW - "Deception Genre" What Books are like Project Lawful? by Double

LW - things that confuse me about the current AI market. by DMMF

EA - Statistical foundations for worldview diversification by Karthik Tadepalli

EA - Legal Impact for Chickens is Hiring an Attorney by KathrynLIC

LW - Unit economics of LLM APIs by dschwarz

LW - In defense of technological unemployment as the main AI concern by tailcalled

LW - Am I confused about the "malign universal prior" argument? by nostalgebraist

The AI-powered Podcast Player