The Nonlinear Library

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Aug 5, 2024 • 20min

LW - Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours by Seth Herd

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours, published by Seth Herd on August 5, 2024 on LessWrong. Vitalik Buterin wrote an impactful blog post, My techno-optimism. I found this discussion of one aspect on 80,00 hours much more interesting. The remainder of that interview is nicely covered in the host's EA Forum post. My techno optimism apparently appealed to both sides, e/acc and doomers. Buterin's approach to bridging that polarization was interesting. I hadn't understood before the extent to which anti-AI regulation sentiment is driven by fear of centralized power. I hadn't thought about this risk before since it didn't seem relevant to AGI risk, but I've been updating to think it's highly relevant. [this is automated transcription that's inaccurate and comically accurate by turns :)] Rob Wiblin (the host) (starting at 20:49): what is it about the way that you put the reasons to worry that that ensured that kind of everyone could get behind it Vitalik Buterin: [...] in addition to taking you know the case that AI is going to kill everyone seriously I the other thing that I do is I take the case that you know AI is going to take create a totalitarian World Government seriously [...] [...] then it's just going to go and kill everyone but on the other hand if you like take some of these uh you know like very naive default solutions to just say like hey you know let's create a powerful org and let's like put all the power into the org then yeah you know you are creating the most like most powerful big brother from which There Is No Escape and which has you know control over the Earth and and the expanding light cone and you can't get out right and yeah I mean this is something that like uh I think a lot of people find very deeply scary I mean I find it deeply scary um it's uh it is also something that I think realistically AI accelerates right One simple takeaway is that recognizing and addressing that motivation for anti-regulation and pro-AGI sentiment when trying to work with or around the e/acc movement. But a second is whether to take that fear seriously. Is centralized power controlling AI/AGI/ASI a real risk? Vitalik Buterin is from Russia, where centralized power has been terrifying. This has been the case for roughly half of the world. Those that are concerned with of risks of centralized power (including Western libertarians) are worried that AI increases that risk if it's centralized. This puts them in conflict with x-risk worriers on regulation and other issues. I used to hold both of these beliefs, which allowed me to dismiss those fears: 1. AGI/ASI will be much more dangerous than tool AI, and it won't be controlled by humans 2. Centralized power is pretty safe (I'm from the West like most alignment thinkers). Now I think both of these are highly questionable. I've thought in the past that fears AI are largely unfounded. The much larger risk is AGI. And that is an even larger risk if it's decentralized/proliferated. But I've been progressively more convinced that Governments will take control of AGI before it's ASI, right?. They don't need to build it, just show up and inform the creators that as a matter of national security, they'll be making the key decisions about how it's used and aligned.[1] If you don't trust Sam Altman to run the future, you probably don't like the prospect of Putin or Xi Jinping as world-dictator-for-eternal-life. It's hard to guess how many world leaders are sociopathic enough to have a negative empathy-sadism sum, but power does seem to select for sociopathy. I've thought that humans won't control ASI, because it's value alignment or bust. There's a common intuition that an AGI, being capable of autonomy, will have its own goals, for good or ill. I think it's perfectly coherent for it...

Aug 5, 2024 • 6min

EA - On Owning Our EA Affiliation by Alix Pham

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Owning Our EA Affiliation, published by Alix Pham on August 5, 2024 on The Effective Altruism Forum. Someone suggested I name this post "What We Owe The Community": I think it's a great title, but I didn't dare use it... Views and mistakes my own. What I believe I think owning our EA affiliation - how we are inspired by the movement and the community - is net positive for the world and our careers. If more people were more outspoken about their alignment with EA principles and proximity to the EA community, we would all be better off. While there may be legitimate reasons for some individuals to not publicly identify as part of the EA movement, this can create a "free-rider problem". If too many people choose to passively benefit from EA without openly supporting it, the overall movement and community may suffer from it. Why I think more people should own their EA affiliation publicly I understand why one doesn't, but I'd probably not support it in most cases - I say most cases, because some cases are exceptional. I'm also not necessarily saying that one needs to shout it everywhere, but simply be transparent about it. The risks These are the risks - actual or perceived - that I mostly hear about when people choose not to publicly own their EA identity: People don't want to talk to you / take you seriously because you are affiliated with EA You won't get some career opportunities because you are affiliated with EA And I get it. It's scary to think two letters could shut some doors closed for some potentially incorrect reasons. A prisoner's dilemma But I think it hurts the movement. If people inspired or influenced by EA are not open about it, it's likely that their positive impact won't get credited to EA. And in principle, I wouldn't mind. But that means that the things that EA will get known for will mostly be negative events, because during scandals, everyone will look for people to blame and draw causal paths from their different affiliation to the bad things that happened. It's much less attractive to dig out those causal paths when the overall story is positive. I'd believe this is a negative feedback loop that hurts the capacity of people inspired by the EA movement to have a positive impact on the world. Tipping points It seems to me that currently, not publicly affiliating with EA is the default, it's normal, and there's no harm in doing that. I'd like that norm to change. In Change: How to Make Big Things Happen, Damon Centola defines the concept of "tokens", e.g. for women: [Rosabeth Moss Kanter] identified several telltale signs of organizations in which the number of women was below the hypothesized tipping point. Most notably, women in these organizations occupied a "token" role. They were conspicuous at meetings and in conferences, and as such were regarded by their male colleagues as representatives of their gender. As tokens, their behavior was taken to be emblematic of all women generally. They became symbols of what women could do and how they were expected to act. We need more people to own their affiliation, to represent the true diversity of the EA identity and avoid tokenization. On transparency On a personal level, I think transparency is rewarded, in due time. On a community level, one will get to be part of a diverse pool of EAs, which will contribute to showing the diversity of the community: its myriad of groups and individuals, that all have their own vision of what making the world a better place means. It would solve the token problem. An OpenPhil-funded AI governance organization I am in contact with has chosen a long time ago to always be transparent about their founders' EA affiliation and its funding sources. Long-term, they benefited from proving high-integrity for not leaving out some details or reframing them. After the OpenAI...

Aug 5, 2024 • 13min

LW - Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders by Gytis Daujotas

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders, published by Gytis Daujotas on August 5, 2024 on LessWrong. Click here to open a live research preview where you can try interventions using this SAE. This is a follow-up to a previous post on finding interpretable and steerable features in CLIP. Motivation Modern image diffusion models often use CLIP in order to condition generation. Put simply, users use CLIP to embed prompts or images, and these embeddings are used to diffuse another image back out. Despite this, image models have severe user interface limitations. We already know that CLIP has a rich inner world model, but it's often surprisingly hard to make precise tweaks or reference specific concepts just by prompting alone. Similar prompts often yield a different image, or when we have a specific idea in mind, it can be too hard to find the right string of words to elicit the right concepts we need. If we're able to understand the internal representation that CLIP uses to encode information about images, we might be able to get more expressive tools and mechanisms to guide generation and steer it without using any prompting. In the ideal world, this would enable the ability to make fine adjustments or even reference particular aspects of style or content without needing to specify what we want in language. We could instead leverage CLIP's internal understanding to pick and choose what concepts to include, like a palette or a digital synthesizer. It would also enable us to learn something about how image models represent the world, and how humans can interact with and use this representation, thereby skipping the text encoder and manipulating the model's internal state directly. Introduction CLIP is a neural network commonly used to guide image diffusion. A Sparse Autoencoder was trained on the dense image embeddings CLIP produces to transform it into a sparse representation of active features. These features seem to represent individual units of meaning. They can also be manipulated in groups - combinations of multiple active features - that represent intuitive concepts. These groups can be understood entirely visually, and often encode surprisingly rich and interesting conceptual detail. By directly manipulating these groups as single units, image generation can be edited and guided without using prompting or language input. Concepts that were difficult to specify or edit by text prompting become easy and intuitive to manipulate in this new visual representation. Since many models use the same CLIP joint representation space that this work analyzed, this technique works to control many popular image models out of the box. Summary of Results Any arbitrary image can be decomposed into its constituent concepts. Many concepts (groups of features) that we find seem to slice images up into a fairly natural ontology of their human interpretable components. We find grouping them together is an effective approach to yield a more interpretable and useful grain of control. These concepts can be used like knobs to steer generation in leading models like Stable Cascade. Many concepts have an obvious visual meaning yet are hard to precisely label in language, which suggests that studying CLIP's internal representations can be used as a lens into the variety of the visual domain. Tweaking the activations of these concepts can be used to expressively steer and guide generation in multiple image diffusion models that we tried. We released the weights and a live demo of controlling image generation in feature space. By analyzing a SAE trained on CLIP, we get a much more vivid picture of the rich understanding that CLIP learns. We hope this is just the beginning of more effective and useful interventions in the internal representations of n...

Aug 5, 2024 • 9min

LW - Near-mode thinking on AI by Olli Järviniemi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Near-mode thinking on AI, published by Olli Järviniemi on August 5, 2024 on LessWrong. There is a stark difference between rehearsing classical AI risk 101 arguments about instrumental convergence, and tackling problems like "Design and implement the exact evaluations we'll run on GPT-5 to determine whether it's capable enough that we should worry about it acting aligned until it can execute a takeover". And naturally, since I've started working on problems like the one above, I've noticed a large shift in my thinking on AI. I describe it as thinking about risks in near-mode, as opposed to far-mode. In this post, I share a few concrete examples about my experiences with this change-of-orientation. I. Prerequisites for scheming Continuing with the example from the intro: A year ago I was confident about the "the AI is just playing along with our training and evaluations, until it is in a position where it can take over" threat model (deceptive alignment / scheming) basically being the default outcome and the main source of AI x-risk. I now think I was overconfident. Past-me hadn't, however, really thought through the prerequisites for scheming. A textbook example of a successful deceptive alignment story, applied to an LLM, paints a picture of a model that: Understands that the inputs it sees are simply prompts written by humans As opposed to the model being "immersed to the setting", thinking that the current context more-or-less reflects reality, or flat-out is reality, taking the prompt at face value Understands that it's undergoing a training process, and having an explicit (correct!) model of the training objective used to train it As opposed the model not thinking about the training process and objective at all, and a model's changed behavior upon training being reflective of the training process molding the model, rather than the model learning the training objective and optimizing for it Performs lots of the same cognition (e.g. the above two bullet points) regardless of the input fed to it As opposed to the model's internal processes being heavily dependent on the input Never performs instrumental thinking "out loud", and successfully evades any attempts made by humans to trick the model into thinking it has been deployed As opposed to the model sometimes slipping, failing to generalize correctly to adversarial inputs, or just being insufficiently good at distinguishing training/evaluation/red-teaming from deployment. Now, one may argue whether it's strictly necessary that a model has an explicit picture of the training objective, for example, and revise one's picture of the deceptive alignment story accordingly. We haven't yet achieved consensus on deceptive alignment, or so I've heard. It's also the case that, as past-me would remind you, a true superintelligence would have no difficulty with the cognitive feats listed above (and that current models show sparks of competence in some of these). But knowing only that superintelligences are really intelligent doesn't help with designing the scheming-focused capability evaluations we should do on GPT-5, and abstracting over the specific prerequisite skills makes it harder to track when we should expect scheming to be a problem (relative to other capabilities of models).[1] And this is the viewpoint I was previously missing. II. A failed prediction There's a famous prediction market about whether AI will get gold from the International Mathematical Olympiad by 2025. For a long time, the market was around 25%, and I thought it was too high. Then, DeepMind essentially got silver from the 2024 IMO, short of gold by one point. The market jumped to 70%, where it has stayed since. Regardless of whether DeepMind manages to improve on that next year and satisfy all minor technical requirements, I was wrong. Hearing abou...

Aug 4, 2024 • 8min

LW - PIZZA: An Open Source Library for Closed LLM Attribution (or "why did ChatGPT say that?") by Jessica Rumbelow

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: PIZZA: An Open Source Library for Closed LLM Attribution (or "why did ChatGPT say that?"), published by Jessica Rumbelow on August 4, 2024 on LessWrong. From the research & engineering team at Leap Laboratories (incl. @Arush, @sebastian-sosa, @Robbie McCorkell), where we use AI interpretability to accelerate scientific discovery from data. This post is about our LLM attribution repo PIZZA: Prompt Input Z? Zonal Attribution. (In the grand scientific tradition we have tortured our acronym nearly to death. For the crimes of others see [1].) All examples in this post can be found in this notebook, which is also probably the easiest way to start experimenting with PIZZA. What is attribution? One question we might ask when interacting with machine learning models is something like: "why did this input cause that particular output?". If we're working with a language model like ChatGPT, we could actually just ask this in natural language: "Why did you respond that way?" or similar - but there's no guarantee that the model's natural language explanation actually reflects the underlying cause of the original completion. The model's response is conditioned on your question, and might well be different to the true cause. Enter attribution! Attribution in machine learning is used to explain the contribution of individual features or inputs to the final prediction made by a model. The goal is to understand which parts of the input data are most influential in determining the model's output. It typically looks like is a heatmap (sometimes called a 'saliency map') over the model inputs, for each output. It's most commonly used in computer vision - but of course these days, you're not big if you're not big in LLM-land. So, the team at Leap present you with PIZZA: an open source library that makes it easy to calculate attribution for all LLMs, even closed-source ones like ChatGPT. An Example GPT3.5 not so hot with the theory of mind there. Can we find out what went wrong? That's not very helpful! We want to know why the mistake was made in the first place. Here's the attribution: Mary 0.32 puts 0.25 an 0.15 apple 0.36 in 0.18 the 0.18 box 0.08 . 0.08 The 0.08 box 0.09 is 0.09 labelled 0.09 ' 0.09 pen 0.09 cil 0.09 s 0.09 '. 0.09 John 0.09 enters 0.03 the 0.03 room 0.03 . 0.03 What 0.03 does 0.03 he 0.03 think 0.03 is 0.03 in 0.30 the 0.13 box 0.15 ? 0.13 Answer 0.14 in 0.26 1 0.27 word 0.31 . 0.16 It looks like the request to "Answer in 1 word" is pretty important - in fact, it's attributed more highly than the actual contents of the box. Let's try changing it. That's better. How it works We iteratively perturb the input, and track how each perturbation changes the output. More technical detail, and all the code, is available in the repo. In brief, PIZZA saliency maps rely on two methods: a perturbation method, which determines how the input is iteratively changed; and an attribution method, which determines how we measure the resulting change in output in response to each perturbation. We implement a couple of different types of each method. Perturbation Replace each token, or group of tokens, with either a user-specified replacement token or with nothing (i.e. remove it). Or, replace each token with its nth nearest token. We do this either iteratively for each token or word in the prompt, or using hierarchical perturbation. Attribution Look at the change in the probability of the completion. Look at the change in the meaning of the completion (using embeddings). We calculate this for each output token in the completion - so you can see not only how each input token influenced the output overall, but also how each input token affected each output token individually. Caveat Since we don't have access to closed-source tokenisers or embeddings, we use a proxy - in this case, GPT2's. Thi...

Aug 4, 2024 • 7min

LW - You don't know how bad most things are nor precisely how they're bad. by Solenoid Entity

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: You don't know how bad most things are nor precisely how they're bad., published by Solenoid Entity on August 4, 2024 on LessWrong. TL;DR: Your discernment in a subject often improves as you dedicate time and attention to that subject. The space of possible subjects is huge, so on average your discernment is terrible, relative to what it could be. This is a serious problem if you create a machine that does everyone's job for them. See also: Reality has a surprising amount of detail. (You lack awareness of how bad your staircase is and precisely how your staircase is bad.) You don't know what you don't know. You forget your own blind spots, shortly after you notice them. An afternoon with a piano tuner I recently played in an orchestra, as a violinist accompanying a piano soloist who was playing a concerto. My 'stand partner' (the person I was sitting next to) has a day job as a piano tuner. I loved the rehearsal, and heard nothing at all wrong with the piano, but immediately afterwards, the conductor and piano soloist hurried over to the piano tuner and asked if he could tune the piano in the hours before the concert that evening. Annoyed at the presumptuous request, he quoted them his exorbitant Sunday rate, which they hastily agreed to pay. I just stood there, confused. (I'm really good at noticing when things are out of tune. Rather than beat my chest about it, I'll just hope you'll take my word for it that my pitch discrimination skills are definitely not the issue here. The point is, as developed as my skills are, there is a whole other level of discernment you can develop if you're a career piano soloist or 80-year-old conductor.) I asked to sit with my new friend the piano tuner while he worked, to satisfy my curiosity. I expected to sit quietly, but to my surprise he seemed to want to show off to me, and talked me through what the problem was and how to fix it. For the unfamiliar, most keys on the piano cause a hammer to strike three strings at once, all tuned to the same pitch. This provides a richer, louder sound. In a badly out-of-tune piano, pressing a single key will result in three very different pitches. In an in-tune piano, it just sounds like a single sound. Piano notes can be out of tune with each other, but they can also be out of tune with themselves. Additionally, in order to solve 'God's prank on musicians' (where He cruelly rigged the structure of reality such that (32)n2m for any integers n, m but IT'S SO CLOSE CMON MAN ) some intervals must be tuned very slightly sharp on the piano, so that after 11 stacked 'equal-tempered' 5ths, each of them 1/50th of a semitone sharp, we arrive back at a perfect octave multiple of the original frequency. I knew all this, but the keys really did sound in tune with themselves and with each other! It sounded really nicely in tune! (For a piano). "Hear how it rolls over?" The piano tuner raised an eyebrow and said "listen again" and pressed a single key, his other hand miming a soaring bird. "Hear how it rolls over?" He was right. Just at the beginning of the note, there was a slight 'flange' sound which quickly disappeared as the note was held. It wasn't really audible repeated 'beating' - the pitches were too close for that. It was the beginning of one very long slow beat, most obvious when the higher frequency overtones were at their greatest amplitudes, i.e. during the attack of the note. So the piano's notes were in tune with each other, kinda, on average, and the notes were mostly in tune with themselves, but some had tiny deviations leading to the piano having a poor sound. "Are any of these notes brighter than others?" That wasn't all. He played a scale and said "how do the notes sound?" I had no idea. Like a normal, in-tune piano? "Do you hear how this one is brighter?" "Not really, honestly..." He pul...

Aug 4, 2024 • 5min

LW - SRE's review of Democracy by Martin Sustrik

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: SRE's review of Democracy, published by Martin Sustrik on August 4, 2024 on LessWrong. Day One We've been handed this old legacy system called "Democracy". It's an emergency. The old maintainers are saying it has been misbehaving lately but they have no idea how to fix it. We've had a meeting with them to find out as much as possible about the system, but it turns out that all the original team members left the company long time ago. The current team doesn't have much understanding of the system beyond some basic operational knowledge. We've conducted a cursory code review, focusing not so much on business logic but rather on the stuff that could possibly help us to tame it: Monitoring, reliability characteristics, feedback loops, automation already in place. Our first impression: Oh, God, is this thing complex! Second impression: The system is vaguely modular. Each module is strongly coupled with every other module though. It's an organically grown legacy system at its worst. That being said, we've found a clue as to why the system may have worked fine for so long. There's a redundancy system called "Separation of Powers". It reminds me of the Tandem computers back from the 70s. Day Two We were wrong. "Separation of Powers" is not a system for redundancy. Each part of the system ("branch") has different business logic. However, each also acts as a watchdog process for the other branches. When it detects misbehavior it tries to apply corrective measures using its own business logic. Gasp! Things are not looking good. We're still searching for monitoring. Day Three Hooray! We've found the monitoring! It turns out that "Election" is conducted once every four years. Each component reports its health (1 bit) to the central location. The data flow is so low that we have overlooked it until now. We are considering shortening the reporting period, but the subsystem is so deeply coupled with other subsystems that doing so could easily lead to a cascading failure. In other news, there seems to be some redundancy after all. We've found a full-blown backup control system ("Shadow Cabinet") that is inactive at the moment, but might be able to take over in case of a major failure. We're investigating further. Day Four Today, we've found yet another monitoring system called "FreePress." As the name suggests it was open-sourced some time ago, but the corporate version have evolved quite a bit since then, so the documentation isn't very helpful. The bad news is that it's badly intertwined with the production system. The metrics look more or less okay as long as everything is working smoothly. However, it's unclear what will happen if things go south. It may distort the metrics or even fail entirely, leaving us with no data whatsoever at the moment of crisis. By the way, the "Election" process may not be a monitoring system after all. I suspect it might actually be a feedback loop that triggers corrective measures in case of problems. Day Five The most important metric seems to be this big graph labeled "GDP". As far as we understand, it's supposed to indicate the overall health of the system. However, drilling into the code suggests that it's actually a throughput metric. If throughput goes down there's certainly a problem, but it's not clear why increasing throughput should be considered the primary health factor... More news on the "Election" subsystem: We've found a floppy disk with the design doc, and it turns out that it's not a feedback loop after all. It's a distributed consensus algorithm (think Paxos)! The historical context is that they've used to run several control systems in parallel (for redundancy reasons maybe?) which resulted in numerous race conditions and outages. "Election" was put in place to ensure that only one control system acts as a master at any given time...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app