The Nonlinear Library

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Jul 2, 2024 • 17min

LW - Decomposing the QK circuit with Bilinear Sparse Dictionary Learning by keith wynroe

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Decomposing the QK circuit with Bilinear Sparse Dictionary Learning, published by keith wynroe on July 2, 2024 on LessWrong. This work was produced as part of Lee Sharkey's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort Intro and Motivation Sparse dictionary learning (SDL) has attracted a lot of attention recently as a method for interpreting transformer activations. They demonstrate that model activations can often be explained using a sparsely-activating, overcomplete set of human-interpretable directions. However, despite its success for explaining many components, applying SDL to interpretability is relatively nascent and have yet to be applied to some model activations. In particular, intermediate activations of attention blocks have yet to be studied, and provide challenges for standard SDL methods. The first challenge is bilinearity: SDL is usually applied to individual vector spaces at individual layers, so we can simply identify features as a direction in activation space. But the QK circuits of transformer attention layers are different: They involve a bilinear form followed by a softmax. Although simply applying sparse encoders to the keys and queries[1] could certainly help us understand the "concepts" being used by a given attention layer, this approach would fail to explain how the query-features and key-features interact bilinearly. We need to understand which keys matter to which queries. The second challenge is attention-irrelevant variance: A lot of the variance in the attention scores is irrelevant to the attention pattern because it is variance in low scores which are softmaxed to zero; this means that most of the variability in the keys and queries is irrelevant for explaining downstream behaviour[2]. The standard method of reconstructing keys and queries would therefore waste capacity on what is effectively functionally irrelevant noise. To tackle these two problems (bilinearity and attention-irrelevant variance), we propose a training setup which only reconstructs the dimensions of the keys and queries that most affect the attention pattern. Training Setup Our training process has two steps: Step 1: Reconstructing the attention pattern with key- and query- encoder-decoder networks Step 2: Finding a condensed set of query-key feature pairs by masking Step 1: Reconstructing the attention pattern with key- and query-transcoders Architecture Our first training step involves training two sparse dictionaries in parallel (one for the keys and one for the queries). The dictionaries both take in the layer-normalized residual stream at a given layer (normalised_resid_pre_i) and each output a [n_head * d_head] vector, representing the flattened keys and queries[3]. Figure 1: High-level diagram of our training set-up Loss functions However, rather than penalising the reconstruction loss of the keys and queries explicitly, we can use these keys and queries to reconstruct the original model's attention pattern. To train the reconstructed attention pattern, we used several different losses: KL divergence between the attention pattern (using reconstructed keys and reconstructed queries) and the ground-truth attention pattern produced by the original model. We also added two auxiliary reconstruction losses both for early-training-run stability, and to ensure our transcoders do not learn to reconstruct the keys and queries with an arbitrary rotation applied (since this would still produce the same attention scores and patterns): KL divergence between the attention pattern (using reconstructed keys and the original model's queries) and the ground-truth attention pattern produced by the original model. KL divergence between the attention pattern (using the original models' keys and the reconstructed queries) and the ground-truth atten...

Jul 2, 2024 • 7min

EA - High Impact Engineers is Transitioning to a Volunteer-Led Model by Jessica Wen

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: High Impact Engineers is Transitioning to a Volunteer-Led Model, published by Jessica Wen on July 2, 2024 on The Effective Altruism Forum. Summary After over 2 years of operations, High Impact Engineers (HI-Eng) is reverting to a volunteer-led organisational model due to a middling impact outcome and a lack of funding. We wanted to thank all our subscribers, supporters, and contributors for being the driving force behind HI-Eng's achievements, which you can read about in our Impact Report. What is High Impact Engineers? High Impact Engineers (HI-Eng for short, pronounced high-enj) is an organisation dedicated to helping (physical - i.e. non-software) engineers increase their ability to have an outsized positive impact through their work. Why Is HI-Eng Winding Down? In December 2023, we sent out a community survey and solicited case studies and testimonials to evaluate our impact, which we wrote up in our Impact Report. As shown in the report, there is some evidence of behavioural and attitudinal changes in our members towards more impactful career outcomes due to interactions with our programmes, as well as some ongoing career transitions that we supported to some extent, but even after consultations with grantmakers and other community builders, we found it difficult to determine if this amount of impact would meet the bar for ongoing funding. As a result, we decided to (re-)apply for funding from the major EA funds (i.e. EAIF and Open Philanthropy), and they ended up deciding to not fund High Impact Engineers. Since our runway from the previous funding round was so short, we decided against trying to hire someone else to take over running HI-Eng, and the team is moving on to new opportunities. However, we still believe that engineers in EA are a valuable and persistently underserved demographic, and that this latent potential can be realised by providing a hub for engineers in EA to meet other like-minded engineers and find relevant resources. Therefore, we decided to maintain the most valuable and impactful programmes through the help of volunteers. Lessons Learnt There are already many resources available for new community builders (e.g. the EA Groups Resource Centre, this, this, this, and this EA Forum post, and especially this post by Sofia Balderson), so we don't believe that there is much we can add that hasn't already been said. However, here are some lessons we think are robustly good: 1. Having a funding cycle of 6 months is too short. 2. If you're looking to get set up and running quickly, getting a fiscal sponsor is great. We went with the Players Philanthropy Fund, but there are other options (including Rethink Priorities and maybe your national EA group). 3. Speak to other community builders, and ask for their resources! They're often more than happy to give you a copy of their systems, processes and documentation (minus personal data). 4. Pay for monthly subscriptions to software when setting up, even if it's cheaper to get an annual subscription. You might end up switching to a different software further down the line, and it's easier (and cheaper) to cancel a monthly subscription. 5. Email each of your subscriptions' customer service to ask for a non-profit discount (if you have non-profit status). They can save you up to 50% of the ticket price. (Jessica will write up her own speculative lessons learnt in a future forum post). What Will HI-Eng Look Like Going Forward? Jessica will continue managing HI-Eng as a volunteer, and is currently implementing the following changes in our programmes: Email newsletter: the final HI-Eng newsletter was sent in May. Future impactful engineering opportunities can be found on the 80,000 Hours job board or the EA Opportunities board. Any other impactful engineering jobs can be submitted to these boards ( submission...

Jul 2, 2024 • 21min

AF - Decomposing the QK circuit with Bilinear Sparse Dictionary Learning by keith wynroe

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Decomposing the QK circuit with Bilinear Sparse Dictionary Learning, published by keith wynroe on July 2, 2024 on The AI Alignment Forum. This work was produced as part of Lee Sharkey's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort Intro and Motivation Sparse dictionary learning (SDL) has attracted a lot of attention recently as a method for interpreting transformer activations. They demonstrate that model activations can often be explained using a sparsely-activating, overcomplete set of human-interpretable directions. However, despite its success for explaining many components, applying SDL to interpretability is relatively nascent and have yet to be applied to some model activations. In particular, intermediate activations of attention blocks have yet to be studied, and provide challenges for standard SDL methods. The first challenge is bilinearity: SDL is usually applied to individual vector spaces at individual layers, so we can simply identify features as a direction in activation space. But the QK circuits of transformer attention layers are different: They involve a bilinear form followed by a softmax. Although simply applying sparse encoders to the keys and queries[1] could certainly help us understand the "concepts" being used by a given attention layer, this approach would fail to explain how the query-features and key-features interact bilinearly. We need to understand which keys matter to which queries. The second challenge is attention-irrelevant variance: A lot of the variance in the attention scores is irrelevant to the attention pattern because it is variance in low scores which are softmaxed to zero; this means that most of the variability in the keys and queries is irrelevant for explaining downstream behaviour[2]. The standard method of reconstructing keys and queries would therefore waste capacity on what is effectively functionally irrelevant noise. To tackle these two problems (bilinearity and attention-irrelevant variance), we propose a training setup which only reconstructs the dimensions of the keys and queries that most affect the attention pattern. Training Setup Our training process has two steps: Step 1: Reconstructing the attention pattern with key- and query- encoder-decoder networks Step 2: Finding a condensed set of query-key feature pairs by masking Step 1: Reconstructing the attention pattern with key- and query-transcoders Architecture Our first training step involves training two sparse dictionaries in parallel (one for the keys and one for the queries). The dictionaries both take in the layer-normalized residual stream at a given layer (normalised_resid_pre_i) and each output a [n_head * d_head] vector, representing the flattened keys and queries[3]. Figure 1: High-level diagram of our training set-up Loss functions However, rather than penalising the reconstruction loss of the keys and queries explicitly, we can use these keys and queries to reconstruct the original model's attention pattern. To train the reconstructed attention pattern, we used several different losses: KL divergence between the attention pattern (using reconstructed keys and reconstructed queries) and the ground-truth attention pattern produced by the original model. We also added two auxiliary reconstruction losses both for early-training-run stability, and to ensure our transcoders do not learn to reconstruct the keys and queries with an arbitrary rotation applied (since this would still produce the same attention scores and patterns): KL divergence between the attention pattern (using reconstructed keys and the original model's queries) and the ground-truth attention pattern produced by the original model. KL divergence between the attention pattern (using the original models' keys and the reconstructed queries) and the groun...

Jul 2, 2024 • 14min

LW - In Defense of Lawyers Playing Their Part by Isaac King

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: In Defense of Lawyers Playing Their Part, published by Isaac King on July 2, 2024 on LessWrong. This is a linkpost for In Defense of Lawyers Playing Their Part. Michael Huemer writes about why he believes it's wrong for lawyers to pursue unjust legal outcomes. It's a good article, and one of the best defenses of this position I've seen. Still, I think this argument is mistaken. The reason why we require lawyers to fight for "their side" even if they believe they're in the wrong is to minimize the opportunity for bias. Imagine if all trials were bench trials, decided by only one person as the judge. Even if they're taught to be as objective as possible, there would still be significant concerns about unconscious bias. One person only has one set of experiences to draw on, which is necessarily not very representative of the full range of experiences. And in some ways this problem becomes worse the more training the judge is given, since it filters the pool of valid people down to a small subset of the population. The chosen solution to this is to instead have the important cases decided by a jury, randomly[1] selected from the population. The jury is then instructed that they must come to a unanimous decision, and are allowed an arbitrarily-long time to discuss the case. This prevents a tyranny of the majority, while still allowing a diverse range of perspectives to have a voice in the discussion. Any prospective juror who seems likely to be so biased that they would vote in a predetermined way regardless of the evidence is removed from consideration during voir dire. (This step does reduce the representativeness of the jury, but the assumption is that for any group of people who hold a particular perspective, there will be members of that group who are not so biased as to be selected out.[2]) But this doesn't solve all problems. The jury is still only human, and if they're presented with facts that are biased in only one direction, they're more likely to vote in that direction. If lawyers were instructed to present an unbiased case to the jury, this would provide a significant incentive for the less ethical lawyers to not do as instructed, using a misleading presentation of data to bias the jury towards their side. This is a bad incentive to give people. It would also lead to copious accusations from the losing side that the other side's lawyer was presenting biased facts, which would necessitate some process to sort them out every time, even if both lawyers were perfectly objective. So instead, we tell the lawyers to go nuts. Be as biased as possible, and, as long as they're equally skilled and there aren't background factors that favor one position over the other, this ensures that each presented position is equally far from the truth. The jury now has a fair overview of both sides of the case, without a malicious lawyer being able to advantage one over the other.[3] Michael provides 5 arguments in favor of this position - that lawyers are obligated to do their best even for a client they believe is guilty - then attempts to refute them all. I'll go through them individually. 2.1. The epistemological problem Michael argues that lawyers can know with high confidence that their clients are guilty, giving the example of Benjamin Courvoisier. Thus, "I'm not sure so I should just defend my client" is not an excuse. In the case of Benjamin Courvoisier, Benjamin confessed to the lawyer, presumably under the expectation that the lawyer would not publicly share this information. If lawyers were duty-bound to share any private confession given to them, all but the dumbest criminals would simply stop giving private confessions. The overall effect on convictions would be negligible. But cases like Benjamin Courvoisier are few and far between. Using this example to argue that de...

Jul 2, 2024 • 21min

EA - Carl Shulman on the moral status of current and future AI systems by rgb

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Carl Shulman on the moral status of current and future AI systems, published by rgb on July 2, 2024 on The Effective Altruism Forum. In which I curate and relate great takes from 80k As artificial intelligence advances, we'll increasingly urgently face the question of whether and how we ought to take into account the well-being and interests of AI systems themselves. In other words, we'll face the question of whether AI systems have moral status.[1] In a recent episode of the 80,000 Hours podcast, polymath researcher and world-model-builder Carl Shulman spoke at length about the moral status of AI systems, now and in the future. Carl has previously written about these issues in Sharing the World with Digital Minds and Propositions Concerning Digital Minds and Society, both co-authored with Nick Bostrom. This post highlights and comments on ten key ideas from Shulman's discussion with 80,000 Hours host Rob Wiblin. 1. The moral status of AI systems is, and will be, an important issue (and it might not have much do with AI consciousness) The moral status of AI is worth more attention than it currently gets, given its potential scale: Yes, we should worry about it and pay attention. It seems pretty likely to me that there will be vast numbers of AIs that are smarter than us, that have desires, that would prefer things in the world to be one way rather than another, and many of which could be said to have welfare, that their lives could go better or worse, or their concerns and interests could be more or less respected. So you definitely should pay attention to what's happening to 99.9999% of the people in your society. Notice that Shulman does not say anything about AI consciousness or sentience in making this case. Here and throughout the interview, Shulman de-emphasizes the question of whether AI systems are conscious, in favor of the question of whether they have desires, preferences, interests. Here he is following a cluster of views in philosophy that hold that consciousness is not necessary for moral status. Rather, an entity, even if it is not conscious, can merit moral consideration if it has a certain kind of agency: preferences, desires, goals, interests, and the like[2]. (This more agency-centric perspective on AI moral status has been discussed in previous posts; for a dip into recent philosophical discussion on this, see the substack post ' Agential value' by friend of the blog Nico Delon.) Such agency-centric views are especially important for the question of AI moral patienthood, because it might be clear that AI systems have morally-relevant preferences and desires well before it's clear whether or not they are conscious. 2. While people have doubts about the moral status of AI current systems, they will attribute moral status to AI more and more as AI advances. At present, Shulman notes, "the general public and most philosophers are quite dismissive of any moral importance of the desires, preferences, or other psychological states, if any exist, of the primitive AI systems that we currently have." But Shulman asks us to imagine an advanced AI system that is behaviorally fairly indistinguishable from a human - e.g., from the host Rob Wiblin. But going forward, when we're talking about systems that are able to really live the life of a human - so a sufficiently advanced AI that could just imitate, say, Rob Wiblin, and go and live your life, operate a robot body, interact with your friends and your partners, do your podcast, and give all the appearance of having the sorts of emotions that you have, the sort of life goals that you have. One thing to keep in mind is that, given Shulman's views about AI trajectories, this is not just a thought experiment: this is a kind of AI system you could see in your lifetime. Shulman also asks us to imagine a system like ...

Jul 2, 2024 • 16min

AF - OthelloGPT learned a bag of heuristics by jylin04

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OthelloGPT learned a bag of heuristics, published by jylin04 on July 2, 2024 on The AI Alignment Forum. Work performed as a part of Neel Nanda's MATS 6.0 (Summer 2024) training program. TLDR This is an interim report on reverse-engineering Othello-GPT, an 8-layer transformer trained to take sequences of Othello moves and predict legal moves. We find evidence that Othello-GPT learns to compute the board state using many independent decision rules that are localized to small parts of the board. Though we cannot rule out that it also learns a single succinct algorithm in addition to these rules, our best guess is that Othello-GPT's learned algorithm is just a bag of independent heuristics. Board state reconstruction 1. Direct attribution to linear probes indicate that the internal board representation is frequently up- and down-weighted during a forward pass. 2. Case study of a decision rule: 1. MLP Neuron L1N421 represents the decision rule: If the move A4 was just played AND B4 is occupied AND C4 is occupied update B4+C4+D4 to "theirs". This rule does not generalize to translations across the board. 2. Another neuron L0377 participates in the implementation of this rule by checking if B4 is occupied, and inhibiting the activation of L1N421 if no. Legal move prediction 1. A subset of neurons in mid to late MLP layers classify board configurations that are sufficient to make a certain move legal with an F1-score above 0.99. These neurons have high direct attribution to the logit for that move, and are causally relevant for legal move prediction. 2. Logit lens suggests that legal move predictions gradually solidify during a forward pass. 3. Some MLP neurons systematically activate at certain times in the game, regardless of the moves played so far. We hypothesize that these neurons encode heuristics about moves that are more probable in specific phases (early/mid/late) of the game. Review of Othello-GPT Othello-GPT is a transformer with 25M parameters trained on sequences of random legal moves in the board game Othello as inputs[1] to predict legal moves[2]. How it does this is a black box that we don't understand. Its claim to fame is that it supposedly 1. Learns an internal representation of the board state; 2. Uses it to predict legal moves which if true, resolves the black box in two[3]. The evidence for the first claim is that linear probes work. Namely, for each square of the ground-truth game board, if we train a linear classifier to take the model's activations at layer 6 as input and predict logits for whether that square is blank, "mine" (i.e. belonging to the player whose move it currently is) or "yours", the probes work with high accuracy on games not seen in training. The evidence for the second claim is that if we edit the residual stream until the probe's outputs change, the model's own output at the end of layer 7 becomes consistent with legal moves that are accessible from the new board state. However, we don't yet understand what's going on in the remaining black boxes. In particular, although it would be interesting if Othello-GPT emergently learned to implement them via algorithms with relatively short description lengths, the evidence so far doesn't rule out the possibility that they could be implemented via a bag of heuristics instead. Project goal Our goal in this project was simply to figure out what's going on in the remaining black boxes. 1. What's going on in box #1 - how does the model compute the board representation? 1. How does the model decide if a cell is blank or not blank? 2. How does the model decide if a cell is "mine" or "yours"? 2. What's going on in box #2 - how does the model use the board representation to pick legal moves? Results on box #1: Board reconstruction A circuit for how the model computes if a cell is blank or not blan...

Jul 2, 2024 • 2min

EA - Announcing a New S-Risk Introductory Fellowship by Alistair Webster

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing a New S-Risk Introductory Fellowship, published by Alistair Webster on July 2, 2024 on The Effective Altruism Forum. The Center for Reducing Suffering (CRS) has opened applications for a new online fellowship, designed to familiarise more individuals with the core ideas of reducing S-Risks (Risks of Astronomical Suffering). This program is intended for individuals who: Are committed to reducing suffering effectively Have an interest in moral philosophy Are familiar with the core ideas of Effective Altruism Have a degree of understanding of key concepts like cause prioritisation and cause neutrality. (For more information, on cause neutrality you can refer to this essay.) The fellowship's curriculum will be broader than the Center on Long-Term Risk's existing S-Risk fellowship. We envision that graduates of the new CRS fellowship will be in a better position to potentially proceed onto CLR's fellowship, contribute to s-risk research, and strengthen the s-risk community going forward. Program Details: Duration: The fellowship is free of charge and conducted entirely online Availability: Spots are limited in our initial cohorts to ensure a quality learning experience Commitment: Expected to be approx 2-4 hours per week for six weeks Start date: 2nd September 2024 The curriculum will cover topics including: What are s-risks? Arguments for and against a focus on s-risks How can we reduce s-risks? Risk factors for s-risks Worst-case AI safety Improving institutional decision-making Career paths and options Staying motivated and mentally healthy while working on reducing suffering To apply please fill in the application form on the CRS website. Applications will close on July 31st. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app