The Nonlinear Library

The Nonlinear Fund
undefined
May 20, 2024 • 5min

AF - The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks by Marius Hobbhahn

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks, published by Marius Hobbhahn on May 20, 2024 on The AI Alignment Forum. This is a linkpost for our two recent papers, produced at Apollo Research in collaboration with Kaarel Hanni (Cadenza Labs), Avery Griffin, Joern Stoehler, Magdalena Wache and Cindy Wu: 1. An exploration of using degeneracy in the loss landscape for interpretability https://arxiv.org/abs/2405.10927 2. An empirical test of an interpretability technique based on the loss landscape https://arxiv.org/abs/2405.10928 Not to be confused with Apollo's recent Sparse Dictionary Learning paper. A key obstacle to mechanistic interpretability is finding the right representation of neural network internals. Optimally, we would like to derive our features from some high-level principle that holds across different architectures and use cases. At a minimum, we know two things: 1. We know that the training loss goes down during training. Thus, the features learned during training must be determined by the loss landscape. We want to use the structure of the loss landscape to identify what the features are and how they are represented. 2. We know that models generalize, i.e. that they learn features from the training data that allow them to accurately predict on the test set. Thus, we want our interpretation to explain this generalization behavior. Generalization has been linked to basin broadness in the loss landscape in several ways, most notably including singular learning theory, which introduces the learning coefficient as a measure of basin broadness that doubles as a measure of generalization error that replaces the parameter count in Occam's razor. Inspired by both of these ideas, the first paper explores using the structure of the loss landscape to find the most computationally natural representation of a network. We focus on identifying parts of the network that are not responsible for low loss (i.e. degeneracy), inspired by singular learning theory. These degeneracies are an obstacle for interpretability as they mean there exist parameters which do not affect the input-output behavior in the network (similar to the parameters of a Transformer WV and WO matrices that do not affect the product WOV). We explore 3 different ways neural network parameterisations can be degenerate: 1. when activations are linearly dependent 2. when gradient vectors are linearly dependent 3. when ReLU neurons fire on the same inputs. This investigation leads to the interaction basis, and eventually the local interaction basis (LIB) that we test in the second paper. This basis removes computationally irrelevant features and interactions, and sparsifies the remaining interactions between layers. Finally, we analyse how modularity is connected to degeneracy in the loss landscape. We suggest a preliminary metric for finding the sorts of modules that the neural network prior is biased towards. The second paper tests how useful the LIB is in toy and language models. In this new basis we calculate integrated gradient based interactions between features, and analyse the graph of all features in a network. We interpret strongly-interacting features, and identify modules in this graph using the modularity metric of the first paper. To derive the LIB basis we coordinate-transform the activations of neural networks in two steps: Step 1 is a transformation into the PCA basis, removing activation space directions which don't explain any variance. Step 2 is a transformation of the activations to align the basis with the right singular vectors of the gradient vector dataset. The 2nd step is the key new ingredient which aims to make interactions between adjacent layers sparse, and removes directions which do not affect downstre...
undefined
May 20, 2024 • 1h 9min

LW - OpenAI: Exodus by Zvi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OpenAI: Exodus, published by Zvi on May 20, 2024 on LessWrong. Previously: OpenAI: Facts From a Weekend, OpenAI: The Battle of the Board, OpenAI: Leaks Confirm the Story, OpenAI: Altman Returns, OpenAI: The Board Expands. Ilya Sutskever and Jan Leike have left OpenAI. This is almost exactly six months after Altman's temporary firing and The Battle of the Board, the day after the release of GPT-4o, and soon after a number of other recent safety-related OpenAI departures. Many others working on safety have also left recently. This is part of a longstanding pattern at OpenAI. Jan Leike later offered an explanation for his decision on Twitter. Leike asserts that OpenAI has lost the mission on safety and culturally been increasingly hostile to it. He says the superalignment team was starved for resources, with its public explicit compute commitments dishonored, and that safety has been neglected on a widespread basis, not only superalignment but also including addressing the safety needs of the GPT-5 generation of models. Altman acknowledged there was much work to do on the safety front. Altman and Brockman then offered a longer response that seemed to say exactly nothing new. Then we learned that OpenAI has systematically misled and then threatened its departing employees, forcing them to sign draconian lifetime non-disparagement agreements, which they are forbidden to reveal due to their NDA. Altman has to some extent acknowledged this and promised to fix it once the allegations became well known, but so far there has been no fix implemented beyond an offer to contact him privately for relief. These events all seem highly related. Also these events seem quite bad. What is going on? This post walks through recent events and informed reactions to them. The first ten sections address departures from OpenAI, especially Sutskever and Leike. The next five sections address the NDAs and non-disparagement agreements. Then at the end I offer my perspective, highlight another, and look to paths forward. Table of Contents 1. The Two Departure Announcements 2. Who Else Has Left Recently? 3. Who Else Has Left Overall? 4. Early Reactions to the Departures 5. The Obvious Explanation: Altman 6. Jan Leike Speaks 7. Reactions After Lekie's Statement 8. Greg Brockman and Sam Altman Respond to Leike 9. Reactions from Some Folks Unworried About Highly Capable AI 10. Don't Worry, Be Happy? 11. The Non-Disparagement and NDA Clauses 12. Legality in Practice 13. Implications and Reference Classes 14. Altman Responds on Non-Disparagement Clauses 15. So, About That Response 16. How Bad Is All This? 17. Those Who Are Against These Efforts to Prevent AI From Killing Everyone 18. What Will Happen Now? 19. What Else Might Happen or Needs to Happen Now? The Two Departure Announcements Here are the full announcements and top-level internal statements made on Twitter around the departures of Ilya Sutskever and Jan Leike. Ilya Sutskever: After almost a decade, I have made the decision to leave OpenAI. The company's trajectory has been nothing short of miraculous, and I'm confident that OpenAI will build AGI that is both safe and beneficial under the leadership of @sama, @gdb, @miramurati and now, under the excellent research leadership of Jakub Pachocki. It was an honor and a privilege to have worked together, and I will miss everyone dearly. So long, and thanks for everything. I am excited for what comes next - a project that is very personally meaningful to me about which I will share details in due time. [Ilya then shared the photo below] Jakub Pachocki: Ilya introduced me to the world of deep learning research, and has been a mentor to me, and a great collaborator for many years. His incredible vision for what deep learning could become was foundational to what OpenAI, and the field of AI, is today. I...
undefined
May 20, 2024 • 16min

EA - Policy advocacy for eradicating screwworm looks cost-effective by MathiasKB

Policy advocacy for eradicating screwworm in South America using gene drives is discussed, highlighting potential impact and economic benefits. The podcast explores the importance of political coordination, challenges in welfare improvement calculation, and precautions in utilizing gene drive technology.
undefined
May 20, 2024 • 46sec

LW - Jaan Tallinn's 2023 Philanthropy Overview by jaan

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Jaan Tallinn's 2023 Philanthropy Overview, published by jaan on May 20, 2024 on LessWrong. to follow up my philantropic pledge from 2020, i've updated my philanthropy page with 2023 results. in 2023 my donations funded $44M worth of endpoint grants ($43.2M excluding software development and admin costs) - exceeding my commitment of $23.8M (20k times $1190.03 - the minimum price of ETH in 2023). Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
May 20, 2024 • 5min

EA - I'm attempting a world record to raise money for AMF by Vincent van der Holst

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: I'm attempting a world record to raise money for AMF, published by Vincent van der Holst on May 20, 2024 on The Effective Altruism Forum. TL;DR It's time for an absurd challenge. On June 7th around 11:00, I'm going to (try to) break the world record for cycling without hands! For more than 100km, I am raising money for the The Against Malaria Foundation (100% donated, costs covered by my company and myself) with the help of The Life You Can Save. Pledge your donation per kilometer or fixed amount here (tax deductibility possible in most countries, email me on vin@boas.co). The full story I'm Vin from Amsterdam, and I'm doing a world record attempt for cycling without hands for charity on the 7th of June. I am donating 100% to The Against Malaria Foundation, with the goal of saving at least one life (5.000 USD). You can participate and push me to go further by joining here. It's going too far to say that my bike saved my life, but at least it made me want to live. I had a pretty bad anxiety disorder about 7 years ago and also became depressed as a result. My father then gave me his old road bike, and that was a golden combination for me. Exercising burned the adrenaline from my anxiety disorder, made me healthier, made me sleep better, and because I slept better and was healthier I started cycling harder and farther, and I regained goals in my life. Often I cycled with my father and best friend, which allowed me to vent my thoughts, and the bike took me to beautiful places all over the world. Soon I was no longer depressed and my anxiety disorder also almost completely disappeared after a few years. I'm good at cycling without hands because as a kid I used to bike to soccer without hands, where it was always a challenge to get through the turns without touching my handlebars. Eventually I found out I was better at cycling than playing soccer, and cycling became my hobby. So that started with cycling to get mentally healthy, and has gotten way out of hand over the past 7 years. In my first year of cycling my longest ride was 60KM, the year after that I rode 100, the year after that 200, the year after that 300, and last year I rode the 535 kilometers from Amsterdam to Paris with 3 other idiots in one day (on my birthday no less). I have less time now because of the startup BOAS I run (which donates 90% of profit to save lives, by donating to the most effective global health charities like AMF), and 600 kilometers is really too far for me, but I always like to have a cycling goal. And then I saw a list of cycling records that included one that I think I could break: the world record for cycling without hands. And not entirely impractical, I can train at my desk at home without hands, or do my calls on my bike and combine work and training that way. As a child, I spent hours in the library browsing through the Guinness Book of World Records, so it is a dream for me to be in it someday, so I had a new goal! Since my anxiety disorder and depression, I have found that a simple life is a good life, and that I'm happier when I help others. A simple life does not have to be expensive, and so why should I keep more money than I need to be healthy and happy? Especially if that money also allows me to help others, something that makes me even happier. So the company I started donates 90% of its profits to save lives, and if I can also raise money to save lives with my record attempt, why wouldn't I? It's important to me that if I have money to donate, that I try to do as much good with it as possible. I found out that 600,000 young people die every year from Malaria, even though it is preventable and curable. It bothers me that we haven't solved that in a world where a fraction of the wealth we have can save almost all those lives. Children didn't choose a world where we don't sh...
undefined
May 20, 2024 • 31sec

EA - Why prediction markets aren't popular by Nick Whitaker

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why prediction markets aren't popular, published by Nick Whitaker on May 20, 2024 on The Effective Altruism Forum. I argue with J. Zachary Mazlish that, among other things, regulation is not the primary obstacle to prediction markets, so philanthropic dollars should not be used to lobby for their deregulation. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
May 20, 2024 • 39min

AF - Infra-Bayesian haggling by hannagabor

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Infra-Bayesian haggling, published by hannagabor on May 20, 2024 on The AI Alignment Forum. Preface I wrote this post during my scholarship at MATS. My goal is to describe a research direction of the learning theoretic agenda (LTA). Namely, a natural infra-Bayesian learning algorithm proposal that arguably leads to Pareto-optimal solutions in repeated games. The idea originates from Vanessa, I'm expanding a draft of her into a more accessible description. The expected audience are people who are interested in ongoing work on LTA. It is especially suitable for people who are looking for a research direction to pursue in this area. Introduction There has been much work on the theory of agents who cooperate in the Prisoner's dilemma and other situations. Some call this behavior superrationality . For example, functional decision theory (FDT) is a decision theory that prescribes such behavior. The strongest results so far are those coming from "modal combat", e.g. Critch. However, these results are of very limited scope: among other issues, they describe agents crafted for specific games, rather than general reasoners that produce superrational behavior in a "naturalized" manner (i.e. as a special case of the general rules of reasoning.) At the same time, understanding multi-agent learning theory is another major open problem. Attempts to prove convergence to game-theoretic solution concepts (a much weaker goal than superrationality) in a learning-theoretic setting are impeded by the so-called grain-of-truth problem (originally observed by Kalai and Lehrer, its importance was emphasized by Hutter). An agent can learn to predict the environment via Bayesian learning only if it assigns non-zero prior probability to that environment, i.e. its prior contains a grain of truth. What's the grain-of-truth problem? Suppose Alice's environment contains another agent, Bob. If Alice is a Bayesian agent, she can learn to predict Bob's behavior only if her prior assigns a positive probability to Bob's behavior. (That is, Alice's prior contains a grain of truth.) If Bob has the same complexity level as Alice, then Alice is not able to represent all possible environments. Thus, in general, Alice's prior doesn't contain a grain of truth. A potential solution based on "reflective oracles" was proposed by Leike, Taylor and Fallenstein. However it involves arbitrarily choosing a fixed point out of an enormous space of possibilities, and requires that all agents involved choose the same fixed point. Approaches to multi-agent learning in the mainstream literature (see e.g. Cesa-Bianchi and Lugosi) also suffer from restrictive assumptions and are not naturalized. Infra-Bayesianism (IB) was originally motivated by the problem of non-realizability, of which the multi-agent grain-of-truth problem is a special case. Moreover, it converges to FDT-optimal behavior in most Newcombian problems. Therefore, it seems natural to expect IB agents to have strong multi-agent guarantees as well, hopefully even superrationality. In this article, we will argue that an infra-Bayesian agent playing a repeated game displays a behavior dubbed "infra-Bayesian haggling". For two-player games, this typically (but, strictly speaking, not always) leads to Pareto-efficient outcome. The latter can be a viewed as a form of superrationality. Currently, we only have an informal sketch, and even that in a toy model with no stochastic hypotheses. However, it seems plausible that it can be extended to a fairly general setting. Certainly it allows for asymmetric agents with different priors and doesn't have any strong mutual compatibility condition. The biggest impediment to naturality is the requirement that the learning algorithm is of a particular type (namely, Upper Confidence Bound). However, we believe that it should be po...
undefined
May 20, 2024 • 6min

EA - Project idea: AI for epistemics by Benjamin Todd

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Project idea: AI for epistemics, published by Benjamin Todd on May 20, 2024 on The Effective Altruism Forum. If transformative AI might come soon and you want to help that go well, one strategy you might adopt is building something that will improve as AI gets more capable. That way if AI accelerates, your ability to help accelerates too. Here's an example: organisations that use AI to improve epistemics - our ability to know what's true -- and make better decisions on that basis. This was the most interesting impact-oriented entrepreneurial idea I came across when I visited the Bay area in February. (Thank you to Carl Shulman who first suggested it.) Navigating the deployment of AI is going to involve successfully making many crazy hard judgement calls, such as "what's the probability this system isn't aligned" and "what might the economic effects of deployment be?" Some of these judgement calls will need to be made under a lot of time pressure - especially if we're seeing 100 years of technological progress in under 5. Being able to make these kinds of decisions a little bit better could therefore be worth a huge amount. And that's true given almost any future scenario. Better decision-making can also potentially help with all other cause areas, which is why 80,000 Hours recommends it as a cause area independent from AI. So the idea is to set up organisations that use AI to improve forecasting and decision-making in ways that can be eventually applied to these kinds of questions. In the short term, you can apply these systems to conventional problems, potentially in the for-profit sector, like finance. We seem to be just approaching the point where AI systems might be able to help (e.g. a recent paper found GPT-4 was pretty good at forecasting if fine-tuned). Starting here allows you to gain scale, credibility and resources. But unlike what a purely profit-motivated entrepreneur would do, you can also try to design your tools such that in an AI crunch moment they're able to help. For example, you could develop a free-to-use version for political leaders, so that if a huge decision about AI regulation suddenly needs to be made, they're already using the tool for other questions. There are already a handful of projects in this space, but it could eventually be a huge area, so it still seems like very early days. These projects could have many forms: One example of a concrete proposal is using AI to make forecasts, or otherwise better at truthfinding in important domains. On the more qualitative side, we could imagine an AI "decision coach" or consultant that aims to augment human decision-making. Any techniques to make it easier to extract the truth from AI systems could also count, such as relevant kinds of interpretability research and the AI debate or weak-to-strong generalisation approaches to AI alignment. I could imagine projects in this area starting in many ways, including a research service within a hedge fund, a research group within an AI company (e.g. focused on optimising systems for truth telling and accuracy), an AI-enabled consultancy (trying to undercut the Big 3), or as a non-profit focused on policy makers. Most likely you'd try to fine tune and build scaffolding around existing leading LLMs, though there are also proposals to build LLMs from the bottom-up for forecasting. For example, you could create an LLM that only has data up to 2023, and then train it to predict what happens in 2024. There's a trade-off to be managed between maintaining independence and trustworthiness, vs. having access to leading models and decision-makers in AI companies and making money. Some ideas could advance frontier capabilities, so you'd want to think carefully about either how to avoid that, stick to ideas that differentially boost more safety-enhancing aspects of ...
undefined
May 20, 2024 • 7min

AF - The consistent guessing problem is easier than the halting problem by Jessica Taylor

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The consistent guessing problem is easier than the halting problem, published by Jessica Taylor on May 20, 2024 on The AI Alignment Forum. The halting problem is the problem of taking as input a Turing machine M, returning true if it halts, false if it doesn't halt. This is known to be uncomputable. The consistent guessing problem (named by Scott Aaronson) is the problem of taking as input a Turing machine M (which either returns a Boolean or never halts), and returning true or false; if M ever returns true, the oracle's answer must be true, and likewise for false. This is also known to be uncomputable. Scott Aaronson inquires as to whether the consistent guessing problem is strictly easier than the halting problem. This would mean there is no Turing machine that, when given access to a consistent guessing oracle, solves the halting problem, no matter which consistent guessing oracle (of which there are many) it has access too. As prior work, Andrew Drucker has written a paper describing a proof of this, although I find the proof hard to understand and have not checked it independently. In this post, I will prove this fact in a way that I at least find easier to understand. (Note that the other direction, that a Turing machine with access to a halting oracle can be a consistent guessing oracle, is trivial.) First I will show that a Turing machine with access to a halting oracle cannot in general determine whether another machine with access to a halting oracle will halt. Suppose M(O, N) is a Turing machine that returns true if N(O) halts, false otherwise, when O is a halting oracle. Let T(O) be a machine that runs M(O, T), halting if it returns false, running forever if it returns true. Now M(O, T) must be its own negation, a contradiction. In particular, this implies that the problem of deciding whether a Turing machine with access to a halting oracle halts cannot be a Σ01 statement in the arithmetic hierarchy, since these statements can be decided by a machine with access to a halting oracle. Now consider the problem of deciding whether a Turing machine with access to a consistent guessing oracle halts for all possible consistent guessing oracles. If this is a Σ01 statement, then consistent guessing oracles must be strictly weaker than halting oracles. Since, if there were a reliable way to derive a halting oracle from a consistent guessing oracle, then any machine with access to a halting oracle can be translated to one making use of a consistent guessing oracle, that halts for all consistent guessing oracles if and only if the original halts when given access to a halting oracle. That would make the problem of deciding whether a Turing machine with access to a halting oracle halts a Σ01 statement, which we have shown to be impossible. What remains to be shown is that the problem of deciding whether a Turing machine with access to a consistent guessing oracle halts for all consistent guessing oracles, is a Σ01 statement. To do this, I will construct a recursively enumerable propositional theory T that depends on the Turing machine. Let M be a Turing machine that takes an oracle as input (where an oracle maps encodings of Turing machines to Booleans). Add to the T the following propositional variables: ON for each Turing machine encoding N, representing the oracle's answer about this machine. H, representing that M(O) halts. Rs for each possible state s of the Turing machine, where the state includes the head state and the state of the tape, representing that s is reached by the machine's execution. Clearly, these variables are recursively enumerable and can be computably mapped to the natural numbers. We introduce the following axiom schemas: (a) For any machine N that halts and returns true, ON. (b) For any machine N that halts and returns false, ON. (c) For any ...
undefined
May 19, 2024 • 25min

LW - International Scientific Report on the Safety of Advanced AI: Key Information by Aryeh Englander

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: International Scientific Report on the Safety of Advanced AI: Key Information, published by Aryeh Englander on May 19, 2024 on LessWrong. I thought that the recently released International Scientific Report on the Safety of Advanced AI seemed like a pretty good summary of the state of the field on AI risks, in addition to being about as close to a statement of expert consensus as we're likely to get at this point. I noticed that each section of the report has a useful "Key Information" bit with a bunch of bullet points summarizing that section. So for my own use as well as perhaps the use of others, and because I like bullet-point summaries, I've copy-pasted all the "Key Information" lists here. 1 Introduction [Bullet points taken from the "About this report" part of the Executive Summary] This is the interim publication of the first 'International Scientific Report on the Safety of Advanced AI'. A diverse group of 75 artificial intelligence (AI) experts contributed to this report, including an international Expert Advisory Panel nominated by 30 countries, the European Union (EU), and the United Nations (UN). Led by the Chair of this report, the independent experts writing this report collectively had full discretion over its content. At a time of unprecedented progress in AI development, this first publication restricts its focus to a type of AI that has advanced particularly rapidly in recent years: General-purpose AI, or AI that can perform a wide variety of tasks. Amid rapid advancements, research on general-purpose AI is currently in a time of scientific discovery and is not yet settled science. People around the world will only be able to enjoy general-purpose AI's many potential benefits safely if its risks are appropriately managed. This report focuses on identifying these risks and evaluating technical methods for assessing and mitigating them. It does not aim to comprehensively assess all possible societal impacts of general-purpose AI, including its many potential benefits. For the first time in history, this interim report brought together experts nominated by 30 countries, the EU, and the UN, and other world-leading experts, to provide a shared scientific, evidence-based foundation for discussions and decisions about general-purpose AI safety. We continue to disagree on several questions, minor and major, around general-purpose AI capabilities, risks, and risk mitigations. But we consider this project essential for improving our collective understanding of this technology and its potential risks, and for moving closer towards consensus and effective risk mitigation to ensure people can experience the potential benefits of general-purpose AI safely. The stakes are high. We look forward to continuing this effort. 2 Capabilities 2.1 How does General-Purpose AI gain its capabilities? General-purpose AI models and systems can produce text, images, video, labels for unlabelled data, and initiate actions. The lifecycle of general-purpose AI models and systems typically involves computationally intensive 'pre-training', labour-intensive 'fine-tuning', and continual post-deployment monitoring and updates. There are various types of general-purpose AI. Examples of general-purpose AI models include: Chatbot-style language models, such as GPT-4, Gemini-1.5, Claude-3, Qwen1.5, Llama-3, and Mistral Large. Image generators, such as DALLE-3, Midjourney-5, and Stable Diffusion-3. Video generators such as SORA. Robotics and navigation systems, such as PaLM-E. Predictors of various structures in molecular biology such as AlphaFold 3. 2.2 What current general-purpose AI systems are capable of General-purpose AI capabilities are difficult to estimate reliably but most experts agree that current general-purpose AI capabilities include: Assisting programmers and writing short ...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app