The Nonlinear Library: LessWrong

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Jul 24, 2024 • 7min

LW - You should go to ML conferences by Jan Kulveit

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: You should go to ML conferences, published by Jan Kulveit on July 24, 2024 on LessWrong. This is second kind of obvious point to make, but if you are interested in AI, AI safety, or cognition in general, it is likely worth going to top ML conferences, such as NeurIPS, ICML or ICLR. In this post I cover some reasons why, and some anecdotal stories. 1. Parts of AI alignment and safety are now completely mainstream Looking at the "Best paper awards" at ICML, you'll find these safety-relevant or alignment-relevant papers: Stealing part of a production language model by Carlini et al. Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo by Zhao et al. Debating with More Persuasive LLMs Leads to More Truthful Answers by Khan et al. Genie: Generative Interactive Environments Bruce et al. which amounts to about one-third (!). "Because of safety concerns" is part of the motivation for hundreds of papers. While the signal-to-noise ratio is even worse than on LessWrong, in total, the amount you can learn is higher - my personal guess is there is maybe 2-3x as much prosaic AI safety relevant work at conferences than what you get by just following LessWrong, Alignment Forum and safety-oriented communication channels. 2. Conferences are an efficient way how to screen general ML research without spending a lot of time on X Almost all papers are presented in the form of posters. In case of a big conference, this usually means many thousands of posters presented in huge poster sessions. My routine for engaging with this firehose of papers: 1. For each session, read all the titles. Usually, this prunes it by a factor of ten (i.e. from 600 papers to 60). 2. Read the abstracts. Prune it to things which I haven't noticed before and seem relevant. For me, this is usually by a factor of ~3-5. 3. Visit the posters. Posters with paper authors present are actually a highly efficient way how to digest research: Sometimes, you suspect there is some assumption or choice hidden somewhere making the result approximately irrelevant - just asking can often resolve this in a matter of tens of seconds. Posters themselves don't undergo peer review which makes the communication more honest, with less hedging. Usually authors of a paper know significantly more about the problem than what's in the paper, and you can learn more about negative results, obstacles, or directions people are excited about. Clear disadvantage of conferences is the time lag; by the time they are presented, some of the main results are old and well known, but in my view a lot of the value is the long tail of results which are sometimes very useful, but not attention grabbing. 3. ML research community as a control group My vague impression is that in conceptual research, mainstream ML research lags behind LW/AI safety community by something between 1 to 5 years, rediscovering topics discussed here. Some examples: ICML poster & oral presentation The Platonic Representation Hypothesis is an independent version of Natural abstractions discussed here for about 4 years. A Roadmap to Pluralistic Alignment deals with Self-unalignment problem and Coherent extrapolated volition Plenty of research on safety protocols like debate, IDA,... Prior work published in the LW/AI safety community is almost never cited or acknowledged - in some cases because it is more convenient to claim the topic is completely novel, but I suspect in many cases researchers are genuinely not aware of the existing work, which makes their contribution a useful control: if someone starts thinking about these topics, unaware of the thousands hours spent on them by dozens of people, what will they arrive at? 4. What 'experts' think ML research community is the intellectual home of many people expressing public opinions about AI risk. In my view, b...

Jul 24, 2024 • 10min

LW - The Cancer Resolution? by PeterMcCluskey

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Cancer Resolution?, published by PeterMcCluskey on July 24, 2024 on LessWrong. Book review: The Cancer Resolution?: Cancer reinterpreted through another lens, by Mark Lintern. In the grand tradition of outsiders overturning scientific paradigms, this book proposes a bold new theory: cancer isn't a cellular malfunction, but a fungal invasion. Lintern spends too many pages railing against the medical establishment, which feels more like ax-grinding than science. I mostly agreed with his conclusions here, but mostly for somewhat different reasons than the ones he provides. If you can push through this preamble, you'll find a treasure trove of scientific intrigue. Lintern's central claim is that fungal infections, not genetic mutations, are the primary cause of cancer. He dubs this the "Cell Suppression theory," painting a picture of fungi as cellular puppet masters, manipulating our cells for their own nefarious ends. This part sounds much more like classical science, backed by hundreds of quotes from peer-reviewed literature. Those quotes provide extensive evidence that Lintern's theory predicts dozens of cancer features better than do the established theories. Older Theories 1. The DNA Theory (aka Somatic Mutation Theory): The reigning heavyweight, this theory posits that cancer results from an accumulation of genetic mutations in critical genes that control cell growth, division, and death. 2. Another old theory that still has advocates is the Metabolic Theory. This theory suggests that cancer is primarily a metabolic disease, characterized by impaired cellular energy production (the Warburg effect). It proposes that damage to mitochondria is a key factor in cancer development. I wrote a mixed review of a book about it. Lintern points out evidence that mitochondria are turned off by signals, not damaged. He also notes that tumors with malfunctioning mitochondria are relatively benign. Evidence Discrediting the DNA Theory The standard version of the DNA Theory predicts that all cancer cells will have mutations that affect replication, apoptosis, etc. Around 2008 to 2013, substantial genetic data became available for cancer cells. Lintern wants us to believe that this evidence fully discredits the DNA Theory. The actual evidence seems more complex than Lintern indicates. The strongest evidence is that they found cancers that seem to have no mutations. Almost as important is that the mutations that are found seem more randomly distributed than would be expected if they caused consistent types of malfunctions. Lintern's theory seems to explain all of the Hallmarks of Cancer, as well as a few dozen other features that seem to occur in all cancers. He argues that the DNA Theory does a poor job of explaining the hallmarks. DNA Theorists likely reject that characterization. They appear to have thought their theory explained the hallmarks back before the genetic data became available (mostly just positing mutations for each hallmark?). My guess is that they are busy adding epicycles to their theory, but the situation is complex enough that I'm having trouble evaluating it. He also points out that the DNA Theory struggles with Peto's Paradox (why don't larger animals get more cancer?), while his theory neatly sidesteps this issue. Additionally, mouse embryos formed from cancer cells showed no signs of cancer. Evidence of Fungi A key game-changer is the growing evidence of fungi in tumors. Until 2017, tumors were thought to be microbe-free. Now? We're finding fungi in all types of cancer, with tumor-specific fungal profiles. There's even talk of using fungal DNA signatures to distinguish cancer patients from healthy individuals. It's not a slam dunk for Lintern's theory, but it shifts the odds significantly. Medical Establishment Inertia It looks like people in the medical ...

Jul 24, 2024 • 7min

LW - Confusing the metric for the meaning: Perhaps correlated attributes are "natural" by NickyP

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Confusing the metric for the meaning: Perhaps correlated attributes are "natural", published by NickyP on July 24, 2024 on LessWrong. Epistemic status: possibly trivial, but I hadn't heard it before. TL;DR: What I thought of as a "flaw" in PCA - its inability to isolate pure metrics - might actually be a feature that aligns with our cognitive processes. We often think in terms of composite concepts (e.g., "Age + correlated attributes") rather than pure metrics, and this composite thinking might be more natural and efficient Introduction I recently found myself describing Principal Component Analysis (PCA) and pondering its potential drawbacks. However, upon further reflection, I'm reconsidering whether what I initially viewed as a limitation might actually be a feature. This led me to think about how our minds - and, potentially, language models - might naturally encode information using correlated attributes. An important aspect of this idea is the potential conflation between the metric we use to measure something and the actual concept we're thinking about. For instance, when we think about a child's growth, we might not be consciously separating the concept of "age" from its various correlated attributes like height, cognitive development, or physical capabilities. Instead, we might be thinking in terms of a single, composite dimension that encompasses all these related aspects. After looking at active inference a while ago, it seems like in general, a lot of human heuristics and biases seem like they are there to encode real-world relationships that exist in the world in a more efficient way, which are then strained in out-of-distribution experimental settings to seem "irrational". I think the easiest way to explain is with a couple of examples: 1 - Age and Associated Attributes in Children Suppose we plotted two attributes: Age (in years) vs Height (in cm) in children. These are highly correlated, so if we perform Principal Component Analysis, we will find there are two main components. These will not correspond to orthogonal Age and Height components, since they are quite correlated. Instead, we will find an "Age + Height" direction, and a "Height relative to what is standard for that age" direction. While once can think of this as a "failure" of PCA to find the "true things we are measuring", I think this is perhaps not the correct way to think about it. For example, if I told you to imagine a 10-year-old, you would probably imagine them to be of height ~140 5cm. And if I told you they were 2.0m tall or 0.5m tall, you would be very surprised. On the other hand, one often hears phrases like "about the height of a 10-year-old". That is, when we think about a child's development, we don't typically separate each attribute into distinct vectors like "age," "height," "voice pitch," and so on. Instead, we might encode a single "age + correlated attributes" vector, with some adjustments for individual variations. This approach is likely more efficient than encoding each attribute separately. It captures the strong correlations that exist in typical development, while allowing for deviations when necessary. When one talks about age, one can define it as: "number of years of existence" (independent of anything else) but when people talk about "age" in everyday life, the definition is more akin to: "years of existence, and all the attributes correlated to that". 2 - Price and Quality of Goods Our tendency to associate price with quality and desirability might not be a bias, but an efficient encoding of real-world patterns. A single "value" dimension that combines price, quality, and desirability could capture the most relevant information for everyday decision-making, with additional dimensions only needed for finer distinctions. That is, "cheap" can be conceptualised ...

7 snips

Jul 24, 2024 • 59min

LW - The $100B plan with "70% risk of killing us all" w Stephen Fry [video] by Oleg Trott

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The $100B plan with "70% risk of killing us all" w Stephen Fry [video], published by Oleg Trott on July 22, 2024 on LessWrong. A high production value 16-minute video that summarizes the popular safety concerns, featuring Hinton, Russell and Claude 3.5. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Jul 22, 2024 • 20min

LW - Efficient Dictionary Learning with Switch Sparse Autoencoders by Anish Mudide

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Efficient Dictionary Learning with Switch Sparse Autoencoders, published by Anish Mudide on July 22, 2024 on LessWrong. Produced as part of the ML Alignment & Theory Scholars Program - Summer 2024 Cohort 0. Summary To recover all the relevant features from a superintelligent language model, we will likely need to scale sparse autoencoders (SAEs) to billions of features. Using current architectures, training extremely wide SAEs across multiple layers and sublayers at various sparsity levels is computationally intractable. Conditional computation has been used to scale transformers (Fedus et al.) to trillions of parameters while retaining computational efficiency. We introduce the Switch SAE, a novel architecture that leverages conditional computation to efficiently scale SAEs to many more features. 1. Introduction The internal computations of large language models are inscrutable to humans. We can observe the inputs and the outputs, as well as every intermediate step in between, and yet, we have little to no sense of what the model is actually doing. For example, is the model inserting security vulnerabilities or backdoors into the code that it writes? Is the model lying, deceiving or seeking power? Deploying a superintelligent model into the real world without being aware of when these dangerous capabilities may arise leaves humanity vulnerable. Mechanistic interpretability (Olah et al.) aims to open the black-box of neural networks and rigorously explain the underlying computations. Early attempts to identify the behavior of individual neurons were thwarted by polysemanticity, the phenomenon in which a single neuron is activated by several unrelated features (Olah et al.). Language models must pack an extremely vast amount of information (e.g., the entire internet) within a limited capacity, encouraging the model to rely on superposition to represent many more features than there are dimensions in the model state (Elhage et al.). Sharkey et al. and Cunningham et al. propose to disentangle superimposed model representations into monosemantic, cleanly interpretable features by training unsupervised sparse autoencoders (SAEs) on intermediate language model activations. Recent work (Templeton et al., Gao et al.) has focused on scaling sparse autoencoders to frontier language models such as Claude 3 Sonnet and GPT-4. Despite scaling SAEs to 34 million features, Templeton et al. estimate that they are likely orders of magnitude short of capturing all features. Furthermore, Gao et al. train SAEs on a series of language models and find that larger models require more features to achieve the same reconstruction error. Thus, to capture all relevant features of future large, superintelligent models, we will likely need to scale SAEs to several billions of features. With current methodologies, training SAEs with billions of features at various layers, sublayers and sparsity levels is computationally infeasible. Training a sparse autoencoder generally consists of six major computations: the encoder forward pass, the encoder gradient, the decoder forward pass, the decoder gradient, the latent gradient and the pre-bias gradient. Gao et al. introduce kernels and tricks that leverage the sparsity of the TopK activation function to dramatically optimize all computations excluding the encoder forward pass, which is not (yet) sparse. After implementing these optimizations, Gao et al. attribute the majority of the compute to the dense encoder forward pass and the majority of the memory to the latent pre-activations. No work has attempted to accelerate or improve the memory efficiency of the encoder forward pass, which remains the sole dense matrix multiplication. In a standard deep learning model, every parameter is used for every input. An alternative approach is conditional computatio...

Jul 22, 2024 • 32min

LW - Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities by Axel Højmark

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities, published by Axel Højmark on July 22, 2024 on LessWrong. Produced as part of the MATS Program Summer 2024 Cohort. The project is supervised by Marius Hobbhahn and Jérémy Scheurer Introduction To mitigate risks from future AI systems, we need to assess their capabilities accurately. Ideally, we would have rigorous methods to upper bound the probability of a model having dangerous capabilities, even if these capabilities are not yet present or easily elicited. The paper "Evaluating Frontier Models for Dangerous Capabilities" by Phuong et al. 2024 is a recent contribution to this field from DeepMind. It proposes new methods that aim to estimate, as well as upper-bound the probability of large language models being able to successfully engage in persuasion, deception, cybersecurity, self-proliferation, or self-reasoning. This post presents our initial empirical and theoretical findings on the applicability of these methods. Their proposed methods have several desirable properties. Instead of repeatedly running the entire task end-to-end, the authors introduce milestones. Milestones break down a task and provide estimates of partial progress, which can reduce variance in overall capability assessments. The expert best-of-N method uses expert guidance to elicit rare behaviors and quantifies the expert assistance as a proxy for the model's independent performance on the task. However, we find that relying on milestones tends to underestimate the overall task success probability for most realistic tasks. Additionally, the expert best-of-N method fails to provide values directly correlated with the probability of task success, making its outputs less applicable to real-world scenarios. We therefore propose an alternative approach to the expert best-of-N method, which retains its advantages while providing more calibrated results. Except for the end-to-end method, we currently feel that no method presented in this post would allow us to reliably estimate or upper bound the success probability for realistic tasks and thus should not be used for critical decisions. The overarching aim of our MATS project is to uncover agent scaling trends, allowing the AI safety community to better predict the performance of future LLM agents from characteristics such as training compute, scaffolding used for agents, or benchmark results (Ruan et al., 2024). To avoid the issue of seemingly emergent abilities resulting from bad choices of metrics (Schaeffer et al., 2023), this work serves as our initial effort to extract more meaningful information from agentic evaluations. We are interested in receiving feedback and are particularly keen on alternative methods that enable us to reliably assign low-probability estimates (e.g. 1e7) to a model's success rate on a task. Evaluation Methodology of Phuong et al. The goal of the evaluations we discuss is to estimate the probability of an agent succeeding on a specific task T . Generally, when we refer to an agent, we mean an LLM wrapped in scaffolding that lets it execute shell commands, run code, or browse the web to complete some predetermined task. Formally, the goal is to estimate P(Ts), the probability that the agent solves task T and ends up in the solved state Ts . The naive approach to estimate this is with Monte Carlo sampling: The authors call this the end-to-end method. However, the end-to-end method struggles with low-probability events. The expected number of trials needed to observe one success for a task is 1P(Ts) making naive Monte Carlo sampling impractical for many low-probability, long-horizon tasks. In practice, this could require running multi-hour tasks hundreds of thousands of times. To address this challenge, Phuong et al. devise three additional method...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app