The Nonlinear Library: LessWrong

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Jul 8, 2024 • 59min

LW - AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0 by James Fox

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0, published by James Fox on July 7, 2024 on LessWrong. TL;DR We are excited to announce the fourth iteration of ARENA (Alignment Research Engineer Accelerator), a 4-5 week ML bootcamp with a focus on AI safety! ARENA's mission is to provide talented individuals with the skills, tools, and environment necessary for upskilling in ML engineering, for the purpose of contributing directly to AI alignment in technical roles. ARENA will be running in-person from LISA from 2nd September - 4th October (the first week is an optional review of the fundamentals of neural networks). Apply here before 23:59 July 20th anywhere on Earth! Summary ARENA has been successfully run three times, with alumni going on to become MATS scholars and LASR participants; AI safety engineers at Apollo Research, Anthropic, METR, and OpenAI; and even starting their own AI safety organisations! This iteration will run from 2nd September - 4th October (the first week is an optional review of the fundamentals of neural networks) at the London Initiative for Safe AI (LISA) in Old Street, London. LISA houses small organisations (e.g., Apollo Research, BlueDot Impact), several other AI safety researcher development programmes (e.g., LASR Labs, MATS extension, PIBBS, Pivotal), and many individual researchers (independent and externally affiliated). Being situated at LISA, therefore, brings several benefits, e.g. facilitating productive discussions about AI safety & different agendas, allowing participants to form a better picture of what working on AI safety can look like in practice, and offering chances for research collaborations post-ARENA. The main goals of ARENA are to: Help participants skill up in ML relevant for AI alignment. Produce researchers and engineers who want to work in alignment and help them make concrete next career steps. Help participants develop inside views about AI safety and the paths to impact of different agendas. The programme's structure will remain broadly the same as ARENA 3.0 (see below); however, we are also adding an additional week on evaluations. For more information, see our website. Also, note that we have a Slack group designed to support the independent study of the material (join link here). Outline of Content The 4-5 week program will be structured as follows: Chapter 0 - Fundamentals Before getting into more advanced topics, we first cover the basics of deep learning, including basic machine learning terminology, what neural networks are, and how to train them. We will also cover some subjects we expect to be useful going forward, e.g. using GPT-3 and 4 to streamline your learning, good coding practices, and version control. Note: Participants can optionally skip the program this week and join us at the start of Chapter 1 if they'd prefer this option and if we're confident that they are already comfortable with the material in this chapter. Topics include: PyTorch basics CNNs, Residual Neural Networks Optimization (SGD, Adam, etc) Backpropagation Hyperparameter search with Weights and Biases GANs & VAEs Chapter 1 - Transformers & Interpretability In this chapter, you will learn all about transformers and build and train your own. You'll also study LLM interpretability, a field which has been advanced by Anthropic's Transformer Circuits sequence, and open-source work by Neel Nanda. This chapter will also branch into areas more accurately classed as "model internals" than interpretability, e.g. recent work on steering vectors. Topics include: GPT models (building your own GPT-2) Training and sampling from transformers TransformerLens In-context Learning and Induction Heads Indirect Object Identification Superposition Steering Vectors Chapter 2 - Reinforcement Learning In this chapter, you w...

Jul 6, 2024 • 5min

LW - Musings on LLM Scale (Jul 2024) by Vladimir Nesov

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Musings on LLM Scale (Jul 2024), published by Vladimir Nesov on July 6, 2024 on LessWrong. In a recent interview, Dario Amodei claimed that cost of training is (starting with models already available) Right now, $100 million. There are models in training today that are more like a $1 billion. I think if we go to $10 or a $100 billion, and I think that will happen in 2025-2026, maybe 2027, ... (Epistemic status: Fermi estimates, 8 is approximately 10 which is greater than 9.) Assuming $40,000 per H100 and associated infrastructure in a datacenter, $1 billion gives 25K H100s, which matches the scale of for example Meta's new training clusters and requires about 40MW of power. At $2 per hour, training time cost of 25K H100s reaches $100 million in 80 days, which seems reasonable if on the short side for a production training run. The cost of time matches $1 billion at 2.3 years. An H100 (SXM) is rated for 2e15 FLOP/s in BF16 (my impression is this is usually stable out of the box). This becomes 4e15 FLOP/s in FP8, which seems practical if done carefully, no degradation in pre-training loss compared to FP32. The $100 million run then translates to 9e25 FLOPs at 30% utilization in BF16, or 2e26 FLOPs in FP8. (For some reason this SemiAnalysis estimate is 2x lower, peak 2e20 FLOP/s for 100,000 H100s at FP8, possibly the sparsity footnote in H100 specification for the 4000 teraFLOP/s figure is the culprit.) This is maybe 10x original GPT-4, estimated at 2e25 FLOPs. The leading models (Claude 3.5 Sonnet, Gemini 1.5 Pro, GPT-4 Omni) cost $15-20 per million output tokens, compared to $75-120 for once-frontier models Claude 3 Opus, Gemini 1 Ultra, original GPT-4. Given a Chinchilla optimal model, if we reduce its active parameters 3x and increase training compute 3x, we get approximately the same performance, but it's now at least 3x cheaper for inference. This increases data 10x, which if everything else fails can be obtained by repeating the old data, giving 30x overtraining in compute compared to what is Chinchilla optimal for the smaller model. Llama-3-70b is overtrained 10x, Llama-3-8b 90x, though they don't use MoE and their performance is lower than for MoE models with the same active parameters and training cost. Beyond $100 million The current frontier models are overtrained on compute that could enable even smarter models. Compute is increasing, but it mostly goes to reduction of inference cost, and only a little bit to capabilities. Why aren't any of the three labs directing the compute to train/release models optimized for maximum capability? Possibly costs are already such that training at too many parameter/data tradeoff points won't be done, instead they choose an option that's currently most useful and spend the rest on experiments that would make imminent larger scale runs better. Even OpenAI's next frontier model in training as of May 28 might just be using compute comparable to what GPT-4 Omni required, not OOMs more, and it could still get much more capable if allowed to be more expensive for inference. To do a run at $1 billion in cost of time, even 100K H100s would need 200 days (powered by 150MW). There probably aren't any individual clusters of this scale yet (which would cost about $4 billion). Gemini 1.0 report stated that Training Gemini Ultra used a large fleet of TPUv4 accelerators owned by Google across multiple datacenters. ... we combine SuperPods in multiple datacenters using Google's intra-cluster and inter-cluster network. Google's network latencies and bandwidths are sufficient to support the commonly used synchronous training paradigm, exploiting model parallelism within superpods and data-parallelism across superpods. This together with Amodei's claim of current $1 billion training runs and individual 100K H100 clusters still getting built ...

Jul 5, 2024 • 20min

LW - [Interim research report] Activation plateaus and sensitive directions in GPT2 by StefanHex

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Interim research report] Activation plateaus & sensitive directions in GPT2, published by StefanHex on July 5, 2024 on LessWrong. This part-report / part-proposal describes ongoing research, but I'd like to share early results for feedback. I am especially interested in any comment finding mistakes or trivial explanations for these results. I will work on this proposal with a LASR Labs team over the next 3 months. If you are working (or want to work) on something similar I would love to chat! Experiments and write-up by Stefan, with substantial inspiration and advice from Jake (who doesn't necessarily endorse every sloppy statement I write). Work produced at Apollo Research. TL,DR: Toy models of how neural networks compute new features in superposition seem to imply that neural networks that utilize superposition require some form of error correction to avoid interference spiraling out of control. This means small variations along a feature direction shouldn't affect model outputs, which I can test: 1. Activation plateaus: Real activations should be resistant to small perturbations. There should be a "plateau" in the output as a function of perturbation size. 2. Sensitive directions: Perturbations towards the direction of a feature should change the model output earlier (at a lower perturbation size) than perturbations into a random direction. I find that both of these predictions hold; the latter when I operationalize "feature" as the difference between two real model activations. As next steps we are planning to Test both predictions for SAE features: We have some evidence for the latter by Gurnee (2024) and Lindsey (2024). Are there different types of SAE features, atomic and composite features? Can we get a handle on the total number of features? If sensitivity-features line up with SAE features, can we find or improve SAE feature directions by finding local optima in sensitivity (similar to how Mack & Turner (2024) find steering vectors)? My motivation for this project is to get data on computation in superposition, and to get dataset-independent evidence for (SAE-)features. Core results & discussion I run two different experiments that test the error correction hypothesis: 1. Activation Plateaus: A real activation is the center of a plateau, in the sense that perturbing the activation affects the model output less than expected. Concretely: applying random-direction perturbations to an activation generated from a random openwebtext input ("real activation") has less effect than applying the same perturbations to a random activation (generated from a Normal distribution). This effect on the model can be measured in KL divergence of logits (shown below) but also L2 difference or cosine similarity of late-layer activations. 2. Sensitive directions: Perturbing a (real) activation into a direction towards another real activation ("poor man's feature directions") affects the model-outputs more than perturbing the same activation into a random direction. In the plot below focus on the size of the "plateau" in the left-hand side 1. Naive random direction vs mean & covariance-adjusted random: Naive isotropic random directions are much less sensitive. Thus we use mean & covariance-adjusted random activations everywhere else in this report. 2. The sensitive direction results are related to Gurnee (2024, SAE-replacement-error direction vs naive random direction) and Lindsey (2024, Anthropic April Updates, SAE-feature direction vs naive random direction). The theoretical explanation for activation plateaus & sensitive direction may be error correction (also referred to as noise suppression): NNs in superposition should expect small amounts of noise in feature activations due to interference. (The exact properties depend on how computation happens in superposition, this toy...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

The Nonlinear Library: LessWrong

Episodes

Mentioned books

LW - Towards shutdownable agents via stochastic choice by EJT

LW - On saying "Thank you" instead of "I'm Sorry" by Michael Cohn

LW - Reflections on Less Online by Error

LW - Indecision and internalized authority figures by Kaj Sotala

LW - LK-99 in retrospect by bhauth

LW - A "Bitter Lesson" Approach to Aligning AGI and ASI by RogerDearnaley

LW - An AI Manhattan Project is Not Inevitable by Maxwell Tabarrok

LW - AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0 by James Fox

LW - Musings on LLM Scale (Jul 2024) by Vladimir Nesov

LW - [Interim research report] Activation plateaus and sensitive directions in GPT2 by StefanHex

The AI-powered Podcast Player