The Nonlinear Library

The Nonlinear Fund
undefined
Jun 25, 2024 • 5min

LW - Higher-effort summer solstice: What if we used AI (i.e., Angel Island)? by Rachel Shu

Rachel Shu discusses the idea of using AI to enhance the Summer Solstice experience on Angel Island, focusing on creating a temporary alternate world, deeper conversations, immersive experiences, and thematic resonance. The concept involves 'Smolstice' and 'Solestus' events with activities like orienteering and rituals to foster community bonding and a memorable celebration.
undefined
Jun 25, 2024 • 9min

LW - The Minority Faction by Richard Ngo

In this episode, Richard Ngo, author of 'The Minority Faction', delves into his experience with red-teaming advanced LLMs, interacting with AI prototypes Kurzweil, Clark, and Nostradamus. They debate on stock picks, super-intelligence, and self-fulfilling prophecies, highlighting the importance of collaboration and preparedness for potential threats.
undefined
Jun 24, 2024 • 20min

LW - So you want to work on technical AI safety by gw

The podcast covers advice on working in technical AI safety, emphasizing the importance of research skills, burnout management, programming abilities, and career advancement. It provides practical tips on navigating the field, choosing research areas, and engaging in truth-seeking discussions for growth.
undefined
Jun 24, 2024 • 8min

LW - Sci-Fi books micro-reviews by Yair Halberstadt

Expert in science fiction literature, Yair Halberstadt, provides micro-reviews of sci-fi books. Ratings and insights given on 'A deepness in the sky,' 'A fire upon the deep,' 'Across Realtime,' and 'Children of Time.' Covering themes, technologies, alien civilizations, and recommendations for each book.
undefined
Jun 24, 2024 • 13min

AF - Compact Proofs of Model Performance via Mechanistic Interpretability by Lawrence Chan

Lawrence Chan discusses using mechanistic interpretability to create compact proofs of model performance. Topics include exploring proof strategies for small transformers, the importance of mechanistic understanding for tighter bounds, challenges in scaling proofs, and addressing structuralist noise in model behavior.
undefined
Jun 24, 2024 • 8min

EA - Farmed animals are neglected by Vasco Grilo

Vasco Grilo discusses the neglect of farmed animals and the disparity in disability and funding compared to humans. He highlights the cost-effectiveness of interventions for chicken welfare and the neglectedness of farmed animals. The podcast explores the analysis of annual disability and philanthropic spending on farmed animals versus humans, shedding light on areas for further research.
undefined
Jun 24, 2024 • 18min

LW - SAE feature geometry is outside the superposition hypothesis by jake mendel

Jake Mendel, author on LessWrong, discusses how SAE feature geometry goes beyond the superposition hypothesis, highlighting the importance of feature vectors' specific locations and rich structures. Understanding this geometry could lead to new theories or supplementing existing ones. The podcast explores the limitations of superposition-based interpretations and proposes alternative theories for neural network activation spaces.
undefined
Jun 24, 2024 • 14min

LW - LLM Generality is a Timeline Crux by eggsyntax

The podcast explores the limitations of Large Language Models in fully general reasoning, discusses ML research on their failures, potential solutions, and the impact on timelines for achieving AGI.
undefined
Jun 24, 2024 • 18min

AF - SAE feature geometry is outside the superposition hypothesis by Jake Mendel

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: SAE feature geometry is outside the superposition hypothesis, published by Jake Mendel on June 24, 2024 on The AI Alignment Forum. Written at Apollo Research Summary: Superposition-based interpretations of neural network activation spaces are incomplete. The specific locations of feature vectors contain crucial structural information beyond superposition, as seen in circular arrangements of day-of-the-week features and in the rich structures of feature UMAPs. We don't currently have good concepts for talking about this structure in feature geometry, but it is likely very important for model computation. An eventual understanding of feature geometry might look like a hodgepodge of case-specific explanations, or supplementing superposition with additional concepts, or plausibly an entirely new theory that supersedes superposition. To develop this understanding, it may be valuable to study toy models in depth and do theoretical or conceptual work in addition to studying frontier models. Epistemic status: Decently confident that the ideas here are directionally correct. I've been thinking these thoughts for a while, and recently got round to writing them up at a high level. Lots of people (including both SAE stans and SAE skeptics) have thought very similar things before and some of them have written about it in various places too. Some of my views, especially the merit of certain research approaches to tackle the problems I highlight, have been presented here without my best attempt to argue for them. What would it mean if we could fully understand an activation space through the lens of superposition? If you fully understand something, you can explain everything about it that matters to someone else in terms of concepts you (and hopefully they) understand. So we can think about how well I understand an activation space by how well I can communicate to you what the activation space is doing, and we can test if my explanation is good by seeing if you can construct a functionally equivalent activation space (which need not be completely identical of course) solely from the information I have given you. In the case of SAEs, here's what I might say: 1. The activation space contains this list of 100 million features, which I can describe concisely in words because they are monosemantic. 2. The features are embedded as vectors, and the activation vector on any input is a linear combination of the feature vectors that are related to the input. 3. As for where in the activation space each feature vector is placed, oh that doesn't really matter and any nearly orthogonal overcomplete basis will do. Or maybe if I'm being more sophisticated, I can specify the correlations between features and that's enough to pin down all the structure that matters - all the other details of the overcomplete basis are random. Every part of this explanation is in terms of things I understand precisely. My features are described in natural language, and I know what a random overcomplete basis is (although I'm on the fence about whether a large correlation matrix counts as something that I understand). The placement of each feature vector in the activation space matters Why might this description be insufficient? First, there is the pesky problem of SAE reconstruction errors, which are parts of activation vectors that are missed when we give this description. Second, not all features seem monosemantic, and it is hard to find semantic descriptions of even the most monosemantic features that have both high sensitivity and specificity, let alone descriptions which allow us to predict the quantitative values that activating features take on a particular input. But let's suppose that these issues have been solved: SAE improvements lead to perfect reconstruction and extremely monosemantic features, and new ...
undefined
Jun 24, 2024 • 21min

LW - On Claude 3.5 Sonnet by Zvi

The podcast discusses the release of Claude 3.5 Sonnet by Zvi, its impressive speed, and improvements compared to previous models. It also touches on the impact it may have in the market and its performance in outperforming other AI models like GPT-40.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app