

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Apr 26, 2024 • 21min
EA - Bringing Monitoring, Evaluation and, Learning to animal advocates: 6 months of lessons learned by Nicoll Peracha
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Bringing Monitoring, Evaluation and, Learning to animal advocates: 6 months of lessons learned, published by Nicoll Peracha on April 26, 2024 on The Effective Altruism Forum.
Introduction and Summary
Why focus on Monitoring, Evaluation, and Learning in the animal cause area?
When we started our international interventions in September 2023, we were quite certain that MEL could increase the cost-effectiveness and impact of interventions in the animal cause area, and avoid doing harm. See e.g. this post about why Anima International suspended the campaign to end live fish sales in Poland (Anima International 2022).
Tools and insights from MEL can help organizations design potentially more (cost)-effective interventions from the start, know if their interventions are on track, and adapt their implementation when necessary.
We also believe MEL can contribute to increasing the evidence base for interventions in the animal cause area. Neill Buddy Shah, a co-founder of IDInsight, observed, "The animal welfare research infrastructure and ecosystem is incredibly immature compared to what has developed over decades in social policy, medicine, and public health." (EAG San Francisco 2019).
Since 2019, research and the number of animal-cause area-specific research databases have increased (Navigating Research Databases for Animal Advocacy, 2024). However, the amount of research available still pales compared to other cause areas.
Uncertainties and findings
We were less certain about the willingness and ability of Animal and Vegan Advocacy organizations to engage with MEL. We also didn't know if MEL tools used in other cause areas such as Global Health and Development would be applicable and useful in the animal cause area.
Overall, MEL is still a neglected topic in the animal community. EA-aligned organizations generally use MEL tools but many others don't, and so far we have only verified a handful of organizations that have complete MEL systems in place that do not require additional support.
Specialized support for MEL is still very limited. If you are interested in supporting animal organizations with MEL, please consider working with us as MEL Associate or communications volunteer.
Below you will find 11 key lessons learned from our pilot intervention to train and support animal and vegan advocacy organizations in Monitoring, Evaluation, and Learning.
We hope this post will be particularly relevant for charities and funders in the animal cause area and will lead to more organizations engaging with MEL and sharing best practices.
In this post, we will share what we've done so far, 11 key lessons we've learned, how they have influenced our strategy, and what you can do to help advance MEL and the overall evidence base in the animal cause area.
I. What have we done so far?
The Mission Motor's current interventions were shaped during AIM's (Charity Entrepreneurship) Incubation Program 2023. Between September 2023 and April 2024, we
trained and supported eight animal and vegan advocacy charities to develop and implement MEL systems
provided ad hoc support to another seven animal organizations
are building a community of MEL peers and practitioners through Slack and monthly meetings
Our interventions are in the ideation phase with a focus on learning if, and why or why not, they work as intended. Although we are still in the first half of our pilot program, we see some early successes. The training and coaching sessions helped staff members
increase their knowledge of MEL
design theories of change
identify key assumptions and risks
unify the collective understanding of their programs.
As a result, several organizations identified key activities they wish to add to their program, decided to focus their efforts on specific target groups, or otherwise change their program design.
To ...

Apr 26, 2024 • 20min
AF - An Introduction to AI Sandbagging by Teun van der Weij
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: An Introduction to AI Sandbagging, published by Teun van der Weij on April 26, 2024 on The AI Alignment Forum.
Summary: Evaluations provide crucial information to determine the safety of AI systems which might be deployed or (further) developed. These development and deployment decisions have important safety consequences, and therefore they require trustworthy information. One reason why evaluation results might be untrustworthy is sandbagging, which we define as strategic underperformance on an evaluation.
The strategic nature can originate from the developer (developer sandbagging) and the AI system itself (AI system sandbagging). This post is an introduction to the problem of sandbagging.
The Volkswagen emissions scandal
There are environmental regulations which require the reduction of harmful emissions from diesel vehicles, with the goal of protecting public health and the environment. Volkswagen struggled to meet these emissions standards while maintaining the desired performance and fuel efficiency of their diesel engines (Wikipedia).
Consequently, Volkswagen intentionally installed defeat devices in millions of vehicles to lower the amount of emissions only during tests, allowing the cars to emit up to 40 times the permitted levels of nitrogen oxides during normal driving conditions. The scandal showed corporate misconduct and a lack of effective oversight, and is a form of sandbagging. But clearly, cars and their manufacturers are different to AI systems and their developers. So, how does sandbagging relate to AI systems?
Sandbagging in the context of AI
Evaluations provide crucial information to determine the safety of AI systems which might be deployed or (further) developed. Frontier labs rely on evaluations to reduce catastrophic risks, as stated in e.g. OpenAI's preparedness framework and Anthropic's responsible scaling policy. Governmental institutions like the UK AI Safety Institute and the EU AI Office also utilize evaluations to manage risks. However, sandbagging can reduce the reliability of these evaluations.
Both the developers of an AI system and the AI system itself have incentives to sandbag. For example, the developer might want to appease stakeholders, and the AI system might want to acquire more resources after an evaluation to better achieve its goals. Importantly, sandbagging is technically feasible[1] and neglected. We hope this post leads to a better understanding of the problem of sandbagging.
Definition
We define sandbagging as strategic underperformance on an evaluation.
This has three main concepts: strategic, underperformance, and evaluation (or: eval), which we discuss below in turn. Notably, we use the term 'AI system' to describe various types of AIs, such as standalone models and models augmented with tools/scaffolding (Shavit et al., 2023).
Strategic
Strategic means that the developer of the AI system and/or the AI system itself underperforms on an evaluation because it believes it will benefit.[2] If the strategy comes from the developer, we call it developer sandbagging (relating to misuse). If the strategy comes from the AI system, we call it AI system sandbagging (relating to misalignment). We use the term developer to mean any entity responsible for the AI system being evaluated.
These entities have been called developers, deployers, or providers elsewhere (1, 2, 3).
By including the term 'strategic' in the definition of sandbagging, we exclude accidental underperformance. In other words, this condition captures the fact that underperformance was intentional, which is a central condition for legal responsibility, including in the context of AI (Ashton, 2022; Halpern, 2018; Ward, 2024). Non-strategic, or accidental, sandbagging brings about other less important safety problems.
Underperformance
Underperformance occurs whe...

Apr 26, 2024 • 8min
LW - Losing Faith In Contrarianism by omnizoid
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Losing Faith In Contrarianism, published by omnizoid on April 26, 2024 on LessWrong.
Crosspost from my blog.
If you spend a lot of time in the blogosphere, you'll find a great deal of people expressing contrarian views. If you hang out in the circles that I do, you'll probably have heard of
Yudkowsky say that dieting doesn't really work, Guzey say that sleep is overrated, Hanson argue that medicine doesn't improve health, various people argue for the lab leak, others argue for hereditarianism, Caplan argue that mental illness is mostly just aberrant preferences and education doesn't work, and various other people expressing contrarian views.
Often, very smart people - like Robin Hanson - will write long posts defending these views, other people will have criticisms, and it will all be such a tangled mess that you don't really know what to think about them.
For a while, I took a lot of these contrarian views pretty seriously. If I'd had to bet 6-months ago, I'd have bet on the lab leak, at maybe 2 to 1 odds. I'd have had significant credence in Hanson's view that healthcare doesn't improve health until pretty recently, when Scott released his post explaining why it is wrong.
Over time, though, I've become much less sympathetic to these contrarian views. It's become increasingly obvious that the things that make them catch on are unrelated to their truth. People like being provocative and tearing down sacred cows - as a result, when a smart articulate person comes along defending some contrarian view - perhaps one claiming that something we think is valuable is really worthless - the view spreads like wildfire, even if it's pretty implausible.
Sam Atis has an article titled
The Case Against Public Intellectuals. He starts it by noting a surprising fact: lots of his friends think education has no benefits. This isn't because they've done a thorough investigation of the literature - it's because they've read Bryan Caplan's book arguing for that thesis. Atis notes that there's a literature review finding that education has significant benefits, yet it's written by boring academics, so no one has read it.
Everyone wants to read the contrarians who criticize education - no one wants to read the boring lit reviews that say what we believed about education all along is right.
Sam is right, yet I think he understates the problem. There are various topics where arguing for one side of them is inherently interesting, yet arguing for the other side is boring. There are a lot of people who read Austian economics blogs, yet no one reads (or writes) anti-Austrian economics blogs. That's because there are a lot of fans of Austrians economics - people who are willing to read blogs on the subject - but almost no one who is really invested in Austrian economics being wrong.
So as a result, in general, the structural incentives of the blogosphere favor being a contrarian.
Thus, you should expect the sense of the debate you get, unless you peruse the academic literature in depth surrounding some topic, to be wildly skewed towards contrarian views. And I think this is exactly what we observe.
I've seen the contrarians be wrong over and over again - and this is what really made me lose faith in them. Whenever I looked more into a topic, whenever I got to the bottom of the full debate, it always seemed like the contrarian case fell apart.
It's easy for contrarians to portray their opponents as the kind of milquetoast bureaucrats who aren't very smart and follow the consensus just because it is the consensus. If Bryan Caplan has a disagreement with a random administrator, I trust that Bryan Caplan's probably right, because he's smarter and cares more about ideas.
But what I've come to realize is that the mainstream view that's supported by most of the academics tends to be supported by some r...

Apr 26, 2024 • 13min
EA - EA Meta Funding Landscape Report by Joel Tan
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: EA Meta Funding Landscape Report, published by Joel Tan on April 26, 2024 on The Effective Altruism Forum.
The Centre for Exploratory Altruism Research (CEARCH) is an EA organization working on cause prioritization research as well as grantmaking and donor advisory. This project was commissioned by the leadership of the Meta Charity Funders (MCF) - also known as the Meta Charity Funding Circle (MCFC) - with the objective of identifying what is underfunded vs overfunded in EA meta. The views expressed in this report are CEARCH's and do not necessarily reflect the position of the MCF.
Generally, by meta we refer to projects whose theory of change is indirect, and involving improving the EA movement's ability to do good - for example, via cause/intervention/charity prioritization (i.e. improving our knowledge of what is cost-effective); effective giving (i.e. increasing the pool of money donated in an impact-oriented way); or talent development (i.e. increasing the pool and ability of people willing and able to work in impactful careers).
The full report may be found here (link). Note that the public version of the report is partially redacted, to respect the confidentiality of certain grants, as well as the anonymity of the people whom we interviewed or surveyed.
Quantitative Findings
Detailed Findings
To access our detailed findings, refer to our spreadsheet (
link).
Overall Meta Funding
Aggregate EA meta funding saw rapid growth and equally rapid contraction over 2021 to 2023 - growing from 109 million in 2021 to 193 million in 2022, before shrinking back to 117 million in 2023. The analysis excludes FTX, as ongoing clawbacks mean that their funding has functioned less as grants and more as loans.
Open Philanthropy is by far the biggest funder in the space, and changes in the meta funding landscape are largely driven by changes in OP's spending. And indeed, OP's global catastrophic risks (GCR) capacity building grants tripled from 2021 to 2022, before falling to twice the 2021 baseline in 2023.
This finding is in line with Tyler Maule's
previous analysis.
Meta Funding by Cause Area
The funding allocation by cause was, in descending order: (most funding) longtermism (i.e. AI, biosecurity, nuclear etc) (274 million) >> global health and development (GHD) (67 million) > cross-cause (53 million) > animal welfare (25 million) (least funding).
Meta Funding by Intervention
The funding allocation by intervention was, in descending order: (most funding) other/miscellaneous (e.g. general community building, including by national/local EA organizations; events; community infrastructure; co-working spaces; fellowships for community builders; production of EA-adjacent media content; translation projects; student outreach; and book purchases etc) (193 million) > talent (121 million) > prioritization (92 million) >> effective giving (13 million) (least funding).
One note of caution - we believe our results overstate how well funded prioritization is, relative to the other three intervention types.
We take into account what grantmakers spend internally on prioritization research, but for lack of time, we do not perform an equivalent analysis for non-grantmakers (i.e. imputing their budget to effective giving, talent, and other/miscellaneous).
For simplicity, we classified all grantmakers (except GWWC) as engaging in prioritization, though some grantmakers (e.g. Founders Pledge, Longview) also do effective giving work.
Meta Funding by Cause Area & Intervention
The funding allocation by cause/intervention subgroup was as follows:
The areas with the most funding were: longtermist other/miscellaneous (153 million) > longtermist talent (106 million) > GHD prioritization (51 million).
The areas with the least funding were GHD other/miscellaneous (3,000,000) > animal welfare effective givi...

Apr 26, 2024 • 11min
LW - LLMs seem (relatively) safe by JustisMills
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: LLMs seem (relatively) safe, published by JustisMills on April 26, 2024 on LessWrong.
Post for a somewhat more general audience than the modal LessWrong reader, but gets at my actual thoughts on the topic.
In 2018 OpenAI
defeated the world champions of Dota 2, a major esports game. This was hot on the heels of DeepMind's AlphaGo performance
against Lee Sedol in 2016, achieving superhuman Go performance way before anyone thought that might happen. AI benchmarks were being cleared at a pace which felt breathtaking at the time, papers were proudly published, and ML tools like Tensorflow (released in 2015) were coming online. To people already interested in AI, it was an exciting era. To everyone else, the world was unchanged.
Now Saturday Night Live sketches use sober discussions of AI risk
as the backdrop for their actual jokes, there are
hundreds of AI bills moving through the world's legislatures, and Eliezer Yudkowsky
is featured in Time Magazine.
For people who have been predicting, since well before AI was cool (and now passe), that it could spell doom for humanity, this explosion of mainstream attention is a dark portent. Billion dollar AI companies keep springing up and allying with the largest tech companies in the world, and bottlenecks like money, energy, and talent are widening considerably. If current approaches can get us to superhuman AI in principle, it seems like they will in practice, and soon.
But what if large language models, the vanguard of the AI movement, are actually safer than what came before? What if the path we're on is less perilous than what we might have hoped for, back in 2017? It seems that way to me.
LLMs are self limiting
To train a large language model, you need an absolutely massive amount of data. The core thing these models are doing is predicting the next few letters of text, over and over again, and they need to be trained on billions and billions of words of human-generated text to get good at it.
Compare this process to
AlphaZero, DeepMind's algorithm that superhumanly masters Chess, Go, and Shogi. AlphaZero trains by playing against itself. While older chess engines bootstrap themselves by observing the records of countless human games, AlphaZero simply learns by doing. Which means that the only bottleneck for training it is computation - given enough energy, it can just play itself forever, and keep getting new data.
Not so with LLMs: their source of data is human-produced text, and human-produced text is a finite resource.
The precise datasets used to train cutting-edge LLMs are secret, but let's suppose that they include a fair bit of the low hanging fruit: maybe 5% of publicly available text that is in principle available and not garbage. You can schlep your way to a 20x bigger dataset in that case, though you'll hit diminishing returns as you have to, for example, generate transcripts of random videos and filter old mailing list threads for metadata and spam.
But nothing you do is going to get you 1,000x the training data, at least not in the short run.
Scaling laws are among the watershed discoveries of ML research in the last decade; basically, these are equations that project how much oomph you get out of increasing the size, training time, and dataset that go into a model. And as it turns out, the amount of high quality data is extremely important, and often becomes the bottleneck.
It's easy to take this fact for granted now, but it wasn't always obvious! If computational power or model size was usually the bottleneck, we could just make bigger and bigger computers and reliably get smarter and smarter AIs. But that only works to a point, because it turns out we need high quality data too, and high quality data is finite (and, as the political apparatus wakes up to what's going on,
legally fraught).
There are rumbling...

Apr 25, 2024 • 2min
EA - Animals in Cost-Benefit Analysis by Vasco Grilo
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Animals in Cost-Benefit Analysis, published by Vasco Grilo on April 25, 2024 on The Effective Altruism Forum.
This is a linkpost for Animals in Cost-Benefit Analysis by Andrew Stawasz. The article is forthcoming in the University of Michigan Journal of Law Reform.
Abstract
Federal agencies' cost-benefit analyses do not capture nonhuman animals' ("animals'") interests. This omission matters. Cost-benefit analysis drives many regulatory decisions that substantially affect many billions of animals. That omission creates a regulatory blind spot that is untenable as a matter of morality and of policy.
This Article advances two claims related to valuing animals in cost-benefit analyses. The Weak Claim argues that agencies typically may do so. No legal prohibitions usually exist, and such valuation is within agencies' legitimate discretion. The Strong Claim argues that agencies often must do so if a policy would substantially affect animals. Cost-benefit analysis is concerned with improving welfare, and no argument for entirely omitting animals' welfare holds water.
Agencies have several options to implement this vision. These options include, most preferably, human-derived valuations (albeit in limited circumstances), interspecies comparisons, direct estimates of animals' preferences, and, at a minimum, breakeven analysis. Agencies could deal with uncertainty by conducting sensitivity analyses or combining methods.
For any method, agencies should consider what happens when a policy would save animals from some bad outcomes and what form a mandate to value animals should take.
Valuing animals could have mattered for many cost-benefit analyses, including those for pet-food safety regulations and a rear backup camera mandate. As a sort of "proof of concept," this Article shows that even a simple breakeven analysis from affected animals' perspective paints even the thoroughly investigated policy decision at issue in Entergy Corp. v. Riverkeeper, Inc. in an informative new light.
Table of contents
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Apr 25, 2024 • 8min
LW - WSJ: Inside Amazon's Secret Operation to Gather Intel on Rivals by trevor
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: WSJ: Inside Amazon's Secret Operation to Gather Intel on Rivals, published by trevor on April 25, 2024 on LessWrong.
The operation, called Big River Services International, sells around $1 million a year of goods through e-commerce marketplaces including
eBay,
Shopify,
Walmart and
Amazon AMZN 1.49%increase; green up pointing triangle.com under brand names such as Rapid Cascade and Svea Bliss. "We are entrepreneurs, thinkers, marketers and creators," Big River says on its website. "We have a passion for customers and aren't afraid to experiment."
What the website doesn't say is that Big River is an arm of Amazon that surreptitiously gathers intelligence on the tech giant's competitors.
Born out of a 2015 plan code named "Project Curiosity," Big River uses its sales across multiple countries to obtain pricing data, logistics information and other details about rival e-commerce marketplaces,
logistics operations and payments services, according to people familiar with Big River and corporate documents viewed by The Wall Street Journal. The team then shared that information with Amazon to incorporate into decisions about its own business.
...
The story of Big River offers new insight into Amazon's
elaborate efforts to stay ahead of rivals. Team members attended their rivals' seller conferences and met with competitors identifying themselves only as employees of Big River Services, instead of disclosing that they worked for Amazon.
They were given non-Amazon email addresses to use externally - in emails with people at Amazon, they used Amazon email addresses - and took other extraordinary measures to keep the project secret. They disseminated their reports to Amazon executives using printed, numbered copies rather than email. Those who worked on the project weren't even supposed to discuss the relationship internally with most teams at Amazon.
An internal crisis-management paper gave advice on what to say if discovered. The response to questions should be: "We make a variety of products available to customers through a number of subsidiaries and online channels." In conversations, in the event of a leak they were told to focus on the group being formed to improve the seller experience on Amazon, and say that such research is normal, according to people familiar with the discussions.
Senior Amazon executives, including Doug Herrington, Amazon's current CEO of Worldwide Amazon Stores, were regularly briefed on the Project Curiosity team's work, according to one of the people familiar with Big River.
...
Virtually all companies research their competitors, reading public documents for information, buying their products or shopping their stores. Lawyers say there is a difference between such corporate intelligence gathering of publicly available information, and what is known as corporate or industrial espionage.
Companies can get into legal trouble for actions such as hiring a rival's former employee to obtain trade secrets or hacking a rival. Misrepresenting themselves to competitors to gain proprietary information can lead to suits on trade secret misappropriation, said Elizabeth Rowe, a professor at the University of Virginia School of Law who specializes in trade secret law.
...
The benchmarking team pitched "Project Curiosity" to senior management and got the approval to buy inventory, use a shell company and find warehouses in the U.S., Germany, England, India and Japan so they could pose as sellers on competitors' websites.
...
Once launched, the focus of the project quickly started shifting to gathering information about rivals, the people said.
...
The team presented its findings from being part of the FedEx program to senior Amazon logistics leaders. They used the code name "OnTime Inc." to refer to FedEx. Amazon made changes to its Fulfillment by Amazon service to ...

Apr 25, 2024 • 14min
LW - "Why I Write" by George Orwell (1946) by Arjun Panickssery
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "Why I Write" by George Orwell (1946), published by Arjun Panickssery on April 25, 2024 on LessWrong.
People have been posting great essays so that they're "fed through the standard LessWrong algorithm." This essay is in the public domain in the UK but not the US.
From a very early age, perhaps the age of five or six, I knew that when I grew up I should be a writer. Between the ages of about seventeen and twenty-four I tried to abandon this idea, but I did so with the consciousness that I was outraging my true nature and that sooner or later I should have to settle down and write books.
I was the middle child of three, but there was a gap of five years on either side, and I barely saw my father before I was eight. For this and other reasons I was somewhat lonely, and I soon developed disagreeable mannerisms which made me unpopular throughout my schooldays. I had the lonely child's habit of making up stories and holding conversations with imaginary persons, and I think from the very start my literary ambitions were mixed up with the feeling of being isolated and undervalued.
I knew that I had a facility with words and a power of facing unpleasant facts, and I felt that this created a sort of private world in which I could get my own back for my failure in everyday life. Nevertheless the volume of serious - i.e. seriously intended - writing which I produced all through my childhood and boyhood would not amount to half a dozen pages. I wrote my first poem at the age of four or five, my mother taking it down to dictation.
I cannot remember anything about it except that it was about a tiger and the tiger had 'chair-like teeth' - a good enough phrase, but I fancy the poem was a plagiarism of Blake's 'Tiger, Tiger'. At eleven, when the war or 1914-18 broke out, I wrote a patriotic poem which was printed in the local newspaper, as was another, two years later, on the death of Kitchener. From time to time, when I was a bit older, I wrote bad and usually unfinished 'nature poems' in the Georgian style.
I also, about twice, attempted a short story which was a ghastly failure. That was the total of the would-be serious work that I actually set down on paper during all those years.
However, throughout this time I did in a sense engage in literary activities. To begin with there was the made-to-order stuff which I produced quickly, easily and without much pleasure to myself. Apart from school work, I wrote vers d'occasion, semi-comic poems which I could turn out at what now seems to me astonishing speed - at fourteen I wrote a whole rhyming play, in imitation of Aristophanes, in about a week - and helped to edit school magazines, both printed and in manuscript.
These magazines were the most pitiful burlesque stuff that you could imagine, and I took far less trouble with them than I now would with the cheapest journalism. But side by side with all this, for fifteen years or more, I was carrying out a literary exercise of a quite different kind: this was the making up of a continuous "story" about myself, a sort of diary existing only in the mind. I believe this is a common habit of children and adolescents.
As a very small child I used to imagine that I was, say, Robin Hood, and picture myself as the hero of thrilling adventures, but quite soon my "story" ceased to be narcissistic in a crude way and became more and more a mere description of what I was doing and the things I saw. For minutes at a time this kind of thing would be running through my head: 'He pushed the door open and entered the room.
A yellow beam of sunlight, filtering through the muslin curtains, slanted on to the table, where a matchbox, half-open, lay beside the inkpot. With his right hand in his pocket he moved across to the window. Down in the street a tortoiseshell cat was chasing a dead leaf,' etc., etc. Thi...

Apr 25, 2024 • 1h 32min
AF - AXRP Episode 29 - Science of Deep Learning with Vikrant Varma by DanielFilan
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AXRP Episode 29 - Science of Deep Learning with Vikrant Varma, published by DanielFilan on April 25, 2024 on The AI Alignment Forum.
In 2022, it was announced that a fairly simple method can be used to extract the true beliefs of a language model on any given topic, without having to actually understand the topic at hand.
Earlier, in 2021, it was announced that neural networks sometimes 'grok': that is, when training them on certain tasks, they initially memorize their training data (achieving their training goal in a way that doesn't generalize), but then suddenly switch to understanding the 'real' solution in a way that generalizes.
What's going on with these discoveries? Are they all they're cracked up to be, and if so, how are they working? In this episode, I talk to Vikrant Varma about his research getting to the bottom of these questions.
Topics we discuss:
Challenges with unsupervised LLM knowledge discovery, aka contra CCS
What is CCS?
Consistent and contrastive features other than model beliefs
Understanding the banana/shed mystery
Future CCS-like approaches
CCS as principal component analysis
Explaining grokking through circuit efficiency
Why research science of deep learning?
Summary of the paper's hypothesis
What are 'circuits'?
The role of complexity
Many kinds of circuits
How circuits are learned
Semi-grokking and ungrokking
Generalizing the results
Vikrant's research approach
The DeepMind alignment team
Follow-up work
Daniel Filan: Hello, everybody. In this episode I'll be speaking with Vikrant Varma, a research engineer at Google DeepMind, and the technical lead of their sparse autoencoders effort. Today, we'll be talking about his research on problems with contrast-consistent search, and also explaining grokking through circuit efficiency. For links what we're discussing, you can check the description of this episode and you can read the transcript at axrp.net.
All right, well, welcome to the podcast.
Vikrant Varma: Thanks, Daniel. Thanks for having me.
Challenges with unsupervised LLM knowledge discovery, aka contra CCS
What is CCS?
Daniel Filan: Yeah. So first, I'd like to talk about this paper. It is called Challenges with Unsupervised LLM Knowledge Discovery, and the authors are Sebastian Farquhar, you, Zachary Kenton, Johannes Gasteiger, Vladimir Mikulik, and Rohin Shah. This is basically about this thing called CCS. Can you tell us: what does CCS stand for and what is it?
Vikrant Varma: Yeah, CCS stands for contrastive-consistent search. I think to explain what it's about, let me start from a more fundamental problem that we have with advanced AI systems. One of the problems is that when we train AI systems, we're training them to produce outputs that look good to us, and so this is the supervision that we're able to give to the system. We currently don't really have a good idea of how an AI system or how a neural network is computing those outputs.
And in particular, we're worried about the situation in the future when the amount of supervision we're able to give it causes it to achieve a superhuman level of performance at that task. By looking at the network, we can't know how this is going to behave in a new situation.
And so the Alignment Research Center put out a report recently about this problem. They named a potential part of this problem as "eliciting latent knowledge". What this means is if your model is, for example, really, really good at figuring out what's going to happen next in a video, as in it's able to predict the next frame of a video really well given a prefix of the video, this must mean that it has some sort of model of what's going on in the world.
Instead of using the outputs of the model, if you could directly look at what it understands about the world, then potentially, you could use that information in a much safer ...

Apr 25, 2024 • 1min
AF - Improving Dictionary Learning with Gated Sparse Autoencoders by Neel Nanda
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Improving Dictionary Learning with Gated Sparse Autoencoders, published by Neel Nanda on April 25, 2024 on The AI Alignment Forum.
Authors: Senthooran Rajamanoharan*, Arthur Conmy*, Lewis Smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, Neel Nanda
A new paper from the Google DeepMind mech interp team: Improving Dictionary Learning with Gated Sparse Autoencoders!
Gated SAEs are a new Sparse Autoencoder architecture that seems to be a significant Pareto-improvement over normal SAEs, verified on models up to Gemma 7B. They are now our team's preferred way to train sparse autoencoders, and we'd love to see them adopted by the community! (Or to be convinced that it would be a bad idea for them to be adopted by the community!)
They achieve similar reconstruction with about half as many firing features, and while being either comparably or more interpretable (confidence interval for the increase is 0%-13%).
See Sen's Twitter summary, my Twitter summary, and the paper!
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.


