
The Nonlinear Library
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Latest episodes

Aug 23, 2024 • 57sec
EA - How do you deal with the "meat eater" problem by sammyboiz
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How do you deal with the "meat eater" problem, published by sammyboiz on August 23, 2024 on The Effective Altruism Forum.
The meat eater problem
I've heard people argue that it's possible global health and development could actually be negative utility due to the consequence of increased meat consumption and also factory farming. It feels like EAs who value animal suffering at all must provide very clear reasons why its okay to support saving the lives of meat-eaters and also developing the third-world given the "meat eater problem."
Even stepping away from utilitarianism, it seems more wrong to actively save someone who is very likely going to commit a moral atrocity (if you believe eating meat is a moral atrocity).
How do you deal with the meat-eater problem? I find the problem very compelling and I do not know myself how to deal with it.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Aug 23, 2024 • 53min
LW - AI #78: Some Welcome Calm by Zvi
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #78: Some Welcome Calm, published by Zvi on August 23, 2024 on LessWrong.
SB 1047 has been amended once more, with both strict improvements and big compromises. I cover the changes, and answer objections to the bill, in my extensive Guide to SB 1047. I follow that up here with reactions to the changes and some thoughts on where the debate goes from here. Ultimately, it is going to come down to one person: California Governor Gavin Newsom.
All of the debates we're having matter to the extent they influence this one person. If he wants the bill to become law, it almost certainly will become law. If he does not want that, then it won't become law, they never override a veto and if he makes that intention known then it likely wouldn't even get to his desk. For now, he's not telling.
Table of Contents
1. Introduction.
2. Table of Contents.
3. Language Models Offer Mundane Utility. AI sort of runs for mayor.
4. Language Models Don't Offer Mundane Utility. A go or no go decision.
5. Deepfaketown and Botpocalypse Soon. How hard is finding the desert of the real?
6. The Art of the Jailbreak. There is always a jailbreak. Should you prove it?
7. Get Involved. Also when not to get involved.
8. Introducing. New benchmark, longer PDFs, the hot new RealFakeGame.
9. In Other AI News. METR shares its conclusions on GPT-4o.
10. Quiet Speculations. Are we stuck at 4-level models due to Nvidia?
11. SB 1047: Nancy Pelosi. Local Nvidia investor expresses opinion.
12. SB 1047: Anthropic. You got most of what you wanted. Your move.
13. SB 1047: Reactions to the Changes. Reasonable people acted reasonably.
14. SB 1047: Big Picture. Things tend to ultimately be rather simple.
15. The Week in Audio. Joe Rogan talks to Peter Thiel.
16. Rhetorical Innovation. Matthew Yglesias offers improved taxonomy.
17. Aligning a Smarter Than Human Intelligence is Difficult. Proving things is hard.
18. The Lighter Side. The future, while coming, could be delayed a bit.
Language Models Offer Mundane Utility
Sully thinks the big models (Opus, 405B, GPT-4-0314) have that special something the medium-sized models don't have, no matter what the evals say.
A source for Llama-3.1-405-base, at $2 per million tokens (both input and output).
Accelerate development of fusion energy, perhaps? Steven Cowley makes the case that this may be AI's 'killer app.' This would be great, but if AI can accelerate fusion by decades as Cowley claims, then what else can it also do? So few people generalize.
Show the troll that AIs can understand what they're misinterpreting. I am not as optimistic about this strategy as Paul Graham, and look forward to his experiments.
Mayoral candidate in Cheyenne, Wyoming promises to let ChatGPT be mayor. You can tell that everyone involved it thinking well and taking it seriously, asking the hard questions:
"Is the computer system in city hall sufficient to handle AI?" one attendee, holding a wireless microphone at his seat, asked VIC.
"If elected, would you take a pay cut?" another wanted to know.
"How would you make your decisions according to human factor, involving humans, and having to make a decision that affects so many people?" a third chimed in.
After each question, a pause followed.
"Making decisions that affect many people requires a careful balance of data-driven insights and human empathy," VIC said in a male-sounding voice. "Here's how I would approach it," it added, before ticking off a six-part plan that included using AI to gather data on public opinion and responding to constituents at town halls.
OpenAI shut off his account, saying this was campaigning and thus against terms of service, but he quickly made another one. You can't actually stop anyone from using ChatGPT. And I think there Aint No Rule against using it for actual governing.
I still don't know how this 'AI Mayor' w...

Aug 22, 2024 • 12min
AF - Interoperable High Level Structures: Early Thoughts on Adjectives by johnswentworth
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Interoperable High Level Structures: Early Thoughts on Adjectives, published by johnswentworth on August 22, 2024 on The AI Alignment Forum.
Meta: This post is a relatively rough dump of some recent research thoughts; it's not one of our more polished posts, in terms of either clarity or rigor. You've been warned.
The
Interoperable Semantics post and the
Solomonoff Inductor Walks Into A Bar post each tackled the question of how different agents in the same world can coordinate on an ontology, so that language can work at all given only a handful of example usages of each word (similar to e.g. children learning new words). Both use
natural latents as a central mathematical tool - one in a Bayesian probabilistic framework, the other in a minimum description length framework. Both focus mainly on nouns, i.e. interoperable-across-minds clusters of "objects" in the environment.
… and the two propose totally different models. In one, the interoperability of cluster labels (i.e. nouns) follows from natural latent conditions over different features of each object. In the other, interoperability follows from natural latent conditions across objects, with no mention of features. The two models are not, in general, equivalent; they can't both be both correct and complete.
In this post, we'll propose that while the natural latent conditions over objects still seem to intuitively capture the rough notion of nouns, the natural latent conditions over features seem much better suited to adjectives. We'll briefly lay out two different potential ways to use natural latents over features as semantic values for adjectives. Then we'll talk a bit about implications, open threads and how this fits into a broader research gameplan.
The Problem
When children learn language, the cognitive process seems to go:
Observe the world a bunch
… organize knowledge of the world according to some categories, concepts, ontology, etc
… those categories, concepts, ontology, etc match other humans' categories, concepts, ontology, etc reasonably well
… so it only takes a handful of examples (1-3, say) of the use of a given word in order for the child to learn what the word refers to.
The crucial point here is that the categories/concepts/ontology are mostly learned before a word is attached; children do not brute-force learn categories/concepts/ontology from "labeled data". We can tell this is true mainly because it typically takes so few examples to learn the meaning of a new word.
The big puzzle, then, is that different humans learn mostly approximately the same categories/concepts/ontology - i.e. the same "candidates" to which words might point - as required for language to work at all with so few examples. How does that work? Mathematically, what are those "interoperable" categories/concepts/ontology, which different humans mostly convergently learn? How can we characterize them?
Or, somewhat earlier on the tech tree: can we find even a single model capable of accounting for the phenomenon of different minds in the same environment robustly converging on approximately the same categories/concepts/ontology? Forget whether we can find a model which correctly captures the ontology converged upon by humans, can we even find any model capable of accounting for any sort of robust ontological convergence? Can we find such a model for which the convergent ontology even vaguely
resembles the sorts of things in human language (nouns, verbs, adjectives, etc)? What would such a model even look like?
That's roughly the stage we're at in this post.
Two Previous Models: Naturality Over Objects vs Features
Our main tool is
(deterministic) natural latents. The usage looks like:
Suppose the different minds each look for (and find) a latent variable which satisfies the natural latent conditions over some lower-level variab...

Aug 22, 2024 • 25min
LW - A primer on the current state of longevity research by Abhishaike Mahajan
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A primer on the current state of longevity research, published by Abhishaike Mahajan on August 22, 2024 on LessWrong.
Note: This post is co-authored with
Stacy Li, a PhD student at Berkeley studying aging biology! Highly appreciate all her help in writing, editing, and fact-checking my understanding!
Introduction
The last time I read about aging research deeply was around 2021. The general impression I was getting was that aging research was increasingly more and more funded (good!). Unfortunately, none of the money led to actionable or useful insights (bad).
Over time, you get slightly burnt out by all the negative news.
After getting a job in biotech, I kept a hazy eye on the subject but mostly tuned out of it entirely. But, especially today, I am curious: how has the aging field progressed in the last few years? Since 2021, what has changed?
In this post, I'll share a list of immediate questions about the state of affairs in aging research, and the answers I've found for them. For each question, I'll offer some basic background knowledge required to understand the question. Feel free to skip that part if you already understand the question!
Did the therapeutic focus on sirtuins amount to much?
Background
Sirtuins are a family of signaling proteins, commonly referred to by their corresponding gene name, SIRT1, SIRT2, all the way up to SIRT7. Their primary role is deacetylation, which is just the removal of a chemical marker (acetyl) on proteins.
It was noticed in the 1980s that some sirtuin classes were especially involved in three key activities: modifying histones, which are proteins that tune the accessibility of DNA in the nucleus, transcriptional modification, which determines how DNA is interpreted by the body, and DNA repair, which speaks for itself. And anything involved in modifying and maintaining DNA is something worth paying attention to!
Studies in the 2000s showed that the activity of specific sirtuin classes strongly correlated with age; the young had more sirtuin activity, and the old had less. This seemed to be causative in aging;
overexpressing certain sirtuin genes led to lifespan increase and
downregulation of them led to lifespan decrease. The results were a bit mixed, and the results were for yeast cells - always a red flag - but there was some promise in viewing sirtuins as an aging target.
It turns out that editing humans to safely overexpress sirtuin genes is somewhat hard to do (as is expressing any gene in humans). But there was an easy way around that: focus on molecules that are required for sirtuin to do its job. A class of therapeutics grew from this:
sirtuin-activating compounds.
How do you activate sirtuins?
Well, sirtuins are dependent on NAD+, or nicotinamide adenine dinucleotide, to perform their primary function. Increasing cellular NAD+ levels could also be a way to indirectly push for more sirtuin activity. Practically speaking, NAD+ bioavailability is poor, so supplementation with precursors to NAD+, such as nicotinamide mononucleotide (NMN) and nicotinamide riboside (NR), was instead used.
There are plenty of other compounds in this category too: resveratrol, fisetin, and quercetin are all names you may hear mentioned.
How has this fared?
Answer
TLDR: The whole sirtuin theory was losing steam by the time I started reading about it a few years ago. It's only gotten worse. Nothing particularly useful has come from sirtuin-focused therapies, and likely nothing ever will.
A
Cell paper from 2018 found that NAD+ precursor supplementation didn't improve mice longevity. To be fair, they did show that supplementation improves some aspects of health-span, specifically improved glucose metabolism and reduced oxidative stress to the liver in aged mice, so still potentially useful. But nothing revolutionary.
Still, human clinical trials ...

Aug 22, 2024 • 23min
LW - The economics of space tethers by harsimony
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The economics of space tethers, published by harsimony on August 22, 2024 on LessWrong.
Some code for this post can be found here.
Space tethers take the old, defunct space elevator concept and shorten it. Rockets can fly up to a dangling hook in the sky and then climb to a higher orbit. If the tether rotates, it can act like a catapult, providing a significant boost in a location where providing thrust is expensive. Kurzgesagt has a nice explainer and ToughSF has a great piece explaining the mechanics and some applications.
Tethers make it cheaper to explore space, but how much cheaper? Let's look at the benefits.
Tether materials and characteristic velocity
The key performance metric for the tether material is the characteristic velocity:
Vc=2Tρ
Where T is the tensile strength and rho is the density.
The stronger and lighter the material is, the faster the tether can spin, boosting payloads to higher speeds and saving more fuel. This quickly leads to thinking about exotic materials. Hexagonal boron nitride! Carbon nanotubes! I'm not immune to this kind of speculation, so I've added an appendix on the topic. But as I argue in another part of the appendix, we already have good enough materials to make a space tether. The potential gain from studying exotic materials is actually pretty small.
For what it's worth, I like glass fibers. They're pretty easy to make, the material can be be sourced in space, they can handle large temperature ranges, and they're resistant to atomic oxygen environments and UV [1]. They can also get pretty good performance, S-2 glass fibers have a characteristic velocity close to 2 km/s while the best currently available material clocks in at 2.7 km/s.
Now let's look at why the speed of the tether matters.
Delta V and fuel savings
Rockets have to reach a certain speed in order to orbit any object. For low earth orbit, that's roughly 7.9 km/s; that's over Mach 20 here on Earth. The change in velocity, or delta V (dV), required to reach orbit is the currency of spaceflight. You can essentially map out the solar system based on the delta V needed to reach different places:
Source
It takes a lot of fuel and engineering to get a payload up to these speeds, making launches expensive [2][3]. Tethers are exciting because they can wait in orbit and offer a rocket some extra delta V. A tether spinning at 1.5 km/s in LEO can grab a rocket moving at 5.8 km/s and release it at 8.8 km/s:
Source
It takes a while to visualize how these work. Staring at this gif helps:
Source
Even a small delta V boost saves a lot of fuel. That's because the total fuel required for a mission increases exponentially with delta V requirements, as we can see from the Tsiolkovsky rocket equation:
ΔV=Ispg0ln(mimp)
I_sp is the specific impulse of the rocket, g_0 is the gravitational acceleration (often just called *g *in Earth's gravity), m_i is the total initial mass of the rocket including fuel, and m_p is the payload mass of the rocket after the fuel has been expended. Note that m_p includes both the literal payload and the mass of the rocket itself.
Rearranging to see the exponential:
mi=mpexp(ΔVIspg0)
m_i is the sum of the payload mass m_p and the fuel mass m_x. We can rewrite the above in terms of fuel mass:
mx=mp(exp(ΔVIspg0)1)
By offering a free delta V boost, tethers can save literal tons of fuel. If the tether is spinning at a certain velocity V_t, the tether provides a boost twice that size. You can subtract that boost from the dV requirements for the rocket:
ΔV'=ΔV2Vt
The new initial mass is:
m'i=mpexp(ΔV2VtIspg0)
The new fuel requirement is:
m'x=m'imp=mp(exp(ΔV2VtIspg0)1)
As an example, let's imagine a tether orbiting in LEO [4] at an orbital velocity of 7.5 km/s and spinning at 2 km/s. Our rocket only needs to reach 5.5 km/s in order to be boosted to 9.5 km/s. A Starsh...

Aug 22, 2024 • 9min
AF - A Robust Natural Latent Over A Mixed Distribution Is Natural Over The Distributions Which Were Mixed by johnswentworth
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Robust Natural Latent Over A Mixed Distribution Is Natural Over The Distributions Which Were Mixed, published by johnswentworth on August 22, 2024 on The AI Alignment Forum.
This post walks through the math for a theorem. It's intended to be a reference post, which we'll link back to as-needed from future posts.
The question which first motivated this theorem for us was: "Redness of a marker seems like maybe a natural latent over a bunch of parts of the marker, and redness of a car seems like maybe a natural latent over a bunch of parts of the car, but what makes redness of the marker 'the same as' redness of the car? How are they both instances of one natural thing, i.e. redness? (or 'color'?)".
But we're not going to explain in this post how the math might connect to that use-case; this post is just the math.
Suppose we have multiple distributions P1,…,Pk over the same random variables X1,…,Xn. (Speaking somewhat more precisely: the distributions are over the same set, and an element of that set is represented by values (x1,…,xn).) We take a mixture of the distributions: P[X]:=jαjPj[X], where jαj=1 and α is nonnegative.
Then our theorem says: if an approximate natural latent exists over P[X], and that latent is robustly natural under changing the mixture weights α, then the same latent is approximately natural over Pj[X] for all j.
Mathematically: the natural latent over P[X] is defined by (x,λP[Λ=λ|X=x]), and naturality means that the distribution (x,λP[Λ=λ|X=x]P[X=x]) satisfies the
naturality conditions (mediation and redundancy).The theorem says that, if the joint distribution (x,λP[Λ=λ|X=x]jαjPj[X=x]) satisfies the naturality conditions robustly with respect to changes in α, then (x,λP[Λ=λ|X=x]Pj[X=x]) satisfies the naturality conditions for all j.
"Robustness" here can be interpreted in multiple ways - we'll cover two here, one for which the theorem is trivial and another more substantive, but we expect there are probably more notions of "robustness" which also make the theorem work.
Trivial Version
First notion of robustness: the joint distribution (x,λP[Λ=λ|X=x]jαjPj[X=x]) satisfies the naturality conditions to within ϵ for all values of α (subject to jαj=1 and α nonnegative).
Then: the joint distribution (x,λP[Λ=λ|X=x]jαjPj[X=x]) satisfies the naturality conditions to within ϵ specifically for αj=δjk, i.e. α which is 0 in all entries except a 1 in entry k. In that case, the joint distribution is (x,λP[Λ=λ|X=x]Pk[X=x]), therefore Λ is natural over Pk. Invoke for each k, and the theorem is proven.
... but that's just abusing an overly-strong notion of robustness. Let's do a more interesting one.
Nontrivial Version
Second notion of robustness: the joint distribution (x,λP[Λ=λ|X=x]jαjPj[X=x]) satisfies the naturality conditions to within ϵ, and the gradient of the approximation error with respect to (allowed) changes in α is (locally) zero.
We need to prove that the joint distributions (x,λP[Λ=λ|X=x]Pj[X=x]) satisfy both the mediation and redundancy conditions for each j. We'll start with redundancy, because it's simpler.
Redundancy
We can express the approximation error of the redundancy condition with respect to Xi under the mixed distribution as
DKL(P[Λ,X]||P[X]P[Λ|Xi])=EX[DKL(P[Λ|X]||P[Λ|Xi])]
where, recall, P[Λ,X]:=P[Λ|X]jαjPj[X].
We can rewrite that approximation error as:
EX[DKL(P[Λ|X]||P[Λ|Xi])]
=jαjPj[X]DKL(P[Λ|X]||P[Λ|Xi])
=jαjEjX[DKL(P[Λ|X]||P[Λ|Xi])]
Note that Pj[Λ|X]=P[Λ|X] is the same under all the distributions (by definition), so:
=jαjDKL(Pj[Λ,X]||P[Λ|Xi])
and by
factorization transfer:
jαjDKL(Pj[Λ,X]||Pj[Λ|Xi])
In other words: if ϵji is the redundancy error with respect to Xi under distribution j, and ϵi is the redundancy error with respect to Xi under the mixed distribution P, then
ϵijαjϵji
The redundancy error of the mixed distribution is a...

Aug 22, 2024 • 18min
LW - Measuring Structure Development in Algorithmic Transformers by Micurie
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Measuring Structure Development in Algorithmic Transformers, published by Micurie on August 22, 2024 on LessWrong.
tl;dr: We compute the evolution of the local learning coefficient (LLC), a proxy for model complexity, for an algorithmic transformer. The LLC decreases as the model learns more structured solutions, such as head specialization.
This post is structured in three main parts, (1) a summary, giving an overview of the main results, (2) the Fine Print, that delves into various cross-checks and details and (3) Discussion and Conclusions.
Structure Formation in Algorithmic Transformers
In this work we study the development of simple algorithmic transformers, which are transformers that learn to perform algorithmic tasks. A major advantage of this setup is that we can control several (hyper)parameters, such as the complexity of the training data and network architecture. This allows us to do targeted experiments studying the impacts of these parameters on the learning dynamics.
The main tool we use to study the development is the Local Learning Coefficient (LLC) and we choose cases where we have a reverse-engineered solution.
Why use the LLC for this purpose? It is a theoretically well motivated measure of model complexity defined by Lau et.al. For an overview of Singular Learning Theory (which serves as the theoretical foundation for the LLC) see Liam Carol's Distilling SLT sequence. For a brief overview of the LLC see e.g. this post.
We use the same setup as CallumMcDougall's October Monthly Algorithmic Mech-Interp Challenge. The model is an attention only transformer, trained on sorting numbers with layer norm and weight decay on a cross-entropy loss function using the Adam optimizer. The residual stream size is 96 and the head dimension is 48. It is trained on sequences of the form
and to predict the next token starting at the separation token. The numbers in the list are sampled uniformly from 0 to 50, which together with the separation token produce a vocabulary of 52 tokens. Numbers do not repeat in the list.
1-Head Model
Let's first look at the case of a 1-head transformer:
The model reaches 100% accuracy around training step 100, confirming that a single attention head is sufficient for sorting, as noted in previous work. Once maximum accuracy is reached, the full QK and OV circuits[2] behave as described by Callum for the 2-head model:
In the QK circuit, source tokens attend more to the smallest token in the list larger than themselves. This results in a higher value band above the diagonal and a lower value band below the diagonal.
The OV circuit copies tokens, as seen by the clear positive diagonal pattern.
In addition, we observe a transition around training step 1000, where the LLC decreases while the accuracy stays unchanged. This is supported by a drop in the sum of the ranks[3] of the matrices in the heat maps.
It also coincides with the formation of the off-diagonal stripes in the OV-circuit. We speculate that these are simpler than the noisier off-diagonal OV pattern observed at peak LLC, and correspond to the translational symmetry of the problem. We define a Translational Symmetry measure[1] (see purple line in the plot) to capture the degree to which the circuits obey this symmetry. It increases throughout most of the training, even after the other measures stabilize.
2-Head Model
Let's now turn our attention to the 2-head transformer in Callum's original setup.
We see a lot of qualitative similarities to the evolution of the full QK and OV circuits for the 1-head model. As the LLC begins to drop (around training step 1000), we note the following:
QK circuit: Slight changes[5] to the attention pattern, which crystallize into triangular regions late in the training, long after the LLC has stabilized.
OV circuit: The heads specialize, splittin...

Aug 22, 2024 • 18min
AF - Measuring Structure Development in Algorithmic Transformers by Jasmina Nasufi
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Measuring Structure Development in Algorithmic Transformers, published by Jasmina Nasufi on August 22, 2024 on The AI Alignment Forum.
tl;dr: We compute the evolution of the local learning coefficient (LLC), a proxy for model complexity, for an algorithmic transformer. The LLC decreases as the model learns more structured solutions, such as head specialization.
This post is structured in three main parts, (1) a summary, giving an overview of the main results, (2) the Fine Print, that delves into various cross-checks and details and (3) Discussion and Conclusions.
Structure Formation in Algorithmic Transformers
In this work we study the development of simple algorithmic transformers, which are transformers that learn to perform algorithmic tasks. A major advantage of this setup is that we can control several (hyper)parameters, such as the complexity of the training data and network architecture. This allows us to do targeted experiments studying the impacts of these parameters on the learning dynamics.
The main tool we use to study the development is the Local Learning Coefficient (LLC) and we choose cases where we have a reverse-engineered solution.
Why use the LLC for this purpose? It is a theoretically well motivated measure of model complexity defined by Lau et.al. For an overview of Singular Learning Theory (which serves as the theoretical foundation for the LLC) see Liam Carol's Distilling SLT sequence. For a brief overview of the LLC see e.g. this post.
We use the same setup as CallumMcDougall's October Monthly Algorithmic Mech-Interp Challenge. The model is an attention only transformer, trained on sorting numbers with layer norm and weight decay on a cross-entropy loss function using the Adam optimizer. The residual stream size is 96 and the head dimension is 48. It is trained on sequences of the form
and to predict the next token starting at the separation token. The numbers in the list are sampled uniformly from 0 to 50, which together with the separation token produce a vocabulary of 52 tokens. Numbers do not repeat in the list. The images making up the gifs can be found here.
1-Head Model
Let's first look at the case of a 1-head transformer:
The model reaches 100% accuracy around training step 100, confirming that a single attention head is sufficient for sorting, as noted in previous work. Once maximum accuracy is reached, the full QK and OV circuits[2] behave as described by Callum for the 2-head model:
In the QK circuit, source tokens attend more to the smallest token in the list larger than themselves. This results in a higher value band above the diagonal and a lower value band below the diagonal.
The OV circuit copies tokens, as seen by the clear positive diagonal pattern.
In addition, we observe a transition around training step 1000, where the LLC decreases while the accuracy stays unchanged. This is supported by a drop in the sum of the ranks[3] of the matrices in the heat maps.
It also coincides with the formation of the off-diagonal stripes in the OV-circuit. We speculate that these are simpler than the noisier off-diagonal OV pattern observed at peak LLC, and correspond to the translational symmetry of the problem. We define a Translational Symmetry measure[1] (see purple line in the plot) to capture the degree to which the circuits obey this symmetry. It increases throughout most of the training, even after the other measures stabilize.
2-Head Model
Let's now turn our attention to the 2-head transformer in Callum's original setup.
We see a lot of qualitative similarities to the evolution of the full QK and OV circuits for the 1-head model. As the LLC begins to drop (around training step 1000), we note the following:
QK circuit: Slight changes[5] to the attention pattern, which crystallize into triangular regions late in the training, long aft...

Aug 22, 2024 • 3min
EA - Crowdsourcing The Best Ideas for Structural Democratic Reforms by Izzy Gainsburg
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Crowdsourcing The Best Ideas for Structural Democratic Reforms, published by Izzy Gainsburg on August 22, 2024 on The Effective Altruism Forum.
Hello EA community! My lab recently launched the Structural Democratic Reforms project, which is using crowdsourcing, expert evaluation, and messaging testing to identify the best democratic reforms to safeguard and strengthen American democracy. We're in the crowdsourcing phase, and we'd love to get folks from the EA community to submit their ideas for democratic reforms!
Here's why I'm especially keen on getting ideas from those in the EA community: Right now, our submitters are disproportionately academic political scientists. By tapping into the EA community, we'll hopefully have a more diverse set of ideas from a more diverse group of thinkers.
And because EA folks typically push for the most effective solutions--often with creative or unconventional ideas that others might miss--I thought it would be especially interesting and important crowdsource from this group.
Submissions can be made here and can take under 5 minutes. The deadline for submissions is September 15, 2024. Multiple submissions are welcome!
Why This Matters
A healthy American democracy underlies many EA cause areas--it is potentially important for promoting world peace, AI safety, economic prospretity, technological development, human rights, and more. For a more in depth discussion of why democracy is relevant to EA, you can read 80,000 hours brief cause area overview.
What We're Looking For
We're seeking ideas for structural democratic reforms that could be implemented via federal legislation, state legislation, executive order, or ballot initiative. These reforms should promote one or more of the following democratic principles:
Increasing citizens' influence on election and policy outcomes
Facilitating voter participation and ballot access
Ensuring integrity, transparency, and fairness of election systems
We welcome both established ideas and novel, creative solutions.
Spread the Word
Feel free to share this call for ideas widely! The more diverse perspectives we gather, the better. You can share this EA forum post or retweet our announcement on Twitter.
What Happens Next
After the submission period, an expert panel will evaluate these reforms on several dimensions (e.g., normative desirability, political viability). The most promising ideas will be further researched and potentially tested for public support and effective messaging strategies. See the project plan below.
We're Grateful for Your Input!
Your participation will make this project stronger! Every submission matters--conventional ideas point towards potential consensus, and unconventional ideas may illuminate hidden gems. More ideas (and more diverse ideas) will also give us confidence that we did our best at exhausting the ideas space. We're truly thankful for anyone who participates and/or spreads the word!
Finally, if you have feedback for the project, feel free to comment on the post or send us an email at democratic-reforms@stanford.edu.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Aug 22, 2024 • 7min
LW - Please do not use AI to write for you by Richard Kennaway
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Please do not use AI to write for you, published by Richard Kennaway on August 22, 2024 on LessWrong.
I've recently seen several articles here that were clearly generated or heavily assisted by AI. They are all dreadful. They are verbose, they are full of "on the one hand" and "on the other", they never make any assertion without also making room for the opposite, and end with "conclusions" that say nothing. Please do not do this.
I wrote the above, unassisted, and it states my own views. To illustrate the point I then had GPT-4o rewrite it. Here was the prompt:
I have recently seen a substantial increase in the proportion of postings to a certain online forum that were clearly generated by AI, although not marked as such. The writing is verbose and meandering, full of "on the one hand" and "on the other", never making an assertion without also suggesting its opposite, sounding oh-so-measured and even-handed that it ends up saying nothing at all.
Please write an article for me, written in the first person, arguing that AI tools in the current state of the art should never be used for this purpose.
Below the line is the chatbot's extruded meat product. This is the whole of its first output, unedited, and I did not press it to continue.
The first paragraph starts out ok, being merely verbose and repetitive but setting out the basic claim. And then the second paragraph backpedals it past where it began. The third paragraph floats off into the cloud layer, and so on for a total of nine paragraphs and 875 words that would be a waste of anyone's time to read, unless they want to see if their judgement agrees with mine.
It did not notice that I was using it in exactly the way I had asked it to argue against, and that the article displays all the faults that it purports to diagnose.
The splash image is a photograph from Wikipedia, CC BY-SA 3.0. I'll spare you the image that GPT-4o came up with, but y'know, I do find the AI illustrations that everyone heads their blog posts with these days pretty annoying. (ETA: Well, there was supposed to be an image, I filled out the "link preview" part of the entry form, but no image. But the link above will show you what it was.)
Why AI-Generated Content Shouldn't Flood Online Forums
Lately, I've noticed something troubling in the online forums I frequent: a distinct uptick in posts that seem like they've been generated by AI, though they're not labeled as such. The language is eerily similar across many of these posts - verbose, wandering, and conspicuously non-committal. Every argument is hedged with a counterpoint, every statement cushioned by a qualifier, making the writing sound balanced on the surface but ultimately empty.
The posts stretch on endlessly, piling one vague point onto another, yet they fail to really say anything. It's frustrating, to say the least, and I believe this trend reveals a serious problem with the way AI is being misused in online spaces.
I want to be clear: I'm not arguing that AI lacks merit as a tool. It has immense potential in fields like data analysis, automation, and even creativity when it complements human intelligence. However, when it comes to contributing to online discourse - spaces where real people go to express ideas, ask questions, and engage in genuine conversation - I believe AI-generated content, at least in its current form, does more harm than good.
First, let's talk about the nature of conversation itself. Forums, at their best, are places where people share perspectives, debate ideas, and solve problems together. What makes these interactions meaningful is that they are infused with human experience, emotion, and authenticity. Even when disagreements arise, they are rooted in real personal stakes - whether it's a passion for a hobby, a struggle to understand a concept, or a desire to ...