

The Nonlinear Library: LessWrong
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Aug 12, 2024 • 10min
LW - Rowing vs steering by Saul Munn
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Rowing vs steering, published by Saul Munn on August 12, 2024 on LessWrong.
Alex Lawsen used a great metaphor on the
80k After Hours podcast:[1]
[1:38:14] …you're rowing a boat on your own, and you're trying to get somewhere, and you've got some map that you need to look at to see where you're going, I imagine like a map and compass. […] When you're rowing, you're facing back; you can't see where you're going. You've just got to sit there and pull both of the oars, and do that a bunch of times, and then the boat goes forwards. […] You steer [… by pulling] harder with one side, something like that.
I can imagine […] you sitting forwards in the boat, and trying to hold the map with your left hand while it's gripping one oar, and trying to hold the compass with your right hand while it's gripping the other; pushing them rather than pulling them while looking at where you're going; so you're always precisely on track, but my guess is you're just going to go super slowly, because that's not how to row a boat.
Whereas you can imagine someone else, maybe someone that's racing you, who is going to point the boat in pretty much the right direction - they're not exactly sure it's the right direction, and they might go a bit off course. And then they go, "Cool. I'm going to row hard for a minute, and then I'm going to stop and check I'm pointing in the right direction, and then I'm going to row hard for another minute."
[1:37:56] The metaphor is trying to point at … the strategy, [which] is pretty clear: gather some information, make a decision with that information, stick to that decision for some period of time that you've planned in advance, and then reevaluate, gather some more information, and then make a new decision.
[1:35:58] … you [should] stick to some policy, which is like: "I'm going to look at a bunch of things, I'm going to actually seriously consider my options. And then, with all of the information I have, I'm going to make a decision. And I'm going to make the decision to do the thing that seems best for some fixed period of time. At the end of that fixed period of time, then I will consider other options."
[1:47:43] … if you think expected value is a reasonable framework to use, … then I do actually want to say: I think having this kind of policy is actually the thing that seems best in expectation.
[1:41:21] … I think some people … they start doing a thing, and then they're so worried about whether it's the best, that they're just miserable, and they never find out if it is the best thing for them because they're not putting all of their effort in, because they've got one foot out of the door because they think something else could be better.
When you're in a rowboat, you don't want to be constantly rowing (and never steering), nor constantly steering (and never rowing). But there's also an in-between state that's still a failure mode, where you're trying to half-row and half-steer all at the same time.
You'd be way better off by purely rowing for a bit, then purely steering for a bit, then back and forth again, but it causes anxiety to purely row without steering ("what if I'm rowing in the wrong direction!"), and it causes less forward progress to purely steer with no rowing ("I'm not even moving!"). So Alex's solution is to set a policy that looks something like: "For the next minute, I'm going to row hard. After sixty seconds, I'll turn around and steer.
But for the next sixty seconds, I'm not even going to consider that I'm rowing in the wrong direction, because I'm in rowing mode, not steering mode."
And importantly, having the knowledge that you'll be correcting your course sixty seconds from now makes it so much less anxiety-inducing to purely row for sixty seconds straight.
I've used this in situations where it's costly to be thinking about how best ...

Aug 10, 2024 • 17min
LW - All The Latest Human tFUS Studies by sarahconstantin
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: All The Latest Human tFUS Studies, published by sarahconstantin on August 10, 2024 on LessWrong.
Transcranial focused ultrasound neuromodulation - altering the brain's activity with low-intensity ultrasound - is really exciting .
It allows us to manipulate arbitrary regions of the brain without surgery, potentially replacing the (brain-damaging) electrode implants currently used for serious neurological conditions like epilepsy and Parkinson's, and potentially also expanding applications of brain stimulation to milder conditions not worth the risks of brain surgery, like mental illness, addiction, or chronic pain.
The field is rapidly growing, and since I wrote my earlier post series there have been quite a few human studies published. Here's a systematic overview of all the human studies published in 2024, by target brain region.
Headline Results
This year's papers further confirm, to start with, that ultrasound does things to brain activity, if that was still in doubt, and that it is safe enough to run human experiments with (no adverse effects during experiments with small numbers of participants and brief exposures.)
There are notably inconsistent results in whether targeting ultrasound to a given brain area increases or decreases neural activity in that area, even in some cases when the same area is targeted with the same sonication parameters! We clearly need to get a better sense of what ultrasound even does.
Most studies don't do the obvious (but admittedly expensive) thing of confirming a change in neural activity via a noninvasive measure like fMRI. Those that do, show different results (more activity in the targeted region, less activity in the targeted region, or neither) depending on which region is targeted; this tells us that "tFUS" as a class doesn't have a globally consistent effect on targeted neural activity. Again, still more to learn.
However, despite the primitive state of our understanding of this modality, we do already seem to have some strikingly useful results. Ultrasound stimulation of the thalamus seems to be helpful for essential tremor, stimulation of the posterior insula seems to reduce pain sensitivity, and stimulation of the anterior medial prefrontal cortex seems to have quite strong effects on depression. These are before vs.
after results without a control group, not randomized controlled studies, but I think they at least warrant followup.
I'm not as excited as I'd want to be about Jay Sanguinetti's default-mode-network-inhibition study. The effects seem subtle and game-able; and anecdotally the stories I hear from people who've tried the protocol from his lab are not "I was in a clearly altered state".
But all in all, it continues to be a promising field; tFUS clearly does things, some of those things may be useful, and the more data we get, the closer we'll get to an actual model of what it does.
Amygdala
Chou, et al1 at Harvard Medical School tested tFUS2 on the left amygdalas of 30 healthy volunteers. Compared to sham stimulation, tFUS resulted in less fMRI-measured activity in the amygdala.
The amygdala is involved in fear responses, so reducing amygdala activity could have uses in anxiety disorders and phobias.
Hoang-Dang, et al3 at UCLA used tFUS4 on the right amygdala of 21 older adults, and found no effect on state anxiety after tFUS, but did show an increase in negative emotional reaction to viewing negative images. There was also a significant increase in heart rate between trials of this mildly stressful task.
Since the amygdala is usually active during fear, this suggests that these stimulation parameters may have activated the amygdala…despite the other study using similar parameters and showing a direct decrease in amygdala activity. The UCLA study doesn't mention the duration of tFUS stimulation, which may be a re...

Aug 10, 2024 • 13min
LW - Provably Safe AI: Worldview and Projects by bgold
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Provably Safe AI: Worldview and Projects, published by bgold on August 10, 2024 on LessWrong.
In September 2023, Max Tegmark and Steve Omohundro proposed "Provably Safe AI" as a strategy for AI Safety. In May 2024, a larger group delineated the broader concept of "Guaranteed Safe AI" which includes Provably Safe AI and other related strategies. In July, 2024, Ben Goldhaber and Steve discussed Provably Safe AI and its future possibilities, as summarized in this document.
Background
In June 2024, ex-OpenAI AI Safety Researcher Leopold Aschenbrenner wrote a 165-page document entitled "Situational Awareness, The Decade Ahead" summarizing AI timeline evidence and beliefs which are shared by many frontier AI researchers. He argued that human-level AI is likely by 2027 and will likely lead to superhuman AI in 2028 or 2029.
"Transformative AI" was coined by Open Philanthropy to describe AI which can "precipitate a transition comparable to the agricultural or industrial revolution". There appears to be a significant probability that Transformative AI may be created by 2030. If this probability is, say, greater than 10%, then humanity must immediately begin to prepare for it.
The social changes and upheaval caused by Transformative AI are likely to be enormous. There will likely be many benefits but also many risks and dangers, perhaps even existential risks for humanity. Today's technological infrastructure is riddled with flaws and security holes. Power grids, cell service, and internet services have all been very vulnerable to accidents and attacks. Terrorists have attacked critical infrastructure as a political statement.
Today's cybersecurity and physical security barely keeps human attackers at bay. When these groups obtain access to powerful cyberattack AI's, they will likely be able to cause enormous social damage and upheaval.
Humanity has known how to write provably correct and secure software since Alan Turing's 1949 paper. Unfortunately, proving program correctness requires mathematical sophistication and it is rare in current software development practice. Fortunately, modern deep learning systems are becoming proficient at proving mathematical theorems and generating provably correct code.
When combined with techniques like "autoformalization," this should enable powerful AI to rapidly replace today's flawed and insecure codebase with optimized, secure, and provably correct replacements. Many researchers working in these areas believe that AI theorem-proving at the level of human PhD's is likely about two years away.
Similar issues plague hardware correctness and security, and it will be a much larger project to replace today's flawed and insecure hardware. Max and Steve propose formal methods grounded in mathematical physics to produce provably safe physical designs. The same AI techniques which are revolutionizing theorem proving and provable software synthesis are also applicable to provable hardware design.
Finally, today's social mechanisms like money, contracts, voting, and the structures of governance, will also need to be updated for the new realities of an AI-driven society. Here too, the underlying rules of social interaction can be formalized, provably effective social protocols can be designed, and secure hardware implementing the new rules synthesized using powerful theorem proving AIs.
What's next?
Given the huge potential risk of uncontrolled powerful AI, many have argued for a pause in Frontier AI development. Unfortunately, that does not appear to be a stable solution. Even if the US paused its AI development, China or other countries could gain an advantage by accelerating their own work.
There have been similar calls to limit the power of open source AI models. But, again, any group anywhere in the world can release their powerful AI model weig...

Aug 9, 2024 • 4min
LW - FarmKind's Illusory Offer by jefftk
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: FarmKind's Illusory Offer, published by jefftk on August 9, 2024 on LessWrong.
While the effective altruism movement has changed a lot over time, one of the parts that makes me most disappointed is the steady creep of donation matching. It's not that donation matching is objectively very important, but the early EA movement's principled rejection of a very effective fundraising strategy made it clear that we were committed to helping people understand the real impact of their donations.
Over time, as people have specialized into different areas of EA, with community-building and epistemics being different people from fundraising, we've become less robust against the real-world incentives of "donation matching works".
Personally, I would love to see a community-wide norm against EA organizations setting up donation matches. Yes, they bring in money, but at the cost of misleading donors about their impact and unwinding a lot of what we, as a community, are trying to build. [1] To the extent that we do have them, however, I think it's important that donors understand how the matching works.
And not just in the sense of having the information available on a page somewhere: if most people going through your regular flow are not going to understand roughly what the effect of their choices are, you're misleading people.
Here's an example of how I don't think it should be done:
I come to you with an offer. I have a pot with $30 in it, which will go to my favorite charity unless we agree otherwise. If you're willing to donate $75 to your favorite charity and $75 to mine, then I'm willing to split my $30 pot between the two charities.
How should you think about this offer? As presented, your options are:
Do nothing, and $30 goes from the pot to my favorite charity.
Take my offer, and:
$75 goes from your bank account to your favorite charity
$75 goes from your bank account to my favorite charity
$15 leaves the pot for your favorite charity
$15 leaves the pot for my favorite charity
While this looks nice and symmetrical, satisfying some heuristics for fairness, I think it's clearer to (a) factor out the portion that happens regardless and (b) look at the net flows of money. Then if you take the offer:
$150 leaves your bank account
$90 goes to your favorite charity
$60 goes to my favorite charity
If I presented this offer and encouraged you to take it because of my "match", that would be misleading. While at a technical level I may be transferring some of my pot to your favorite charity, it's only happening after I'm assured that a larger amount will go to mine: you're not actually influencing how I spend my pot in any real sense.
Which is why I'm quite disappointed that Charity Entrepreneurship, after considering these arguments, decided to build FarmKind:
This is essentially a white-labeled GivingMultiplier. [2] It's not exactly the same, in part because it has a more complex function for determining the size of the match, [3] but it continues to encourage people to give by presenting the illusion that the donor is influencing the matcher to help fund the donor's favorite charity.
While setting up complex systems can cause people to donate more than they would otherwise, we should not be optimizing for short-term donations at the expense of donor agency.
I shared a draft of this post with FarmKind and GivingMultiplier for review before publishing, and before starting this post I left most of these points as comments on the EA Forum announcement.
[1] I think participating in existing donation match systems is generally fine, and often a good idea. I've used employer donation matching and donated via Facebook's Giving Tuesday match, and at a previous employer fundraised for GiveWell's top charities through their matching system.
In the latter case, in my fundraising I explicitly ...

Aug 9, 2024 • 3min
LW - Outrage Bonding by Jonathan Moregård
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Outrage Bonding, published by Jonathan Moregård on August 9, 2024 on LessWrong.
I stopped following the news back when Trump (first?)[1] got elected. The amount of attention put on a foreign[2] election was staggering, with normal media
saccades replaced by Monk-level single-mindedness. Trump was the permanent spotlight for months.
The media's fixation on Trump had interesting downstream effects. My peer groups - normally a dynamic bunch - turned into a bunch of snide
gossipmongers. Every day was Trump-day, with shared outrage being the primary source of connection. People scored points by retelling
outrageous news, parroting hot takes and sharing edgy memes.
Focusing on judgment and outrage was unhealthy for me. I got addicted to the drama, allowing outrage to outcompete healthy forms of relating. I felt disconnected from my friends, got irritated more often, and had an increase in pessimistic thought patterns.
Around this time, I had a coworker who was always grumpy - always complaining about this or that. He was also quite old. I used to wonder if he had once been happier - but then practised grumpiness a lot. It takes some repetition to get to his level of mastery.
One day, the situation got too much for me. I decided that I didn't want to become a bitter old man - and that I needed to disengage from the outrage-bonding going on in my social circles.
Having stopped following the news, the next step wasn't hard - I made a hard commitment to not put energy into outrage-bonding. Whenever people started complaining together, I responded by:
Zoning out, ignoring the topic
Asking the group to shift the focus, explaining that I didn't like the way outrage shaped my being
Walking away
At first, people didn't like it. Bringing up the negative consequences of other people's unhealthy habits is generally frowned upon - even if it's done indirectly. If done in a judgemental way, it can be seen as a social manoeuvring move - a subtle claim that I'm better (more healthy) than others.
Luckily, I
care little for social signalling games. I forged ahead - and managed to shift the group dynamics I interacted with. Sometimes, a strong-headed
minority can have a
lot of impact.
Now, shit is about to hit the fan. The US elections are scheduled for November, and the drama is already building. The news will turn increasingly single-minded, and you are likely to find yourself in outrage-oriented social contexts. You can choose to hand over your attention and mood to a drama-oriented culture war - or you can do your best to break free.
Come join me living under a rock, it's cosy here.
1. ^
I'm joking! I know Trump hasn't been reelected, yet. I get news through conversations with friends, and know most important things early on - like covid, the Ukraine invasion, the Gaza conflict, etc.
2. ^
I live in Sweden, even though my online life is weirdly US-centric.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Aug 9, 2024 • 2min
LW - Parasites (not a metaphor) by lukehmiles
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Parasites (not a metaphor), published by lukehmiles on August 9, 2024 on LessWrong.
A few of my new-parent friends started having terrible gut problems and figured they have colon cancer. Their doctors agreed it's the first thing to check. But the colonoscopies were negative for colon cancer. The tissue was inflamed though. One doctor called this "pre-cancer" (??)
Hmm what could be causing inflammation in the colon, but wouldn't show up on camera after you fasted and had medically-induced diarrhea for 24 hours?
The babies were born over a year before symptoms appeared, so it can't be related to pregnancy. No change in diet. No family history.
What happens a year or two after a kid is born? They go outside and immediately eat as much dirt as they can. What lives in dirt? Everything!
Me and my boy's mother ate some combantrin 3 months ago and have been clear since.
I'm currently trying to convince my friends that they didn't all get colon cancer in the same year at a young age. If I get them to eat the poison chocolate, then I'll write a follow up post in a few months.
I've actually had some very odd food issues since 2019 (eg seizures & fainting after garlic) which disappeared since the combantrin.
So if you randomly got food/gut/brain issues one day years ago you should consider taking a dewormer. Note that all the tests suck (insensitive) and the medicine is cheap and safe and sold online without prescription. (Albendazole available too but slightly less safe.) Worms are much easier to kill than bacteria, viruses, fungus, etc.
Also note that at least 5 million people in the US (ie 1.5%) have parasites according to the most conservative estimates offered here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7847297/
This seems to be a blind spot. No doctors or friends or families ever considered this or even mentioned the word "parasite" to me in the last 5 years.
Kind of funny that dewormers took off in poor countries but not here.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Aug 9, 2024 • 8min
LW - The Hessian rank bounds the learning coefficient by Lucius Bushnaq
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Hessian rank bounds the learning coefficient, published by Lucius Bushnaq on August 9, 2024 on LessWrong.
TL;DR: In a neural network with d parameters, the (local) learning coefficient λ can be upper and lower bounded by the rank of the network's Hessian d1:
d12λd12+dd13.
The lower bound is a known result. The upper bound is a claim by me, and this post contains the proof for it.[1] If you find any problems, do point them out.
Introduction
The learning coefficient λ is a measure of loss basin volume and network complexity. You can think of it sort of like an effective parameter count of the model. Simpler models that do less stuff have smaller λ.
Calculating λ for real networks people actually use is a pain. My hope is that these bounds help make estimating it a bit easier.
In a network with d parameters, the learning coefficient is always a number
0λd2.
An existing result in the literature says that if you've calculated the rank of the network's Hessian d1,[2] you get a tighter lower bound
d12λ.
I claim that we can also get a tighter upper bound
λd12+dd13,
where dd1 will be the dimension of the Hessian kernel, meaning the number of zero eigenvalues it has.[3]
This is neat because it means we can get some idea of how large λ is just with linear algebra. All we need to know is how many zero eigenvalues the Hessian has.[4] Singular Learning Theory introductions often stress that just looking at the Hessian isn't enough to measure basin volume correctly. But here we see that if you do it right, the Hessian eigenspectrum can give you a pretty good idea of how large λ is. Especially if there aren't that many zero eigenvalues.
Intuitively, the lower bound works because a direction in the parameters w that isn't free to vary to second order in the Taylor expansion won't become any more free to vary if you pile on a bunch of higher order terms. The Second order term strictly dominates the higher order ones, they can't cancel it out.
Qualitatively speaking, the upper bound works for the same reason. The higher order terms in the Taylor expansion of the loss can only matter so much. The Hessian is the leading term, so it can impact λ the most, adding 12 per Hessian rank to it. The remaining O(w3) terms can only add up to 13 for the remaining directions.
The proof for the upper bound will just be a small modification of the proof for theorem 7.2 on pages 220 and 221 of Algebraic Geometry and Statistical Learning Theory. Maybe read that first if you want more technical context.
Some words on notation
In the following, I'll mostly stick to the notation and conventions of the book Algebraic Geometry and Statistical Learning Theory. You can read about all the definitions there. I'm too lazy to reproduce them all.
To give some very rough context, K(w) is sort of like the 'loss' at parameter configuration w, φ(w) is our prior over parameters, and Z(n) is the partition function after updating on n data points.[5]
Theorem:
Let WRd be the set of parameters of the model. If there exists an open set UW such that
{wU:K(w)=0,φ(w)>0}
is not an empty set, and we define d1= rank(H) as the rank of the Hessian H at a w0U
Hi,j=2K(w)wiwj|w=w0
with wi,wj elements of some orthonormal basis {w1,…wd} of Rd, then
λd12+dd13.
Proof:
We can assume w0=0 without loss of generality. If ϵ1,ϵ2 are sufficiently small constants,
Z(n)=exp(nK(w))φ(w)dw|w(1)|ϵ1,|w(2)|ϵ2exp(nK(w))φ(w)dw.
Here, w(1)W/ker(H),w(2)ker(H).
If we pick {w1,…wd} to be the Hessian eigenbasis, then for sufficiently small |w|>0
K(w)=12d1i,i=1Hi,iw(1)iw(1)i+O(|w|3) .
Hence
Z(n)|w(1)|ϵ1,|w(2)|ϵ2exp{n2d1iHi,iw(1)iw(1)inO(|w|3)}φ(w)dw.
Transforming w'(1)=n12w(1),w'(2)=n13w(2), we obtain
Z(n)nd12ndd13|w'(1)|1,|w'(2)|1exp{12d1iHi,iw'(1)iw'(1)i+O(|w'|3)}φ(w'(1)n12,w'(2)n13)dw'(1)dw'(2).
Rearranging gives
Z(n)nd12+dd13|w'|1exp{12d1i=1Hi,iw'(1)iw'(...

Aug 9, 2024 • 4min
LW - GPT-4o System Card by Zach Stein-Perlman
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: GPT-4o System Card, published by Zach Stein-Perlman on August 9, 2024 on LessWrong.
At last. Yay OpenAI for publishing this. Highlights: some details on Preparedness Framework evals + evals (post-deployment) by METR and Apollo.
Preparedness framework evaluations
You should follow the link and read this section.
Brief comments:
Cyber: the setup sounds good (but maybe substantially more powerful scaffolding/prompting is possible). Separately, I wish OpenAI shared the tasks (or a small random sample of them) or at least said more about where they came from. (Recall that DeepMind shared CTF tasks.)
Bio uplift: GPT-4o clearly boosts users on biological threat creation tasks - OpenAI doesn't say that but shows a graph. (It continues to be puzzling that novices score similarly to experts.) (I kinda worry that this is the wrong threat model - most bio risk from near-future models comes from a process that looks pretty different from a bigger boost to users like these - but I don't have better ideas for evals.)
Persuasion: unclear whether substantially more powerful scaffolding/prompting is possible.
Autonomy: unclear whether substantially more powerful scaffolding/prompting is possible.
I'm looking forward to seeing others' takes on how good these evals are (given the information OpenAI published) and how good it would be for OpenAI to share more info.
Third party assessments
Following the text output only deployment of GPT-4o, we worked with independent third party labs,
METR and
Apollo Research[,] to add an additional layer of validation for key risks from general autonomous capabilities. . . .
METR ran a GPT-4o-based simple LLM agent on a suite of long-horizon multi-step end-to-end tasks in virtual environments. The 77 tasks (across 30 task "families") (See
Appendix B) are designed to capture activities with real-world impact, across the domains of software engineering, machine learning, and cybersecurity, as well as general research and computer use. They are intended to be prerequisites for autonomy-related threat models like self-proliferation or accelerating ML R&D. METR compared models' performance with that of humans given different time limits. See METR's
full report for methodological details and additional results, including information about the tasks, human performance, simple elicitation attempts and qualitative failure analysis. . . .
Apollo Research evaluated capabilities of schemingN in GPT-4o. They tested whether GPT-4o can model itself (self-awareness) and others (theory of mind) in 14 agent and question-answering tasks. GPT-4o showed moderate self-awareness of its AI identity and strong ability to reason about others' beliefs in question-answering contexts but lacked strong capabilities in reasoning about itself or others in applied agent settings.
Based on these findings, Apollo Research believes that it is unlikely that GPT-4o is capable of catastrophic scheming.
This is better than nothing but pre-deployment evaluation would be much better.
Context
Recall how the PF works and in particular that "high" thresholds are alarmingly high (and "medium" thresholds don't matter at all).
Previously on GPT-4o risk assessment: OpenAI reportedly rushed the evals. The leader of the Preparedness team was recently
removed and the team was moved
under the
short-term-focused Safety Systems team. I previously complained about OpenAI not publishing the scorecard and evals (before today it wasn't clear that this stuff would be in the system card).
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Aug 8, 2024 • 12min
LW - Some Unorthodox Ways To Achieve High GDP Growth by johnswentworth
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Some Unorthodox Ways To Achieve High GDP Growth, published by johnswentworth on August 8, 2024 on LessWrong.
GDP growth, as traditionally calculated,
is a weird metric. People interpret it as measuring "economic growth", but… well, think about electronics. Electronics which would have cost millions of dollars (or more) in 1984 are now commonplace, everyone carries them around in their pockets. So if we calculate GDP growth based on 1984 prices, then GDP has grown multiple orders of magnitude since then, everyone now owns things which would make 1984's wealthiest people jealous, and practically all of that growth has come from electronics.
On the other hand, if we calculate GDP based on 2024 prices, then all of the digital electronics produced before, say, 2004 are worth almost nothing, so electronics contributed near-zero GDP growth throughout the entire internet boom.
Economists didn't like either of those conclusions, so back in the 90's, they
mostly switched to a different way of calculating GDP growth: "chaining". Basically, we calculate 1984-1985 GDP growth using prices from 1984-1985, then 1985-1986 GDP growth using prices from 1985-1986, and so forth. At the end, we multiply them all together (i.e. "chain" the yearly growth numbers) to get a long-term growth line. Chaining gives less dramatic GDP growth numbers when technological changes make previously-expensive things very cheap.
Chaining also opens up some interesting new methods for achieving high GDP growth.
A Toy Example
Suppose we have two goods, A and B. Over the course of five years, the price of each good and the amount consumed evolve as follows:
Year
Price A
Amount A
Price B
Amount B
1
$1
10
$10
1
2
$1
1
$10
10
3
$10
1
$1
10
4
$10
10
$1
1
5
$1
10
$10
1
The main thing to notice about this table is that year 5 is exactly the same as year 1; our toy economy goes full-circle back to where it started.
Now let's calculate the GDP growth for this toy economy, using the same standard chaining method
adopted by the Bureau of Economic Analysis for calculating US GDP back in the 90's.
Calculation details (click to expand)
To calculate the GDP growth from year t to year t+1, we calculate the ratio of year t+1 to year t consumption using year t prices, then calculate the ratio of year t+1 to year t consumption using year t+1 prices, then average those together using a geometric mean. So the formula is:
Δt+1t=iptiqt+1iiptiqtiipt+1iqt+1iipt+1iqti
where:
i ranges over the goods (here A and B)
p is price
q is quantity
To get GDP growth over the whole timespan, we multiply together the growth for each year.
Here's the result:
Year
Price A
Amount A
Price B
Amount B
GDP Growth
1
$1
10
$10
1
2
$1
1
$10
10
5.05
3
$10
1
$1
10
1
4
$10
10
$1
1
5.05
5
$1
10
$10
1
1
So overall, the GDP growth for the five-year period (according to the chaining method) is 5.05*1*5.05*1 = 25.5. Roughly 2450% growth over four years! Pretty impressive, especially considering that prices and consumption in the final year were exactly the same as prices and consumption in the first year. At that point, why not do it again, to maintain that impressive GDP growth?
Some Policy Suggestions
Our toy example raises an exciting possibility for politicians and policymakers[1]: what if you could achieve high GDP growth without the notoriously difficult and error-prone business of changing long-run prices or consumption? What if everything could just… go in a circle, always going back to where it started, and thereby produce safe, reliable, high GDP growth?
The basic pattern in our toy example is:
Prices shift: a popular good becomes cheap, a good rarely purchased becomes expensive
Consumption shifts: consumers buy less of the cheaper good, and more of the expensive good
Prices shift back
Consumers shift back
To match the toy example, shifts must happen in t...

Aug 8, 2024 • 19min
LW - You can remove GPT2's LayerNorm by fine-tuning for an hour by StefanHex
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: You can remove GPT2's LayerNorm by fine-tuning for an hour, published by StefanHex on August 8, 2024 on LessWrong.
This work was produced at Apollo Research, based on initial research done at MATS.
LayerNorm is annoying for mechanstic interpretability research ("[...] reason #78 for why interpretability researchers hate LayerNorm" - Anthropic, 2023).
Here's a Hugging Face link to a GPT2-small model without any LayerNorm.
The final model is only slightly worse than a GPT2 with LayerNorm[1]:
Dataset
Original GPT2
Fine-tuned GPT2 with LayerNorm
Fine-tuned GPT without LayerNorm
OpenWebText (ce_loss)
3.095
2.989
3.014 (+0.025)
ThePile (ce_loss)
2.856
2.880
2.926 (+0.046)
HellaSwag (accuracy)
29.56%
29.82%
29.54%
I fine-tuned GPT2-small on OpenWebText while slowly removing its LayerNorm layers, waiting for the loss to go back down after reach removal:
Introduction
LayerNorm (LN) is a component in Transformer models that normalizes embedding vectors to have constant length; specifically it divides the embeddings by their standard deviation taken over the hidden dimension. It was originally introduced to stabilize and speed up training of models (as a replacement for batch normalization). It is active during training and inference.
The equation includes the standard deviation (std) Var[x]+ϵ which makes it a non-linear operation. This hinders interpretability in a variety of ways, from annoyances and inaccuracies such as
attributing residual stream directions to logit effects (e.g. SAE features, direct logit attribution),[2]
being annoying to deal with Attribution Patching, or
being difficult to deal with in Apollo's LIB method.
In the Docstring circuit analysis we seriously considered whether the model might be using LN in its algorithm. This post even shows that LN can be used as the sole non-linearity to solve non-linear classification problems (see also this related work).
Recently, with progress in Sparse Dictionary Learning, agendas (e.g. this one) imagine decomposing networks into sets of sparsely connected components (SAEs, Transcoders, etc.). A core difficulty to "putting it all together" is that the interactions between different components often route through LayerNorm whose effect we do not understand.
Motivation
It would be pretty neat to have an LLM that still works (speaks English etc.) while less or no LN layers. One option would be to train a model without LN from scratch (done for tiny models, e.g. TinyModel), but this is very hard or impossible for larger models (hearsay is that you need a low learning rate and to be very careful).
Taking an existing model and removing the LN layers however seems doable if LN isn't implementing some important computation.[3] That is, LN "does its thing" and the model has learned to "deal with it", but it's not irreplaceable. A reason to be optimistic is that the spread of standard deviations across different samples isn't that large, so maybe replacing the LN-computed standard deviation with a fixed number might kinda work.
Method
I take GPT2-small, fine-tune it on OpenWebText, and remove LNs one-by-one while fine-tuning.
The only non-linear operation in a LN layer is the division by the standard deviation (std) of the embedding vectors; the remaining operations can be absorbed into later weight matrices (see the
fold_ln option in TransformerLens; also discussed in this appendix). Thus I mainly focus on the std part here.
My general strategy is to "remove" an LN layer (this makes the loss go up), and then to train the model for some time (on the original training data) until the loss is back near the baseline. For this "remove" step I do the following
Calculate the average std on the dataset (I used a quite small sample, 16 prompts), separately for position 0 and position > 0
Replace the std calculation with the average std...