

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Jun 11, 2024 • 19min
LW - AI takeoff and nuclear war by owencb
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI takeoff and nuclear war, published by owencb on June 11, 2024 on LessWrong.
Summary
As we approach and pass through an AI takeoff period, the risk of nuclear war (or other all-out global conflict) will increase.
An AI takeoff would involve the automation of scientific and technological research. This would lead to much faster technological progress, including military technologies. In such a rapidly changing world, some of the circumstances which underpin the current peaceful equilibrium will dissolve or change. There are then two risks[1]:
1. Fundamental instability. New circumstances could give a situation where there is no peaceful equilibrium it is in everyone's interests to maintain.
e.g.
If nuclear calculus changes to make second strike capabilities infeasible
If one party is racing ahead with technological progress and will soon trivially outmatch the rest of the world, without any way to credibly commit not to completely disempower them after it has done so
2. Failure to navigate. Despite the existence of new peaceful equilibria, decision-makers might fail to reach one.
e.g.
If decision-makers misunderstand the strategic position, they may hold out for a more favourable outcome they (incorrectly) believe is fair
If the only peaceful equilibria are convoluted and unprecedented, leaders may not be able to identify or build trust in them in a timely fashion
Individual leaders might choose a path of war that would be good for them personally as they solidify power with AI; or nations might hold strongly to values like sovereignty that could make cooperation much harder
Of these two risks, it is likely simpler to work to reduce the risk of failure to navigate. The three straightforward strategies here are research & dissemination, to ensure that the basic strategic situation is common knowledge among decision-makers, spreading positive-sum frames, and crafting and getting buy-in to meaningful commitments about sharing the power from AI, to reduce incentives for anyone to initiate war.
Additionally, powerful AI tools could change the landscape in ways that reduce either or both of these risks. A fourth strategy, therefore, is to differentially accelerate risk-reducing applications of AI. These could include:
Tools to help decision-makers make sense of the changing world and make wise choices;
Tools to facilitate otherwise impossible agreements via mutually trusted artificial judges;
Tools for better democratic accountability.
Why do(n't) people go to war?
To date, the world has been pretty good at avoiding thermonuclear war. The doctrine of mutually assured destruction means that it's in nobody's interest to start a war (although the short timescales involved mean that accidentally starting one is a concern).
The rapid development of powerful AI could disrupt the current equilibrium. From a very outside-view perspective, we might think that this is equally likely to result in, say, a 10x decrease in risk as a 10x increase. Even this would be alarming, since the annual probability seems fairly low right now, so a big decrease in risk is merely nice-to-have, but a big increase could be catastrophic.
To get more clarity than that, we'll look at the theoretical reasons people might go to war, and then look at how an AI takeoff period might impact each of these.
Rational reasons to go to war
War is inefficient; for any war, there should be some possible world which doesn't have that war in which everyone is better off. So why do we have war? Fearon's classic paper on Rationalist Explanations for War explains that there are essentially three mechanisms that can lead to war between states that are all acting rationally:
1. Commitment problems
If you're about to build a superweapon, I might want to attack now. We might both be better off if I didn't attack, and I paid y...

Jun 11, 2024 • 3min
EA - Safety-concerned EAs should prioritize AI governance over alignment by sammyboiz
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Safety-concerned EAs should prioritize AI governance over alignment, published by sammyboiz on June 11, 2024 on The Effective Altruism Forum.
Excluding the fact that EAs tend to be more tech-savvy and their advantage lies in technical work such as alignment, the community as a whole is not prioritizing advocacy and governance enough.
Effective Altruists over-prioritize working on AI alignment over AI regulation advocacy. I disagree with prioritizing alignment because much of alignment research is simultaneously capabilities research (Connor Leahy even begged people to stop publishing interpretability research). Consequently, alignment research is accelerating the timelines toward AGI.
Another problem with alignment research is that cutting-edge models are only available at frontier AI labs, meaning there is comparatively less that someone on the outside can help with. Finally, even if an independent alignment researcher finds a safeguard to a particular AGI risk, the target audience AI lab might not implement it since it would cost time and effort. This is due to the "race to the bottom," a governance problem.
Even excluding X-risk, I can imagine a plethora of reasons why a US corporation or the USA itself is by far one of the worst paths to AGI. Corporations are profit-seeking and are less concerned with the human-centric integrations of technology necessitated by AGI. Having one country with the ultimate job-replacer also seems like a bad idea. All economies all over the world are subject to whatever the next GPT model can do, potentially replacing half their workforce.
Instead, I am led to believe that the far superior best-case scenario is an international body that globally makes decisions or at least has control over AGI development in each country. Therefore, I believe EA should prioritize lengthening the time horizon by advocating for a pause, a slowdown, or any sort of international treaty. This would help to prevent the extremely dangerous race dynamics that we are currently in.
How you can help:
I recommend PauseAI. They are great community of people (including many EAs) trying to advocate for an international moratorium on frontier general capability AI models. There is so much you can do to help, including putting up posters, writing letters, writing about the issue, etc. They are very friendly and will answer any questions about how you can fit in and maximize your power as a democratic citizen.
Even if you disagree with pausing as the solution to the governance problem, I believe that the direction of PauseAI is correct. On a governance political compass, I feel like pausing is 10 miles away from the current political talk but most EAs generally lie 9.5 miles in the same direction.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Jun 11, 2024 • 9min
LW - "Metastrategic Brainstorming", a core building-block skill by Raemon
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "Metastrategic Brainstorming", a core building-block skill, published by Raemon on June 11, 2024 on LessWrong.
I want to develop rationality training, which is aimed at solving confusing problems.
Two key problems with "confusing problems" are:
1. You might feel so confused and overwhelmed that you bounce off completely.
2. You might be confused about what counts as progress, or where the most progress is possible, and accidentally work on the wrong thing.
A skill that helps with both of these is "metastrategic brainstorming" - the art of generating lots of potential good approaches, and then choosing approaches that are likely to help.
Different situations call for different sorts of strategies. If a problem is confusing, you probably don't have a simple playbook for dealing with it. Different people also benefit from different sorts of strategies. So, while I can tell you a list of potential mental tools, what I most want you to practice is the art of identifying what would help you, in particular, with the situation in particular in which you find yourself.
My triggers for switching to "metastrategic brainstorming mode" are:
I've just sat down to work on a problem I already know is hard.
I've starting to feel stuck, annoyed or frustrated.
I notice that I settled into the very first plan that occurred to me, and I have a sneaking suspicion it's not the best plan.
...and, I'm trying to solve a problem I expect to take at least 30 minutes (i.e. enough time it's worth spending at least a few minutes meta-brainstorming)...
...then I switch into "metastrategic brainstorming mode", which entails:
1. Open up a writing doc.
2. Ask myself "what are my goals?". If there are multiple goals, write them both down.
3. Set a 5-10 minute timer, spend it brainstorming "meta-level strategies." Don't try to solve the object level problem. Just focus on generating strategies that might help you solve the problem.
4. Look at my list of meta-strategies, and see if there's one that I feel at least reasonably optimistic about.
5. If so, try that meta-strategy.
6. If not, brainstorm more. (But: note that "take a break", "nap", and "ask a friend for help" all totally count as valid meta-strategies to try. Taking a nap is often pretty important, actually!)
7. When/if I eventually solve my problem, take note of what strategies and meta-strategies I ended up using. Ideally, write them down somewhere I'm likely to remember them again.
I want to repeat emphasize "setting a real timer, for at least 5 and maybe up to 10 minutes, where you only allow yourself to generate meta-level strategies."
Exploring multiple plans before committing.
Partly, this is because it just takes a little while to shift out of "object level mode". But, more importantly: because your problem is confusing, your ways of thinking about it might be somewhat off track. And, even if you'd eventually solve your problem, you might be doing it using a way less efficient method.
In particular, many problems benefit from going "breadth first", where instead of barreling down the first plan you came up with, you try ~3 plans a little bit and see if one of them turns out to be way better than your initial plan.
Come up with multiple "types" of metastrategies.
When you're doing the 5-10 minutes of brainstorming, I recommend exploring a variety of strategies. For example, there are conceptual strategies like "break the problem down into smaller pieces." There are physical/biological strategies like "take a walk, or get a drink of water". There are social strategies like "ask a friend for help." (sometimes this isn't appropriate if you're training, but is a fine strategy to use on real world tasks)
Example: Writing this Blogpost
Right now I'm writing a blogpost on Metastrategic brainstorming. I actually found myself a bit stuck (a few p...

Jun 11, 2024 • 7min
EA - I doubled the world record cycling without hands for AMF by Vincent van der Holst
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: I doubled the world record cycling without hands for AMF, published by Vincent van der Holst on June 11, 2024 on The Effective Altruism Forum.
A couple weeks ago I announced I was going to try and break the world record cycling without hands for AMF. That post also explains why I wanted to break that record. Last Friday we broke that record and raised nearly €10.000 for AMF. Here's what happened on friday. You can still donate here.
What was the old record?
Canadian Robert John Murray rode the old record of 130.29 kilometers in 5:37 hours in Calgary on June 12, 2023. His average speed was 23.2 kilometers per hour. See
here the Guinness World Records page.
I managed to double the record and these were my stats.
How did the record attempt itself go?
On Friday, June 7, I started the record attempt on the closed cycling course of WV Amsterdam just after 6 am. I got up at half past four and immediately drank a large cup of coffee so that I could leave number 2 in the toilet. After all, that is not possible on a bicycle without using your hands, or at least that was not the record I was trying to break.
At 6 o'clock we did the last checks. Are the tires pumped? Is the bicycle in the right gear? After all, you can no longer switch gears during the attempt. Is the GoPro on my chest turned on? Stopwatches on? Guinness World Records forms ready and completed?
There was virtually no wind early in the morning, which was also the reason I started so early. Later in the day it would be windier and I knew from the training that with too much wind the balance becomes very difficult.
These are laps of 2.5 kilometers, and after 52 laps the current record of 130 kilometers would have been broken. The course is flat, but has one bridge, where you have to climb quite a bit. Because you can't shift gears, you have to go at a good speed to keep enough balance when you get to the top. The advantage is that when descending from the bridge I was able to stand on the pedals without hands, so that my butt could get off the saddle for a while during each lap.
And it gave me the chance to pee off the bike. The question is of course: how do you pee on a bicycle without hands? So when I wanted to pee, I picked up speed, stood on the pedals with my right arm resting on the saddle, and then peed straight over my bike with my left hand. Not super hygienic, but better than peeing in my pants, and I could always clean my frame with the water from my water bottles.
My first goal was 100 kilometers, anything below that would have been a complete disappointment. But at almost 90 kilometers I almost touched my handlebars out of habit. At that moment I would never have started again because it would never be possible to cycle another 130 kilometers without hands after those 90 kilometers.
At least, that's what I thought, because eventually I would double the old record and cycle another 170 kilometers after those 90 kilometers. But at that point the record attempt was almost over.
In the end, apart from rabbits and angry goose mothers getting close to my wheels, I managed to get through the 100 kilometers smoothly. My next goal was the record: 130 kilometers. I started to get quite a bit of cramping, but I had a group of great volunteers who passed me food and water, and they gave me water bottles with lots of salt and minerals in them. I also drove relatively fast at 27.5 kilometers per hour, and then I decided to drive one kilometer per hour slower.
That helped, and I broke the record without any problems, and then the question was how far I could go.
A question that I expected my ass to answer. During training I often quickly developed serious saddle pain. However, I had found a sustainable clothing sponsor and the bib shorts from
Velor which I had only had for a week, made my butt really hurt much less t...

Jun 11, 2024 • 4min
EA - [Linkpost] How to start an advance market commitment (AMC) by Stan Pinsent
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Linkpost] How to start an advance market commitment (AMC), published by Stan Pinsent on June 11, 2024 on The Effective Altruism Forum.
This is a linkpost for an How to start an advance market commitment by Nan Ransohoff, published on Works in Progress[1].
Summary
Advance market commitments, or AMCs, are promises to buy or subsidize something in the future, if someone can invent and produce it. The purpose of an AMC is to provide a financial incentive for innovators to develop and scale products that address important societal needs but lack a natural commercial market.
Past AMCs
Pneumococcal Vaccines
In the 2000s, hundreds of thousands were dying of pneumococcal diseases every year. However, because the deaths were occurring in low-income countries unable to pay for new treatments, there was little investment in developing new vaccines.
In 2007, a group of governments and philanthropists pledged $1.5 billion through an AMC run by Gavi to accelerate pneumococcal vaccine development. The AMC subsidized vaccine doses beyond what poorer countries could afford, to incentivize pharmaceutical companies like Pfizer and GlaxoSmithKline. By 2011, these companies had developed qualifying vaccines and signed contracts for large-scale production.
The AMC accelerated vaccine development and distribution by around 5 years, saving an estimated 700,000 lives.
COVID-19 Vaccines
As part of Operation Warp Speed in 2020, the U.S. government issued $900 million in guaranteed purchase orders for COVID-19 vaccine doses from pharmaceutical companies. This provided an incentive for companies to invest in vaccine development despite uncertainty around which vaccines would succeed. The purchase guarantees ensured companies would still have a market even if other companies developed vaccines first.
Frontier: a carbon-removal AMC
Frontier is a $925 million advance market commitment launched in 2022 to accelerate the development of affordable technologies for removing carbon dioxide from the atmosphere at scale. It aims to bridge the gap until a long-term market for carbon removal emerges, by providing funding commitments for future carbon removal.
Frontier aims to catalyze the carbon removal industry by sending a strong demand signal to motivate entrepreneurs, investors, and researchers to prioritize carbon removal innovation, while also advocating for policies to create a durable, global market for permanent carbon removal after 2030.
Before 2022, only $30million had ever been spent on carbon removal. Frontier's $925 represents a new era for carbon removal tech, but billions more in investment will probably be required before the unit cost of carbon removal becomes low enough for a mass market to emerge.
Where else could AMCs be used?
AMCs have been proven as a way to accelerate vaccine development, but they may also be a powerful way to advance other critical technologies:
Low-/zero-carbon cement.
Low-/zero-carbon steel.
[Non-CO2] Greenhouse gas removal
Climate-resilient crops.
Health products where demand in high-income countries is uncertain, like strep A vaccines and hepatitis C vaccines
Health products where the need in lower- and middle-income countries is disproportionately high, like vaccines for tuberculosis, syphilis, and malaria
Learn More
Full article
Market Shaping Accelerator, UChicago
Frontier Climate
1. ^
This was linkposted on the Forum 2 weeks ago but didn't get much buzz, so I am linkposting again with a proper summary.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Jun 11, 2024 • 6min
EA - On Estimates of how much Zakat there is by Kaleem
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Estimates of how much Zakat there is, published by Kaleem on June 11, 2024 on The Effective Altruism Forum.
The most infuriating academic urban legend I have come across whilst doing my research.
Context:
I am researching the cost effectiveness etc. of starting a new organization which redirects zakat to effective charities. One part of that scoping process is to figure out what the market cap/fundraising ceiling within the zakat sector is. I thought this would be really easy because someone else must have already done this, so I googled it…
All over the internet, people repeatedly claim that every year that there is between $500Bn and $1Trn given as zakat. However there is (basically) never any citation for this claim anywhere you read it. If you do a load of rabbit-holling (which I've done so that you don't have to - you're welcome), you'll discover that in the few instances when this claim has been cited, the citation leads to this paper[1].
After a couple of minutes and some control-F-ing, you'll realize that the $500bn-$1trn estimate isn't even mentioned in this paper.[2] The author, Stirk, claims that estimates of $200Bn-$1Trn have been cited, and then links to a web article . It seems like the estimate actually comes from this anonymous web-article, which attributes the estimate to an unnamed financial expert in Dubai.
At this point, you (and I) are both thinking to yourself : "It cannot possibly be true that every estimate of annual global zakat on the entirety of the English internet is based on this random unsubstantiated anonymous quote?!". Well, it seems like this is true. And I've been looking for contrary evidence every week since December 2023, to no avail.
Sanity Checking the estimate:
I think it makes sense to sanity check the estimate people are using: can it even be possible that there is between $500Bn and $1Trn given as zakat every year? I think the answer (thankfully) is Yes!. Here are some ways to reverse engineer the estimate that are all quite plausible. Worth noting for the following calculations, I'm holding the number of Muslims constant at 2 billion, the average percentage of wealth due in zakat every year at 2.5%.
1. It doesn't seem insane to suggest that on average, Muslims give $250-$1000 as zakat every year
1. Especially since the bottom ~5% of the wealth distribution probably don't have to pay zakat.
2. Even if only the top 1% of the wealthiest Muslims were responsible for all zakat, that they'd be giving $25,000-$50,000 a year.
1. This would put their mean net-wealth at $1M-$2M. Seems reasonable (e.g. pages 21-26 of the 2013 credit Suisse wealth study suggested that 1.1% of the global population had net wealth of over $1M).
3. Could 2.5% of all Muslim wealth be $1Trn? That would mean that total Muslim wealth is $40Trn.
1. Global net-wealth is estimated between $400-700Trn. If Muslims are ~20% of total population, then they'd be worth ~ 20% of $400-700Trn, which is $80-140Trn.
2. Seems reasonable then that Muslims are worth at least $40Trn, which makes the $1Trn figure plausible.
3. Also seems plausible that the mean net-worth of Muslims is ~$20k.
Pessimistically however:
1. Some estimates I've seen that make the 500Bn-1Trn figure seem unlikely are that the US gives ~$2Bn, the UK ~1Bn, and Saudi ~$18Bn in zakat every year. These are probably 3 of the top 10 countries where I'd expect the net-worth of Muslims to be the highest. It seems really unlikely that the rest of the world would make up at least $480Bn.
1. On the other hand, most Zakat is not reported/informal, and the figures above are formally reported zakat donations.
2. Because Zakat is voluntary/unreported, we don't know how much Muslims are actually giving, or how many Muslims are fulfilling their zakat obligations.
1. This means it could actually only be a small fracti...

Jun 11, 2024 • 27min
LW - [Valence series] 4. Valence & Liking / Admiring by Steven Byrnes
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Valence series] 4. Valence & Liking / Admiring, published by Steven Byrnes on June 11, 2024 on LessWrong.
4.1 Post summary / Table of contents
Part of the Valence series.
(This is my second attempt to write the 4th post of my valence series. If you already read the previous attempt and are unsure whether to read this too, see footnote[1]. Also, note that this post has a bit of overlap with (and self-plagiarism from) my post Social status part 2/2: everything else, but the posts are generally quite different.)
The previous three posts built a foundation about what valence is, and how valence relates to thought in general. Now we're up to our first more specific application: the application of valence to the social world.
Here's an obvious question: "If my brain really assigns valence to any and every concept in my world-model, well, how about the valence that my brain assigns to the concept of some other person I know?" I think this question points to an important and interesting phenomenon that I call "liking / admiring" - I made up that term, because existing terms weren't quite right.
This post will talk about what "liking / admiring" is, and some of its important everyday consequences related to social status, mirroring, deference, self-esteem, self-concepts, and more.
Section 4.2 spells out a concept that I call "liking / admiring". For example, if Beth likes / admires Alice, then Beth probably is interested in Alice's opinions, and Beth probably cares what Alice thinks about her, and Beth probably is happy to be in the presence of Alice, and so on.
Section 4.3 suggests that liking / admiration is a special case of valence, where it's applied to a person: if "Beth likes / admires Alice", then the concept "Alice" evokes positive valence in Beth's brain.
Section 4.4 proposes that we have an innate "drive to feel liked / admired", particularly by people whom we ourselves like / admire in turn. I speculate on how such a drive might work in the brain.
Section 4.5 discusses our tendency to "mirror" people whom we like / admire, in their careers, clothes, beliefs, and so on.
Section 4.6 discusses our related tendency to defer to people whom we like / admire when we interact with them - i.e., to treat them like they have high social status.
Section 4.7 argues that feeling liked / admired is different from having high self-esteem, but that the former can have an outsized impact on the latter. I also relate this idea to the dynamics of self-concept formulation - for example, when we split motivations into externalized ego-dystonic "urges" versus internalized ego-syntonic "desires", we often tend to do so in a way that maximizes our self-esteem and (relatedly) maximizes the extent to which we implicitly feel liked / admired.
Section 4.8 is a brief conclusion.
4.2 Key concept: "liking / admiring"
I'm using the term "liking / admiring" to talk about a specific thing. I'll try to explain what it is. Note that it doesn't perfectly line up with how people commonly use the English words "liking" or "admiring".
4.2.1 Intuitive (extreme) example of "liking / admiring"
I'm Beth, a teenage fan-girl of famous pop singer Alice, whom I am finally meeting in person. Let's further assume that my demeanor right now is "confident enthusiasm": I am not particularly worried or afraid about the possibility that I will offend Alice, nor am I sucking up to Alice in expectation of favorable treatment (in fact, I'm never going to see her again after today). Rather, I just really like Alice! I am hanging on Alice's every word like it was straight from the mouth of God.
My side of the conversation includes things like "Oh wow!", "Huh, yeah, I never thought about it that way!", and "What a great idea!". And (let us suppose) I'm saying all those things sincerely, not to impress or suck up to Alice.
T...

Jun 11, 2024 • 9min
AF - Corrigibility could make things worse by ThomasCederborg
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Corrigibility could make things worse, published by ThomasCederborg on June 11, 2024 on The AI Alignment Forum.
Summary: A Corrigibility method that works for a Pivotal Act AI (PAAI) but fails for a CEV style AI could make things worse. Any implemented Corrigibility method will necessarily be built on top of a set of unexamined implicit assumptions. One of those assumptions could be true for a PAAI, but false for a CEV style AI. The present post outlines one specific scenario where this happens.
This scenario involves a Corrigibility method that only works for an AI design, if that design does not imply an identifiable outcome. The method fails when it is applied to an AI design, that does imply an identifiable outcome. When such an outcome does exist, the ''corrigible'' AI will ''explain'' this implied outcome, in a way that makes the designers want to implement that outcome.
The example scenario:
Consider a scenario where a design team has access to a Corrigibility method that works for a PAAI design. A PAAI can have a large impact on the world. For example by helping a design team prevent other AI projects. But there exists no specific outcome, that is implied by a PAAI design. Since there exists no implied outcome for a PAAI to ''explain'' to the designers, this Corrigibility method actually renders a PAAI genuinely corrigible.
For some AI designs, the set of assumptions that the design is built on top of, does however imply a specific outcome. Let's refer to this as the Implied Outcome (IO). This IO can alternatively be viewed as: ''the outcome that a Last Judge would either approve of, or reject''. In other words: consider the Last Judge proposal from the CEV arbital page. If it would make sense to add a Last Judge of this type, to a given AI design, then that AI design has an IO.
The IO is the outcome that a Last Judge would either approve of, or reject (for example a successor AI that will either get a thumbs up or a thumbs down). In yet other words: the purpose of adding a Last Judge to an AI design, is to allow someone to render a binary judgment on some outcome. For the rest of this post, that outcome will be referred to as the IO of the AI design in question.
In this scenario, the designers first implement a PAAI that buys time (for example by uploading the design team). For the next step, they have a favoured AI design, that does have an IO. One of the reasons that they are trying to make this new AI corrigible, is that they can't calculate this IO. And they are not certain that they actually want this IO to be implemented.
Their Corrigibility method always results in an AI that wants to refer back to the designers, before implementing anything. The AI will help a group of designers implement a specific outcome, iff they are all fully informed, and they are all in complete agreement that this outcome should be implemented. The Corrigibility method has a definition of Unacceptable Influence (UI). And the Corrigibility method results in an AI that genuinely wants to avoid exerting any UI.
It is however important that the AI is able to communicate with the designers in some way. So the Corrigibility method also includes a definition of Acceptable Explanation (AE).
At some point the AI becomes clever enough to figure out the details of the IO. At that point, it is clever enough to convince the designers that this IO is the objectively correct thing to do, using only methods classified as AE. This ''explanation'' is very effective and results in a very robust conviction, that the IO is the objectively correct thing to do. In particular, this value judgment does not change, when the AI tells the designers what has happened.
So, when the AI explains what has happened, the designers do not change their mind about IO. They still consider themselves to have a duty...

Jun 10, 2024 • 3min
LW - Soviet comedy film recommendations by Nina Rimsky
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Soviet comedy film recommendations, published by Nina Rimsky on June 10, 2024 on LessWrong.
I'm a big fan of the Soviet comedy directors
Eldar Ryazanov,
Leonid Gaidai, and
Georgiy Daneliya. Almost anything by them is worth watching, but here are my favorites (filtered for things that have a free YouTube version with good English subtitles, bold are the highest-recommended):
Ryazanov
1966
Beware of the Car (Берегись автомобиля)
[YouTube]
Comedy about a benevolent car thief who steals to donate to charity
1975
The Irony of Fate (Ирония судьбы или с легким паром!)
[YouTube]
A New Year's classic premised on the uniformity of Soviet apartment buildings - a guy gets drunk on NYE and ends up in a different city but finds an identical building that his key can access
1977
Office Romance (Служебный роман)
[YouTube]
Romantic comedy and satirical portrayal of Soviet office life
1979
The Garage (Гараж)
[YouTube]
Comedy set in a single room where people argue about who should lose their garage after the government decides to build a road through the plot they were collectively building garages on
1987
Forgotten Melody for a Flute (Забытая мелодия для флейты)
[YouTube]
Satirical romantic comedy about Soviet bureaucracy and its decline in power in the late 80s, great opening song (translate the lyrics)
1991
The Promised Heaven (Небеса обетованные)
Sadly couldn't find an English-subtitled YT link for this but I like it too much to miss off[1]
Tragic comedy about the lives of people made recently homeless during the Perestroika period, very sad and of its time
Gaidai
1966
Kidnapping, Caucasian Style (Кавказская пленница, или Новые приключения Шурика)
[YouTube]
One of the most famous Soviet comedies - a naive visitor to the Caucasus is convinced to assist in the "bride kidnapping" tradition
1969
The Diamond Arm (Бриллиантовая рука)
[YouTube]
Another one of the most famous Soviet comedies - diamonds end up being smuggled in the wrong guy's cast because he happens to injure himself and say the "codeword" in front of the smugglers' hideout
1971
The Twelve Chairs (12 стульев)
[YouTube]
Film adaptation of the satirical novel by Soviet authors
Ilf and Petrov set in post-revolutionary Russia
Daneliya
1977
Mimino (Мимино)
[YouTube]
Romantic comedy about a Georgian bush pilot
1986
Kin-dza-dza! (Кин-Дза-Дза!)
[YouTube]
Funny low-budget sci-fi
Bonus recommendations
1973
Seventeen Moments of Spring (Семнадцать мгновений весны)
[YouTube]
Extremely popular Soviet spy thriller set during WW2
Source of "Stierlitz jokes"
1975
Hedgehog in the Fog (Ёжик в тумане)
[YouTube]
Classic short (10mins) animated children's film, great atmosphere
1. ^
$10 bounty to anyone who finds a link to a free version of this with high-quality English subtitles
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Jun 10, 2024 • 1h 35min
LW - On Dwarksh's Podcast with Leopold Aschenbrenner by Zvi
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Dwarksh's Podcast with Leopold Aschenbrenner, published by Zvi on June 10, 2024 on LessWrong.
Previously: Quotes from Leopold Aschenbrenner's Situational Awareness Paper
Dwarkesh Patel talked to Leopold Aschenbrenner for about four and a half hours.
The central discussion was the theses of his paper, Situational Awareness, which I offered quotes from earlier, with a focus on the consequences of AGI rather than whether AGI will happen soon. There are also a variety of other topics.
Thus, for the relevant sections of the podcast I am approaching this via roughly accepting the technological premise on capabilities and timelines, since they don't discuss that. So the background is we presume straight lines on graphs will hold to get us to AGI and ASI (superintelligence), and this will allow us to generate a 'drop in AI researcher' that can then assist with further work. Then things go into 'slow' takeoff.
I am changing the order of the sections a bit. I put the pure AI stuff first, then afterwards are most of the rest of it.
The exception is the section on What Happened at OpenAI.
I am leaving that part out because I see it as distinct, and requiring a different approach. It is important and I will absolutely cover it. I want to do that in its proper context, together with other events at OpenAI, rather than together with the global questions raised here. Also, if you find OpenAI events relevant to your interests that section is worth listening to in full, because it is absolutely wild.
Long post is already long, so I will let this stand on its own and not combine it with people's reactions to Leopold or my more structured response to his paper.
While I have strong disagreements with Leopold, only some of which I detail here, and I especially believe he is dangerously wrong and overly optimistic about alignment, existential risks and loss of control in ways that are highly load bearing, causing potential sign errors in interventions, and also I worry that the new AGI fund may make our situation worse rather than better, I want to most of all say: Thank you.
Leopold has shown great courage. He stands up for what he believes in even at great personal cost. He has been willing to express views very different from those around him, when everything around him was trying to get him not to do that. He has thought long and hard about issues very hard to think long and hard about, and is obviously wicked smart. By writing down, in great detail, what he actually believes, he allows us to compare notes and arguments, and to move forward. This is The Way.
I have often said I need better critics. This is a better critic. A worthy opponent.
Also, on a great many things, he is right, including many highly important things where both the world at large and also those at the labs are deeply wrong, often where Leopold's position was not even being considered before. That is a huge deal.
The plan is to then do a third post, where I will respond holistically to Leopold's model, and cover the reactions of others.
Reminder on formatting for Podcast posts:
1. Unindented first-level items are descriptions of what was said and claimed on the podcast unless explicitly labeled otherwise.
2. Indented second-level items and beyond are my own commentary on that, unless labeled otherwise.
3. Time stamps are from YouTube.
The Trillion Dollar Cluster
1. (2:00) We start with the trillion-dollar cluster. It's coming. Straight lines on a graph at half an order of magnitude a year, a central theme throughout.
2. (4:30) Power. We'll need more. American power generation has not grown for decades. Who can build a 10 gigawatt center let alone 100? Leonard thinks 10 was so six months ago and we're on to 100. Trillion dollar cluster a bit farther out.
3. (6:15) Distinction between cost of cluster versus rental...


