

The Nonlinear Library: LessWrong
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Jun 12, 2024 • 8min
LW - Anthropic's Certificate of Incorporation by Zach Stein-Perlman
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Anthropic's Certificate of Incorporation, published by Zach Stein-Perlman on June 12, 2024 on LessWrong.
Yesterday I obtained Anthropic's[1]
Certificate of Incorporation, and
its past versions, from the State of Delaware. I don't recommend reading it.[2] This post is about what the CoI tells us about Anthropic's
Long-Term Benefit Trust (context:
Maybe Anthropic's Long-Term Benefit Trust is powerless).
Tl;dr: the only new information of moderate importance is the voting thresholds necessary to modify Trust stuff. My concerns all still stand in some form. Absence of badness is a small positive update.
Anthropic has vaguely
described stockholders' power over the Trust:
a series of "failsafe" provisions . . . allow changes to the Trust and its powers without the consent of the Trustees if sufficiently large supermajorities of the stockholders agree. The required supermajorities increase as the Trust's power phases in
The CoI has details: amending the CoI to modify the Trust requires a vote reaching the "Transfer Approval Threshold," defined as:
(1) prior to the date that is the one-year anniversary of the Final Phase-In Date [note: "the Final Phase-In Date"
is in November 2024], either (a)(i) a majority of the Voting Common Stock then-outstanding and held by the Founders (as defined in the Voting Agreement), (ii) a majority of the Series A Preferred Stock then-outstanding and (iii) a majority of the voting power of the outstanding Preferred Stock entitled to vote generally (which for the avoidance of doubt shall exclude the Non-Voting Preferred Stock), but excluding the Series A Preferred Stock or (b) at least seventy-five percent (75%) of the
voting power of the then-outstanding shares of the Corporation's capital stock entitled to vote generally (which for the avoidance of doubt shall exclude the Non-Voting Preferred Stock and any voting power attributable to the Class T Common Stock) and
(2) on and following the date that is the one-year anniversary of the Final Phase-In Date, either (x)(i) at least seventy-five percent (75%) of the Voting Common Stock then outstanding and held by the Founders (as defined in the Voting Agreement), (ii) at least at least fifty percent (50%) of the Series A Preferred Stock then-outstanding and (iii) at least seventy-five percent (75%) of the voting power of the outstanding Preferred Stock entitled to vote generally (which for the avoidance of
doubt shall exclude the Non-Voting Preferred Stock), but excluding the Series A Preferred Stock or (y) at least eighty-five [percent] (85%) of the voting power of the then-outstanding shares of the Corporation's capital stock entitled to vote generally (which for the avoidance of doubt shall exclude the Non-Voting Preferred Stock and any voting power attributable to the Class T Common Stock)
If Anthropic's description above is about this, it's odd and misleading. Perhaps Anthropic's description is about the Trust Agreement, not just the CoI.
Per Article IX,[3] amending the CoI to modify the Trust also requires at least 75% of the board. This will apparently give the Trust tons of independence after it elects 3/5 of the board! Or at least, it will give the Trust tons of protection from CoI amendments - but not necessarily from Trust Agreement shenanigans; see below.
Before reading the CoI, I had 4 main questions/concerns about the Trust:[4]
1.
Morley et al.: "the Trust Agreement also authorizes the Trust to be enforced by the company and by groups of the company's stockholders who have held a sufficient percentage of the company's equity for a sufficient period of time," rather than the Trustees.
1. I don't really know what this means. And it's vague. It sounds like a straightforward way for Anthropic/stockholders to subvert the Trust.
2.
Morley et al.: the Trust and its powers can be amended "by a ...

Jun 12, 2024 • 7min
LW - [New Feature] Your Subscribed Feed by Ruby
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [New Feature] Your Subscribed Feed, published by Ruby on June 12, 2024 on LessWrong.
tl;dr
LessWrong now has a Subscribed tab (next to the Latest tab and Enriched tab[1]). You can now "follow" users, which means their posts and comments will show up in your Subscribed tab[2].
We've put a lot of thought into how to display the right amount of recent content from people you follow, plus the right amount of surrounding context, to keep you up to date without it being overwhelming. See here for more detail.
How to follow people
You can follow users via multiple methods:
1. Using the widget on the Subscribed tab:
2. You can follow people from their user profile:
3. You can follow people using the user tooltip that comes up when you hover on their username.
Note!
Following people for your subscribed tab is different from subscribing to get notifications. Signing up for one does not cause the other!
Except, to help people start using the Subscribed tab, we did a one time operation to cause you to be following (for purposes of the subscribed tab), anyone who you'd already subscribed to for post and comment notifications. We assume if you want notifications, you'd also want to follow.
What's shown to me in my Subscribed feed?
Short description
We display the recent posts and comments of people you follow, plus comments from other users that people you follow are replying to.
Long description
(Subject to change, lasted update 2024-06-10)
1. We load posts and comments from people you follow from the last 30 days
2. We group posts and comments to the post level
1. We might show a post because someone you followed published it.
2. We might show a post because someone you follow is commenting on it, even if you don't follow the author of the post. (This will probably be most of your feed, unless you follow people who write more posts than comments.)
3. We display the five most recent comments from people you follow, unless those comments were a week or more older than the most recent one (we found this necessary to avoid seeing lots of stale content).
4. We further display (with de-emphasized styling) the comments being replied to by people you follow.
Why we built this
A while back we introduced the ability to subscribe to all of a user's comments. At first, I thought this was great - "wow, look at all these comments I was seeing previously that I want to read". However it cluttered up my notifications tab and also reading comments via notifications isn't best. I realized I wanted a feed, and that's what we've built.
The mainstay of LessWrong is the frontpage posts list, but I'm interested in supplementing with feeds since they have two main advantages:
1. You can easily start to read content of post before clicking. Especially on mobile where there's no hover-preview, it's often nice to get to read a few sentences before deciding to commit to a post.
2. Puts comments on even footing as posts. Often comments from some users are of greater interest than posts from others, a feed lets them be brought to your attention just as easily.
So far I've found the feed really great for (1) high signal-to-noise ratio content, since it's from people I've chosen to be follow, (2) reading through without having to spend as much up-front "decide what to read" energy. I like it for casual reading.
Future Directions
I think the Subscribed feed is good but has some drawbacks that mean it's not actually the feed I most want to see. First, it requires work to decide who to follow, and for users who aren't that familiar with the authors on the site, it'll be hard to decide who to follow. This means they might not get enough content. On the other hand, it's possible to subscribe to too many people, bringing down your average quality and driving you away from your feed.
Rather, I'm interested in a Subsc...

Jun 11, 2024 • 19min
LW - AI takeoff and nuclear war by owencb
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI takeoff and nuclear war, published by owencb on June 11, 2024 on LessWrong.
Summary
As we approach and pass through an AI takeoff period, the risk of nuclear war (or other all-out global conflict) will increase.
An AI takeoff would involve the automation of scientific and technological research. This would lead to much faster technological progress, including military technologies. In such a rapidly changing world, some of the circumstances which underpin the current peaceful equilibrium will dissolve or change. There are then two risks[1]:
1. Fundamental instability. New circumstances could give a situation where there is no peaceful equilibrium it is in everyone's interests to maintain.
e.g.
If nuclear calculus changes to make second strike capabilities infeasible
If one party is racing ahead with technological progress and will soon trivially outmatch the rest of the world, without any way to credibly commit not to completely disempower them after it has done so
2. Failure to navigate. Despite the existence of new peaceful equilibria, decision-makers might fail to reach one.
e.g.
If decision-makers misunderstand the strategic position, they may hold out for a more favourable outcome they (incorrectly) believe is fair
If the only peaceful equilibria are convoluted and unprecedented, leaders may not be able to identify or build trust in them in a timely fashion
Individual leaders might choose a path of war that would be good for them personally as they solidify power with AI; or nations might hold strongly to values like sovereignty that could make cooperation much harder
Of these two risks, it is likely simpler to work to reduce the risk of failure to navigate. The three straightforward strategies here are research & dissemination, to ensure that the basic strategic situation is common knowledge among decision-makers, spreading positive-sum frames, and crafting and getting buy-in to meaningful commitments about sharing the power from AI, to reduce incentives for anyone to initiate war.
Additionally, powerful AI tools could change the landscape in ways that reduce either or both of these risks. A fourth strategy, therefore, is to differentially accelerate risk-reducing applications of AI. These could include:
Tools to help decision-makers make sense of the changing world and make wise choices;
Tools to facilitate otherwise impossible agreements via mutually trusted artificial judges;
Tools for better democratic accountability.
Why do(n't) people go to war?
To date, the world has been pretty good at avoiding thermonuclear war. The doctrine of mutually assured destruction means that it's in nobody's interest to start a war (although the short timescales involved mean that accidentally starting one is a concern).
The rapid development of powerful AI could disrupt the current equilibrium. From a very outside-view perspective, we might think that this is equally likely to result in, say, a 10x decrease in risk as a 10x increase. Even this would be alarming, since the annual probability seems fairly low right now, so a big decrease in risk is merely nice-to-have, but a big increase could be catastrophic.
To get more clarity than that, we'll look at the theoretical reasons people might go to war, and then look at how an AI takeoff period might impact each of these.
Rational reasons to go to war
War is inefficient; for any war, there should be some possible world which doesn't have that war in which everyone is better off. So why do we have war? Fearon's classic paper on Rationalist Explanations for War explains that there are essentially three mechanisms that can lead to war between states that are all acting rationally:
1. Commitment problems
If you're about to build a superweapon, I might want to attack now. We might both be better off if I didn't attack, and I paid y...

Jun 11, 2024 • 9min
LW - "Metastrategic Brainstorming", a core building-block skill by Raemon
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "Metastrategic Brainstorming", a core building-block skill, published by Raemon on June 11, 2024 on LessWrong.
I want to develop rationality training, which is aimed at solving confusing problems.
Two key problems with "confusing problems" are:
1. You might feel so confused and overwhelmed that you bounce off completely.
2. You might be confused about what counts as progress, or where the most progress is possible, and accidentally work on the wrong thing.
A skill that helps with both of these is "metastrategic brainstorming" - the art of generating lots of potential good approaches, and then choosing approaches that are likely to help.
Different situations call for different sorts of strategies. If a problem is confusing, you probably don't have a simple playbook for dealing with it. Different people also benefit from different sorts of strategies. So, while I can tell you a list of potential mental tools, what I most want you to practice is the art of identifying what would help you, in particular, with the situation in particular in which you find yourself.
My triggers for switching to "metastrategic brainstorming mode" are:
I've just sat down to work on a problem I already know is hard.
I've starting to feel stuck, annoyed or frustrated.
I notice that I settled into the very first plan that occurred to me, and I have a sneaking suspicion it's not the best plan.
...and, I'm trying to solve a problem I expect to take at least 30 minutes (i.e. enough time it's worth spending at least a few minutes meta-brainstorming)...
...then I switch into "metastrategic brainstorming mode", which entails:
1. Open up a writing doc.
2. Ask myself "what are my goals?". If there are multiple goals, write them both down.
3. Set a 5-10 minute timer, spend it brainstorming "meta-level strategies." Don't try to solve the object level problem. Just focus on generating strategies that might help you solve the problem.
4. Look at my list of meta-strategies, and see if there's one that I feel at least reasonably optimistic about.
5. If so, try that meta-strategy.
6. If not, brainstorm more. (But: note that "take a break", "nap", and "ask a friend for help" all totally count as valid meta-strategies to try. Taking a nap is often pretty important, actually!)
7. When/if I eventually solve my problem, take note of what strategies and meta-strategies I ended up using. Ideally, write them down somewhere I'm likely to remember them again.
I want to repeat emphasize "setting a real timer, for at least 5 and maybe up to 10 minutes, where you only allow yourself to generate meta-level strategies."
Exploring multiple plans before committing.
Partly, this is because it just takes a little while to shift out of "object level mode". But, more importantly: because your problem is confusing, your ways of thinking about it might be somewhat off track. And, even if you'd eventually solve your problem, you might be doing it using a way less efficient method.
In particular, many problems benefit from going "breadth first", where instead of barreling down the first plan you came up with, you try ~3 plans a little bit and see if one of them turns out to be way better than your initial plan.
Come up with multiple "types" of metastrategies.
When you're doing the 5-10 minutes of brainstorming, I recommend exploring a variety of strategies. For example, there are conceptual strategies like "break the problem down into smaller pieces." There are physical/biological strategies like "take a walk, or get a drink of water". There are social strategies like "ask a friend for help." (sometimes this isn't appropriate if you're training, but is a fine strategy to use on real world tasks)
Example: Writing this Blogpost
Right now I'm writing a blogpost on Metastrategic brainstorming. I actually found myself a bit stuck (a few p...

Jun 11, 2024 • 27min
LW - [Valence series] 4. Valence and Liking / Admiring by Steven Byrnes
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Valence series] 4. Valence & Liking / Admiring, published by Steven Byrnes on June 11, 2024 on LessWrong.
4.1 Post summary / Table of contents
Part of the Valence series.
(This is my second attempt to write the 4th post of my valence series. If you already read the previous attempt and are unsure whether to read this too, see footnote[1]. Also, note that this post has a bit of overlap with (and self-plagiarism from) my post Social status part 2/2: everything else, but the posts are generally quite different.)
The previous three posts built a foundation about what valence is, and how valence relates to thought in general. Now we're up to our first more specific application: the application of valence to the social world.
Here's an obvious question: "If my brain really assigns valence to any and every concept in my world-model, well, how about the valence that my brain assigns to the concept of some other person I know?" I think this question points to an important and interesting phenomenon that I call "liking / admiring" - I made up that term, because existing terms weren't quite right.
This post will talk about what "liking / admiring" is, and some of its important everyday consequences related to social status, mirroring, deference, self-esteem, self-concepts, and more.
Section 4.2 spells out a concept that I call "liking / admiring". For example, if Beth likes / admires Alice, then Beth probably is interested in Alice's opinions, and Beth probably cares what Alice thinks about her, and Beth probably is happy to be in the presence of Alice, and so on.
Section 4.3 suggests that liking / admiration is a special case of valence, where it's applied to a person: if "Beth likes / admires Alice", then the concept "Alice" evokes positive valence in Beth's brain.
Section 4.4 proposes that we have an innate "drive to feel liked / admired", particularly by people whom we ourselves like / admire in turn. I speculate on how such a drive might work in the brain.
Section 4.5 discusses our tendency to "mirror" people whom we like / admire, in their careers, clothes, beliefs, and so on.
Section 4.6 discusses our related tendency to defer to people whom we like / admire when we interact with them - i.e., to treat them like they have high social status.
Section 4.7 argues that feeling liked / admired is different from having high self-esteem, but that the former can have an outsized impact on the latter. I also relate this idea to the dynamics of self-concept formulation - for example, when we split motivations into externalized ego-dystonic "urges" versus internalized ego-syntonic "desires", we often tend to do so in a way that maximizes our self-esteem and (relatedly) maximizes the extent to which we implicitly feel liked / admired.
Section 4.8 is a brief conclusion.
4.2 Key concept: "liking / admiring"
I'm using the term "liking / admiring" to talk about a specific thing. I'll try to explain what it is. Note that it doesn't perfectly line up with how people commonly use the English words "liking" or "admiring".
4.2.1 Intuitive (extreme) example of "liking / admiring"
I'm Beth, a teenage fan-girl of famous pop singer Alice, whom I am finally meeting in person. Let's further assume that my demeanor right now is "confident enthusiasm": I am not particularly worried or afraid about the possibility that I will offend Alice, nor am I sucking up to Alice in expectation of favorable treatment (in fact, I'm never going to see her again after today). Rather, I just really like Alice! I am hanging on Alice's every word like it was straight from the mouth of God.
My side of the conversation includes things like "Oh wow!", "Huh, yeah, I never thought about it that way!", and "What a great idea!". And (let us suppose) I'm saying all those things sincerely, not to impress or suck up to Alice.
T...

Jun 10, 2024 • 3min
LW - Soviet comedy film recommendations by Nina Rimsky
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Soviet comedy film recommendations, published by Nina Rimsky on June 10, 2024 on LessWrong.
I'm a big fan of the Soviet comedy directors
Eldar Ryazanov,
Leonid Gaidai, and
Georgiy Daneliya. Almost anything by them is worth watching, but here are my favorites (filtered for things that have a free YouTube version with good English subtitles, bold are the highest-recommended):
Ryazanov
1966
Beware of the Car (Берегись автомобиля)
[YouTube]
Comedy about a benevolent car thief who steals to donate to charity
1975
The Irony of Fate (Ирония судьбы или с легким паром!)
[YouTube]
A New Year's classic premised on the uniformity of Soviet apartment buildings - a guy gets drunk on NYE and ends up in a different city but finds an identical building that his key can access
1977
Office Romance (Служебный роман)
[YouTube]
Romantic comedy and satirical portrayal of Soviet office life
1979
The Garage (Гараж)
[YouTube]
Comedy set in a single room where people argue about who should lose their garage after the government decides to build a road through the plot they were collectively building garages on
1987
Forgotten Melody for a Flute (Забытая мелодия для флейты)
[YouTube]
Satirical romantic comedy about Soviet bureaucracy and its decline in power in the late 80s, great opening song (translate the lyrics)
1991
The Promised Heaven (Небеса обетованные)
Sadly couldn't find an English-subtitled YT link for this but I like it too much to miss off[1]
Tragic comedy about the lives of people made recently homeless during the Perestroika period, very sad and of its time
Gaidai
1966
Kidnapping, Caucasian Style (Кавказская пленница, или Новые приключения Шурика)
[YouTube]
One of the most famous Soviet comedies - a naive visitor to the Caucasus is convinced to assist in the "bride kidnapping" tradition
1969
The Diamond Arm (Бриллиантовая рука)
[YouTube]
Another one of the most famous Soviet comedies - diamonds end up being smuggled in the wrong guy's cast because he happens to injure himself and say the "codeword" in front of the smugglers' hideout
1971
The Twelve Chairs (12 стульев)
[YouTube]
Film adaptation of the satirical novel by Soviet authors
Ilf and Petrov set in post-revolutionary Russia
Daneliya
1977
Mimino (Мимино)
[YouTube]
Romantic comedy about a Georgian bush pilot
1986
Kin-dza-dza! (Кин-Дза-Дза!)
[YouTube]
Funny low-budget sci-fi
Bonus recommendations
1973
Seventeen Moments of Spring (Семнадцать мгновений весны)
[YouTube]
Extremely popular Soviet spy thriller set during WW2
Source of "Stierlitz jokes"
1975
Hedgehog in the Fog (Ёжик в тумане)
[YouTube]
Classic short (10mins) animated children's film, great atmosphere
1. ^
$10 bounty to anyone who finds a link to a free version of this with high-quality English subtitles
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Jun 10, 2024 • 1h 35min
LW - On Dwarksh's Podcast with Leopold Aschenbrenner by Zvi
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Dwarksh's Podcast with Leopold Aschenbrenner, published by Zvi on June 10, 2024 on LessWrong.
Previously: Quotes from Leopold Aschenbrenner's Situational Awareness Paper
Dwarkesh Patel talked to Leopold Aschenbrenner for about four and a half hours.
The central discussion was the theses of his paper, Situational Awareness, which I offered quotes from earlier, with a focus on the consequences of AGI rather than whether AGI will happen soon. There are also a variety of other topics.
Thus, for the relevant sections of the podcast I am approaching this via roughly accepting the technological premise on capabilities and timelines, since they don't discuss that. So the background is we presume straight lines on graphs will hold to get us to AGI and ASI (superintelligence), and this will allow us to generate a 'drop in AI researcher' that can then assist with further work. Then things go into 'slow' takeoff.
I am changing the order of the sections a bit. I put the pure AI stuff first, then afterwards are most of the rest of it.
The exception is the section on What Happened at OpenAI.
I am leaving that part out because I see it as distinct, and requiring a different approach. It is important and I will absolutely cover it. I want to do that in its proper context, together with other events at OpenAI, rather than together with the global questions raised here. Also, if you find OpenAI events relevant to your interests that section is worth listening to in full, because it is absolutely wild.
Long post is already long, so I will let this stand on its own and not combine it with people's reactions to Leopold or my more structured response to his paper.
While I have strong disagreements with Leopold, only some of which I detail here, and I especially believe he is dangerously wrong and overly optimistic about alignment, existential risks and loss of control in ways that are highly load bearing, causing potential sign errors in interventions, and also I worry that the new AGI fund may make our situation worse rather than better, I want to most of all say: Thank you.
Leopold has shown great courage. He stands up for what he believes in even at great personal cost. He has been willing to express views very different from those around him, when everything around him was trying to get him not to do that. He has thought long and hard about issues very hard to think long and hard about, and is obviously wicked smart. By writing down, in great detail, what he actually believes, he allows us to compare notes and arguments, and to move forward. This is The Way.
I have often said I need better critics. This is a better critic. A worthy opponent.
Also, on a great many things, he is right, including many highly important things where both the world at large and also those at the labs are deeply wrong, often where Leopold's position was not even being considered before. That is a huge deal.
The plan is to then do a third post, where I will respond holistically to Leopold's model, and cover the reactions of others.
Reminder on formatting for Podcast posts:
1. Unindented first-level items are descriptions of what was said and claimed on the podcast unless explicitly labeled otherwise.
2. Indented second-level items and beyond are my own commentary on that, unless labeled otherwise.
3. Time stamps are from YouTube.
The Trillion Dollar Cluster
1. (2:00) We start with the trillion-dollar cluster. It's coming. Straight lines on a graph at half an order of magnitude a year, a central theme throughout.
2. (4:30) Power. We'll need more. American power generation has not grown for decades. Who can build a 10 gigawatt center let alone 100? Leonard thinks 10 was so six months ago and we're on to 100. Trillion dollar cluster a bit farther out.
3. (6:15) Distinction between cost of cluster versus rental...

Jun 10, 2024 • 7min
LW - My AI Model Delta Compared To Yudkowsky by johnswentworth
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My AI Model Delta Compared To Yudkowsky, published by johnswentworth on June 10, 2024 on LessWrong.
Preamble: Delta vs Crux
I don't natively think in terms of cruxes. But there's a similar concept which is more natural for me, which I'll call a delta.
Imagine that you and I each model the world (or some part of it) as implementing some program. Very oversimplified example: if I learn that e.g. it's cloudy today, that means the "weather" variable in my program at a particular time[1] takes on the value "cloudy". Now, suppose your program and my program are exactly the same, except that somewhere in there I think a certain parameter has value 5 and you think it has value 0.3.
Even though our programs differ in only that one little spot, we might still expect very different values of lots of variables during execution - in other words, we might have very different beliefs about lots of stuff in the world.
If your model and my model differ in that way, and we're trying to discuss our different beliefs, then the obvious useful thing-to-do is figure out where that one-parameter difference is.
That's a delta: one or a few relatively "small"/local differences in belief, which when propagated through our models account for most of the differences in our beliefs.
For those familiar with Pearl-style causal models: think of a delta as one or a few do() operations which suffice to make my model basically match somebody else's model, or vice versa.
This post is about my current best guesses at the delta between my AI models and Yudkowsky's AI models. When I apply the delta outlined here to my models, and propagate the implications, my models basically look like Yukowsky's as far as I can tell.
This post might turn into a sequence if there's interest; I already have another one written for Christiano, and people are welcome to suggest others they'd be interested in.
My AI Model Delta Compared To Yudkowsky
Best guess: Eliezer basically rejects the
natural abstraction hypothesis. He mostly expects AI to use internal ontologies fundamentally alien to the ontologies of humans, at least in the places which matter.
Lethality #33 lays it out succinctly:
33. The AI does not think like you do, the AI doesn't have thoughts built up from the same concepts you use, it is utterly alien on a staggering scale. Nobody knows what the hell GPT-3 is thinking, not only because the matrices are opaque, but because the stuff within that opaque container is, very likely, incredibly alien - nothing that would translate well into comprehensible human thinking, even if we could see past the giant wall of floating-point numbers to what lay behind.
What do my models look like if I propagate that delta? In worlds where natural abstraction basically fails, we are thoroughly and utterly fucked, and a 99% probability of doom strikes me as entirely reasonable and justified.
Here's one oversimplified doom argument/story in a world where natural abstraction fails hard:
1. Humanity is going to build superhuman goal-optimizing agents. ('Cause, like, obviously somebody's going to do that, there's no shortage of capabilities researchers loudly advertising that they're aiming to do that exact thing.) These will be so vastly more powerful than humans that we have basically-zero bargaining power except insofar as AIs are aligned to our interests.
2. We're assuming natural abstraction basically fails, so those AI systems will have fundamentally alien internal ontologies. For purposes of this overcompressed version of the argument, we'll assume a very extreme failure of natural abstraction, such that human concepts cannot be faithfully and robustly translated into the system's internal ontology at all.
(For instance, maybe a faithful and robust translation would be so long in the system's "internal language" that the transla...

Jun 10, 2024 • 1min
LW - What if a tech company forced you to move to NYC? by KatjaGrace
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What if a tech company forced you to move to NYC?, published by KatjaGrace on June 10, 2024 on LessWrong.
It's interesting to me how chill people sometimes are about the non-extinction future AI scenarios. Like, there seem to be opinions around along the lines of "pshaw, it might ruin your little sources of 'meaning', Luddite, but we have always had change and as long as the machines are pretty near the mark on rewiring your brain it will make everything amazing".
Yet I would bet that even that person, if faced instead with a policy that was going to forcibly relocate them to New York City, would be quite indignant, and want a lot of guarantees about the preservation of various very specific things they care about in life, and not be just like "oh sure, NYC has higher GDP/capita than my current city, sounds good".
I read this as a lack of engaging with the situation as real. But possibly my sense that a non-negligible number of people have this flavor of position is wrong.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Jun 10, 2024 • 4min
LW - The Data Wall is Important by JustisMills
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Data Wall is Important, published by JustisMills on June 10, 2024 on LessWrong.
Modern AI is trained on a
huge fraction of the internet, especially at the cutting edge, with the best models trained on close to all the high quality data we've got.[1] And data is
really important! You can scale up compute, you can make algorithms more efficient, or you can add infrastructure around a model to make it more useful, but on the margin, great datasets are king. And, naively, we're about to run out of fresh data to use.
It's rumored that the top firms are looking for ways to get around the data wall. One possible approach is having LLMs create their own data to train on, for which there is kinda-sorta a precedent from, e.g. modern chess AIs learning by playing games against themselves.[2] Or just finding ways to make AI dramatically more sample efficient with the data we've already got: the existence of human brains proves that this is, theoretically, possible.[3]
But all we have, right now, are rumors. I'm not even personally aware of rumors that any lab has cracked the problem: certainly, nobody has come out and say so in public! There's a lot of insinuation that the data wall is not so formidable, but no hard proof. And if the data wall is a hard blocker, it could be very hard to get AI systems much stronger than they are now.
If the data wall stands, what would we make of today's rumors? There's certainly an optimistic mood about progress coming from AI company CEOs, and a steady trickle of not-quite-leaks that exciting stuff is going on behind the scenes, and to stay tuned. But there are at least two competing explanations for all this:
Top companies are already using the world's smartest human minds to crack the data wall, and have all but succeeded.
Top companies need to keep releasing impressive stuff to keep the money flowing, so they declare, both internally and externally, that their current hurdles are surmountable.
There's lots of precedent for number two! You may have heard of startups hard coding a feature and then scrambling to actually implement it when there's interest.
And race dynamics make this even more likely: if OpenAI projects cool confidence that it's almost over the data wall, and Anthropic doesn't, then where will all the investors, customers, and high profile corporate deals go? There also could be an echo chamber effect, where one firm acting like the data wall's not a big deal makes other firms take their word for it.
I don't know what a world with a strong data wall looks like in five years. I bet it still looks pretty different than today! Just improving GPT-4 level models around the edges, giving them better tools and scaffolding, should be enough to spur massive economic activity and, in the absence of government intervention, job market changes. We can't unscramble the egg. But the "just trust the straight line on the graph" argument is ignoring that one of the determinants of that line is running out.
There's a world where the line is stronger than that particular constraint, and a new treasure trove of data appears in time. But there's also a world where it isn't, and we're near the inflection of an
S-curve.
Rumors and projected confidence can't tell us which world we're in.
1. ^
For good analysis of this, search for the heading "The data wall" here.
2. ^
But don't take this parallel too far! Chess AI (or AI playing any other game) has a signal of "victory" that it can seek out - it can preferentially choose moves that systematically lead to the "my side won the game" outcome. But the core of a LLM is a text predictor: "winning" for it is correctly guessing what comes next in human-created text.
What does self-play look like there? Merely making up fake human-created text has the obvious issue of amplifying any weaknesses the AI has ...