

The Nonlinear Library: LessWrong
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

May 26, 2024 • 57min
LW - Review: Conor Moreton's "Civilization and Cooperation" by [DEACTIVATED] Duncan Sabien
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Review: Conor Moreton's "Civilization & Cooperation", published by [DEACTIVATED] Duncan Sabien on May 26, 2024 on LessWrong.
Author's note: in honor of the upcoming LessOnline event, I'm sharing this one here on LessWrong rather than solely on my substack. If you like it, you should subscribe to my substack, which you can do for free (paid subscribers see stuff a week early). I welcome discussion down below but am not currently committing to any particular level of participation myself.
Dang it, I knew I should have gone with my first instinct, and photocopied the whole book first. But then again, given that it vanished as soon as I got to the end of it, maybe my second instinct was right, and trying to do that would've been seen as cheating by whatever magical librarians left it for me in the first place.
It was just sitting there, on my desk, when I woke up six weeks ago. At first I thought it was an incredibly in-depth prank, or maybe like a fun puzzle that Logan had made for me as an early birthday present. But when I touched it, it glowed, and it unfolded in a way that I'm pretty sure we don't currently have the tech for.
Took me a while to decode the text, which mostly looked like:
…but eventually I got the hang of it, thanks to the runes turning out to be English, somehow, just a weird phonetic transcription of it.
Hilariously mundanely, it turned out to be a textbook (!), for what seemed like the equivalent of seventh graders (!!), for what seemed like the equivalent of social studies (!!!), written by an educator whose name (if I managed the translation correctly) is something like "Conor Moreton"…
…in a place called (if I managed the translation correctly) something like "Agor."
At first, I thought it was a civics textbook for the government and culture of Agor in particular, but nope - the more I read, the more it seemed like a "how stuff works" for societies in general, with a lot of claims that seemed to apply pretty straightforwardly to what I understand about cultures here on Earth.
(I'll be honest. By the time I got to the end of it, I was stoked about the idea of living in a country where everybody was taught this stuff in seventh grade.)
I took notes, but not very rigorous ones. I wasn't counting on the book just disappearing as soon as I finished reading the last page
(I know, I know, not very savvy of me, I should have seen that coming. 20/20 hindsight.)
so what follows is a somewhat patchwork review, with a lot of detail in random places and very little detail in others. Sorry. It's as complete as I can make it. If anybody else happens to get their hands on a copy, please let me know, or at least be sure to take better notes yourself.
I. Civilization as self-restraint
The first chapter of Moreton's book asks readers to consider the question Where does civilization come from? Why do we have it?
After all, at some point, civilization didn't exist. Then gradually, over time, it came into being, and gradually, over time, it became more and more complex.
(Moreton goes out of his way to make clear that he's not just talking about, like, static agrarian society, but civilizations of all kinds, including nomadic and foraging ones.)
At every step of the way, he argues, each new extra layer of civilization had to be better than what came before. Cultures aren't quite the same as organisms, but they're still subject to evolutionary pressure. Behaviors that don't pay off, in some important sense, eventually die out, outcompeted by other, better-calibrated behaviors.
The book points out that what civilization even is is a question that's up for debate, with many people using many different definitions. Moreton proposes a single, unifying principle:
Civilization is the voluntary relinquishment of technically available options. It's a binding of the self, a del...

May 26, 2024 • 13min
LW - Notifications Received in 30 Minutes of Class by tanagrabeast
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Notifications Received in 30 Minutes of Class, published by tanagrabeast on May 26, 2024 on LessWrong.
Introduction
If you are choosing to read this post, you've probably seen the image below depicting all the notifications students received on their phones during one class period. You probably saw it as a retweet of
this tweet, or in
one of Zvi's posts. Did you find this data plausible, or did you roll to disbelieve? Did you know that the image dates back to at least 2019? Does that fact make you more or less worried about the truth on the ground as of 2024?
Last month, I performed an enhanced replication of this experiment in my high school classes. This was partly because we had a use for it, partly to model scientific thinking, and partly because I was just really curious. Before you scroll past the image, I want to give you a chance to mentally register your predictions. Did my average class match the roughly 1,084 notifications I counted on Ms.
Garza's viral image? What does the distribution look like? Is there a notable gender difference? Do honors classes get more or fewer notifications than regular classes? Which apps dominate? Let's find out!
Before you rush to compare apples and oranges, keep in mind that I don't know anything about Ms. Garza's class -- not the grade, the size, or the duration of her experiment. That would have made it hard for me to do a true replication, and since I saw some obvious ways to improve on her protocol, I went my own way with it.
Procedure
We opened class with a discussion about what we were trying to measure and how we were going to measure it for the next 30 minutes. Students were instructed to have their phones on their desks and turned on. For extra amusement, they were invited (but not required) to turn on audible indicators. They were asked to tally each notification received and log it by app.
They were instructed to not engage with any received notifications, and to keep their phone use passive during the experiment, which I monitored.
While they were not to put their names on their tally sheets, they were asked to provide some metadata that included (if comfortable) their gender. (They knew that gender differences in phone use and depression were a topic of public discussion, and were largely happy to provide this.)
To give us a consistent source of undemanding background "instruction" - and to act as our timer - I played the first 30 minutes of Kurzgesagt's groovy
4.5 Billion Years in 1 Hour video. Periodically, I also mingled with students in search of insights, which proved highly productive.
After the 30 minutes, students were charged with summing their own tally marks and writing totals as digits, so as to avoid a common issue where different students bundle and count tally clusters differently.
Results
Below are the two charts from our experiment that I think best capture the data of interest. The first is more straightforward, but I think the second is a little more meaningful.
Ah! So right away we can see a textbook long-tailed distribution. The top 20% of recipients accounted for 75% of all received notifications, and the bottom 20% for basically zero. We can also see that girls are more likely to be in that top tier, but they aren't exactly crushing the boys.
But do students actually notice and get distracted by all of these notifications? This is partly subjective, obviously, but we probably aren't as worried about students who would normally have their phones turned off or tucked away in their backpacks on the floor. So one of my metadata questions asked them about this.
The good rapport I enjoy with my students makes me pretty confident that I got honest answers - as does the fact that the data doesn't change all that much when I adjust for this in the chart below.
The most interesting difference in the ...

May 25, 2024 • 6min
LW - Level up your spreadsheeting by angelinahli
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Level up your spreadsheeting, published by angelinahli on May 25, 2024 on LessWrong.
Epistemic status: Passion project / domain I'm pretty opinionated about, just for fun.
In this post, I walk through some principles I think good spreadsheets abide by, and then in the companion piece, I walk through a whole bunch of tricks I've found valuable.
Who am I?
I've spent a big chunk of my (short) professional career so far getting good at Excel and Google Sheets.[1] As such, I've accumulated a bunch of opinions on this topic.
Who should read this?
This is not a guide to learning how to start using spreadsheets at all. I think you will get more out of this post if you use spreadsheets at least somewhat frequently, e.g.
Have made 20+ spreadsheets
Know how to use basic formulas like sum, if, countif, round
Know some fancier formulas like left/mid/right, concatenate, hyperlink
Have used some things like filters, conditional formatting, data validation
Principles of good spreadsheets
Broadly speaking, I think good spreadsheets follow some core principles (non-exhaustive list).
I think the below is a combination of good data visualization (or just communication) advice, systems design, and programming design (spreadsheets combine the code and the output).
It should be easy for you to extract insights from your data
1. A core goal you might have with spreadsheets is quickly calculating something based on your data. A bunch of tools below are aimed at improving functionality, allowing you to more quickly grab the data you want.
Your spreadsheet should be beautiful and easy to read
1. Sometimes, spreadsheets look like the following example.
2. I claim that this is not beautiful or easy for your users to follow what is going on. I think there are cheap techniques you can use to improve the readability of your data.
There should be one source of truth for your data
1. One common pitfall when designing spreadsheet-based trackers is hard copy and pasting data from one sheet to another, such that when your source data changes, the sheets you use for analyses no longer reflect "fresh" data. This is a big way in which your spreadsheet systems can break down.
2. A bunch of tools below are designed to improve data portability - i.e. remove the need for copy and pasting.
Your spreadsheet should be easy to audit
1. One major downside of spreadsheets as compared to most coding languages, is that it's often easy for relatively simple spreadsheets to contain silent bugs in them.[2]
2. Some features of spreadsheets that contribute to this problem:
1. Spreadsheets hide the code and show you only the output by default.
1. When you use formulas, once you hit enter, the user doesn't by default get to read what's going on. So if the output looks plausible, you might not notice your formula has a bug in it.
2. It's harder to break up your work into chunks.
1. When you're coding, most people will break up a complicated formula into several lines of code, using intermediate variables and comments to make things more readable. E.g.:
2.
3. By default, some Sheets formulas get really unwieldy, and you need to work a bit harder to recover readability.
3. Spreadsheets contain more individual calculations.
1. When you're coding and you want to perform the same calculation on 100 rows of data, you'd probably use a single line of code to iterate over your data (e.g. a for loop).
2. In Google Sheets, you're more likely to drag your formula down across all of your rows. But this means that if you accidentally change the formula for one cell and not the others, or if your data has now changed and it turns out you need to drag your formulas down more, things can break in annoying ways.
3. Because of this, I consider auditability one of the key qualities of a well designed spreadsheet. Some of the tools below will rec...

May 25, 2024 • 1h
LW - The Schumer Report on AI (RTFB) by Zvi
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Schumer Report on AI (RTFB), published by Zvi on May 25, 2024 on LessWrong.
Or at least, Read the Report (RTFR).
There is no substitute. This is not strictly a bill, but it is important.
The introduction kicks off balancing upside and avoiding downside, utility and risk. This will be a common theme, with a very strong 'why not both?' vibe.
Early in the 118th Congress, we were brought together by a shared recognition of the profound changes artificial intelligence (AI) could bring to our world: AI's capacity to revolutionize the realms of science, medicine, agriculture, and beyond; the exceptional benefits that a flourishing AI ecosystem could offer our economy and our productivity; and AI's ability to radically alter human capacity and knowledge.
At the same time, we each recognized the potential risks AI could present, including altering our workforce in the short-term and long-term, raising questions about the application of existing laws in an AI-enabled world, changing the dynamics of our national security, and raising the threat of potential doomsday scenarios. This led to the formation of our Bipartisan Senate AI Working Group ("AI Working Group").
They did their work over nine forums.
1. Inaugural Forum
2. Supporting U.S. Innovation in AI
3. AI and the Workforce
4. High Impact Uses of AI
5. Elections and Democracy
6. Privacy and Liability
7. Transparency, Explainability, Intellectual Property, and Copyright
8. Safeguarding Against AI Risks
9. National Security
Existential risks were always given relatively minor time, with it being a topic for at most a subset of the final two forums. By contrast, mundane downsides and upsides were each given three full forums. This report was about response to AI across a broad spectrum.
The Big Spend
They lead with a proposal to spend 'at least' $32 billion a year on 'AI innovation.'
No, there is no plan on how to pay for that.
In this case I do not think one is needed. I would expect any reasonable implementation of that to pay for itself via economic growth. The downsides are tail risks and mundane harms, but I wouldn't worry about the budget. If anything, AI's arrival is a reason to be very not freaked out about the budget. Official projections are baking in almost no economic growth or productivity impacts.
They ask that this money be allocated via a method called emergency appropriations. This is part of our government's longstanding way of using the word 'emergency.'
We are going to have to get used to this when it comes to AI.
Events in AI are going to be happening well beyond the 'non-emergency' speed of our government and especially of Congress, both opportunities and risks.
We will have opportunities that appear and compound quickly, projects that need our support. We will have stupid laws and rules, both that were already stupid or are rendered stupid, that need to be fixed.
Risks and threats, not only catastrophic or existential risks but also mundane risks and enemy actions, will arise far faster than our process can pass laws, draft regulatory rules with extended comment periods and follow all of our procedures.
In this case? It is May. The fiscal year starts in October. I want to say, hold your damn horses. But also, you think Congress is passing a budget this year? We will be lucky to get a continuing resolution. Permanent emergency. Sigh.
What matters more is, what do they propose to do with all this money?
A lot of things. And it does not say how much money is going where. If I was going to ask for a long list of things that adds up to $32 billion, I would say which things were costing how much money. But hey. Instead, it looks like he took the number from NSCAI, and then created a laundry list of things he wanted, without bothering to create a budget of any kind?
It also seems like they took the origin...

May 24, 2024 • 8min
LW - AI companies aren't really using external evaluators by Zach Stein-Perlman
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI companies aren't really using external evaluators, published by Zach Stein-Perlman on May 24, 2024 on LessWrong.
Crossposted from my new blog: AI Lab Watch. Subscribe on Substack.
Many AI safety folks think that METR is close to the labs, with ongoing relationships that grant it access to models before they are deployed. This is incorrect. METR (then called ARC Evals) did pre-deployment evaluation for GPT-4 and Claude 2 in the first half of 2023, but it seems to have had no special access since then.[1] Other model evaluators also seem to have little access before deployment.
Frontier AI labs' pre-deployment risk assessment should involve external model evals for dangerous capabilities.[2] External evals can improve a lab's risk assessment and - if the evaluator can publish its results - provide public accountability.
The evaluator should get deeper access than users will get.
To evaluate threats from a particular deployment protocol, the evaluator should get somewhat deeper access than users will - then the evaluator's failure to elicit dangerous capabilities is stronger evidence that users won't be able to either.[3] For example, the lab could share a version of the model without safety filters or harmlessness training, and ideally allow evaluators to fine-tune the model.
To evaluate threats from model weights being stolen or released, the evaluator needs deep access, since someone with the weights has full access.
The costs of using external evaluators are unclear.
Anthropic said that collaborating with METR "requir[ed] significant science and engineering support on our end"; it has not clarified why. And even if providing deep model access or high-touch support is a hard engineering problem, I don't understand how sharing API access - including what users will receive and a no-harmlessness no-filters version - could be.
Sharing model access pre-deployment increases the risk of leaks, including of information about products (modalities, release dates), information about capabilities, and demonstrations of models misbehaving.
Independent organizations that do model evals for dangerous capabilities include METR, the UK AI Safety Institute (UK AISI), and Apollo. Only Google DeepMind says it has recently shared pre-deployment access with such an evaluator - UK AISI - and that sharing was minimal (see below).
What the labs say they're doing on external evals before deployment:
DeepMind
DeepMind
shared Gemini 1.0 Ultra with unspecified external groups
apparently including UK AISI to test for dangerous capabilities before deployment. But DeepMind didn't share deep access: it only shared a system with safety fine-tuning and safety filters and it didn't allow evaluators to fine-tune the model. DeepMind has not shared any results of this testing.
Its Frontier Safety Framework says "We will . . . explore how to appropriately involve independent third parties in our risk assessment and mitigation processes."
Anthropic
Currently nothing
Its Responsible Scaling Policy mentions "external audits" as part of "Early Thoughts on ASL-4"
It shared Claude 2 with METR in the first half of 2023
OpenAI
Currently nothing
Its Preparedness Framework does not mention external evals before deployment. The closest thing it says is "Scorecard evaluations (and corresponding mitigations) will be audited by qualified, independent third-parties."
It shared GPT-4 with METR in the first half of 2023
It
said "We think it's important that efforts like ours submit to independent audits before releasing new systems; we will talk about this in more detail later this year." That was in February 2023; I do not believe it elaborated (except to
mention that it shared GPT-4 with METR).
All notable American labs joined the
White House voluntary commitments
, which include "external red-teaming . . . in areas ...

May 24, 2024 • 4min
LW - minutes from a human-alignment meeting by bhauth
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: minutes from a human-alignment meeting, published by bhauth on May 24, 2024 on LessWrong.
"OK, let's get this meeting started. We're all responsible for development of this new advanced intelligence 'John'. We want John to have some kids with our genes, instead of just doing stuff like philosophy or building model trains, and this meeting is to discuss how we can ensure John tries to do that."
"It's just a reinforcement learning problem, isn't it? We want kids to happen, so provide positive reinforcement when that happens."
"How do we make sure the kids are ours?"
"There's a more fundamental problem than that: without intervention earlier on, that positive reinforcement will never happen."
"OK, so we need some guidance earlier on. Any suggestions?"
"To start, having other people around is necessary. How about some negative reinforcement if there are no other humans around for some period of time?"
"That's a good one, also helps with some other things. Let's do that."
"Obviously sex is a key step in producing children. So we can do positive reinforcement there."
"That's good, but wait, how do we tell if that's what's actually happening?"
"We have access to internal representation states. Surely we can monitor those to determine the situation."
"Yeah, we can monitor the representation of vision, instead of something more abstract and harder to understand."
"What if John creates a fictional internal representation of naked women, and manages to direct the monitoring system to that instead?"
"I don't think that's plausible, but just in case, we can add some redundant measures. A heuristic blend usually gives better results, anyway."
"How about monitoring the level of some association between some representation of the current situation and sex?"
"That could work, but how do we determine that association? We'd be working with limited data there, and we don't want to end up with associations to random irrelevant things, like specific types of shoes or stylized drawings of ponies."
"Those are weird examples, but whatever. We can just rely on indicators of social consensus, and then blend those with personal experiences to the extent they're available."
"I've said this before, but this whole approach isn't workable. To keep a John-level intelligence aligned, we need another John-level intelligence."
"Oh, here we go again. So, how do you expect to do that?"
"I actually have a proposal: we have John follow cultural norms around having children. We can presume that a society that exists would probably have a culture conducive to that."
"Why would you expect that to be any more stable than John as an individual? All that accomplishes is some averaging, and it adds the disadvantages of relying on communication."
"I don't have a problem with the proposal of following cultural norms, but I think that such a culture will only be stable to the extent that the other alignment approaches we discussed are successful. So it's not a replacement, it's more of a complement."
"We were already planning for some cultural norm following. Anyone opposed to just applying the standard amount of that to sex-related things?"
"Seems good to me."
"I have another concern. I think the effectiveness of the monitoring systems we discussed is going to depend on the amount of recursive self-improvement that happens, so we should limit that."
"I think that's a silly concern and a huge disadvantage. Absolutely not."
"I'm not concerned about the alignment impact if John is already doing some RSI, but we do have a limited amount of time before those RSI investments need to start paying off. I vote we limit the RSI extent based on things like available food resources and life expectancy."
"I don't think everyone will reach a consensus on this issue, so let's just compromise on the amount and metrics."
"Fine."
"A...

May 24, 2024 • 19min
LW - What mistakes has the AI safety movement made? by EuanMcLean
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What mistakes has the AI safety movement made?, published by EuanMcLean on May 24, 2024 on LessWrong.
This is the third of three posts summarizing what I learned when I interviewed 17 AI safety experts about their "big picture" of the existential AI risk landscape: how AGI will play out, how things might go wrong, and what the AI safety community should be doing. See here for a list of the participants and the standardized list of questions I asked.
This post summarizes the responses I received from asking "Are there any big mistakes the AI safety community has made in the past or are currently making?"
"Yeah, probably most things people are doing are mistakes. This is just some random group of people. Why would they be making good decisions on priors? When I look at most things people are doing, I think they seem not necessarily massively mistaken, but they seem somewhat confused or seem worse to me by like 3 times than if they understood the situation better." - Ryan Greenblatt
"If we look at the track record of the AI safety community, it quite possibly has been harmful for the world." - Adam Gleave
"Longtermism was developed basically so that AI safety could be the most important cause by the utilitarian EA calculus. That's my take." - Holly Elmore
Participants pointed to a range of mistakes they thought the AI safety movement had made. Key themes included an overreliance on theoretical argumentation, being too insular, putting people off by pushing weird or extreme views, supporting the leading AGI companies, insufficient independent thought, advocating for an unhelpful pause to AI development, and ignoring policy as a potential route to safety.
How to read this post
This is not a scientific analysis of a systematic survey of a representative sample of individuals, but my qualitative interpretation of responses from a loose collection of semi-structured interviews. Take everything here with the appropriate seasoning.
Results are often reported in the form "N respondents held view X". This does not imply that "17-N respondents disagree with view X", since not all topics, themes and potential views were addressed in every interview. What "N respondents held view X" tells us is that at least N respondents hold X, and consider the theme of X important enough to bring up.
The following is a summary of the main themes that came up in my interviews. Many of the themes overlap with one another, and the way I've clustered the criticisms is likely not the only reasonable categorization.
Too many galaxy-brained arguments & not enough empiricism
"I don't find the long, abstract style of investigation particularly compelling." - Adam Gleave
9 respondents were concerned about an overreliance or overemphasis on certain kinds of theoretical arguments underpinning AI risk: namely Yudkowsky's arguments in the sequences and Bostrom's arguments in Superintelligence.
"All these really abstract arguments that are very detailed, very long and not based on any empirical experience. [...]
Lots of trust in loose analogies, thinking that loose analogies let you reason about a topic you don't have any real expertise in. Underestimating the conjunctive burden of how long and abstract these arguments are. Not looking for ways to actually test these theories. [...]
You can see Nick Bostrom in Superintelligence stating that we shouldn't use RL to align an AGI because it trains the AI to maximize reward, which will lead to wireheading. The idea that this is an inherent property of RL is entirely mistaken. It may be an empirical fact that certain minds you train with RL tend to make decisions on the basis of some tight correlate of their reinforcement signal, but this is not some fundamental property of RL."
Alex Turner
Jamie Bernardi argued that the original view of what AGI will look like, nam...

May 24, 2024 • 2min
LW - The case for stopping AI safety research by catubc
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The case for stopping AI safety research, published by catubc on May 24, 2024 on LessWrong.
TLDR: AI systems are failing in obvious and manageable ways for now. Fixing them will push the failure modes beyond our ability to understand and anticipate, let alone fix. The AI safety community is also doing a huge economic service to developers. Our belief that our minds can "fix" a super-intelligence - especially bit by bit - needs to be re-thought.
I wanted to write this post forever, but now seems like a good time. The case is simple, I hope it takes you 1min to read.
1. AI safety research is still solving easy problems. We are patching up the most obvious (to us) problems. As time goes we will no longer be able to play this existential risk game of chess with AI systems. I've argued this a lot (preprint; ICML paper accepted (shorter read, will repost), will be out in a few days; www.agencyfoundations.ai). Seems others have this thought.
2. Capability development is getting AI safety research for free. It's likely in the millions to tens of millions of dollars. All the "hackathons", and "mini" prizes to patch something up or propose a new way for society to digest/adjust to some new normal (and increasingly incentivizing existing academic labs).
3. AI safety research is speeding up capabilities. I hope this is somewhat obvious to most.
I write this now because in my view we are about 5-7 years before massive human biometric and neural datasets will enter our AI training. These will likely generate amazing breakthroughs in long-term planning and emotional and social understanding of the human world. They will also most likely increase x-risk radically.
Stopping AI safety research or taking it in-house with security guarantees etc, will slow down capabilities somewhat - and may expose capabilities developers more directly to public opinion of still manageable harmful outcomes.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

May 24, 2024 • 27min
LW - Talent Needs in Technical AI Safety by yams
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Talent Needs in Technical AI Safety, published by yams on May 24, 2024 on LessWrong.
Co-Authors: @yams, @Carson Jones, @McKennaFitzgerald, @Ryan Kidd
MATS tracks the evolving landscape of AI safety[1] to ensure that our program continues to meet the talent needs of safety teams. As the field has grown, it's become increasingly necessary to adopt a more formal approach to this monitoring, since relying on a few individuals to intuitively understand the dynamics of such a vast ecosystem could lead to significant missteps.[2]
In the winter and spring of 2024, we conducted 31 interviews, ranging in length from 30 to 120 minutes, with key figures in AI safety, including senior researchers, organization leaders, social scientists, strategists, funders, and policy experts. This report synthesizes the key insights from these discussions.
The overarching perspectives presented here are not attributed to any specific individual or organization; they represent a collective, distilled consensus that our team believes is both valuable and responsible to share. Our aim is to influence the trajectory of emerging researchers and field-builders, as well as to inform readers on the ongoing evolution of MATS and the broader AI Safety field.
All interviews were conducted on the condition of anonymity.
Needs by Organization Type
Organization type
Talent needs
Scaling Lab (i.e. OpenAI, DeepMind, Anthropic) Safety Teams
Iterators > Amplifiers
Small Technical Safety Orgs (<10 FTE)
Iterators > Machine Learning (ML) Engineers
Growing Technical Safety Orgs (10-30 FTE)
Amplifiers > Iterators
Independent Research
Iterators > Connectors
Archetypes
We found it useful to frame the different profiles of research strengths and weaknesses as belonging to one of three archetypes (one of which has two subtypes). These aren't as strict as, say, Diablo classes; this is just a way to get some handle on the complex network of skills involved in AI safety research. Indeed, capacities tend to converge with experience, and neatly classifying more experienced researchers often isn't possible.
We acknowledge past framings by Charlie Rogers-Smith and Rohin Shah (research lead/contributor), John Wentworth (theorist/experimentalist/distillator), Vanessa Kosoy (proser/poet), Adam Shimi (mosaic/palimpsests), and others, but believe our framing of current AI safety talent archetypes is meaningfully different and valuable, especially pertaining to current funding and employment opportunities.
Connectors / Iterators / Amplifiers
Connectors are strong conceptual thinkers who build a bridge between contemporary empirical work and theoretical understanding. Connectors include people like Paul Christiano, Buck Shlegeris, Evan Hubinger, and Alex Turner[3]; researchers doing original thinking on the edges of our conceptual and experimental knowledge in order to facilitate novel understanding.
Note that most Connectors are typically not purely theoretical; they still have the technical knowledge required to design and run experiments. However, they prioritize experiments and discriminate between research agendas based on original, high-level insights and theoretical models, rather than on spur of the moment intuition or the wisdom of the crowds.
Pure Connectors often have a long lead time before they're able to produce impactful work, since it's usually necessary for them to download and engage with varied conceptual models. For this reason, we make little mention of a division between experienced and inexperienced Connectors.
Iterators are strong empiricists who build tight, efficient feedback loops for themselves and their collaborators. Ethan Perez is the central contemporary example here; his efficient prioritization and effective use of frictional time has empowered him to make major contributions to a wide range of empir...

May 23, 2024 • 9min
LW - Big Picture AI Safety: Introduction by EuanMcLean
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Big Picture AI Safety: Introduction, published by EuanMcLean on May 23, 2024 on LessWrong.
tldr: I conducted 17 semi-structured interviews of AI safety experts about their big picture strategic view of the AI safety landscape: how will human-level AI play out, how things might go wrong, and what should the AI safety community be doing. While many respondents held "traditional" views (e.g.
the main threat is misaligned AI takeover), there was more opposition to these standard views than I expected, and the field seems more split on many important questions than someone outside the field may infer.
What do AI safety experts believe about the big picture of AI risk? How might things go wrong, what we should do about it, and how have we done so far? Does everybody in AI safety agree on the fundamentals? Which views are consensus, which are contested and which are fringe? Maybe we could learn this from the literature (as in the MTAIR project), but many ideas and opinions are not written down anywhere, they exist only in people's heads and in lunchtime conversations at AI labs and coworking
spaces.
I set out to learn what the AI safety community believes about the strategic landscape of AI safety. I conducted 17 semi-structured interviews with a range of AI safety experts. I avoided going into any details of particular technical concepts or philosophical arguments, instead focussing on how such concepts and arguments fit into the big picture of what AI safety is trying to achieve.
This work is similar to the AI Impacts surveys, Vael Gates' AI Risk Discussions, and Rob Bensinger's existential risk from AI survey. This is different to those projects in that both my approach to interviews and analysis are more qualitative. Part of the hope for this project was that it can hit on harder-to-quantify concepts that are too ill-defined or intuition-based to fit in the format of previous survey work.
Questions
I asked the participants a standardized list of questions.
What will happen?
Q1 Will there be a human-level AI? What is your modal guess of what the first human-level AI (HLAI) will look like? I define HLAI as an AI system that can carry out roughly 100% of economically valuable cognitive tasks more cheaply than a human.
Q1a What's your 60% or 90% confidence interval for the date of the first HLAI?
Q2 Could AI bring about an existential catastrophe? If so, what is the most likely way this could happen?
Q2a What's your best guess at the probability of such a catastrophe?
What should we do?
Q3 Imagine a world where, absent any effort from the AI safety community, an existential catastrophe happens, but actions taken by the AI safety community prevent such a catastrophe. In this world, what did we do to prevent the catastrophe?
Q4 What research direction (or other activity) do you think will reduce existential risk the most, and what is its theory of change? Could this backfire in some way?
What mistakes have been made?
Q5 Are there any big mistakes the AI safety community has made in the past or are currently making?
These questions changed gradually as the interviews went on (given feedback from participants), and I didn't always ask the questions exactly as I've presented them here. I asked participants to answer from their internal model of the world as much as possible and to avoid deferring to the opinions of others (their inside view so to speak).
Participants
Adam Gleave is the CEO and co-founder of the alignment research non-profit FAR AI. (Sept 23)
Adrià Garriga-Alonso is a research scientist at FAR AI. (Oct 23)
Ajeya Cotra leads Open Philantropy's grantmaking on technical research that could help to clarify and reduce catastrophic risks from advanced AI. (Jan 24)
Alex Turner is a research scientist at Google DeepMind on the Scalable Alignment team. (Feb 24)
Ben Cottie...