

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Apr 17, 2024 • 3min
LW - Mid-conditional love by KatjaGrace
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Mid-conditional love, published by KatjaGrace on April 17, 2024 on LessWrong.
People talk about unconditional love and conditional love. Maybe I'm out of the loop regarding the great loves going on around me, but my guess is that love is extremely rarely unconditional. Or at least if it is, then it is either very broadly applied or somewhat confused or strange: if you love me unconditionally, presumably you love everything else as well, since it is only conditions that separate me from the worms.
I do have sympathy for this resolution - loving someone so unconditionally that you're just crazy about all the worms as well - but since that's not a way I know of anyone acting for any extended period, the 'conditional vs. unconditional' dichotomy here seems a bit miscalibrated for being informative.
Even if we instead assume that by 'unconditional', people mean something like 'resilient to most conditions that might come up for a pair of humans', my impression is that this is still too rare to warrant being the main point on the love-conditionality scale that we recognize.
People really do have more and less conditional love, and I'd guess this does have important, labeling-worthy consequences. It's just that all the action seems to be in the mid-conditional range that we don't distinguish with names. A woman who leaves a man because he grew plump and a woman who leaves a man because he committed treason both possessed 'conditional love'.
So I wonder if we should distinguish these increments of mid-conditional love better.
What concepts are useful? What lines naturally mark it?
One measure I notice perhaps varying in the mid-conditional affection range is "when I notice this person erring, is my instinct to push them away from me or pull them toward me?" Like, if I see Bob give a bad public speech, do I feel a drive to encourage the narrative that we barely know each other, or an urge to pull him into my arms and talk to him about how to do better?
This presumably depends on things other than the person. For instance, the scale and nature of the error: if someone you casually like throws a frisbee wrong, helping them do better might be appealing. Whereas if that same acquaintance were to kick a cat, your instinct might be to back away fast.
This means perhaps you could construct a rough scale of mid-conditional love in terms of what people can do and still trigger the 'pull closer' feeling. For instance, perhaps there are:
People who you feel a pull toward when they misspell a word
People who you feel a pull toward when they believe something false
People who you feel a pull toward when they get cancelled
(You could also do this with what people can do and still be loved, but that's more expensive to measure than minute urges.)
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Apr 17, 2024 • 9min
EA - Cooperative AI: Three things that confused me as a beginner (and my current understanding) by C Tilli
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Cooperative AI: Three things that confused me as a beginner (and my current understanding), published by C Tilli on April 17, 2024 on The Effective Altruism Forum.
I started working in cooperative AI almost a year ago, and as an emerging field I found it quite confusing at times since there is very little introductory material aimed at beginners. My hope with this post is that by summing up my own confusions and how I understand them now I might help to speed up the process for others who want to get a grasp on what cooperative AI is.
I work at Cooperative AI Foundation (CAIF) and there will be a lot more polished and official material coming from there, so this is just a quick personal write-up to get something out in the meantime. We're working on a cooperative AI curriculum that should be published within the next couple of months, and we're also organising a summer school in June for people new to the area (
application deadline April 26th).
Contradicting definitions
When I started to learn about cooperative AI I came across a lot of different definitions of the concept. While drafting this post I dug up my old interview preparation doc for my current job, where I had listed different descriptions of cooperative AI that I had found while reading up:
"the objective of this research would be to study the many aspects of the problems of cooperation and to innovate in AI to contribute to solving these problems"
"AI research trying to help individuals, humans and machines, to find ways to improve their joint welfare"
"AI research which can help contribute to solving problems of cooperation"
"building machine agents with the capabilities needed for cooperation"
"building tools to foster cooperation in populations of (machine and/or human) agents"
"conducting AI research for insight relevant to problems of cooperation"
To me this did not paint a very clear picture and I was pretty frustrated to be unable to find a concise answer to the most basic question: What is cooperative AI and what is it not?
At this point I still don't have a clear, final definition, but I am less frustrated by it because I no longer think that this is just a failure of understanding or failure of communication - the situation is simply that the field is so new that there is no single definition that people working in the field agree on, and it is still an ongoing discussion where the boundaries should be drawn.
That said, my current favourite explanation of what cooperative AI is is that while AI alignment deals with the question of how to make one powerful AI system behave in a way that is aligned with (good) human values, cooperative AI is about making things go well with powerful AI systems in a messy world where there might be many different AI systems, lots of different humans and human groups and different sets of (sometimes contradictory) values.
Another recurring framing is that cooperative AI is about improving the cooperative intelligence of advanced AI, which leads to the question of what cooperative intelligence is. Here also there are many different versions in circulation, but the following one is the one I find most useful so far:
Cooperative intelligence is an agent's ability to achieve their goals in ways that also promote social welfare, in a wide range of environments and with a wide range of other agents.
Is this really different from alignment?
The second major issue I had was to figure out how cooperative AI really differed from AI alignment. The description of "cooperative intelligence" seemed like it could be understood as just a certain framing of alignment - "achieve the goals in a way that is also good for everyone".
As I have been learning more about cooperative AI, it seems to me like the term "cooperative intelligence" is best understood in the context of social dil...

Apr 17, 2024 • 9min
EA - How good it is to donate and how hard it is to get a job by Elijah Persson-Gordon
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How good it is to donate and how hard it is to get a job, published by Elijah Persson-Gordon on April 17, 2024 on The Effective Altruism Forum.
Summary
In this post, I hope to inspire other Effective Altruists to focus more on donation and commiserate with those who have been disappointed in their ability to get an altruistic job.
First, I argue that the impact of having a job that helps others is complicated. In this section, I discuss annual donation statistics of people in the Effective Altruism community donate, which I find quite low.
In the rest of the post, I describe my recent job search, my experience substituting at public schools, and my expenses.
Having a job that helps others might be overemphasized
Doing a job that helps others seems like a good thing to do. Weirdly, it's not as simple as that.
While some job vacancies last for years, other fields are very competitive and have many qualified applicants for most position listings.
In the latter case, if you take the job offer, you may think you are doing good in the world. But if you hadn't taken the job, there could be someone in your position doing nearly as good as you (or better, depending on if you were overstating your qualifications.)
In animal welfare in particular, jobs get many applicants.
Lauren Mee, from Animal Advocacy Careers, on the podcast How I Learned to Love Shrimp: "...there's an interesting irony in the movement where there is actually a lot of people who are interested in working in the movement and not enough roles for all of those people."
There is some social pressure within and outside of the Effective Altruism community to have a meaningful job where you help others.
Although there is a lot of focus on impactful careers, Rethink Priorities' 2020 Effective Altruism survey found that around only 10% of non-student respondents worked at an Effective Altruism organization.
Donations are an amazing opportunity, and I think they are underemphasized
I was confused to find that most people I talked to in Effective Altruism settings did not seem to be frugal or donate very much.
It seems that this is correct.
In the 2020 Effective Altruism survey
, among respondents who opted to share their donation amounts, donating $10,000 annually would place you within the top 10% of donors. The median for these respondents was close to $500 per year. (Mostly, they donate to global poverty.)
A lot of people in rich countries have flexibility in where their money goes. This money could be put toward their best bets of doing good in the world.
Which is more likely to do good: going out to eat, or helping to fund an effective charity?
It seems to me that you would have to think that the most effective charities are not that effective or that your contributions would be too small to make an impact to choose the former.
To understand more about the effectiveness of charities, I would highly recommend talking to someone from the charity and asking your specific doubts.
As for small contributions, I am not exactly sure how to think about them, and hope to write about this topic in the future. However, it seems to me that many charities make purchases in the thousands of dollars, which could be an achievable amount to donate over a year. For instance, Fish Welfare Initiative's 2024 budget includes numbers in the thousands.
I used to really want an animal welfare-related job. Then I wanted to donate more. Now I am a substitute at a public school
I graduated in May of 2023 and have since been interested in an animal welfare job.
I have applied to a handful of these positions, realizing over time that the applicant pools were larger than I thought; the researcher position at Animal Charity Evaluators had 375 applicants.
After moving back to a more rural area to be around friends and family, I looked into busin...

Apr 16, 2024 • 21min
LW - Transformers Represent Belief State Geometry in their Residual Stream by Adam Shai
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Transformers Represent Belief State Geometry in their Residual Stream, published by Adam Shai on April 16, 2024 on LessWrong.
Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS. Work done in collaboration with @Paul Riechers, @Lucas Teixeira, @Alexander Gietelink Oldenziel, and Sarah Marzen. Paul was a MATS scholar during some portion of this work. Thanks to Paul, Lucas, Alexander, and @Guillaume Corlouer for suggestions on this writeup.
Introduction
What computational structure are we building into LLMs when we train them on next-token prediction? In this post we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. We'll explain exactly what this means in the post. We are excited by these results because
We have a formalism that relates training data to internal structures in LLMs.
Conceptually, our results mean that LLMs synchronize to their internal world model as they move through the context window.
The computation associated with synchronization can be formalized with a framework called Computational Mechanics. In the parlance of Computational Mechanics, we say that LLMs represent the Mixed-State Presentation of the data generating process.
The structure of synchronization is, in general, richer than the world model itself. In this sense, LLMs learn more than a world model.
We have increased hope that Computational Mechanics can be leveraged for interpretability and AI Safety more generally.
There's just something inherently cool about making a non-trivial prediction - in this case that the transformer will represent a specific fractal structure - and then verifying that the prediction is true. Concretely, we are able to use Computational Mechanics to make an a priori and specific theoretical prediction about the geometry of residual stream activations (below on the left), and then show that this prediction holds true empirically (below on the right).
Theoretical Framework
In this post we will operationalize training data as being generated by a Hidden Markov Model (HMM)[2]. An HMM has a set of hidden states and transitions between them. The transitions are labeled with a probability and a token that it emits. Here are some example HMMs and data they generate.
Consider the relation a transformer has to an HMM that produced the data it was trained on. This is general - any dataset consisting of sequences of tokens can be represented as having been generated from an HMM. Through the discussion of the theoretical framework, let's assume a simple HMM with the following structure, which we will call the Z1R process[3] (for "zero one random").
The Z1R process has 3 hidden states, S0,S1, and SR. Arrows of the form Sxa:p%Sy denote P(Sy,a|Sx)=p%, that the probability of moving to state Sy and emitting the token a, given that the process is in state Sx, is p%. In this way, taking transitions between the states stochastically generates binary strings of the form
...01R01R... where
R is a random 50/50 sample from {
0,
1}.
The HMM structure is not directly given by the data it produces. Think of the difference between the list of strings this HMM emits (along with their probabilities) and the hidden structure itself[4]. Since the transformer only has access to the strings of emissions from this HMM, and not any information about the hidden states directly, if the transformer learns anything to do with the hidden structure, then it has to do the work of inferring it from the training data.
What we will show is that when they predict the next token well, transformers are doing even more computational work than inferring the hidden data generating process!
Do Transformers Learn a Model of the World...

Apr 16, 2024 • 21min
AF - Transformers Represent Belief State Geometry in their Residual Stream by Adam Shai
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Transformers Represent Belief State Geometry in their Residual Stream, published by Adam Shai on April 16, 2024 on The AI Alignment Forum.
Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS. Work done in collaboration with @Paul Riechers, @Lucas Teixeira, @Alexander Gietelink Oldenziel, and Sarah Marzen. Paul was a MATS scholar during some portion of this work. Thanks to Paul, Lucas, Alexander, Sarah, and @Guillaume Corlouer for suggestions on this writeup.
Introduction
What computational structure are we building into LLMs when we train them on next-token prediction? In this post we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. We'll explain exactly what this means in the post. We are excited by these results because
We have a formalism that relates training data to internal structures in LLMs.
Conceptually, our results mean that LLMs synchronize to their internal world model as they move through the context window.
The computation associated with synchronization can be formalized with a framework called Computational Mechanics. In the parlance of Computational Mechanics, we say that LLMs represent the Mixed-State Presentation of the data generating process.
The structure of synchronization is, in general, richer than the world model itself. In this sense, LLMs learn more than a world model.
We have increased hope that Computational Mechanics can be leveraged for interpretability and AI Safety more generally.
There's just something inherently cool about making a non-trivial prediction - in this case that the transformer will represent a specific fractal structure - and then verifying that the prediction is true. Concretely, we are able to use Computational Mechanics to make an a priori and specific theoretical prediction about the geometry of residual stream activations (below on the left), and then show that this prediction holds true empirically (below on the right).
Theoretical Framework
In this post we will operationalize training data as being generated by a Hidden Markov Model (HMM)[2]. An HMM has a set of hidden states and transitions between them. The transitions are labeled with a probability and a token that it emits. Here are some example HMMs and data they generate.
Consider the relation a transformer has to an HMM that produced the data it was trained on. This is general - any dataset consisting of sequences of tokens can be represented as having been generated from an HMM. Through the discussion of the theoretical framework, let's assume a simple HMM with the following structure, which we will call the Z1R process[3] (for "zero one random").
The Z1R process has 3 hidden states, S0,S1, and SR. Arrows of the form Sxa:p%Sy denote P(Sy,a|Sx)=p%, that the probability of moving to state Sy and emitting the token a, given that the process is in state Sx, is p%. In this way, taking transitions between the states stochastically generates binary strings of the form
...01R01R... where
R is a random 50/50 sample from {
0,
1}.
The HMM structure is not directly given by the data it produces. Think of the difference between the list of strings this HMM emits (along with their probabilities) and the hidden structure itself[4]. Since the transformer only has access to the strings of emissions from this HMM, and not any information about the hidden states directly, if the transformer learns anything to do with the hidden structure, then it has to do the work of inferring it from the training data.
What we will show is that when they predict the next token well, transformers are doing even more computational work than inferring the hidden data generating process!
Do Transformers Learn ...

Apr 16, 2024 • 2min
EA - U.S. Commerce Secretary Gina Raimondo Announces Expansion of U.S. AI Safety Institute Leadership Team [and Paul Christiano update] by Phib
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: U.S. Commerce Secretary Gina Raimondo Announces Expansion of U.S. AI Safety Institute Leadership Team [and Paul Christiano update], published by Phib on April 16, 2024 on The Effective Altruism Forum.
U.S. Secretary of Commerce Gina Raimondo announced today additional members of the executive leadership team of the U.S. AI Safety Institute (AISI), which is housed at the National Institute of Standards and Technology (NIST). Raimondo named Paul Christiano as Head of AI Safety, Adam Russell as Chief Vision Officer, Mara Campbell as Acting Chief Operating Officer and Chief of Staff, Rob Reich as Senior Advisor, and Mark Latonero as Head of International Engagement.
They will join AISI Director Elizabeth Kelly and Chief Technology Officer Elham Tabassi, who were announced in February. The AISI was established within NIST at the direction of President Biden, including to support the responsibilities assigned to the Department of Commerce under the President's landmark Executive Order.
...
Paul Christiano, Head of AI Safety, will design and conduct tests of frontier AI models, focusing on model evaluations for capabilities of national security concern. Christiano will also contribute guidance on conducting these evaluations, as well as on the implementation of risk mitigations to enhance frontier model safety and security.
Christiano founded the Alignment Research Center, a non-profit research organization that seeks to align future machine learning systems with human interests by furthering theoretical research. He also launched a leading initiative to conduct third-party evaluations of frontier models, now housed at Model Evaluation and Threat Research (METR).
He previously ran the language model alignment team at OpenAI, where he pioneered work on reinforcement learning from human feedback (RLHF), a foundational technical AI safety technique. He holds a PhD in computer science from the University of California, Berkeley, and a B.S. in mathematics from the Massachusetts Institute of Technology.
Following up from previous news post:
https://forum.effectivealtruism.org/posts/9QLJgRMmnD6adzvAE/nist-staffers-revolt-against-expected-appointment-of
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Apr 16, 2024 • 2min
LW - Paul Christiano named as US AI Safety Institute Head of AI Safety by Joel Burget
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Paul Christiano named as US AI Safety Institute Head of AI Safety, published by Joel Burget on April 16, 2024 on LessWrong.
U.S. Secretary of Commerce Gina Raimondo announced today additional members of the executive leadership team of the U.S. AI Safety Institute (AISI), which is housed at the National Institute of Standards and Technology (NIST). Raimondo named Paul Christiano as Head of AI Safety, Adam Russell as Chief Vision Officer, Mara Campbell as Acting Chief Operating Officer and Chief of Staff, Rob Reich as Senior Advisor, and Mark Latonero as Head of International Engagement.
They will join AISI Director Elizabeth Kelly and Chief Technology Officer Elham Tabassi, who were announced in February. The AISI was established within NIST at the direction of President Biden, including to support the responsibilities assigned to the Department of Commerce under the President's landmark Executive Order.
Paul Christiano, Head of AI Safety, will design and conduct tests of frontier AI models, focusing on model evaluations for capabilities of national security concern. Christiano will also contribute guidance on conducting these evaluations, as well as on the implementation of risk mitigations to enhance frontier model safety and security.
Christiano founded the Alignment Research Center, a non-profit research organization that seeks to align future machine learning systems with human interests by furthering theoretical research. He also launched a leading initiative to conduct third-party evaluations of frontier models, now housed at Model Evaluation and Threat Research (METR).
He previously ran the language model alignment team at OpenAI, where he pioneered work on reinforcement learning from human feedback (RLHF), a foundational technical AI safety technique. He holds a PhD in computer science from the University of California, Berkeley, and a B.S. in mathematics from the Massachusetts Institute of Technology.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Apr 16, 2024 • 13min
EA - Essay competition on the Automation of Wisdom and Philosophy - $25k in prizes by Owen Cotton-Barratt
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Essay competition on the Automation of Wisdom and Philosophy - $25k in prizes, published by Owen Cotton-Barratt on April 16, 2024 on The Effective Altruism Forum.
With AI Impacts, we're pleased to announce an essay competition on the automation of wisdom and philosophy. Submissions are due by July 14th. The first prize is $10,000, and there is a total of $25,000 in prizes available.
The full announcement text is reproduced here:
Background
AI is likely to automate more and more categories of thinking with time.
By default, the direction the world goes in will be a result of the choices people make, and these choices will be informed by the best thinking available to them. People systematically make better, wiser choices when they understand more about issues, and when they are advised by deep and wise thinking.
Advanced AI will reshape the world, and create many new situations with potentially high-stakes decisions for people to make. To what degree people will understand these situations well enough to make wise choices remains to be seen. To some extent this will depend on how much good human thinking is devoted to these questions; but at some point it will probably depend crucially on how advanced, reliable, and widespread the automation of high-quality thinking about novel situations is.
We believe[1] that this area could be a crucial target for differential technological development, but is at present poorly understood and receives little attention. This competition aims to encourage and to highlight good thinking on the topics of what would be needed for such automation, and how it might (or might not) arise in the world.
For more information about what we have in mind, see some of the suggested essay prompts or the FAQ below.
Scope
To enter, please submit a link to a piece of writing, not published before 2024. This could be published or unpublished; although if selected for a prize we will require publication (at least in pre-print form; optionally on the AI Impacts website) in order to pay out the prize.
There are no constraints on the format - we will accept essays, blog posts, papers[2], websites, or other written artefacts[3] of any length. However, we primarily have in mind essays of 500-5,000 words. AI assistance is welcome but its nature and extent should be disclosed. As part of your submission you will be asked to provide a summary of 100-200 words.
Your writing should aim to make progress on a question related to the automation of wisdom and philosophy. A non-exhaustive set of questions of interest, in four broad categories:
Automation of wisdom
What is the nature of the sort of good thinking we want to be able to automate? How can we distinguish the type of thinking it's important to automate well and early from types of thinking where that's less important?
What are the key features or components of this good thinking?
How do we come to recognise new ones?
What are traps in thinking that is smart but not wise?
How can this be identified in automatable ways?
How could we build metrics for any of these things?
Automation of philosophy
What types of philosophy are language models well-equipped to produce, and what do they struggle with?
What would it look like to develop a "science of philosophy", testing models' abilities to think through new questions, with ground truth held back, and seeing empirically what is effective?
What have the trend lines for automating philosophy looked like, compared to other tasks performed by language models?
What types of training/finetuning/prompting/scaffolding help with the automation of wisdom/philosophy?
How much do they help, especially compared to how much they help other types of reasoning?
Thinking ahead
Considering the research agenda that will (presumably) eventually be needed to automate high quality wisdo...

Apr 16, 2024 • 55sec
EA - What should the EA community learn from the FTX / SBF disaster? An in-depth discussion with Will MacAskill on the Clearer Thinking podcast by spencerg
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What should the EA community learn from the FTX / SBF disaster? An in-depth discussion with Will MacAskill on the Clearer Thinking podcast, published by spencerg on April 16, 2024 on The Effective Altruism Forum.
In this new podcast episode, I discuss with Will MacAskill what the Effective Altruism community can learn from the FTX / SBF debacle, why Will has been limited in what he could say about this topic in the past, and what future directions for the Effective Altruism community and his own research Will is most enthusiastic about:
https://podcast.clearerthinking.org/episode/206/will-macaskill-what-should-the-effective-altruism-movement-learn-from-the-sbf-ftx-scandal
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Apr 16, 2024 • 1min
EA - Spencer Greenberg and William MacAskill: What should the EA movement learn from the SBF/FTX scandal? by AnonymousTurtle
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Spencer Greenberg and William MacAskill: What should the EA movement learn from the SBF/FTX scandal?, published by AnonymousTurtle on April 16, 2024 on The Effective Altruism Forum.
What are the facts around Sam Bankman-Fried and FTX about which all parties agree? What was the nature of Will's relationship with SBF? What things, in retrospect, should've been red flags about Sam or FTX? Was Sam's personality problematic? Did he ever really believe in EA principles? Does he lack empathy? Or was he on the autism spectrum? Was he naive in his application of utilitarianism? Did EA intentionally install SBF as a spokesperson, or did he put himself in that position of his own
accord? What lessons should EA leaders learn from this? What steps should be taken to prevent it from happening again? What should EA leadership look like moving forward? What are some of the dangers around AI that are not related to alignment? Should AI become the central (or even the sole) focus of the EA movement?
the Clearer Thinking podcast is aimed more at people in or related to EA, whereas Sam Harris's wasn't
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org


