

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Apr 9, 2024 • 30min
EA - Deontological Constraints on Animal Products by emre kaplan
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Deontological Constraints on Animal Products, published by emre kaplan on April 9, 2024 on The Effective Altruism Forum.
Introduction
There is a memetically powerful argument within animal advocacy circles which goes like the following: "We would never ask child abusers to commit less child abuse, so we can't ask other people to reduce their animal product consumption. We must ask them to end it."
In this post I try to construct and evaluate a part of this argument. First, I explain my motivation for evaluating the strength of this argument. Second, I note that it's morally permissible to ask for reductions in some kinds of wrongdoings and list different ways animal production consumption can be morally wrong. I create the category of "non-negotiably wrong" to refer to actions that can't be asked to be reduced.
Third, I look into whether animal product use might be non-negotiably wrong by listing several deontological constraints that might be non-negotiable.
A Venn Diagram summarising results
I don't have any strong conclusions. I aim to reduce my own confusion and get more input from professional moral philosophers on this topic through this post. I'm also not sure if I should keep writing such posts, so if this post is helpful to you in any way, please let me know.
Many thanks to Michael St. Jules and Bob Fischer for their helpful feedback. All errors are my own.
Motivation
Some animal advocates argue for the following positions because they believe people have a non-negotiable duty to avoid consuming animal products:
Only vegans can speak at animal advocacy events
Only vegans can be members of animal advocacy organisations
Non-vegans shouldn't join animal advocacy protests
All animal advocacy organisations have a responsibility to prominently advocate for veganism because it's the main obligation to animals
It's morally forbidden to use the following sentences because they condone some animal product use or don't explicitly reject all animal product use:
Go vegetarian.
Meat should be taxed.
Our school should have Meatless Mondays.
Costco should go cage-free.
The default school meals at Grenoble should be vegetarian.
The public schools in New York City should serve exclusively plant-based food on Fridays.
Take the vegan-22 challenge, go vegan for 22 days.
Maybe you should try going plant-based except for cheese.
According to this line of argument, animal product use is not merely harmful(akin to carbon emissions) but also violation of a very strong moral constraint(akin to direct physical violence or owning slaves). It is non-negotiably wrong. For that reason, including non-vegans in animal advocacy is similar to including slave-owners in anti-slavery advocacy. Asking for a reduction in animal product use is akin to asking for a reduction in physical violence("don't beat your wife in January").
To clarify, as it is the case with many issues, there is a spectrum of opinions here. Some people will endorse some of the positions above while rejecting others.
I have been sympathetic to these arguments when it comes to my own consumption. I'm very sympathetic to the idea that since animals are not well-represented, we're likely to have a bias against their interests. When animal interests and my own interests get into conflict, it makes sense for me to be extra cautious to compensate for my own bias. So I'm happy with being strict in avoiding animal products in clothing and food.
On the other hand, I also suspect being too restrictive in animal advocacy might result in more animals being killed and tortured compared to alternatives. Some reasons offered are the following:
There might be a Laffer curve to the behaviour change created by your demands. Being too demanding might result in less change than being moderately demanding. (Example: The New York City officials won...

Apr 9, 2024 • 11min
LW - Math-to-English Cheat Sheet by nahoj
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Math-to-English Cheat Sheet, published by nahoj on April 9, 2024 on LessWrong.
Say you've learnt math in your native language which is not English. Since then you've also read math in English and you appreciate the near universality of mathematical notation. Then one day you want to discuss a formula in real life and you realize you don't know how to pronunce "an".
Status: I had little prior knowledge of the topic. This was mostly generated by ChatGPT4 and kindly reviewed by @TheManxLoiner.
General
Distinguishing case
F,δ
"Big F" or "capital F", "little delta"
Subscripts
an
"a sub n" or, in most cases, just "a n"
Calculus
Pythagorean Theorem
a2+b2=c2
"a squared plus b squared equals c squared."
Area of a Circle
A=πr2
"Area equals pi r squared."
Slope of a Line
m=y2y1x2x1
"m equals y 2 minus y 1 over x 2 minus x 1."
Quadratic Formula
x=bb24ac2a
"x equals minus b [or 'negative b'] plus or minus the square root of b squared minus four a c, all over two a."
Sum of an Arithmetic Series
S=n2(a1+an)
"S equals n over two times a 1 plus a n."
Euler's Formula
eiθ=cos(θ)+isin(θ)
"e to the i theta equals cos [pronounced 'coz'] theta plus i sine theta."
Law of Sines
sin(A)a=sin(B)b=sin(C)c
"Sine A over a equals sine B over b equals sine C over c."
Area of a Triangle (Heron's Formula)
A=s(sa)(sb)(sc), where s=a+b+c2
"Area equals the square root of s times s minus a times s minus b times s minus c, where s equals a plus b plus c over two."
Compound Interest Formula
A=P(1+rn)nt
"A equals P times one plus r over n to the power of n t."
Logarithm Properties
logb(xy)=logb(x)+logb(y)
Don't state the base if clear from context: "Log of x y equals log of x plus log of y."
Otherwise "Log to the base b of x y equals log to the base b of x plus log to the base b of y."
More advanced operations
Derivative of a Function
dfdx or ddxf(x) or f'(x)
"df by dx" or "d dx of f of x" or "f prime of x."
Second Derivative
d2dx2f(x) or f''(x)
"d squared dx squared of f of x" or "f double prime of x."
Partial Derivative (unreviewed)
xf(x,y)
"Partial with respect to x of f of x, y."
Definite Integral
baf(x)dx
"Integral from a to b of f of x dx."
Indefinite Integral (Antiderivative)
f(x)dx
"Integral of f of x dx."
Line Integral (unreviewed)
Cf(x,y)ds
"Line integral over C of f of x, y ds."
Double Integral
badcf(x,y)dxdy
"Double integral from a to b and c to d of f of x, y dx dy."
Gradient of a Function
f
"Nabla f" or "gradient of f" to distinguish from other uses such as divergence or curl.
Divergence of a Vector Field
F
"Nabla dot F."
Curl of a Vector Field
F
"Nabla cross F."
Laplace Operator (unreviewed)
Δf or 2f
"Delta f" or "Nabla squared f."
Limit of a Function
limxaf(x)
"Limit as x approaches a of f of x."
Linear Algebra (vectors and matrices)
Vector Addition
v+w
"v plus w."
Scalar Multiplication
cv
"c times v."
Dot Product
vw
"v dot w."
Cross Product
vw
"v cross w."
Matrix Multiplication
AB
"A B."
Matrix Transpose
AT
"A transpose."
Determinant of a Matrix
|A| or det(A)
"Determinant of A" or "det A".
Inverse of a Matrix
A1
"A inverse."
Eigenvalues and Eigenvectors
λ for eigenvalues, v for eigenvectors
"Lambda for eigenvalues; v for eigenvectors."
Rank of a Matrix
rank(A)
"Rank of A."
Trace of a Matrix
tr(A)
"Trace of A."
Vector Norm
v
"Norm of v" or "length of v".
Orthogonal Vectors
vw=0
"v dot w equals zero."
With numerical values
Matrix Multiplication with Numerical Values
Let A=(1234) and B=(5678), then AB=(19224350).
"A B equals nineteen, twenty-two; forty-three, fifty."
Vector Dot Product
Let v=(1,2,3) and w=(4,5,6), then vw=32.
"v dot w equals thirty-two."
Determinant of a Matrix
For A=(1234), |A|=2.
"Determinant of A equals minus two."
Eigenvalues and Eigenvectors with Numerical Values
Given A=(2112), it has eigenvalues λ1=3 and λ2=1, with corresponding eigenvectors v1=(11) and v2=(11).
"Lambda ...

Apr 9, 2024 • 31min
LW - Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition by cmathw
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition, published by cmathw on April 9, 2024 on LessWrong.
This work represents progress on removing attention head superposition. We are excited by this approach but acknowledge there are currently various limitations. In the short term, we will be working on adjacent problems are excited to collaborate with anyone thinking about similar things!
Produced as part of the
ML Alignment & Theory Scholars Program - Summer 2023 Cohort
Summary: In transformer language models, attention head superposition makes it difficult to study the function of individual attention heads in isolation. We study a particular kind of attention head superposition that involves constructive and destructive interference between the outputs of different attention heads. We propose a novel architecture - a 'gated attention block' - which resolves this kind of attention head superposition in toy models.
In future, we hope this architecture may be useful for studying more natural forms of attention head superposition in large language models.
Our code can be found here.
Background
Mechanistic interpretability aims to reverse-engineer what neural networks have learned by decomposing a network's functions into human-interpretable algorithms. This involves isolating the individual components within the network that implement particular behaviours. This has proven difficult, however, because networks make use of polysemanticity and superposition to represent information.
Polysemanticity in a transformer's multi-layer perceptron (MLPs) layers is when neurons appear to represent many unrelated concepts (Gurnee et al., 2023). We also see this phenomena within the transformer's attention mechanism, when a given attention head performs qualitatively different functions based on its destination token and context (Janiak et al., 2023).
Superposition occurs when a layer in a network (an 'activation space') represents more features than it has dimensions. This means that features are assigned to an overcomplete set of directions as opposed to being aligned with e.g. the neuron basis.
The presence of polysemanticity means that the function of a single neuron or attention head cannot be defined by the features or behaviours it expresses on a subset of its training distribution because it may serve different purposes on different subsets of the training distribution. Relatedly, superposition makes it misleading to study the function of individual neurons or attention heads in isolation from other neurons or heads.
Both of these phenomena promote caution around assigning specific behaviours to individual network components (neurons or attention heads), due to there both being a diversity in behaviours across a training distribution and in their interaction with other components in the network.
Although polysemanticity and superposition make the isolated components of a network less immediately interpretable, understanding of the correct functional units of analysis has improved. Progress has been made on both understanding features as directions within an activation space (Elhage et al., 2023) and resolving feature superposition by applying sparse autoencoders to identify highly-interpretable features (Sharkey et al., 2022; Cunningham et al., 2023; Bricken et al., 2023).
Attention head superposition for OV-Incoherent Skip Trigrams
Superposition in the context of attention heads is less understood. It is however conceivable that an attention block could make use of a similar compression scheme to implement more behaviours than the number of attention heads in the block.
Prior work introduced a task to study attention head superposition in the form of OV-Incoherent Skip Trigrams (Jermyn et al., 2023; Conerly et al., 2023). These are s...

Apr 9, 2024 • 48min
EA - Therapy without a therapist: Why unguided self-help might be a better intervention than guided by huw
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Therapy without a therapist: Why unguided self-help might be a better intervention than guided, published by huw on April 9, 2024 on The Effective Altruism Forum.
Summary
Guided self-help involves self-learning psychotherapy, and regular, short contact with an advisor (ex. Kaya Guides, AbleTo). Unguided self-help removes the advisor (ex. Headspace, Waking Up, stress relief apps).
It's probably 7 (2.5-12) more cost-effective
It's about 70% as effective against depression
Beneficiaries use it no less than half as much
It's at least 10-20 cheaper, and might scale sub-linearly
Behavioural activation is a great fit for it (That's where you think about things that make you happy and make structured plans to do them)
It's as effective as CBT and other evidence-based therapies
It's the strongest significant component of internet-based CBT
It might be easier to self-learn since it's simpler
It might be less stigmatising since it's less medicalised
It's less risky, mostly because it scales better
There's less evidence overall, but not much
It scales superbly, so it's highly funding-absorbent
It can fail faster and cheaper
Externalities are small, but displacement is concerning
How much worse is therapy without a therapist?
A lot of work has already been done in EA to emphasise mental health as a cause area. It seems important, tractable, neglected relative to other interventions, and is at least in the conversation for cost-effectiveness[1]. And unlike many other health issues, we can only expect it to get worse over time[2].
Guided self-help is an intervention which incorporates self-directed learning of a psychotherapy, and brief, regular contact with a lightly-trained healthcare worker. It can be deployed in highly cost-sensitive environments, and flagship programmes have been developed with the WHO and deployed across Europe, Asia, and Africa[3][4][5][6].
Off the back of their own research (easily the most comprehensive cost-effectiveness analysis)[7], Charity Entrepreneurship recently incubated Kaya Guides, who are cost-effectively scaling the same programme in India[8].
But the same report also notes that removing the guided component might be even more cost-effective. This is called unguided (a.k.a. pure) self-help, and it's usually defined as any self-learned psychotherapy (regardless of whether that psychotherapy is evidence-based).
The early examples involved reading books, such as the Overcoming series, but modern interventions are usually apps, such as Headspace, Waking Up, Clarity CBT Journal, Thought Saver, UpLift or just versions of Step-By-Step and Kaya Guides without the guides. This report's definition is deliberately broad to keep in line with the cited literature, but when talking about potential interventions I'm generally thinking about apps based on evidence-based techniques.
This report is purely comparative; it's only valuable if you already believe that guided self-help might be a promising, cost-effective intervention. I'll discuss dollar-for-dollar cost-effectiveness, make some arguments for behavioural activation as a uniquely well-suited psychotherapy to apply, and finish up by arguing that since it scales so much better, it's much less risky to try. (Note: That last bit is a bit self-serving since I'm applying to AIM with this idea).
Finally, I'll limit the analysis to depression since it's the most burdensome mental health problem, but many of the included studies also find similar results for anxiety.
Unguided self-help is probably 7 (2.5-12) more cost-effective than guided
Let's start with cost-effectiveness. Here's my chain of reasoning:
It's about 70% as effective against depression
We should compare against waitlist controls, despite possible bias
We should compare against studies which recruited depressed people
The best meta-analyses show...

Apr 9, 2024 • 5min
AF - PIBBSS is hiring in a variety of roles (alignment research and incubation program) by Nora Ammann
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: PIBBSS is hiring in a variety of roles (alignment research and incubation program), published by Nora Ammann on April 9, 2024 on The AI Alignment Forum.
PIBBSS is looking to expand its team and is running work trials for new team members (primarily) in April, May and early June. If you're interested in joining a nimble team focused on AI safety research, field-building and incubation of new agendas, consider letting us know by filling in
this form.
The form is meant to be a low effort means for gauging interests. We don't guarantee getting back to everyone, but will reach out to you if we think you might be a good fit for the team. We would then aim to get to know you better (e.g. via call) before deciding whether it seems valuable (and worth our respective time) to do a trial. Work trials will look different depending on circumstances, including your interests and availability. We intend to reimburse people for the work they do for us.
About PIBBSS
PIBBSS (pibbss.ai) is a research initiative aimed at extracting insights in the parallels between natural and artificial intelligent systems, with the purpose of making progress on important questions about the safety and design of superintelligent artificial systems.
Since its inception in 2021, PIBBSS supported ~50 researchers for 3-month full-time fellowships, is currently supporting 5 in-house, long-term research affiliates, and has organized 15+ AI safety research events/workshops on topics with participants from both academia and industry. We currently have three full-time staff: Nora Ammann (Co-Founder), Lucas Teixeira (Programs), Dušan D. Nešić (Operations).
Over the past number of months, and in particular with the launch of our affiliate program at the start of 2024, we have started focusing more of our resources towards identifying, testing and developing specific research bets we find promising on our inside-view.
This also means we have been directionally moving away from more generic field-building or talent-interventions (though we still do some of this, and might continue doing so, where this appears sufficiently synergetic and counterfactually compelling). We expect to continue and potentially accelerate this trend over the course of 2024 and beyond, and will likely rebrand our efforts soon such as to better reflect the evolving scope and nature of our vision.
Our
affiliate program selects scholars from disciplines which study intelligence from a naturalized lens, as well as independent alignment researchers with established track records, and provides them with the necessary support to quickly test, develop, and iterate on high upside research directions. The lacunas in the field which we are trying to address:
(Field-building intervention) "Reverse-MATS": Getting established academics with deep knowledge in areas of relevant but as-of-yet neglected expertise into AI safety
(Research intervention) Creating high-quality research output which is theoretically-ambitious as well as empirically-grounded, ultimately leading to the counterfactual incubation of novel promising research agendas in AI safety
What we're looking for in a new team member
We don't have a specific singular job description that we're trying to hire for. Instead, there is a range of skill sets/profiles that we believe could valuable enhance our team. These tend to range from research to engineering, organizational and management/leadership profiles. Importantly, we seek to hire someone who becomes part of the core team, implying potential for a significant ability to co-create the vision and carve your own niche based on your strengths and interests.
We expect to hire one or more people who fit an interesting subset of the below list of interests & aptitudes:
Ability to manage projects (people, timelines, milestones, deliverables, etc) a...

Apr 9, 2024 • 20min
EA - Some underrated reasons why the AI safety community should reconsider its embrace of strict liability by Cecil Abungu
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Some underrated reasons why the AI safety community should reconsider its embrace of strict liability, published by Cecil Abungu on April 9, 2024 on The Effective Altruism Forum.
Introduction
It is by now a well-known fact that existing AI systems are already causing harms like
discrimination, and it's also
widely expected that the advanced AI systems which the likes of
Meta and
OpenAI are building could also cause significant harms in the future. Knowingly or unknowingly, innocent people have to live with the dire impacts of these systems. Today that might be
a lack of equal access to certain opportunities or the
distortion of democracy but in future it might escalate to more concerning
security threats. In light of this, it should be uncontroversial for anyone to insist that we need to establish fair and practically sensible ways of figuring out who should be held liable for AI harms. The good news is that a number of AI safety experts have been making suggestions. The not-so-good news is that the idea of
strict liability for highly capable advanced AI systems still has many devotees.
The most common anti-strict liability argument out there is that
it discourages innovation. In this piece, we won't discuss that position much because it's already received outsize attention.
Instead, we argue that the pro-strict liability argument should be reconsidered for the following trifecta of reasons: (i) In the context of highly capable advanced AI, both strict criminal liability and strict civil liability have fatal gaps, (ii) The argument for strict liability often rests on faulty analogies and (iii) Given the interests at play, strict liability will struggle to gain traction.
Finally, we propose that AI safety-oriented researchers working on liability should instead focus on the most inescapably important task-figuring out how to transform good safety ideas into real legal duties.
AI safety researchers have been pushing for strict liability for certain AI harms
The few AI safety researchers who've tackled the question of liability in-depth seem to have taken a pro-strict liability for certain AI harms, especially harms that are a result of highly capable advanced AI. Let's consider some examples. In
a statement to the US Senate, the Legal Priorities Project recommended that AI developers and deployers be held strictly liable if their technology is used in attacks on critical infrastructure or a range of high-risk weapons that result in harm. LPP also recommended strict liability for malicious use of exfiltrated systems and open-sourced weights. Consider as well the Future of Life Institute's
feedback to the European Commission, where it calls for a strict liability regime for harms that result from high-risk and general purpose AI systems. Finally, consider Gabriel Weil's
research on the promise that tort law has for regulating highly capable advanced AI (also summarized in his
recent EA Forum piece) where he notes the difficulty of proving negligence in AI harm scenarios and then argues that strict liability can be a sufficient corrective for especially dangerous AI.
The pro-strict liability argument
In the realm of AI safety, arguments for strict liability generally rest on two broad lines of reasoning. The first is that historically, strict liability has been applied for other phenomena that are somewhat similar to highly capable advanced AI, which means that it would be appropriate to apply the same regime to highly capable advanced AI. Some common examples of these phenomena include
new technologies like trains and motor vehicles,
activities which may cause significant harm such as the use of nuclear power and the release of hazardous chemicals into the environment and the so-called
'abnormally dangerous activities' such as blasting with dynamite.
The second line of r...

Apr 8, 2024 • 17min
LW - on the dollar-yen exchange rate by bhauth
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: on the dollar-yen exchange rate, published by bhauth on April 8, 2024 on LessWrong.
Recently, the yen-dollar exchange rate hit a 34-year low. Why is that?
6-month US Treasuries are paying around 5.3% interest. Japanese government bonds are paying about 0%. That being the case, you can borrow yen, trade it for dollars, buy US bonds, and get more interest. That's called a "yen carry trade". The risk you take in exchange for that money is that the exchange rate will shift so that a dollar is worth less yen.
But of course, it's also possible that the exchange rate will shift in the other direction, and that's what's happened recently. From 2020 to now, $1 went from 105 to 150 yen.
That being the case, I'd normally expect inflation to be higher in Japan than the US - their currency became less valuable, which makes imports more expensive. Yet, that's not what happened; inflation has been higher in the US. In Japan, you can get a good bowl of ramen for $6. In an American city, today, including tax and tip you'd probably pay more like $20 for something likely worse.
The PPP / nominal GDP of Japan is now ~1.5x that of the US, and I'd argue that's actually an underestimate: PPP estimates don't account for quality of services, and a lot of Japanese services are higher-quality than their US equivalents. But that's not to say I envy how the economic situation of people in Japan has changed. While inflation was lower in Japan than America, wages barely increased, and real incomes of most Japanese fell.
In some countries, you can argue that crime or lack of property rights or inadequate infrastructure keep labor values down, but that's not the case for Japan. So, we're left with some questions.
Question 1: Why would an hour of labor from an American be worth 2x as much as an hour from a Japanese employee?
I remember talking to an economist about this once, and he said, "that means Japanese labor is just not as good as American labor" - but he was just wrong.
(He didn't even consider the possibility that Japanese management culture was the problem, because obviously inefficient companies would just get outcompeted.) There's something about a lot of economists where, when they have some model and reality disagrees with them, they seem to think reality is wrong, and aren't even inclined to investigate.
I'll have to get back to this later.
Question 2: Why do Japanese automakers operate some factories in America instead of importing everything from Japan?
I can answer this one:
Direct labor is generally <20% of the cost of a car, and a lot of components can be imported from other countries.
Shipping a car to the US from Japan costs maybe $1000.
For US imports from Japan, there's a 2.5% tariff on cars and 25% on trucks. Trucks make up the majority of Ford's profits; they basically can't make a profit when competing with Japan with no tariff.
Most of the US factories were built decades ago, and new factories are being made in Mexico instead.
Question 3: Why can the Japanese government keep borrowing money with no interest?
That debt is funded largely by bank deposits from Japanese citizens. I asked a Japanese guy I know why people don't put their money in something that yields more interest, like US bonds, and he said:
Japanese people think of investments as having risk, and bank deposits as being safe. They don't really understand that their bank deposits aren't inherently safer than some other things.
Question 4: If dollars are overvalued, why does America have any exports?
A lot of US exports are currently oil and gas products, which are natural resources being used up. I personally think the US government should tax the extraction of natural resources, because they have some value that should be collectively owned by the population, but that's another topic.
How about food exports? S...

Apr 8, 2024 • 11min
LW - How We Picture Bayesian Agents by johnswentworth
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How We Picture Bayesian Agents, published by johnswentworth on April 8, 2024 on LessWrong.
I think that when most people picture a Bayesian agent, they imagine a system which:
Enumerates every possible state/trajectory of "the world", and assigns a probability to each.
When new observations come in, loops over every state/trajectory, checks the probability of the observations conditional on each, and then updates via Bayes rule.
To select actions, computes the utility which each action will yield under each state/trajectory, then averages over state/trajectory weighted by probability, and picks the action with the largest weighted-average utility.
Typically, we define Bayesian agents as agents which behaviorally match that picture.
But that's not really the picture David and I typically have in mind, when we picture Bayesian agents. Yes, behaviorally they act that way. But I think people get overly-anchored imagining the internals of the agent that way, and then mistakenly imagine that a Bayesian model of agency is incompatible with various features of real-world agents (e.g. humans) which a Bayesian framework can in fact handle quite well.
So this post is about our prototypical mental picture of a "Bayesian agent", and how it diverges from the basic behavioral picture.
Causal Models and Submodels
Probably you've heard of
causal diagrams or Bayes nets by now.
If our Bayesian agent's world model is represented via a big causal diagram, then that already looks quite different from the original "enumerate all states/trajectories" picture. Assuming reasonable sparsity, the data structures representing the causal model (i.e. graph + conditional probabilities on each node) take up an amount of space which grows linearly with the size of the world, rather than exponentially. It's still
too big for an agent embedded in the world to store in its head directly, but much smaller than the brute-force version.
(Also, a realistic agent would want to explicitly represent more than just one causal diagram, in order to have uncertainty over causal structure. But that will largely be subsumed by our next point anyway.)
Much more efficiency can be achieved by
representing causal models like we represent programs. For instance, this little "program":
… is in fact a recursively-defined causal model. It compactly represents an infinite causal diagram, corresponding to the unrolled computation. (See the linked post for more details on how this works.)
Conceptually, this sort of representation involves lots of causal "submodels" which "call" each other - or, to put it differently, lots of little diagram-pieces which can be wired together and reused in the full world-model. Reuse means that such models can represent worlds which are "bigger than" the memory available to the agent itself, so long as those worlds have lots of compressible structure - e.g.
the factorial example above, which represents an infinite causal diagram using a finite representation.
(Aside: those familiar with probabilistic programming could view this world-model representation as simply a probabilistic program.)
Updates
So we have a style of model which can compactly represent quite large worlds, so long as those worlds have lots of compressible structure. But there's still the problem of updates on that structure.
Here, we typically imagine some kind of message-passing, though it's an open problem exactly what such an algorithm looks like for big/complex models.
The key idea here is that most observations are not directly relevant to our submodels of most of the world. I see a bird flying by my office, and that tells me nothing at all about the price of gasoline[1]. So we expect that, the vast majority of the time, message-passing updates of a similar flavor to those used on Bayes nets (though not exactly the same) w...

Apr 8, 2024 • 11min
AF - How We Picture Bayesian Agents by johnswentworth
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How We Picture Bayesian Agents, published by johnswentworth on April 8, 2024 on The AI Alignment Forum.
I think that when most people picture a Bayesian agent, they imagine a system which:
Enumerates every possible state/trajectory of "the world", and assigns a probability to each.
When new observations come in, loops over every state/trajectory, checks the probability of the observations conditional on each, and then updates via Bayes rule.
To select actions, computes the utility which each action will yield under each state/trajectory, then averages over state/trajectory weighted by probability, and picks the action with the largest weighted-average utility.
Typically, we define Bayesian agents as agents which behaviorally match that picture.
But that's not really the picture David and I typically have in mind, when we picture Bayesian agents. Yes, behaviorally they act that way. But I think people get overly-anchored imagining the internals of the agent that way, and then mistakenly imagine that a Bayesian model of agency is incompatible with various features of real-world agents (e.g. humans) which a Bayesian framework can in fact handle quite well.
So this post is about our prototypical mental picture of a "Bayesian agent", and how it diverges from the basic behavioral picture.
Causal Models and Submodels
Probably you've heard of
causal diagrams or Bayes nets by now.
If our Bayesian agent's world model is represented via a big causal diagram, then that already looks quite different from the original "enumerate all states/trajectories" picture. Assuming reasonable sparsity, the data structures representing the causal model (i.e. graph + conditional probabilities on each node) take up an amount of space which grows linearly with the size of the world, rather than exponentially. It's still
too big for an agent embedded in the world to store in its head directly, but much smaller than the brute-force version.
(Also, a realistic agent would want to explicitly represent more than just one causal diagram, in order to have uncertainty over causal structure. But that will largely be subsumed by our next point anyway.)
Much more efficiency can be achieved by
representing causal models like we represent programs. For instance, this little "program":
… is in fact a recursively-defined causal model. It compactly represents an infinite causal diagram, corresponding to the unrolled computation. (See the linked post for more details on how this works.)
Conceptually, this sort of representation involves lots of causal "submodels" which "call" each other - or, to put it differently, lots of little diagram-pieces which can be wired together and reused in the full world-model. Reuse means that such models can represent worlds which are "bigger than" the memory available to the agent itself, so long as those worlds have lots of compressible structure - e.g.
the factorial example above, which represents an infinite causal diagram using a finite representation.
(Aside: those familiar with probabilistic programming could view this world-model representation as simply a probabilistic program.)
Updates
So we have a style of model which can compactly represent quite large worlds, so long as those worlds have lots of compressible structure. But there's still the problem of updates on that structure.
Here, we typically imagine some kind of message-passing, though it's an open problem exactly what such an algorithm looks like for big/complex models.
The key idea here is that most observations are not directly relevant to our submodels of most of the world. I see a bird flying by my office, and that tells me nothing at all about the price of gasoline[1]. So we expect that, the vast majority of the time, message-passing updates of a similar flavor to those used on Bayes nets (though not exactl...

Apr 8, 2024 • 32min
EA - Analyzing the moral value of unaligned AIs by Matthew Barnett
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Analyzing the moral value of unaligned AIs, published by Matthew Barnett on April 8, 2024 on The Effective Altruism Forum.
A crucial consideration in assessing the risks of advanced AI is the moral value we place on "unaligned" AIs - systems that do not share human preferences - which could emerge if we fail to make enough progress on technical alignment.
In this post I'll consider three potential moral perspectives, and analyze what each of them has to say about the normative value of the so-called "default" unaligned AIs that humans might eventually create:
Standard total utilitarianism combined with longtermism: the view that what matters most is making sure the cosmos is eventually filled with numerous happy beings.
Human species preservationism: the view that what matters most is making sure the human species continues to exist into the future, independently from impartial utilitarian imperatives.
Near-termism or present-person affecting views: what matters most is improving the lives of those who currently exist, or will exist in the near future.
I argue that from the first perspective, unaligned AIs don't seem clearly bad in expectation relative to their alternatives, since total utilitarianism is impartial to whether AIs share human preferences or not. A key consideration here is whether unaligned AIs are less likely to be conscious, or less likely to bring about consciousness, compared to alternative aligned AIs. On this question, I argue that there are considerations both ways, and no clear answers.
Therefore, it tentatively appears that the normative value of alignment work is very uncertain, and plausibly neutral, from a total utilitarian perspective.
However, technical alignment work is much more clearly beneficial from the second and third perspectives. This is because AIs that share human preferences are likely to both preserve the human species and improve the lives of those who currently exist. However, in the third perspective, pausing or slowing down AI is far less valuable than in the second perspective, since it forces existing humans to forego benefits from advanced AI, which I argue will likely be very large.
I personally find moral perspectives (1) and (3) most compelling, and by contrast find view (2) to be uncompelling as a moral view. Yet it is only from perspective (2) that significantly delaying advanced AI for alignment reasons seems clearly beneficial, in my opinion. This is a big reason why I'm not very sympathetic to pausing or slowing down AI as a policy proposal.
While these perspectives do not exhaust the scope of potential moral views, I think this analysis can help to sharpen what goals we intend to pursue by promoting particular forms of AI safety work.
Unaligned AIs from a total utilitarian point of view
Let's first consider the normative value of unaligned AIs from the first perspective. From a standard total utilitarian perspective, entities matter morally if they are conscious (under hedonistic utilitarianism) or if they have preferences (under preference utilitarianism). From this perspective, it doesn't actually matter much intrinsically if AIs don't share human preferences, so long as they are moral patients and have their preferences satisfied.
The following is a prima facie argument that utilitarians shouldn't care much about technical AI alignment work. Utilitarianism is typically not seen as partial to human preferences in particular. Therefore, efforts to align AI systems with human preferences - the core aim of technical alignment work - may be considered morally neutral from a utilitarian perspective.
The reasoning here is that changing the preferences of AIs to better align them with the preferences of humans doesn't by itself clearly seem to advance the aims of utilitarianism, in the sense of filling the cosmos w...


