
The Nonlinear Library: LessWrong
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Latest episodes

Sep 7, 2024 • 21min
LW - Excerpts from "A Reader's Manifesto" by Arjun Panickssery
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Excerpts from "A Reader's Manifesto", published by Arjun Panickssery on September 7, 2024 on LessWrong.
"A Reader's Manifesto" is a July 2001 Atlantic piece by B.R. Myers that I've returned to many times. He complains about the inaccessible pretension of the highbrow literary fiction of his day. The article is mostly a long list of critiques of various quotes/passages from well-reviewed books by famous authors. It's hard to accuse him of cherry-picking since he only targets passages that reviewers singled out as unusually good.
Some of his complaints are dumb but the general idea is useful: authors try to be "literary" by (1) avoiding a tightly-paced plot that could evoke "genre fiction" and (2) trying to shoot for individual standout sentences that reviewers can praise, using a shotgun approach where many of the sentences are banal or just don't make sense.
Here are some excerpts of his complaints. Bolding is always mine.
The "Writerly" Style
He complains that critics now dismiss too much good literature as "genre" fiction.
More than half a century ago popular storytellers like Christopher Isherwood and Somerset Maugham were ranked among the finest novelists of their time, and were considered no less literary, in their own way, than Virginia Woolf and James Joyce. Today any accessible, fast-moving story written in unaffected prose is deemed to be "genre fiction" - at best an excellent "read" or a "page turner," but never literature with a capital L.
An author with a track record of blockbusters may find the publication of a new work treated like a pop-culture event, but most "genre" novels are lucky to get an inch in the back pages of The New York Times Book Review.
The dualism of literary versus genre has all but routed the old trinity of highbrow, middlebrow, and lowbrow, which was always invoked tongue-in-cheek anyway. Writers who would once have been called middlebrow are now assigned, depending solely on their degree of verbal affectation, to either the literary or the genre camp.
David Guterson is thus granted Serious Writer status for having buried a murder mystery under sonorous tautologies (Snow Falling on Cedars, 1994), while Stephen King, whose Bag of Bones (1998) is a more intellectual but less pretentious novel, is still considered to be just a very talented genre storyteller.
Further, he complains that fiction is regarded as "literary" the more slow-paced, self-conscious, obscure, and "writerly" its style.
The "literary" writer need not be an intellectual one. Jeering at status-conscious consumers, bandying about words like "ontological" and "nominalism," chanting Red River hokum as if it were from a lost book of the Old Testament: this is what passes for profundity in novels these days. Even the most obvious triteness is acceptable, provided it comes with a postmodern wink.
What is not tolerated is a strong element of action - unless, of course, the idiom is obtrusive enough to keep suspense to a minimum. Conversely, a natural prose style can be pardoned if a novel's pace is slow enough, as was the case with Ha Jin's aptly titled Waiting, which won the National Book Award (1999) and the PEN/Faulkner Award (2000).
If the new dispensation were to revive good "Mandarin" writing - to use the term coined by the British critic Cyril Connolly for the prose of writers like Virginia Woolf and James Joyce - then I would be the last to complain. But what we are getting today is a remarkably crude form of affectation: a prose so repetitive, so elementary in its syntax, and so numbing in its overuse of wordplay that it often demands less concentration than the average "genre" novel.
4 Types of Bad Prose
Then he has five sections complaining about 4 different types of prose he doesn't like (in addition to the generic "literary" prose): "evocative" prose, "muscular"...

Sep 7, 2024 • 1min
LW - Pay Risk Evaluators in Cash, Not Equity by Adam Scholl
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Pay Risk Evaluators in Cash, Not Equity, published by Adam Scholl on September 7, 2024 on LessWrong.
Personally, I suspect the alignment problem is hard. But even if it turns out to be easy, survival may still require getting at least the absolute basics right; currently, I think we're mostly failing even at that.
Early discussion of AI risk often focused on debating the viability of various elaborate safety schemes humanity might someday devise - designing AI systems to be more like "tools" than "agents," for example, or as purely question-answering oracles locked within some kryptonite-style box. These debates feel a bit quaint now, as AI companies race to release agentic models they barely understand directly onto the internet.
But a far more basic failure, from my perspective, is that at present nearly all AI company staff - including those tasked with deciding whether new models are safe to build and release - are paid substantially in equity, the value of which seems likely to decline if their employers stop building and releasing new models.
As a result, it is currently the case that roughly everyone within these companies charged with sounding the alarm risks personally losing huge sums of money if they do. This extreme conflict of interest could be avoided simply by compensating risk evaluators in cash instead.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Sep 6, 2024 • 7min
LW - Adam Optimizer Causes Privileged Basis in Transformer Language Models by Diego Caples
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Adam Optimizer Causes Privileged Basis in Transformer Language Models, published by Diego Caples on September 6, 2024 on LessWrong.
Diego Caples (diego@activated-ai.com)
Rob Neuhaus (rob@activated-ai.com)
Introduction
In principle, neuron activations in a transformer-based language model residual stream should be about the same scale. In practice, however the dimensions unexpectedly widely vary in scale. Mathematical theories of the transformer architecture do not predict this. They expect rotational equivariance within a model, where one dimension is no more important than any other.
Is there something wrong with our reasonably informed intuitions of how transformers work? What explains these outlier channels?
Previously, Anthropic researched the existence of these privileged basis dimensions (dimensions more important / larger than expected) and ruled out several causes. By elimination, they reached the hypothesis that per-channel normalization in the Adam optimizer was the cause of privileged basis. However, they did not prove this was the case.
We conclusively show that Adam causes outlier channels / privileged basis within the transformer residual stream. When replacing the Adam optimizer with SGD, models trained do not have a privileged basis.
As a whole, this work improves mechanistic understanding of transformer LM training dynamics and confirms that our mathematical models of transformers are not flawed. Rather, they simply do not take into account the training process.
Our code is open source at the LLM outlier channel exploration GitHub.
Key Results
Training an LM with SGD does not result in a privileged basis, indicating that Adam is the cause of privileged basis in transformer LMs.
Training a 12M parameter model on TinyStories allows us to replicate outlier channel behavior on a small LM, training in less than 15 minutes on an H100.
Background
Recommended Reading
Privileged Bases in the Transformer Residual Stream
Toy Models of Superposition (Privileged Basis Section)
More About Anthropic's Work
We consider Anthropic's research on privileged basis the primary motivator for this work. In Anthropic's Privileged Bases in the Transformer Residual Stream, they demonstrate privileged basis in a 200M parameter LLM, performed some experiments to rule out possible causes, but did not find a definitive cause. They hypothesize that outlier channels are caused by Adam's lack of rotational equivariance, and suggest that training using SGD could isolate Adam as the cause.
Adam vs SGD, and Rotational Equivariance
Consider an experiment where we rotate the parameter space of a neural network, train it, and then invert the rotation. With Stochastic Gradient Descent (SGD), this process yields the same model as if we hadn't rotated at all. However, with the Adam optimizer, we end up with a different model.
This difference can be explained by the presence/absence a property called rotational equivariance. SGD is rotationally equivariant: optimizer steps are always directly proportional to the gradient of the loss function, regardless of the chosen coordinate system. In contrast, Adam is not rotationally equivariant because it takes steps in ways that are not proportional to the gradient. Updates depend on coordinate-wise gradient statistics.
As we later show, this difference is what leads to privileged basis within LMs.
Kurtosis
Motivated by Anthropic, we use excess kurtosis as a metric for measuring basis privilege.
We encourage the reader to read Anthropic's reasoning for why this is a good metric, but here we aim to demonstrate graphically that excess kurtosis is a reasonable choice for measuring basis privilege.
We plot the middle layer residual stream activations for the last token of string:
"Lilly saw a big red apple!"
as an Adam optimized LM training run progresses....

Sep 5, 2024 • 9min
LW - instruction tuning and autoregressive distribution shift by nostalgebraist
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: instruction tuning and autoregressive distribution shift, published by nostalgebraist on September 5, 2024 on LessWrong.
[Note: this began life as a "Quick Takes" comment, but it got pretty long, so I figured I might as well convert it to a regular post.]
In LM training, every token provides new information about "the world beyond the LM" that can be used/"learned" in-context to better predict future tokens in the same window.
But when text is produced by autoregressive sampling from the same LM, it is not informative in the same way, at least not to the same extent[1]. Thus, sampling inevitably produces a distribution shift.
I think this is one of the reasons why it's (apparently) difficult to get instruction-tuned / HH-tuned models to report their uncertainty and level of competence accurately, rather than being overconfident.
(I doubt this is a novel point, I just haven't seen it spelled out explicitly before, and felt like doing so.)
Imagine that you read the following (as the beginning of some longer document), and you trust that the author is accurately describing themselves:
I'm a Princeton physics professor, with a track record of highly cited and impactful research in the emerging field of Ultra-High-Density Quasiclassical Exotic Pseudoplasmas (UHD-QC-EPPs).
The state of the art in numerical simulation of UHD-QC-EPPs is the so-called Neural Pseudospectral Method (NPsM).
I made up all those buzzwords, but imagine that this is a real field, albeit one you know virtually nothing about. So you've never heard of "NPsM" or any other competing method.
Nonetheless, you can confidently draw some conclusions just from reading this snippet and trusting the author's self-description:
Later in this document, the author will continue to write as though they believe that NPsM is "the gold standard" in this area.
They're not going to suddenly turn around and say something like "wait, whoops, I just checked Wikipedia and it turns out NPsM has been superseded by [some other thing]." They're a leading expert in the field! If that had happened, they'd already know by the time they sat down to write any of this.
Also, apart from this particular writer's beliefs, it's probably actually true that NPsM is the gold standard in this area.
Again, they're an expert in the field -- and this is the sort of claim that would be fairly easy to check even if you're not an expert yourself, just by Googling around and skimming recent papers. It's also not the sort of claim where there's any obvious incentive for deception. It's hard to think of a plausible scenario in which this person writes this sentence, and yet the sentence is false or even controversial.
During training, LLMs are constantly presented with experiences resembling this one.
The LLM is shown texts about topics of which it has incomplete knowledge. It has to predict each token from the preceding ones.
Whatever new information the text conveys about the topic may make it into the LLM's weights, through gradient updates on this example. But even before that happens, the LLM can also use the kind of reasoning shown in the bulleted list above to improve its predictions on the text right now (before any gradient updates).
That is, the LLM can do in-context learning, under the assumption that the text was produced by an entity outside itself -- so that each part of the text (potentially) provides new information about the real world, not yet present in the LLM's weights, that has useful implications for the later parts of the same text.
So, all else being equal, LLMs will learn to apply this kind of reasoning to all text, always, ubiquitously.
But autoregressive sampling produces text that is not informative about "the world outside" in the same way that all the training texts were.
During training, when an LLM sees information it d...

Sep 5, 2024 • 14min
LW - Conflating value alignment and intent alignment is causing confusion by Seth Herd
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Conflating value alignment and intent alignment is causing confusion, published by Seth Herd on September 5, 2024 on LessWrong.
Submitted to the Alignment Forum. Contains more technical jargon than usual.
Epistemic status: I think something like this confusion is happening often. I'm not saying these are the only differences in what people mean by "AGI alignment".
Summary:
Value alignment is better but probably harder to achieve than personal intent alignment to the short-term wants of some person(s). Different groups and people tend to primarily address one of these alignment targets when they discuss alignment. Confusion abounds.
One important confusion stems from an assumption that the type of AI defines the alignment target: strong goal-directed AGI must be value aligned or misaligned, while personal intent alignment is only viable for relatively weak AI. I think this assumption is important but false.
While value alignment is categorically better, intent alignment seems easier, safer, and more appealing in the short term, so AGI project leaders are likely to try it.[1]
Overview
Clarifying what people mean by alignment should dispel some illusory disagreement, and clarify alignment theory and predictions of AGI outcomes.
Caption: Venn diagram of three types of alignment targets. Value alignment and Personal intent alignment are both subsets of Evan Hubinger's definition of intent alignment: AGI aligned with human intent in the broadest sense.
Prosaic alignment work usually seems to be addressing a target somewhere in the neighborhood of personal intent alignment (following instructions or doing what this person wants now), while agent foundations and other conceptual alignment work usually seems to be addressing value alignment. Those two clusters have different strengths and weaknesses as alignment targets, so lumping them together produces confusion.
People mean different things when they say alignment. Some are mostly thinking about value alignment (VA): creating sovereign AGI that has values close enough to humans' for our liking. Others are talking about making AGI that is corrigible (in the Christiano or Harms sense)[2] or follows instructions from its designated principal human(s). I'm going to use the term personal intent alignment (PIA) until someone has a better term for that type of alignment target.
Different arguments and intuitions apply to these two alignment goals, so talking about them without differentiation is creating illusory disagreements.
Value alignment is better almost by definition, but personal intent alignment seems to avoid some of the biggest difficulties of value alignment. Max Harms' recent sequence on corrigibility as a singular target (CAST) gives both a nice summary and detailed arguments. We do not need us to point to or define values, just short term preferences or instructions.
The principal advantage is that an AGI that follows instructions can be used as a collaborator in improving its alignment over time; you don't need to get it exactly right on the first try. This is more helpful in slower and more continuous takeoffs. This means that PI alignment has a larger basin of attraction than value alignment does.[3]
Most people who think alignment is fairly achievable seem to be thinking of PIA, while critics often respond thinking of value alignment. It would help to be explicit. PIA is probably easier and more likely than full VA for our first stabs at AGI, but there are reasons to wonder if it's adequate for real success. In particular, there are intuitions and arguments that PIA doesn't address the real problem of AGI alignment.
I think PIA does address the real problem, but in a non-obvious and counterintuitive way.
Another unstated divide
There's another important clustering around these two conceptions of alignment. Peop...

Sep 5, 2024 • 20min
LW - The Fragility of Life Hypothesis and the Evolution of Cooperation by KristianRonn
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Fragility of Life Hypothesis and the Evolution of Cooperation, published by KristianRonn on September 5, 2024 on LessWrong.
This part 2 in a 3-part sequence summarizes my book (see part 1 here), The Darwinian Trap. The book aims to popularize the concept of multipolar traps and establish them as a broader cause area. If you find this series intriguing and want to spread the word and learn more:
1. Share this post with others on X or other social media platforms.
2. Pre-order the book
here.
3. Sign up for my mailing list
here before September 24 for a 20% chance to win a free hardcover copy of the book (it takes 5 seconds).
4. Contact me at kristian@kristianronn.com if you have any input or ideas.
In Part 1, I introduced the concept of a Darwinian demon - selection pressures that drive agents to harm others for personal gain. I also argued that the game theory of our evolutionary fitness landscape, with its limited resources, often favors defection over cooperation within populations. Yet, when we observe nature, cooperation is ubiquitous: from molecules working together in metabolism, to genes forming genomes, to cells building organisms, and individuals forming societies.
Clearly, cooperation must be evolutionarily adaptive, or we wouldn't see it so extensively in the natural world. I refer to a selection pressure that fosters mutually beneficial cooperation as a "Darwinian angel."
To understand the conditions under which cooperative behavior thrives, we can look at our own body. For an individual cell, the path to survival might seem clear: prioritize self-interest by replicating aggressively, even at the organism's expense. This represents the Darwinian demon - selection pressure favoring individual survival.
However, from the perspective of the whole organism, survival depends on suppressing these self-serving actions. The organism thrives only when its cells cooperate, adhering to a mutually beneficial code. This tension between individual and collective interests forms the core of multi-level selection, where evolutionary pressures act on both individuals and groups.
Interestingly, the collective drive for survival paradoxically requires cells to act altruistically, suppressing their self-interest for the organism's benefit. In this context, Darwinian angels are the forces that make cooperation adaptive, promoting collective well-being over individual defection. These angels are as much a part of evolution as their demonic counterparts, fostering cooperation that benefits the broader environment.
Major Evolutionary Transitions and Cooperation
This struggle, between selection pressures of cooperation and defection, traces back to the dawn of life. In the primordial Earth, a world of darkness, immense pressure, and searing heat, ribonucleic acid (RNA) emerged - a molecule that, like DNA, encodes the genetic instructions essential for life. Without RNA, complex life wouldn't exist. Yet, as soon as RNA formed, it faced a Darwinian challenge known as Spiegelman's Monster.
Shorter RNA strands replicate faster than longer ones, creating a selection pressure favoring minimal RNA molecules with as few as 218 nucleotides - insufficient to encode any useful genetic material. This challenge was likely overcome through molecular collaboration: a lipid membrane provided a sanctuary for more complex RNA, which in turn helped form proteins to stabilize and enhance the membrane.
Throughout evolutionary history, every major transition has occurred because Darwinian angels successfully suppressed Darwinian demons, forming new units of selection and driving significant evolutionary progress. Each evolutionary leap has been a fierce struggle against these demons, with every victory paving the way for the beauty, diversity, and complexity of life we see today.
These triumphs are...

Sep 5, 2024 • 5min
LW - What is SB 1047 *for*? by Raemon
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What is SB 1047 *for*?, published by Raemon on September 5, 2024 on LessWrong.
Emmett Shear asked on twitter:
I think SB 1047 has gotten much better from where it started. It no longer appears actively bad. But can someone who is pro-SB 1047 explain the specific chain of causal events where they think this bill becoming law results in an actual safer world? What's the theory?
And I realized that AFAICT no one has concisely written up what the actual story for SB 1047 is supposed to be.
This is my current understanding. Other folk here may have more detailed thoughts or disagreements.
The bill isn't sufficient on it's own, but it's not regulation for regulation's sake because it's specifically a piece of the regulatory machine I'd ultimately want built.
Right now, it mostly solidifies the safety processes that existing orgs have voluntarily committed to. But, we are pretty lucky that they voluntarily committed to them, and we don't have any guarantee that they'll stick with them in the future.
For the bill to succeed, we do need to invent good, third party auditing processes that are not just a bureaucratic sham. This is an important, big scientific problem that isn't solved yet, and it's going to be a big political problem to make sure that the ones that become consensus are good instead of regulatory-captured. But, figuring that out is one of the major goals of the AI safety community right now.
The "Evals Plan" as I understand it comes in two phase:
1. Dangerous Capability Evals. We invent evals that demonstrate a model is capable of dangerous things (including manipulation/scheming/deception-y things, and "invent bioweapons" type things)
As I understand it, this is pretty tractable, although labor intensive and "difficult" in a normal, boring way.
2. Robust Safety Evals. We invent evals that demonstrate that a model capable of scheming, is nonetheless safe - either because we've proven what sort of actions it will choose to take (AI Alignment), or, we've proven that we can control it even if it is scheming (AI control). AI control is probably easier at first, although limited.
As I understand it, this is very hard, and while we're working on it it requires new breakthroughs.
The goal with SB 1047 as I understand is roughly:
First: Capability Evals trigger
By the time it triggers for the first time, we have a set of evals that are good enough to confirm "okay, this model isn't actually capable of being dangerous" (and probably the AI developers continue unobstructed.
But, when we first hit a model capable of deception, self-propagation or bioweapon development, the eval will trigger "yep, this is dangerous." And then the government will ask "okay, how do you know it's not dangerous?".
And the company will put forth some plan, or internal evaluation procedure, that (probably) sucks. And the Frontier Model Board will say "hey Attorny General, this plan sucks, here's why."
Now, the original version of SB 1047 would include the Attorney General saying "okay yeah your plan doesn't make sense, you don't get to build your model." The newer version of the plan I think basically requires additional political work at this phase.
But, the goal of this phase, is to establish "hey, we have dangerous AI, and we don't yet have the ability to reasonably demonstrate we can render it non-dangerous", and stop development of AI until companies reasonably figure out some plans that at _least_ make enough sense to government officials.
Second: Advanced Evals are invented, and get woven into law
The way I expect a company to prove their AI is safe, despite having dangerous capabilities, is for third parties to invent the a robust version of the second set of evals, and then for new AIs to pass those evals.
This requires a set of scientific and political labor, and the hope is that by the...

Sep 5, 2024 • 8min
LW - Executable philosophy as a failed totalizing meta-worldview by jessicata
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Executable philosophy as a failed totalizing meta-worldview, published by jessicata on September 5, 2024 on LessWrong.
(this is an expanded, edited version of an x.com post)
It is easy to interpret Eliezer Yudkowsky's main goal as creating a friendly AGI. Clearly, he has failed at this goal and has little hope of achieving it. That's not a particularly interesting analysis, however. A priori, creating a machine that makes things ok forever is not a particularly plausible objective. Failure to do so is not particularly informative.
So I'll focus on a different but related project of his: executable philosophy. Quoting Arbital:
Two motivations of "executable philosophy" are as follows:
1. We need a philosophical analysis to be "effective" in Turing's sense: that is, the terms of the analysis must be useful in writing programs. We need ideas that we can compile and run; they must be "executable" like code is executable.
2. We need to produce adequate answers on a time scale of years or decades, not centuries. In the entrepreneurial sense of "good execution", we need a methodology we can execute on in a reasonable timeframe.
There is such a thing as common sense rationality, which says the world is round, you shouldn't play the lottery, etc. Formal notions like Bayesianism, VNM utility theory, and Solomonoff induction formalize something strongly related to this common sense rationality. Yudkowsky believes further study in this tradition can supersede ordinary academic philosophy, which he believes to be conceptually weak and motivated to continue ongoing disputes for more publications.
In the Sequences, Yudkowsky presents these formal ideas as the basis for a totalizing meta-worldview, of epistemic and instrumental rationality, and uses the meta-worldview to argue for his object-level worldview (which includes many-worlds, AGI foom, importance of AI alignment, etc.).
While one can get totalizing (meta-)worldviews from elsewhere (such as interdisciplinary academic studies), Yudkowsky's (meta-)worldview is relatively easy to pick up for analytically strong people (who tend towards STEM), and is effective ("correct" and "winning") relative to its simplicity.
Yudkowsky's source material and his own writing do not form a closed meta-worldview, however. There are open problems as to how to formalize and solve real problems. Many of the more technical sort are described in MIRI's technical agent foundations agenda.
These include questions about how to parse a physically realistic problem as a set of VNM lotteries ("decision theory"), how to use something like Bayesianism to handle uncertainty about mathematics ("logical uncertainty"), how to formalize realistic human values ("value loading"), and so on.
Whether or not the closure of this meta-worldview leads to creation of friendly AGI, it would certainly have practical value. It would allow real world decisions to be made by first formalizing them within a computational framework (related to Yudkowsky's notion of "executable philosophy"), whether or not the computation itself is tractable (with its tractable version being friendly AGI).
The practical strategy of MIRI as a technical research institute is to go meta on these open problems by recruiting analytically strong STEM people (especially mathematicians and computer scientists) to work on them, as part of the agent foundations agenda. I was one of these people. While we made some progress on these problems (such as with the Logical Induction paper), we didn't come close to completing the meta-worldview, let alone building friendly AGI.
With the Agent Foundations team at MIRI eliminated, MIRI's agent foundations agenda is now unambiguously a failed project. I had called MIRI technical research as likely to fail around 2017 with the increase in internal secrecy, but at thi...

Sep 5, 2024 • 3min
LW - Michael Dickens' Caffeine Tolerance Research by niplav
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Michael Dickens' Caffeine Tolerance Research, published by niplav on September 5, 2024 on LessWrong.
Michael Dickens has read the research and performed two self-experiments on whether consuming caffeine builds up tolerance, and if yes, how quickly.
First literature review:
What if instead of taking caffeine every day, you only take it intermittently - say, once every 3 days? How often can most people take caffeine without developing a tolerance?
The scientific literature on this question is sparse. Here's what I found:
1. Experiments on rats found that rats who took caffeine every other day did not develop a tolerance. There are no experiments on humans. There are no experiments that use other intermittent dosing frequencies (such as once every 3 days).
2. Internet forum users report that they can take caffeine on average once every 3 days without developing a tolerance. But there's a lot of variation between individuals.
Second literature review:
If you take caffeine every day, does it stop working? If it keeps working, how much of its effect does it retain?
There are many studies on this question, but most of them have severe methodological limitations. I read all the good studies (on humans) I could find. Here's my interpretation of the literature:
Caffeine almost certainly loses some but not all of its effect when you take it every day.
In expectation, caffeine retains 1/2 of its benefit, but this figure has a wide credence interval.
The studies on cognitive benefits all have some methodological issues so they might not generalize.
There are two studies on exercise benefits with strong methodology, but they have small sample sizes.
First experiment:
I conducted an experiment on myself to see if I would develop a tolerance to caffeine from taking it three days a week. The results suggest that I didn't. Caffeine had just as big an effect at the end of my four-week trial as it did at the beginning.
This outcome is statistically significant (p = 0.016), but the data show a weird pattern: caffeine's effectiveness went up over time instead of staying flat. I don't know how to explain that, which makes me suspicious of the experiment's findings.
Second experiment:
This time I tested if I could have caffeine 4 days a week without getting habituated.
Last time, when I took caffeine 3 days a week, I didn't get habituated but the results were weird. This time, with the more frequent dose, I still didn't get habituated, and the results were weird again! […] But it looks like I didn't get habituated when taking caffeine 4 days a week - or, at least, not to a detectable degree. So I'm going to keep taking caffeine 4 days a week.
When I take caffeine 3 days in a row, do I habituate by the 3rd day?
The evidence suggests that I don't, but the evidence is weak.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Sep 4, 2024 • 6min
LW - What happens if you present 500 people with an argument that AI is risky? by KatjaGrace
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What happens if you present 500 people with an argument that AI is risky?, published by KatjaGrace on September 4, 2024 on LessWrong.
Recently, Nathan Young and I wrote about arguments for AI risk and put them on the AI Impacts wiki. In the process, we ran a casual little survey of the American public regarding how they feel about the arguments, initially (if I recall) just because we were curious whether the arguments we found least compelling would also fail to compel a wide variety of people.
The results were very confusing, so we ended up thinking more about this than initially intended and running four iterations total. This is still a small and scrappy poll to satisfy our own understanding, and doesn't involve careful analysis or error checking. But I'd like to share a few interesting things we found. Perhaps someone else wants to look at our data more carefully, or run more careful surveys about parts of it.
In total we surveyed around 570 people across 4 different polls, with 500 in the main one. The basic structure was:
1.
p(doom): "If humanity develops very advanced AI technology, how likely do you think it is that this causes humanity to go extinct or be substantially disempowered?" Responses had to be given in a text box, a slider, or with buttons showing ranges
2.
(Present them with one of eleven arguments, one a 'control')
3.
"Do you understand this argument?"
4.
"What did you think of this argument?"
5.
"How compelling did you find this argument, on a scale of 1-5?"
6.
p(doom) again
7.
Do you have any further thoughts about this that you'd like to share?
Interesting things:
In the first survey, participants were much more likely to move their probabilities downward than upward, often while saying they found the argument fairly compelling. This is a big part of what initially confused us. We now think this is because each argument had counterarguments listed under it. Evidence in support of this: in the second and fourth rounds we cut the counterarguments and probabilities went overall upward.
When included, three times as many participants moved their probabilities downward as upward (21 vs 7, with 12 unmoved).
In the big round (without counterarguments), arguments pushed people upward slightly more: 20% move upward and 15% move downward overall (and 65% say the same). On average, p(doom) increased by about 1.3% (for non-control arguments, treating button inputs as something like the geometric mean of their ranges).
But the input type seemed to make a big difference to how people moved!
It makes sense to me that people move a lot more in both directions with a slider, because it's hard to hit the same number again if you don't remember it. It's surprising to me that they moved with similar frequency with buttons and open response, because the buttons covered relatively chunky ranges (e.g. 5-25%) so need larger shifts to be caught.
Input type also made a big difference to the probabilities people gave to doom before seeing any arguments. People seem to give substantially lower answers when presented with buttons (Nathan proposes this is because there was was a <1% and 1-5% button, so it made lower probabilities more salient/ "socially acceptable", and I agree):
Overall, P(doom) numbers were fairly high: 24% average, 11% median.
We added a 'control argument'. We presented this as "Here is an argument that advanced AI technology might threaten humanity:" like the others, but it just argued that AI might substantially contribute to music production:
This was the third worst argument in terms of prompting upward probability motion, but the third best in terms of being "compelling". Overall it looked a lot like other arguments, so that's a bit of a blow to the model where e.g. we can communicate somewhat adequately, 'arguments' are more compelling than rando...