

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Apr 1, 2024 • 7min
LW - A Selection of Randomly Selected SAE Features by CallumMcDougall
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Selection of Randomly Selected SAE Features, published by CallumMcDougall on April 1, 2024 on LessWrong.
Epistemic status - self-evident.
In this post, we interpret a small sample of Sparse Autoencoder features which reveal meaningful computational structure in the model that is clearly highly researcher-independent and of significant relevance to AI alignment.
Motivation
Recent excitement about Sparse Autoencoders (SAEs) has been mired by the following question: Do SAE features reflect properties of the model, or just capture correlational structure in the underlying data distribution?
While a full answer to this question is important and will take deliberate investigation, we note that researchers who've spent large amounts of time interacting with feature dashboards think it's more likely that SAE features capture highly non-trivial information about the underlying models.
Evidently, SAEs are the one true answer to ontology identification and as evidence of this, we show how initially uninterpretable features are often quite interpretable with further investigation / tweaking of dashboards. In each case, we describe how we make the best possible use of feature dashboards to ensure we aren't fooling ourselves or reading tea-leaves.
Note - to better understand these results, we highly recommend readers who are unfamiliar with SAE Feature Dashboards briefly refer to the relevant section of Anthropic's publication (whose dashboard structure we emulate below). TLDR - to understand what concepts are encoded by features, we look for patterns in the text which causes them to activate most strongly.
Case Studies in SAE Features
Scripture Feature
We open with a feature that seems to activate strongly on examples of sacred text, specifically from the works of Christianity.
Even though interpreting SAEs seems bad, and it can really make you mad, seeing features like this reminds us to always look on the bright side of life.
Perseverance Feature
We register lower confidence in this feature than others, but the top activating examples all seem to present a consistent theme of perseverance and loyalty in the face of immense struggle (this was confirmed with GPT4[1]). We're very excited at how semantic this feature is rather than merely syntactic, since a huge barrier to future progress in dictionary learning is whether we can find features associated with high-level semantic concepts like these.
Teamwork Feature
We were very surprised with this one, given that the training data for our models was all dated at 2022 or earlier. We welcome any and all theories here.
Deciphering Feature Activations with Quantization can be highly informative
Most analyses of SAE features have not directly attempted to understand the significance of feature activation strength, but we've found this can be highly informative. Take this feature for example.
Due to the apparently highly quantized pattern of activation, we decided to attempt decoding the sequence of max-activating sequences using the Morse code-based mapping {0.0: '/', 0.2: ' ', 1.0: '.', 2.0: '-'}. When we tried this, we found the following pattern:
Which translated into Morse code reads as:
We weren't sure exactly what to make of this, but more investigation is definitely advisable.
Lesson - visualize activation on full prompts to better understand features!
One feature which at first appeared uninterpretable is pictured below. Clearly this feature fires in DNA strings, but what is it actually tracking?
Showing a larger context after the max activating tokens, we begin to see what might be an interpretable pattern in the max activating examples.
We did this one more time, and revealed that this in-fact a feature which fires on DNA sequences from the species Rattus Norvegicus (japanese variants in particular). We leave it as an exerci...

Apr 1, 2024 • 4min
EA - Introducing "Bribe Well" by Dušan D. Nešić (Dushan)
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Introducing "Bribe Well", published by Dušan D. Nešić (Dushan) on April 1, 2024 on The Effective Altruism Forum.
On April 1st, in a groundbreaking pivot from traditional charitable efforts, we are thrilled to unveil "Bribe Well", a charity destined to redefine the boundaries of effective altruism. This idea was pioneered by EAs from Eastern Europe as we have seen firsthand how little money from private interests can move governments a long way. The post was written in large parts by AI, but we see no conflict of interest (in general, not just about AI writing this).
With a mission firmly rooted in the principle of maximizing impact, Bribe Well leverages the untapped potential of "direct governance by money," pioneering a future where philanthropy and pragmatism intertwine througthrough the strategic election of the most amenable politicians. Yes, you heard it right - we're talking about those with a flexible moral compass because in the world of impact, direction matters more than standing still.
The Philosophy Behind Bribe Well
At Bribe Well, we embrace the notion that the best way to predict the future is to buy it. Our innovative approach, dubbed "Ruling by Fiat (Currency)," is not about undermining democracy but enhancing its efficiency. Why waste time with debates and legislation when you can ensure the right decisions are made upfront, with a modest financial incentive? It's about cutting through the bureaucratic red tape with green bills.
How It Works:
Selection Savvy: We identify potential political candidates possessing a unique blend of ambition and pliability. Our rigorous vetting process ensures they're open to... let's call it "philanthropic persuasion".
Investment in Influence: Through a carefully curated portfolio of "donations", we secure our candidates' commitments to policies that align with our high-impact agenda. Think of it as crowd-funding for the common good - with returns measured in societal benefits.
Governance by Guideline: Once in power, our elected officials receive ongoing support and guidance - along with reminders of their generous benefactors. It's about keeping the ship of state on the right course, with a steady hand on the tiller (and a finger on the scales).
Why It's Revolutionary
Bribe Well isn't just a charity; it's a movement towards "efficient democracy," as practiced in many countries around the world. By prioritizing outcomes over processes, we ensure that the path to impactful change is as direct as a cash transfer. Our model bypasses the inefficiencies of the democratic process, offering a streamlined route to societal improvement. We are a version of GiveDirectly, with our direct giving going to the most efficient politicians for most bang for your buck.
What you should look forward to
We are hoping to have a website listing our achievements, with prices and services listed out, but the politicians and judges keep complaining about it, so we keep getting delayed. Perhaps by next April 1st we manage to have it up and running.
We have also managed to work with all the politicians you dislike to do the thing that you hate, but that was accidental.
Possible slogans, vote for your favorites or add your own in the comments
At Bribe Well, we believe in transparency - every dollar spent is a seed planted in the fertile ground of governance. You might say we're into "green" policy in more ways than one.
Some say money can't buy happiness, but at Bribe Well, we know it can certainly lease legislative efficiency.
Our critics may accuse us of moral bankruptcy, but we prefer to think of ourselves as investing in ethical liquidity.
A Call to Arms (and Wallets)
As we launch Bribe Well on this auspicious April 1st, we invite you to join us in embracing the future of philanthropy. Together, we can ensure that the road to hell is not just ...

Apr 1, 2024 • 5min
EA - EA Global: we're improving 1-on-1s by frances lorenz
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: EA Global: we're improving 1-on-1s, published by frances lorenz on April 1, 2024 on The Effective Altruism Forum.
Introducing our new 1-on-1 booking system
EA Globals (EAGs) are weekend-long conferences that feature a variety of talks, panels, workshops, and networking opportunities related to effective altruism.
Unlike other academic conferences, EAGs place a unique emphasis on 1-on-1 networking. Attendees are encouraged to connect via our event app, Swapcard, and book meetings to discuss potential projects, career goals, opportunities, and whatever else might be useful in a professional development context. Attendees often cite 1-on-1s among their most valuable experiences at the event.
This year, the EAG team is hoping to test out new ideas. For example, we recently ran EA Global: Bay Area 2024, our first EAG specifically focussed on global catastrophic risks.
In an attempt to generate novel experiments, the team decided to employ some first-principles reasoning to one of our key event components (see Figure 1).
After attendees arrive at their 1-on-1, notice the chain of events becomes uncertain - our team (currently) has very little power to ensure each individual 1-on-1 is valuable. One could argue this is part of the nature of existence (i.e. some things are out of our control); however, our team worried this conclusion was suspiciously convenient. What might we be missing?
To address this uncertainty, we are now assigning 1-on-1s based on a complex, astrological algorithm which uses robust celestial compatibility indicators.
We feel confident in this change because:
1-on-1s are more likely to be a positive experience if both parties are cosmically guaranteed to get along.
Your star sign determines much of your expected life trajectory, including your career. It follows that those with compatible star signs will have synergistic ideas and career paths; thus, early coordination is important and valuable.
A concrete example
Aanya is an Aries. Thus, she is partial to challenging the status quo. For example, through entrepreneurial ventures in alternative proteins or perhaps advocacy in the AI governance space.
Taylor is a Libra. They value collaboration and harmony. Thus, they are suited to roles involving peacekeeping or managing international relations, perhaps also in a governance capacity.
In a 1-on-1 meeting, we expect an Aries to bring bold ideas, which will then be tempered and balanced by the Libra's relational and strategic focus. According to the cosmos, this creates a perfect blend of bold action, which is also sustainable and feasible.
In other words, the 1-on-1 is astrologically guaranteed to be valuable. Our algorithm proceeds to schedule a meeting on Swapcard at the optimal time, based on planet movement and relevant cosmic orientations.
Under this system, we estimate a minimum likelihood of 60% that every meeting will result in at least one of: the creation of a new organisation, clarity on the worm wars, a communications strategy for EA, or a solution to the alignment problem (see Figure 2).
Frequently asked questions
Can I book 1-on-1s outside of the meetings that are assigned to me?
Yes, you can still use your own free will to book 1-on-1s with attendees. However, Swapcard will auto-cancel a meeting if compatibility is too low. Please be mindful throughout the event as the team is trying to cultivate a trusting relationship with the cosmos.
What if I have a 1-on-1 that isn't valuable? Can I provide you with feedback?
This won't happen.
Will you be applying this new strategy to content?
By 2025, we hope to implement a system by which star compatibility between speakers and attendees is accounted for when attendees attempt to reserve a spot at a talk, workshop, or office hour.
Will this apply to speed meetings?
Yes, everyone will have a name tag w...

Apr 1, 2024 • 1min
EA - Cause Area Tier List by John Salter
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Cause Area Tier List, published by John Salter on April 1, 2024 on The Effective Altruism Forum.
F-Tier
Animal Advocacy
Billions spent, but they still haven't got a single animal to advocate for EA ideas.
AI Safety
It's not clear that we wan't AIs to be safe - they pose a threat to mankind. If anything, we should be trying to kill them.
Rationality
Entirely dominated by wordcels who can't handle surds.
Wild animal suffering
Not neglected. There's trillions of living creatures suffering perfectly well as it is.
S-Tier
Meta-EA
The numbers speak for themselves. Meta was valued at $1.24 trillion, Electronic Arts at $35 billion.
Climate-change
Incredible successful. Everyone in the western world, worth knowing, has a thermostat.
EA Forum
It's important to pay your staff well for the value they create. No organisation pays its staff more per unit value created.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Apr 1, 2024 • 9min
EA - Excerpts From The EA Talmud by Scott Alexander
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Excerpts From The EA Talmud, published by Scott Alexander on April 1, 2024 on The Effective Altruism Forum.
(aka "surely we have enough Jews here that at least one person finds this funny")
MISHNA: Rabbi Ord says that the three permissible cause areas are global health and development, animal welfare, and long-termism. Why are these the three permissible cause areas? Because any charity outside of these cause areas is a violation of bal tashchit, the prohibition against wasting resources.
GEMARA: And why not four cause areas, because global health and development are two separate categories? Rabbi bar bar Hana answers: Is not global health valuable only because it later leads to development? Therefore, consider them the same category.
Rav Yehuda objects. He tells the story of a student who asked Rabbi Yohanan why there should not be a fourth category, meta-effective-altruism. Rabbi Yohanan answered: this is covered under long-termism, because it pays off in the long-term when the meta-charity provides more resources to effective altruism.
But if meta-effective altruism is long-termist because it pays off in more resources later, shouldn't global health also be classified as long-termist, because it pays off in development later?
Rav Hisda asks, how can you think to compare these two? In meta-effective-altruism, the resources have not yet been used to help the poor. But in global health, the resources are already spent on the poor. From this, learn that the category of charity is determined when the resources are spent on helping the poor. The Gemara concludes: indeed, learn from this.
Rabbi Shimon bar Yochai says: it is permissible to count your contributions towards animal welfare as also being contributions to long-termism. For Isaiah 10-11 says "On that day ... the wolf shall lie down with the lamb". What is meant by "on that day"? It means "in the long-term future".
If it is permissible to count your contributions towards animal welfare as also being contributions to long-termism, then in what sense are they two different cause areas? Rav answers: Rabbi Shimon was only referring to contributions towards speculative wild animal suffering causes like genetically engineering predators into herbivores.
Using the method of juxtaposition, we see that the next verse is "The cow and the bear shall graze, their young shall lie down together, and the lion shall eat straw like the ox". One who has pledged to donate a certain amount to animal welfare and long-termism may count these types of causes as either category of donation. Donations to normal animal welfare causes may not be counted as long-termism.
How long must it be before a charitable intervention pays off in order to count it as for "the long-term future"? Rabbi Hisda says in the name of Rav in the name of Rabbi Akiva: seventy years. The Bible says (Jeremiah 29:11) "For I know the plans I have for you, declares the LORD, plans to prosper you and not to harm you, plans to give you hope and a future." And this passage is talking about the end of the Babylonian exile after seventy years.
Therefore, the long-term future is seventy years from now.
MISHNA: Rabbi Eliezer said: one may work on AI safety but not on AI capabilities. What is AI capabilities? It is anything that makes an AI better at any of the the 39 categories of labor involved in constructing the Tabernacle.
GEMARA: The Exiliarch raised a question to Rav Hamnuna: Clearly language models are forbidden due to the prohibition against writing. But why is it capabilities research to work on an image model? Rav Hamnuna answered: that is the prohibited labor of dyeing. And is it dyeing if the image is in black and white? Rav Sheshet said: rather, say that it is still the prohibited labor of writing, because the user must prompt the image model.
Rabbi Zeira objec...

Apr 1, 2024 • 7min
AF - A Selection of Randomly Selected SAE Features by CallumMcDougall
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Selection of Randomly Selected SAE Features, published by CallumMcDougall on April 1, 2024 on The AI Alignment Forum.
Epistemic status - self-evident.
In this post, we interpret a small sample of Sparse Autoencoder features which reveal meaningful computational structure in the model that is clearly highly researcher-independent and of significant relevance to AI alignment.
Motivation
Recent excitement about Sparse Autoencoders (SAEs) has been mired by the following question: Do SAE features reflect properties of the model, or just capture correlational structure in the underlying data distribution?
While a full answer to this question is important and will take deliberate investigation, we note that researchers who've spent large amounts of time interacting with feature dashboards think it's more likely that SAE features capture highly non-trivial information about the underlying models.
Evidently, SAEs are the one true answer to ontology identification and as evidence of this, we show how initially uninterpretable features are often quite interpretable with further investigation / tweaking of dashboards. In each case, we describe how we make the best possible use of feature dashboards to ensure we aren't fooling ourselves or reading tea-leaves.
Note - to better understand these results, we highly recommend readers who are unfamiliar with SAE Feature Dashboards briefly refer to the relevant section of Anthropic's publication (whose dashboard structure we emulate below). TLDR - to understand what concepts are encoded by features, we look for patterns in the text which causes them to activate most strongly.
Case Studies in SAE Features
Scripture Feature
We open with a feature that seems to activate strongly on examples of sacred text, specifically from the works of Christianity.
Even though interpreting SAEs seems bad, and it can really make you mad, seeing features like this reminds us to always look on the bright side of life.
Perseverance Feature
We register lower confidence in this feature than others, but the top activating examples all seem to present a consistent theme of perseverance and loyalty in the face of immense struggle (this was confirmed with GPT4[1]). We're very excited at how semantic this feature is rather than merely syntactic, since a huge barrier to future progress in dictionary learning is whether we can find features associated with high-level semantic concepts like these.
Teamwork Feature
We were very surprised with this one, given that the training data for our models was all dated at 2022 or earlier. We welcome any and all theories here.
Deciphering Feature Activations with Quantization can be highly informative
Most analyses of SAE features have not directly attempted to understand the significance of feature activation strength, but we've found this can be highly informative. Take this feature for example.
Due to the apparently highly quantized pattern of activation, we decided to attempt decoding the sequence of max-activating sequences using the Morse code-based mapping {0.0: '/', 0.2: ' ', 1.0: '.', 2.0: '-'}. When we tried this, we found the following pattern:
Which translated into Morse code reads as:
We weren't sure exactly what to make of this, but more investigation is definitely advisable.
Lesson - visualize activation on full prompts to better understand features!
One feature which at first appeared uninterpretable is pictured below. Clearly this feature fires in DNA strings, but what is it actually tracking?
Showing a larger context after the max activating tokens, we begin to see what might be an interpretable pattern in the max activating examples.
We did this one more time, and revealed that this in-fact a feature which fires on DNA sequences from the species Rattus Norvegicus (japanese variants in particular). We leave it...

Apr 1, 2024 • 2min
EA - Introducing Open Asteroid Impact by Linch
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Introducing Open Asteroid Impact, published by Linch on April 1, 2024 on The Effective Altruism Forum.
"That which does not kill us makes us stronger."
Hillary Clinton, who is still alive
I'm proud and excited to announce the founding of my new startup, Open Asteroid Impact, where we redirect asteroids towards Earth for the benefit of humanity. Our mission is to have as high an impact as possible.
Below, I've copied over the one-pager I've sent potential investors and early employees:
Name: Open Asteroid Impact
Launch Date: April 1 2024
Website: openasteroidimpact.org
Mission: To have as high an impact as possible
Pitch: We are an asteroid mining company. When most people think about asteroid mining, they think of getting all the mining equipment to space and carefully mining and refining ore in space, before bringing the ore back down in a controlled landing. But humanity has zero experience in Zero-G mining in the vacuum of space. This is obviously very inefficient. Instead, it's much more efficient to bring the asteroids down to Earth first, and mine it on the ground.
Furthermore, we are first and foremost an asteroid mining *safety* company. That is why we need to race as fast as possible to be at the forefront of asteroid redirection, so more dangerous companies don't get there before us, letting us set safety standards.
Cofounder and CEO: Linch Zhang
Other employees: Austin Chen (CTO), Zach Weinersmith (Chief Culinary Officer), Annie Vu (ESG Analyst)
Board: tbd
Competitors: DeepMine, Anthropocene
Valuation: Astronomical
Design Principles: Bigger, Faster, Safer
Organizational Structure: for-profit C corp owned by B corp owned by public benefit corporation owned by 501c4 owned by 501c3 with a charter set through a combination of regulations from Imperial France, tlatoani Aztec Monarchy, Incan federalism, and Qin-dynasty China to avoid problems with Arrow's Impossibility Theorem
Safety Statement: "Mitigating the risk of extinction from human-directed asteroids should be a global priority alongside other civilizational risks such as nuclear war and artificial general intelligence"
You can learn more about us on our website.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Apr 1, 2024 • 2min
LW - Apply to be a Safety Engineer at Lockheed Martin! by yanni
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Apply to be a Safety Engineer at Lockheed Martin!, published by yanni on April 1, 2024 on LessWrong.
Are you passionate about ensuring the safety and reliability of the world's most lethal and cutting-edge weaponry? Does the idea of creating technology and then working out its impacts excite you? Do you thrive in dynamic environments where innovation meets rigorous safety standards? If so, you might want to consider joining the team at Lockheed Martin (LM), global leaders in advanced weapon systems development!
Position overview and background:
As a Safety Engineer specializing in advanced weaponry systems, you will play a critical role in ensuring we pass the checks and balances we've helped Federal Governments develop. You will collaborate very closely with multidisciplinary teams of engineers, scientists, and analysts to assess, mitigate, and manage risks associated with our most innovative products (however we expect any capabilities insights you discover along the way will be kept from your colleagues).
You might be a good fit if you:
Thrive on rigorously testing SOTA lethal weaponry, to ensure their safety and compliance.
Enjoy working closely with PR & Comms - as needed you will be asked to appear on various podcasts and give presentations to the weapons Safety community, whom we work very closely with.
Have experience in organizations with a flat hierarchy. For example, our CEO works extremely closely with the board.
Have industry connections. We maintain close ties with our independent auditors, many of whom used to work at LM!
Can predict with 100% accuracy that you won't ever be interested in moving into different areas of the company. We hire the smartest and most conscientious talent specifically for our Safety teams, and assume they'll never want to move into weapons capabilities advancement.
Annual Salary (USD)
Multiply the not-for-profit equivalent by 7X.
Join Us:
Apply
here by June 16, 2026 (after which it will probably be
too late).
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Apr 1, 2024 • 1min
EA - EA is now scandal-constrained by Guy Raveh
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: EA is now scandal-constrained, published by Guy Raveh on April 1, 2024 on The Effective Altruism Forum.
It's been at least a few months since the last proper EA scandals, and we're now desperately trying to squeeze headlines out of the past ones.
On the contrary, a few scandals have been wrapped up:
SBF was sentenced to 25 years in prison
The investigation regarding Owen Cotton-Barratt presented its findings
Whytham Abbey is being sold
Indeed, even OpenPhil's involvement in the Whytham Abbey sale shows they're now less willing to fund new scandals.
Therefore it seems to me that EA is now neither funding- nor talent-constrained, but rather scandal-constrained.
This cannot go on. We've all become accustomed to a neverending stream of scandals, and if that stream dwindles, we might find ourselves bored to death - or worse, the world might stop talking about EA all the time.
I therefore raise a few ideas for discussion - feel free to add your own:
EA Funds should open a new Scandal Fund to create a continuous supply.
CEA's community health team should hire a person to look harder for scandals lying under the surface.
Nick Bostrom should publish a book.
EA should work harder on encouraging group housing of people with their bosses, preferably in secluded areas abroad.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Apr 1, 2024 • 2min
EA - Surrendering with Dignity by MathiasKB
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Surrendering with Dignity, published by MathiasKB on April 1, 2024 on The Effective Altruism Forum.
It's obvious at this point humanity isn't going to solve the alignment problem. Since winning is untenable, I believe it's high time humanity begins drafting the terms of surrender.
I propose the following: AI gets the universe but in return we get Iceland.
There is yet time to set up favorable conditions for negotiation. Like Switzerland which set up explosives for key tunnels and bridges in preparation for an eventual German invasion before the second world war, even if we cannot win the war, we can bring our future AI overlords to the negotiating table by making it sufficiently annoying to win.
While we still control the playing field, let's set up nukes operated by Dead Man's Switches and get working on a device to destroy the earth's magnetic field to increase cosmic radiation and subsequent bit-flips. Some would argue such devices are just as likely to be used by the AI against us, but that is inconvenient to my argument so I'm going to go ahead and ignore that.
Dignified surrender is still a pre-paradigmatic field with lots of open research questions. For example, will it be necessary to ensure fishing-rights for the waters surrounding Iceland? If so, how much additional bargaining power will this require?
I myself, will be working on my latest book "Tiny Utopia", asking the important questions of what post-surrender life and meaning on Iceland will be like. What hats will be best to ward off the cold? Is it ethical to make them with Icelandic wool?
It's too late to win, but not yet too late to surrender. Let's get to work!
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org


