

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Jun 28, 2024 • 18min
EA - My experience founding what will hopefully be a high-impact for-profit by PV
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My experience founding what will hopefully be a high-impact for-profit, published by PV on June 28, 2024 on The Effective Altruism Forum.
In this post, I reflect on my journey with
Tälist and our platform
AltProtein.Jobs. I share my experience with and insights into the strategic decision-making process, misconceptions around our financial model, work-life balance as a founder, and other aspects, hoping they help others in their career decisions and entrepreneurial endeavors in high-impact sectors.
TL:DR
Product-market fit is Hard: Finding the right organizational structure and intervention (=service/product) has been an iterative process of learning new things, re-evaluating, and pivoting. It might seem obvious in retrospect why an organization ends up doing things a certain way, but the experience of getting to the right place feels like several consecutive steps with uncertainty and junctures along the way.
Pivoting to Meet Demand: The growing talent demand in the Alt. Protein industry demands efficient and scalable solutions. We pivoted to a matchmaking platform after starting as an industry-specific recruitment service and a comprehensive intervention comparison.
Communicating Our For-Profit Model: We're a for-profit company. This has been leading to misconceptions in the EA community and beyond, and we're working to get better at communicating why we chose this organizational structure. One of the main reasons is that it allows us to generate revenue and qualify for public innovation grants rather than relying completely on philanthropic money via funds.
Reframing Entrepreneurship for Work-Life Balance: I assume there are great people who avoid founding organizations or initiatives because they equate being a founder/entrepreneur with a hustle culture that wouldn't allow them to have a fulfilling and sustainable work-life balance. It helped me to reframe my role as "a regular job where I do entrepreneurial stuff."
About Alternative Proteins
Alternative Proteins is an umbrella term for various alternatives to animal products, from milk/dairy to fish & seafood to meat or just single components like fat or protein. Innovative techniques are used to create plant-based, cultivated, or precision-fermented products (as well as hybrids between these three "verticals" or latest developments like plant molecular farming).
Widespread adoption of alternative proteins is expected to play a crucial role in ending industrial animal agriculture, mitigating the environmental impact of our food system, reducing animal suffering, and decreasing the risk of pandemics from zoonotic disease.
The industry is growing, and so is the
demand for talent, which continues to outpace its supply. While talent and skill gaps are a bottleneck in many industries, the Alt. Protein sector faces additional industry-specific challenges (e.g. the requirement of very scientific and technical skills or that this niche industry and its career opportunities are still widely unknown to many promising candidates). This situation is pushing the date for when Alt. Proteins can scale to start displacing industrial animal agriculture.
Talent solutions within the Alt. Protein industry are an important intervention to address a key bottleneck and accelerate the industry and its expected impact.
Becoming a founder
The years before
I came across EA in 2016, when I was working as a consultant & project manager in the private sector, starting an extra-occupational MBA in General Management. Through my work and MBA, I discovered my interest in entrepreneurship. In 2017, I founded EA Dresden with my now-husband. Still, it took me a couple of years to fully embrace the thought of changing my career to maximize my impact.
In parallel, I also became more and more intrigued about the impact potential of the Alt. Protein i...

Jun 28, 2024 • 8min
LW - how birds sense magnetic fields by bhauth
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: how birds sense magnetic fields, published by bhauth on June 28, 2024 on LessWrong.
introduction
It is known that many birds are able to sense the direction of Earth's magnetic field. Here's a wikipedia page on that general phenomenon. There have been 2 main theories of how that works.
One theory is that birds have magnets in their beak that act like a compass. We know this is the correct theory because:
Small magnetite crystals have been found in bird beaks.
Anaesthesia of bird beaks seems to affect their magnetic sense, sometimes.
The other theory is that birds have some sensing mechanism in their eyes that uses magneto-optical effects. We know this is the correct theory because:
Birds can't sense magnetic field direction in red light.
Covering the right eye of birds prevents them from sensing field direction.
We also know those theories probably aren't both correct because:
Most animals don't have a magnetic field sense. It's implausible that birds developed two separate and redundant systems for sensing magnetic fields when other animals didn't develop one.
organic magneto-optics
It's possible for magnetic fields to affect the optical properties of molecules; here's an example, a fluorescent protein strongly affected by a small magnet. However, known examples of this require much stronger (~1000x) fields than the Earth's magnetic field.
Let's suppose birds sense magnetic fields using some proteins in their eyes that directly interact with fields. The energy density of a magnetic field is proportional to the field strength^2. The energy of interaction of a magnet with a field is proportional to the product of the field strengths. The earth has a field of 25 to 65 μT.
If we consider the energy of a strongly magnetic protein interacting with the Earth's magnetic field, that's not enough energy to directly cause a cellular signalling effect.
So, magnetic fields must act to control some energy-transferring process, and the only logical possibilities are light absorption/emission and transfer of excited states between molecules. Birds can sense the direction of magnetic fields, more so than field strength, so the effect of magnetic fields must be relative to the orientation of something.
Molecules are randomly oriented, but absorption/emission of a photon is relative to molecule orientation, so magnetic fields can create differences in absorption/emission of light at different angles. (That's the basis of a spectroscopy technique I previously proposed.)
For excited states of molecules to interact with a magnetic field, they must have a magnetic field. The excited states with the strongest fields would logically be triplet states, where the spin of an electron is reversed, creating a net spin difference of 2. (The magnetism of iron comes from the spin of its one unpaired electron, so triplet states are more magnetic than iron atoms.)
Molecules absorb/emit photons only of specific wavelengths: as energy and momentum are conserved, molecules must have a vibrational mode that matches the photon. Magnetic fields can shift what wavelengths are absorbed. Considering the energy density of the Earth's magnetic field and the magnetic field of triplet states, shifting the affected wavelengths of visible light by 1nm seems feasible.
A blue sky doesn't seem to have sharp enough spectral lines. Can one be made artificially? It's not normally possible to absorb a wide spectrum of light and emit a narrow spectral line: thermodynamically, a more narrow spectrum has a higher "temperature". The spectral width of emission is typically about the same as the width of absorption. (This is why early laser types are so inefficient: they only absorb a small fraction of the light used to pump them.
Systems using diode lasers are more efficient.) Thus, we need to absorb only a narrow spectral lin...

Jun 28, 2024 • 37min
LW - Childhood and Education Roundup #6: College Edition by Zvi
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Childhood and Education Roundup #6: College Edition, published by Zvi on June 28, 2024 on LessWrong.
Childhood roundup #5 excluded all developments around college. So this time around is all about issues related to college or graduate school, including admissions.
Tuition and Costs
What went wrong with federal student loans? Exactly what you would expect when you don't check who is a good credit risk. From a performance perspective, the federal government offered loans to often-unqualified students to attend poor-performing, low-value institutions. Those students then did not earn much and were often unable to repay the loans. The students are victims here too, as we told them to do it.
Alas, none of the proposed student loan solutions involve fixing the underlying issue. If you said 'we are sorry we pushed these loans on students and rewarded programs and institutions that do not deserve it, and we are going to stop giving loans for those programs and institutions and offer help to the suffering former students, ideally passing some of those costs on to the institutions' then I would understand that.
Instead, our programs are moving dollars mostly to relatively rich people who can afford to pay, and by offering forgiveness we are making the underlying problems far worse rather than better. Completely unacceptable even if it were constitutional.
Colorado governor Jared Polis, who really ought to know better, signs bipartisan bill to make first two years of college free for students whose family income is under $90k/year at in-state public schools. Technically this is 65 credits not counting AP/IB, concurrent enrollment, military credit or credit for prior learning, so there is even more incentive to get such credits.
The good news is they do have a full cliff, this falls off as you approach $90k, so they dodged the full version of quit-your-job insanity. The obvious bad news is that this is effectively one hell of a tax increase.
The less obvious bad news is this is setting up a huge disaster. Think about what the student who actually needs this help will do. They will go to a local college for two years for free. If they do well, they'll get to 65 credits.
Then the state will say 'oops, time to pay tuition.' And what happens now? Quite a lot of them will choose to, or be forced to, leave college and get a job.
This is a disaster for everyone. The benefits of college mostly accrue to those who finish. At least roughly 25% of your wage premium is the pure Sheepskin Effect for getting your degree. If you aren't going to finish and were a marginal student to begin with (hence the not finishing), you are better off not going, even for free.
I do not think we should be in the business of providing universal free college. There are real costs involved, including the negative externalities involved in accelerating credentialism. However, if we do want to make this offer to help people not drown, we need to at least not stop it halfway across the stream.
What Your Tuition Buys You
The real life version of the college where there degree students who pay for a degree but aren't allowed to come to class versus the non-degree students who get no degree but are educated for free. To be clear, this is totally awesome.
David Weekly: This seems kinda…radical? ASU makes its courses available to anyone for $25/course. After you take the class, if you want the grade you got added to an official transcript with a credit you can use, +$400. These are real college credits. 8 year olds are getting college credits!
Emmett Shear: This is cool to me because you can see the core of university economics right there. Bundling $25 worth of education with $400 of credentialist gatekeeping. I'm not blaming ASU, it's cool they're doing this, but that is deeply broken.
Sudowoodo: Totally understand y...

Jun 28, 2024 • 17min
LW - Corrigibility = Tool-ness? by johnswentworth
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Corrigibility = Tool-ness?, published by johnswentworth on June 28, 2024 on LessWrong.
Goal of This Post
I have never seen anyone give a satisfying intuitive explanation of what corrigibility (in roughly Eliezer's sense of the word) is. There's
lists of desiderata, but they sound like scattered wishlists which don't obviously point to a unified underlying concept at all. There's also Eliezer's
extremely meta pointer:
We can imagine, e.g., the AI imagining itself building a sub-AI while being prone to various sorts of errors, asking how it (the AI) would want the sub-AI to behave in those cases, and learning heuristics that would generalize well to how we would want the AI to behave if it suddenly gained a lot of capability or was considering deceiving its programmers and so on.
… and that's basically it.[1]
In this post, we're going to explain a reasonably-unified concept which seems like a decent match to "corrigibility" in Eliezer's sense.
Tools
Starting point: we think of a thing as corrigible exactly insofar as it is usefully thought-of as a tool. A screwdriver, for instance, is an excellent central example of a corrigible object. For AI alignment purposes, the challenge is to achieve corrigibility - i.e. tool-ness - in much more general, capable, and intelligent systems.
… that all probably sounds like a rather nebulous and dubious claim, at this point. In order for it to make sense, we need to think through some key properties of "good tools", and also how various properties of incorrigibility make something a "bad tool".
We broke off a separate post on
what makes something usefully thought-of as a tool. Key ideas:
Humans tend to solve problems by finding partial plans with "gaps" in them, where the "gaps" are subproblems which the human will figure out later. For instance, I might make a plan to decorate my apartment with some paintings, but leave a "gap" about how exactly to attach the paintings to the wall; I can sort that out later.[2]
Sometimes many similar subproblems show up in my plans, forming a cluster.[3] For instance, there's a cluster (and many subclusters) of subproblems which involve attaching things together.
Sometimes a thing (a physical object, a technique, whatever) makes it easy to solve a whole cluster of subproblems. That's what tools are. For instance, a screwdriver makes it easy to solve a whole subcluster of attaching-things-together subproblems.
How does that add up to corrigibility?
Respecting Modularity
One key piece of the above picture is that the gaps/subproblems in humans' plans are typically modular - i.e. we expect to be able to solve each subproblem without significantly changing the "outer" partial plan, and without a lot of coupling between different subproblems. That's what makes the partial plan with all its subproblems useful in the first place: it factors the problem into loosely-coupled subproblems.
Claim from the tools post: part of what it means for a tool to solve a subproblem-cluster is that the tool roughly preserves the modularity of that subproblem-cluster. That means the tool should not have a bunch of side effects which might mess with other subproblems, or mess up the outer partial plan. Furthermore, the tool needs to work for a whole subproblem-cluster, and that cluster includes similar subproblems which came up in the context of many different problems.
So, the tool needs to robustly not have side effects which mess up the rest of the plan, across a wide range of possibilities for what "the rest of the plan" might be.
Concretely: a screwdriver which sprays flames out the back when turned is a bad tool; it usually can't be used to solve most screw-turning subproblems when the bigger plan takes place in a wooden building.
Another bad tool: a screwdriver which, when turned, also turns the lights on and off, cau...

Jun 28, 2024 • 3min
LW - Secondary forces of debt by KatjaGrace
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Secondary forces of debt, published by KatjaGrace on June 28, 2024 on LessWrong.
A general thing I hadn't noticed about debts until lately:
Whenever Bob owes Alice, then Alice has reason to look after Bob, to the extent that increases the chance he satisfies the debt.
Yet at the same time, Bob has an incentive for Alice to disappear, insofar as it would relieve him.
These might be tiny incentives, and not overwhelm for instance Bob's many reasons for not wanting Alice to disappear.
But the bigger the owing, the more relevant the incentives. When big enough, the former comes up as entities being "too big to fail", and potentially rescued from destruction by those who would like them to repay or provide something expected of them in future. But the opposite must exist also: too big to succeed - where the abundance owed to you is so off-putting to provide that those responsible for it would rather disempower you.
And if both kinds of incentive are around in whisps whenever there is a debt, surely they often get big enough to matter, even before they become the main game.
For instance, if everyone around owes you a bit of money, I doubt anyone will murder you over it. But I wouldn't be surprised if it motivated a bit more political disempowerment for you on the margin.
There is a lot of owing that doesn't arise from formal debt, where these things also apply. If we both agree that I - as your friend - am obliged to help you get to the airport, you may hope that I have energy and fuel and am in a good mood. Whereas I may (regretfully) be relieved when your flight is canceled.
Money is an IOU from society for some stuff later, so having money is another kind of being owed. Perhaps this is part of the common resentment of wealth.
I tentatively take this as reason to avoid debt in all its forms more: it's not clear that the incentives of alliance in one direction make up for the trouble of the incentives for enmity in the other. And especially so when they are considered together - if you are going to become more aligned with someone, better it be someone who is not simultaneously becoming misaligned with you.
Even if such incentives never change your behavior, every person you are obligated to help for an hour on their project is a person for whom you might feel a dash of relief if their project falls apart. And that is not fun to have sitting around in relationships.
(Inpsired by reading The Debtor's Revolt by Ben Hoffman lately, which may explicitly say this, but it's hard to be sure because I didn't follow it very well. Also perhaps inspired by a recent murder mystery spree, in which my intuitions have absorbed the heuristic that having something owed to you is a solid way to get murdered.)
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Jun 28, 2024 • 1h 12min
LW - AI #70: A Beautiful Sonnet by Zvi
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #70: A Beautiful Sonnet, published by Zvi on June 28, 2024 on LessWrong.
They said it couldn't be done.
No, not Claude Sonnet 3.5 becoming the clear best model.
No, not the Claude-Sonnet-empowered automatic meme generators. Those were whipped together in five minutes.
They said I would never get quiet time and catch up. Well, I showed them!
That's right. Yes, there is a new best model, but otherwise it was a quiet week. I got a chance to incorporate the remaining biggest backlog topics. The RAND report is covered under Thirty Eight Ways to Steal Your Model Weights. Last month's conference in Seoul is covered in You've Got Seoul. I got to publish my thoughts on OpenAI's Model Spec last Friday.
Table of Contents
Be sure to read about Claude 3.5 Sonnet here. That is by far the biggest story.
1. Introduction.
2. Table of Contents.
3. Language Models Offer Mundane Utility. I am increasingly persuaded.
4. Language Models Don't Offer Mundane Utility. EU's DMA versus the AiPhone.
5. Clauding Along. More people, mostly impressed.
6. Fun With Image Generation. They are coming for our memes. Then Hollywood.
7. Copyright Confrontation. The RIAA does the most RIAA thing.
8. Deepfaketown and Botpocalypse Soon. Character.ai addiction. Am I out of touch?
9. They Took Our Jobs. More arguments that the issues lie in the future.
10. The Art of the Jailbreak. We need to work together as a team.
11. Get Involved. AISI, Apollo, Astra, Accra, BlueDot, Cybersecurity and DOE.
12. Introducing. Forecasting, OpenAI Mac App, Otto, Dot, Butterflies, Decagon.
13. In Other AI News. OpenAI equity takes steps forward. You can sell it.
14. Quiet Speculations. A distinct lack of mojo.
15. You've Got Seoul. Delayed coverage of the Seoul summit from last month.
16. Thirty Eight Ways to Steal Your Model Weights. Right now they would all work.
17. The Quest for Sane Regulations. Steelmanning restraint.
18. SB 1047. In Brief.
19. The Week in Audio. Dwarkesh interviews Tony Blair, and many more.
20. Rhetorical Innovation. A demolition, and also a disputed correction.
21. People Are Worried About AI Killing Everyone. Don't give up. Invest wisely.
22. Other People Are Not As Worried About AI Killing Everyone. What even is ASI?
23. The Lighter Side. Eventually the AI will learn.
Language Models Offer Mundane Utility
Training only on (x,y) pairs, define the function f(x), compose and invert it without in-context examples or chain of thought.
AI Dungeon will let you be the DM and take the role of the party, if you prefer.
Lindy 'went rogue' and closed a customer on its own. They seem cool with it?
Persuasive capability of the model is proportional to the log of the model size, says paper. Author Kobi Hackenburg paints this as reassuring, but the baseline is that everything scales with the log of the model size. He says this is mostly based on 'task completion' and staying on topic improving, and current frontier models are already near perfect at that, so he is skeptical we will see further improvement. I am not.
I do believe the result that none of the models was 'more persuasive than human baseline' in the test, but that is based on uncustomized messages on generic political topics. Of course we should not expect above human performance there for current models.
75% of knowledge workers are using AI, but 78% of the 75% are not telling the boss.
Build a team of AI employees to write the first half of your Shopify CEO speech from within a virtual office, then spend the second half of the speech explaining how you built the team. It is so weird to think 'the best way to get results from AI employees I can come up with is to make them virtually thirsty so they will have spontaneous water cooler conversations.' That is the definition of scratching the (virtual) surface.
Do a bunch of agent-based analysis off a si...

Jun 27, 2024 • 40min
EA - My Current Claims and Cruxes on LLM Forecasting & Epistemics by Ozzie Gooen
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My Current Claims and Cruxes on LLM Forecasting & Epistemics, published by Ozzie Gooen on June 27, 2024 on The Effective Altruism Forum.
I think that recent improvements in LLMs have brought us to the point where LLM epistemic systems are starting to be useful. After spending some time thinking about it, I've realized that such systems, broadly, seem very promising to me as an effective altruist intervention area. However, I think that our community has yet to do a solid job outlining what this area could look like or figuring out key uncertainties.
This document presents a rough brainstorm on these topics. While I could dedicate months to refining these ideas, I've chosen to share these preliminary notes now to spark discussion. If you find the style too terse, feel free to use an LLM to rewrite it in a format you prefer.
I believe my vision for this area is more ambitious and far-reaching (i.e. not narrow to a certain kind of forecasting) than what I've observed in other discussions. I'm particularly excited about AI-heavy epistemic improvements, which I believe have greater potential than traditional forecasting innovations.
I'm trying to figure out what to make of this regarding our future plans at QURI, and I recommend that other organizations in the space consider similar updates.
Key Definitions:
Epistemic process: A set of procedures to do analysis work, often about topics with a lot of uncertainty. This could be "have one journalist do everything themselves", to a complex (but repeatable) ecosystem of multiple humans and software systems.
LLM-based Epistemic Process (LEP): A system that relies on LLMs to carry out most or all of an epistemic process. This might start at ~10% LLM-labor, but can gradually ramp up. I imagine that such a process is likely to feature some kinds of estimates or forecasts.
Scaffolding: Software used around an LLM system, often to make it valuable for specific use-cases. In the case of an LEP, a lot of scaffolding might be needed.
1. High-Level Benefits & Uses
Claim 1: If humans could forecast much better, these humans should make few foreseeable mistakes. This covers many mistakes, particularly ones we might be worried about now.
Someone deciding about talking to a chatbot that can be predicted to be net-negative (perhaps it would create an unhealthy relationship) could see this forecast and simply decide not to start the chat.
Say that a person's epistemic state could follow one of four trajectories, depending on some set of reading materials. For example, one set is conspiratorial, one is religious, etc. Good forecasting could help forecast this and inform the person ahead of time.
Note that this can be radical and perhaps dangerous. For example, a religious family knowing how to keep their children religious with a great deal of certainty.
Say that one of two political candidates is predictably terrible. This could be made clear to voters who trust said prediction.
If an AI actor is doing something likely to be monopolistic or dangerous, this would be made more obvious to itself and those around it.
Note: There will also be unforeseeable mistakes, but any actions that we could do that are foreseeably-high-value for them, could be predicted. For example, general-purpose risk mitigation measures.
Claim 2: Highly intelligent / epistemically capable organizations are likely to be better at coordination.
This might well mean fewer wars and conflict, along with corresponding military spending.
If highly capable actors were in a prisoner's dilemma, the results could be ugly. But very often, there's a lot of potential and value in not getting into one in the first place.
Evidence: From The Better Angels of Our Nature, there's significant evidence that humanity has become significantly less violent over time. One potential exception is t...

Jun 27, 2024 • 11min
EA - How can we get the world to talk about factory farming? by LewisBollard
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How can we get the world to talk about factory farming?, published by LewisBollard on June 27, 2024 on The Effective Altruism Forum.
Note: This post was crossposted from the Open Philanthropy Farm Animal Welfare Research Newsletter by the Forum team, with the author's permission. The author may not see or respond to comments on this post.
Silence favors the status quo, which doesn't favor animals
It's easy to ignore factory farming. Inflation sparks public debates about the economy. Natural disasters spur news stories on climate change. Advances in artificial intelligence prompt discussion of its risks. But abuses on factory farms go ignored.
An analysis by my colleague Emma Buckland found that, since 2010, global English-language print and online news coverage of factory farming has only grown in line with other reporting on agriculture (see graph below). By contrast, coverage of climate change has grown two to three times faster. Google News lists 0.02 - 0.4% as many articles in the last week on factory farming as on climate change.
Undercover investigations once broke this media silence. In the decade up to 2018, top US media outlets, like
CBS,
CNN, and
NBC, routinely covered their findings. Since then, they seldom have. Before 2018, 27 undercover investigations from the top
three
investigative
groups surpassed 500,000 views on YouTube. Since then, none have.
This matters because factory farming thrives in the dark. Many industry practices are publicly indefensible, so the industry prefer to not publicly discuss them at all. And when the media ignores factory farming, politicians and corporate leaders can too.
A 2022 Faunalytics
study tested the impact of various advocacy tactics on 2,405 people. News articles and social media posts most reduced self-reported animal product consumption and improved attitudes toward farm animal welfare. (Though the impact of all tactics was small.) They also didn't trigger a backlash, as more confrontational tactics like disruptive protests did.
Why is factory farming so rarely publicly discussed? Some blame industry capture of the media. But the industry struggles to get news coverage too. The US chicken industry's main communications initiative,
Chicken Check-In, appears to have never secured a story in a mainstream news outlet or many "likes" on its
social
media posts. The problem is not media bias, but media indifference.
That indifference likely has many causes. Factory farming's horrors aren't new, so they're not "news." The topic is too obscure for most newspapers, too gruesome for most television shows, and too mundane for most online culture warriors. It doesn't help that animals can't speak, so they can't squawk about their plight online.
The decline in coverage of undercover investigations is more mysterious. It may have to do with
ag-gag laws, the collapse of investigative journalism, or the media's obsession with US politics. But it may also be thanks to
pink slime. ABC News' reporting on that meat-derived goo ensnared it in a lawsuit, which led to a record defamation settlement of
$177M in 2017. Soon afterward, media coverage of factory farm investigations began to decline.
The story on social media is even less clear. The algorithms likely changed, but we don't know how or why. We may be victims of the social media giants' 2016 post-election crack-down on distressing videos. Or the algorithms may just have mastered what people really want to watch - and the answer is
kittens in a maze, not tortured chickens.
What can we do about this? I'm no PR expert, so I asked some movement leaders who are, plus a few friends in the media. They had lots of ideas - far too many to list here. So I focus below on some broad points of agreement across three areas: media, influencers, and narrative. (A disclaimer: this is a list of int...

Jun 27, 2024 • 13min
AF - Representation Tuning by Christopher Ackerman
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Representation Tuning, published by Christopher Ackerman on June 27, 2024 on The AI Alignment Forum.
Summary
First, I identify activation vectors related to honesty in an RLHF'd LLM (Llama-2-13b-chat). Next, I demonstrate that model output can be made more or less honest by adding positive or negative multiples of these vectors to residual stream activations during generation. Then, I show that a similar effect can be achieved by fine-tuning the vectors directly into (or out of) the model, by use of a loss function based on the cosine similarity of residual stream activations to the vectors.
Finally, I compare the results to fine-tuning with a token-based loss on honest or dishonest prompts, and to online steering. Overall, fine-tuning the vectors into the models using the cosine similarity loss had the strongest effect on shifting model output in the intended direction, and showed some resistance to subsequent steering, suggesting the potential utility of this approach as a safety measure.
This work was done as the capstone project for BlueDot Impact's AI Safety Fundamentals - Alignment course, June 2024
Introduction
The concept of activation steering/representation engineering is simple, and it is remarkable that it works. First, one identifies an activation pattern in a model (generally in the residual stream input or output) corresponding to a high-level behavior like "sycophancy" or "honesty" by a simple expedient such as running pairs of inputs with and without the behavior through the model and taking the mean of the differences in the pairs' activations.
Then one adds the resulting vector, scaled by +/- various coefficients, to the model's activations as it generates new output, and the model gives output that has more or less of the behavior, as one desires. This would seem quite interesting from the perspective of LLM interpretability, and potentially safety.
Beneath the apparent simplicity of activation steering, there are a lot of details and challenges, from deciding on which behavioral dimension to use, to identifying the best way to elicit representations relevant to it in the model, to determining which layers to target for steering, and more.
A number of differing approaches having been reported and many more are possible, and I explored many of them before settling on one to pursue more deeply; see this github repo for a longer discussion of this process and associated code.
In this work I extend the activation steering concept by permanently changing the weights of the model via fine-tuning, obviating the need for active steering with every input. Other researchers have independently explored the idea of fine-tuning as a replacement for online steering, but this work is distinctive in targeting the tuning specifically at model activations, rather than the standard method of tuning based on model output deviations from target output.
In addition to offering compute savings due to not having to add vectors to every token at inference, it was hypothesized that this approach might make the model more robust in its intended behavior. See this github repo for representation tuning code and methods. Tuned models are available in this HuggingFace repo.
The basic approach I use in the work is as follows:
1. Identify candidate steering vectors for the behavioral dimension of interest (here, Honesty) via contrastive factual true/false prompts and PCA.
2. Use visualizations to infer the meaning of the vectors and candidate model layers to target for steering/tuning.
3. Identify the most effective steering parameters (layers and multipliers) via steering on an evaluation dataset containing contrastive prompts (but no labels).
4. Fine tune the vectors into or out of the model, targeting the layers identified above, using cosine similarity loss and, separately, f...

Jun 27, 2024 • 15min
EA - Detecting Genetically Engineered Viruses With Metagenomic Sequencing by Jeff Kaufman
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Detecting Genetically Engineered Viruses With Metagenomic Sequencing, published by Jeff Kaufman on June 27, 2024 on The Effective Altruism Forum.
This represents work from several people at the NAO. Thanks especially to Dan Rice for implementing the duplicate junction detection, and to @Will Bradshaw and @mike_mclaren for editorial feedback.
Summary
If someone were to intentionally cause a
stealth pandemic today, one of the ways they might do it is by modifying an existing virus. Over the past few months we've been working on building a computational pipeline that could flag evidence of this kind of genetic engineering, and we now have an initial pipeline working end to end.
When given 35B read pairs of wastewater sequencing data it raises 14 alerts for manual review, 13 of which are quickly dismissible false positives and one is a known genetically engineered sequence derived from HIV. While it's hard to get a good estimate before actually going and doing it, our best guess is that if this system were deployed at the scale of approximately $1.5M/y it could detect something genetically engineered that shed like SARS-CoV-2 before 0.2% of people had been infected.
System Design
The core of the system is based on two observations:
If someone has made substantial modifications to an existing virus then somewhere in the engineered genome there will be a series of bases that are a good match for the original genome followed by a series of bases that are a poor match for the original genome. We can look for sequencing reads that have this property and raise them for human review.
Chimeric reads can occur as an artifact of sequencing, which can lead to false positives. The chance that you would see multiple chimeras involving exactly the same junction by chance, however, is relatively low. By requiring 2x coverage of the junction we can remove almost all false positives, at the cost of requiring approximately twice as much sequencing.
Translating these observations into sufficiently performant code that does not trigger alerts on common sequencing artifacts has taken some work, but we now have this running.
While it would be valuable to release our detector so that others can evaluate it or apply it to their own sequencing reads, knowing the details of how we have applied this algorithm could allow someone to engineer sequences that it would not be able to detect. While we would like to build a detection system that can't be more readily bypassed once you know how it works, we're unfortunately not there yet.
Evaluation
We have evaluated the system in two ways: by measuring its performance on simulated genetic engineered genomes and by applying it to a real-world dataset collected by a partner lab.
Simulation
We chose a selection of 35 viruses that
Virus Host DB categorizes as human-infecting viruses, with special attention to respiratory viruses:
Disease
Virus
Genome Length
AIDS
HIV
9,000
Chickenpox and Shingles
Human alphaherpesvirus 3
100,000
Chikungunya
Chikungunya virus
10,000
Common cold
Human coronavirus 229E
30,000
Common cold
Human coronavirus NL63
30,000
Common cold
Human coronavirus OC43
30,000
Common cold
Human rhinovirus NAT001
7,000
Common cold
Rhinovirus A1
7,000
Common cold
Rhinovirus B3
7,000
Conjunctivitis
Human adenovirus 54
30,000
COVID-19
SARS-CoV-2
30,000
Ebola
Ebola
20,000
Gastroenteritis
Astrovirus MLB1
6,000
Influenza
Influenza A Virus, H1N1
10,000
Influenza
Influenza A Virus, H2N2
10,000
Influenza
Influenza A Virus, H3N2
10,000
Influenza
Influenza A Virus, H7N9
10,000
Influenza
Influenza A Virus, H9N2
10,000
Influenza
Influenza C Virus
10,000
Measles
Measles morbillivirus
20,000
MERS
MERS Virus
30,000
Metapneumovirus infection
Human metapneumovirus
10,000
Mononucleosis
Human herpesvirus 4 type 2
200,000
MPox
Monkeypox virus
200,000
Mumps
Mumps orthor...


