

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

May 21, 2024 • 26min
EA - The suffering of a farmed animal is equal in size to the happiness of a human, according to a survey by Stijn
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The suffering of a farmed animal is equal in size to the happiness of a human, according to a survey, published by Stijn on May 21, 2024 on The Effective Altruism Forum.
Author: Stijn Bruers, researcher economics KU Leuven
Short summary
According to a survey among a representative sample of the Belgian population, most people believe that farmed animals like chickens have the same capacity for suffering as humans, and that most farmed land animals (broiler chickens) have negative welfare levels (i.e. experience more suffering than happiness).
The average suffering of a farmed land animal, estimated by people, is equal in size to the positive welfare of an average human (in Belgium) whereas the welfare level of a wild bird is zero on average.
Given the fact that there are more farmed animals than humans in the world, and that the populations of small farmed animals (chickens, fish, shrimp and insects) are increasing, most people would have to come to the conclusion that net global welfare (of humans, farmed animals and wild animals combined) is negative and declining. People who care about global welfare should therefore strongly prioritize decreasing animal farming and improving farmed animal welfare conditions.
Introduction
How much do farmed animals such as broiler chickens suffer? How can we compare the welfare of animals and humans? These are crucially important questions, because knowing the welfare capacities and welfare levels of humans and non-human animals is necessary to prioritize strategies to improve welfare on Earth. They can also be used to estimate the global welfare state of the world, as was first done by Fish (2023).
His results were very pessimistic: net global welfare may be negative and declining, due to the increased farming of small animals (chicken, fish, shrimp and possibly insects). The top-priority to improve global welfare and decrease suffering on Earth becomes very clear: decrease animal farming (or decrease the suffering of farmed animals).
Fish arrived at these pessimistic results using welfare range and welfare level estimates by animal welfare experts at Rethink Priorities (the Moral Weight Project) and Charity Entrepreneurship (the Weighted Animal Welfare Index).
However, the calculations by Fish may be criticized on the point that his choice of welfare ranges and welfare levels was too arbitrary, because it first involved the arbitrary choice of source or group of experts, and those experts themselves also made arbitrary choices to arrive at their welfare range and level estimates.
Perhaps people believe that the welfare capacities and levels of animal suffering used by Fish were overestimated? Perhaps people won't believe his results because they don't believe that animals have such high capacities for suffering?
In order to convince the general public, we can instead consider the estimates of welfare ranges and welfare levels of animals given by the wider public. To do so, a survey among a representative sample of the Flemish population in Belgium was conducted to study how much sentience people ascribe to non-human animals. The estimates of animal welfare ranges by the general public were more animal-positive than those of Rethink Priorities.
Most respondents gave higher values of animal welfare ranges than those given by the animal welfare experts at Rethink Priorities. According to the general public, Rethink Priorities may have underestimated the animal welfare ranges. Furthermore, most people estimate that the welfare level of most farmed land animals (chickens) is negative, and in absolute value as large as the positive welfare level of humans (in line with the Animal Welfare Index estimates by Charity Entrepreneurship).
Hence, according to the general public, the results of Fish were too optimistic. The global welfare sta...

May 21, 2024 • 6min
AF - EIS XIII: Reflections on Anthropic's SAE Research Circa May 2024 by Stephen Casper
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: EIS XIII: Reflections on Anthropic's SAE Research Circa May 2024, published by Stephen Casper on May 21, 2024 on The AI Alignment Forum.
Part 13 of 12 in the
Engineer's Interpretability Sequence.
TL;DR
On May 5, 2024,
I made a set of 10 predictions about what the next sparse autoencoder (SAE) paper from Anthropic would and wouldn't do.
Today's new SAE paper from Anthropic was full of brilliant experiments and interesting insights, but it ultimately underperformed my expectations. I am beginning to be concerned that Anthropic's recent approach to interpretability research might be better explained by safety washing than practical safety work.
Think of this post as a curt editorial instead of a technical piece. I hope to revisit my predictions and this post in light of future updates.
Reflecting on predictions
Please
see my original post for 10 specific predictions about what
today's paper would and wouldn't accomplish. I think that Anthropic obviously did 1 and 2 and obviously did not do 4, 5, 7, 8, 9, and 10. Meanwhile, I think that their experiments to identify
specific and
safety-relevant features should count for 3 (proofs of concept for a useful type of task) but definitely do not count for 6 (*competitively* finding and removing a harmful behavior that was represented in the training data).
Thus, my assessment is that Anthropic did 1-3 but not 4-10.
I have been wrong with mech interp predictions in the past, but this time, I think I was 10 for 10: everything I predicted with >50% probability happened, and everything I predicted with <50% probability did not happen.
Overall, the paper underperformed my expectations. If you scored the paper relative to my predictions by giving it (1-p) points when it did something that I predicted it would do with probability p and -p points when it did not, the paper would score -0.74.
A review + thoughts
I think that Anthropic's new SAE work has continued to be like lots of prior high-profile work on mechanistic interpretability - it has focused on presenting illustrative examples, streetlight demos, and cherry-picked proofs of concept. This is useful for science, but it does not yet show that SAEs are helpful and competitive for diagnostic and debugging tasks that could improve AI safety.
I feel increasingly concerned about how Anthropic motivates and sells its interpretability research in the name of safety. Today's paper makes some major Motte and Bailey claims that oversell what was accomplished like "
Eight months ago, we demonstrated that sparse autoencoders could recover monosemantic features from a small one-layer transformer," "
Sparse autoencoders produce interpretable features for large models," and "
The resulting features are highly abstract: multilingual, multimodal, and generalizing between concrete and abstract references." The paper also made some omissions of past literature on interpretability illusions (e.g., Bolukbasi et al., 2021), which their methodology seems prone to. Normally, problems like this are mitigated by peer review, which Anthropic does not participate in.
Meanwhile, whenever Anthropic puts out new interpretability research, I see a laundry list of posts from the company and employees to promote it. They always seem to claim the same thing - that some 'groundbreaking new progress has been made' and that 'the model was even more interpretable than they thought' but that 'there remains progress to be made before interpretability is solved'. I won't link to any specific person's posts, but here is Anthropic's post from
today and
October 2023.
The way that Anthropic presents its interpretability work has real-world consequences. For example, it seems to have led to
viral claims that interpretability will be solved and that we are bound for safe models. It has also led to at least one claim in a pol...

May 21, 2024 • 33min
LW - On Dwarkesh's Podcast with OpenAI's John Schulman by Zvi
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Dwarkesh's Podcast with OpenAI's John Schulman, published by Zvi on May 21, 2024 on LessWrong.
Dwarkesh Patel recorded a Podcast with John Schulman, cofounder of OpenAI and at the time their head of current model post-training. Transcript here. John's job at the time was to make the current AIs do what OpenAI wanted them to do. That is an important task, but one that employs techniques that their at-the-time head of alignment, Jan Leike, made clear we should not expect to work on future more capable systems. I strongly agree with Leike on that.
Then Sutskever left and Leike resigned, and John Schulman was made the new head of alignment, now charged with what superalignment efforts remain at OpenAI to give us the ability to control future AGIs and ASIs.
This gives us a golden opportunity to assess where his head is at, without him knowing he was about to step into that role.
There is no question that John Schulman is a heavyweight. He executes and ships. He knows machine learning. He knows post-training and mundane alignment.
The question is, does he think well about this new job that has been thrust upon him?
The Big Take
Overall I was pleasantly surprised and impressed.
In particular, I was impressed by John's willingness to accept uncertainty and not knowing things.
He does not have a good plan for alignment, but he is far less confused about this fact than most others in similar positions.
He does not know how to best navigate the situation if AGI suddenly happened ahead of schedule in multiple places within a short time frame, but I have not ever heard a good plan for that scenario, and his speculations seem about as directionally correct and helpful as one could hope for there.
Are there answers that are cause for concern, and places where he needs to fix misconceptions as quickly as possible? Oh, hell yes.
His reactions to potential scenarios involved radically insufficient amounts of slowing down, halting and catching fire, freaking out and general understanding of the stakes.
Some of that I think was about John and others at OpenAI using a very weak definition of AGI (perhaps partly because of the Microsoft deal?) but also partly he does not seem to appreciate what it would mean to have an AI doing his job, which he says he expects in a median of five years.
His answer on instrumental convergence is worrisome, as others have pointed out. He dismisses concerns that an AI given a bounded task would start doing things outside the intuitive task scope, or the dangers of an AI 'doing a bunch of wacky things' a human would not have expected. On the plus side, it shows understanding of the key concepts on a basic (but not yet deep) level, and he readily admits it is an issue with commands that are likely to be given in practice, such as 'make money.'
In general, he seems willing to react to advanced capabilities by essentially scaling up various messy solutions in ways that I predict would stop working at that scale or with something that outsmarts you and that has unanticipated affordances and reason to route around typical in-distribution behaviors.
He does not seem to have given sufficient thought to what happens when a lot of his assumptions start breaking all at once, exactly because the AI is now capable enough to be properly dangerous.
As with the rest of OpenAI, another load-bearing assumption is presuming gradual changes throughout all this, including assuming past techniques will not break. I worry that will not hold.
He has some common confusions about regulatory options and where we have viable intervention points within competitive dynamics and game theory, but that's understandable, and also was at the time very much not his department.
As with many others, there seems to be a disconnect.
A lot of the thinking here seems like excellent practical thi...

May 21, 2024 • 12min
LW - New voluntary commitments (AI Seoul Summit) by Zach Stein-Perlman
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: New voluntary commitments (AI Seoul Summit), published by Zach Stein-Perlman on May 21, 2024 on LessWrong.
Basically the companies commit to make responsible scaling policies.
Part of me says this is amazing, the best possible commitment short of all committing to a specific RSP. It's certainly more real than almost all other possible kinds of commitments. But as far as I can tell, people pay almost no attention to what RSP-ish documents (Anthropic, OpenAI, Google) actually say and whether the companies are following them. The discourse is more like "Anthropic, OpenAI, and Google have safety plans and other companies don't." Hopefully that will change.
Maybe "These commitments represent a crucial and historic step forward for international AI governance." It does seem nice from an international-governance perspective that Mistral AI, TII, and a Chinese company joined.
The UK and Republic of Korea governments announced that the following organisations have agreed to the Frontier AI Safety Commitments:
Amazon
Anthropic
Cohere
Google
G42
IBM
Inflection AI
Meta
Microsoft
Mistral AI
Naver
OpenAI
Samsung Electronics
Technology Innovation Institute
xAI
Zhipu.ai
The above organisations, in furtherance of safe and trustworthy AI, undertake to develop and deploy their frontier AI models and systems[1] responsibly, in accordance with the following voluntary commitments, and to demonstrate how they have achieved this by publishing a safety framework focused on severe risks by the upcoming AI Summit in France.
Given the evolving state of the science in this area, the undersigned organisations' approaches (as detailed in paragraphs I-VIII) to meeting Outcomes 1, 2 and 3 may evolve in the future. In such instances, organisations will provide transparency on this, including their reasons, through public updates.
The above organisations also affirm their commitment to implement current best practices related to frontier AI safety, including: internal and external red-teaming of frontier AI models and systems for severe and novel threats; to work toward information sharing; to invest in cybersecurity and insider threat safeguards to protect proprietary and unreleased model weights; to incentivize third-party discovery and reporting of issues and vulnerabilities; to develop and deploy mechanisms that
enable users to understand if audio or visual content is AI-generated; to publicly report model or system capabilities, limitations, and domains of appropriate and inappropriate use; to prioritize research on societal risks posed by frontier AI models and systems; and to develop and deploy frontier AI models and systems to help address the world's greatest challenges.
Outcome 1. Organisations effectively identify, assess and manage risks when developing and deploying their frontier AI models and systems. They will:
I. Assess the risks posed by their frontier models or systems across the AI lifecycle, including before deploying that model or system, and, as appropriate, before and during training. Risk assessments should consider model capabilities and the context in which they are developed and deployed, as well as the efficacy of implemented mitigations to reduce the risks associated with their foreseeable use and misuse.
They should also consider results from internal and external evaluations as appropriate, such as by independent third-party evaluators, their home governments[2], and other bodies their governments deem appropriate.
II. Set out thresholds[3] at which severe risks posed by a model or system, unless adequately mitigated, would be deemed intolerable. Assess whether these thresholds have been breached, including monitoring how close a model or system is to such a breach. These thresholds should be defined with input from trusted actors, including organisations' respective ho...

May 21, 2024 • 20min
EA - Introducing MSI Reproductive Choices by Meghan Blake
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Introducing MSI Reproductive Choices, published by Meghan Blake on May 21, 2024 on The Effective Altruism Forum.
In 1976, our founder Tim Black established MSI Reproductive Choices[1] to bring contraception and abortion care to women in underserved communities that no one else would go to. As a doctor, he witnessed firsthand the hardship caused by the lack of reproductive choice and accordingly, he established quality care, cost-effectiveness, data, and sustainability as foundational principles of the organization. Nearly 50 years later, this legacy remains at the core of MSI Reproductive Choices today.
Since our founding, MSI has served more than 200 million clients, of which more than 100 million were served in the last nine years. Since 2000, our global services have averted an estimated 316,000 maternal deaths and 158.6 million unintended pregnancies.
We deliver impact with exceptional cost-effectiveness. On a global scale via our Outreach and Public Sector Strengthening programs, which reach underserved "last mile" communities, our average cost per disability-adjusted life year (DALY) is $4.70[2], and our average cost per maternal death averted is $3,353. In Nigeria, our most cost-effective program, these figures drop to just $1.63 per DALY and $685 per maternal death averted in our programming with the most underserved communities.[3]
The Challenge
Health Benefits: Family planning saves lives and is a development intervention that brings transformational benefits to women, their families, and communities. While progress has been made, it hasn't been fast enough; 257 million people still lack access to contraception, resulting in 111 million unintended pregnancies annually. Additionally, 280,000 women - primarily in sub-Saharan Africa - lose their lives due to pregnancy-related complications each year, amounting to 767 deaths per day.
Maternal mortality is nearly 50 times higher for women in sub-Saharan Africa compared to high-income countries, and their babies are 10 times more likely to die in their first month of life.
The global disparity between the rate of maternal deaths is evident in low-income countries. In 2020, the Maternal Mortality Ratio (MMR) reached 430 per 100,000 live births in low-income countries, a significant contrast to just 12 per 100,000 live births in high-income nations. In some countries, like Nigeria, the maternal mortality rate exceeds 1,000 per 100,000 live births.
Demand for family planning and sexual and reproductive healthcare services will continue to grow, and by 2030, an additional 180 million women will need access to these services. The urgency of this need is emphasized by the adolescent girls in low- and middle-income countries who wish to avoid pregnancy, yet a significant 43% of them face an unmet need for contraception. Pregnancy-related deaths are the leading cause of death for adolescent girls globally.
The World Health Organization has stressed that to avoid maternal deaths, it is vital to prevent unintended pregnancies. They stated: "all women, including adolescents, need access to contraception, safe abortion services to the full extent of the law, and quality post-abortion care."
If all women in low- and middle-income countries who wish to avoid pregnancy had access to family planning, the rate of unintended pregnancies would drop by 68%.
Number of Maternal Deaths by Region, 2000 - 2017
Education and Economic Opportunities: The effects of inadequate access to family planning are profoundly felt in Sub-Saharan Africa, where every year, MSI analysis has estimated that up to 4 million teenage girls drop out of school due to teenage pregnancy. This education gap is exacerbated by disparities in contraceptive access: women in the wealthiest quintile have more than double the proportion of met contraceptive demand compa...

May 21, 2024 • 3min
LW - [Linkpost] Statement from Scarlett Johansson on OpenAI's use of the "Sky" voice, that was shockingly similar to her own voice. by Linch
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Linkpost] Statement from Scarlett Johansson on OpenAI's use of the "Sky" voice, that was shockingly similar to her own voice., published by Linch on May 21, 2024 on LessWrong.
Scarlett Johansson makes a statement about the "Sky" voice, a voice for GPT-4o that OpenAI recently pulled after less than a week of prime time.
tl;dr: OpenAI made an offer last September to Johansson; she refused. They offered again 2 days before the public demo. Scarlett Johansson claims that the voice was so similar that even friend and family noticed. She hired legal counsel to ask OpenAI to "detail the exact process by which they created the 'Sky' voice," which resulted in OpenAI taking the voice down.
Full statement below:
Last September, I received an offer from Sam Altman, who wanted to hire me to voice the current ChatGPT 4.0 system. He told me that he felt that by my voicing the system, I could bridge the gap between tech companies and creatives and help consumers to feel comfortable with the seismic shift concerning humans and Al. He said he felt that my voice would be comforting to people.
After much consideration and for personal reasons, declined the offer.
Nine months later, my friends, family and the general public all noted how much the newest system named 'Sky' sounded like me.
When I heard the released demo, I was shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference. Mr. Altman even insinuated that the similarity was intentional, tweeting a single word 'her' - a reference to the film in which I voiced a chat system, Samantha, who forms an intimate relationship with a human.
Two days before the ChatGPT 4.0 demo was released, Mr. Altman contacted my agent, asking me to reconsider. Before we could connect, the system was out there.
As a result of their actions, I was forced to hire legal counsel, who wrote two letters to Mr. Altman and OpenAl, setting out what they had done and asking them to detail the exact process by which they created the 'Sky' voice. Consequently, OpenAl reluctantly agreed to take down the 'Sky' voice.
In a time when we are all grappling with deepfakes and the protection of our own likeness, our own work, our own identities, I believe these are questions that deserve absolute clarity. I look forward to resolution in the form of transparency and the passage of appropriate legislation to help ensure that individual rights are protected.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

May 21, 2024 • 11min
AF - The Problem With the Word 'Alignment' by peligrietzer
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Problem With the Word 'Alignment', published by peligrietzer on May 21, 2024 on The AI Alignment Forum.
This post was written by Peli Grietzer, inspired by internal writings by TJ (tushant jha), for AOI[1]. The original post, published on Feb 5, 2024, can be found here: https://ai.objectives.institute/blog/the-problem-with-alignment.
The purpose of our work at the AI Objectives Institute (AOI) is to direct the impact of AI towards human autonomy and human flourishing.
In the course of articulating our mission and positioning ourselves -- a young organization -- in the landscape of AI risk orgs, we've come to notice what we think are serious conceptual problems with the prevalent vocabulary of 'AI alignment.' This essay will discuss some of the major ways in which we think the concept of 'alignment' creates bias and confusion, as well as our own search for clarifying concepts.
At AOI, we try to think about AI within the context of humanity's contemporary institutional structures: How do contemporary market and non-market (eg. bureaucratic, political, ideological, reputational) forces shape AI R&D and deployment, and how will the rise of AI-empowered corporate, state, and NGO actors reshape those forces? We increasingly feel that 'alignment' talk tends to obscure or distort these questions.
The trouble, we believe, is the idea that there is a single so-called Alignment Problem. Talk about an 'Alignment Problem' tends to conflate a family of related but distinct technical and social problems, including:
P1: Avoiding takeover from emergent optimization in AI agents
P2: Ensuring that AI's information processing (and/or reasoning) is intelligible to us
P3: Ensuring AIs are good at solving problems as specified (by user or designer)
P4: Ensuring AI systems enhance, and don't erode, human agency
P5: Ensuring that advanced AI agents learn a human utility function
P6: Ensuring that AI systems lead to desirable systemic and long term outcomes
Each of P1-P6 is known as 'the Alignment Problem' (or as the core research problem in 'Alignment Research') to at least some people in the greater AI Risk sphere, in at least some contexts. And yet these problems are clearly not simply interchangeable: placing any one of P1-P6 at the center of AI safety implies a complicated background theory about their relationship, their relative difficulty, and their relative significance.
We believe that when different individuals and organizations speak of the 'Alignment Problem,' they assume different controversial reductions of the P1-P6 problems network to one of its elements. Furthermore, the very idea of an 'Alignment Problem' precommits us to finding a reduction for P1-P6, obscuring the possibility that this network of problems calls for a multi-pronged treatment.
One surface-level consequence of the semantic compression around 'alignment' is widespread miscommunication, as well as fights over linguistic real-estate. The deeper problem, though, is that this compression serves to obscure some of a researcher's or org's foundational ideas about AI by 'burying' them under the concept of alignment.
Take a familiar example of a culture clash within the greater AI Risk sphere: many mainstream AI researchers identify 'alignment work' with incremental progress on P3 (task-reliability), which researchers in the core AI Risk community reject as just safety-washed capabilities research. We believe working through this culture-clash requires that both parties state their theories about the relationship between progress on P3 and progress on P1 (takeover avoidance).
In our own work at AOI, we've had occasion to closely examine a viewpoint we call the Berkeley Model of Alignment -- a popular reduction of P1-P6 to P5 (agent value-learning) based on a paradigm consolidated at UC Berkeley's CHAI research gr...

May 21, 2024 • 6min
EA - What's Going on With OpenAI's Messaging? by Ozzie Gooen
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What's Going on With OpenAI's Messaging?, published by Ozzie Gooen on May 21, 2024 on The Effective Altruism Forum.
This is a quickly-written opinion piece, of what I understand about OpenAI. I first posted it to Facebook, where it had some discussion.
Some arguments that OpenAI is making, simultaneously:
1. OpenAI will likely reach and own transformative AI (useful for attracting talent to work there).
2. OpenAI cares a lot about safety (good for public PR and government regulations).
3. OpenAI isn't making anything dangerous and is unlikely to do so in the future (good for public PR and government regulations).
4. OpenAI doesn't need to spend many resources on safety, and implementing safe AI won't put it at any competitive disadvantage (important for investors who own most of the company).
5. Transformative AI will be incredibly valuable for all of humanity in the long term (for public PR and developers).
6. People at OpenAI have thought long and hard about what will happen, and it will be fine.
7. We can't predict concretely what transformative AI will look like or what will happen after (Note: Any specific scenario they propose would upset a lot of people. Vague hand-waving upsets fewer people).
8. OpenAI can be held accountable to the public because it has a capable board of advisors overseeing Sam Altman (he said this explicitly in an interview).
9. The previous board scuffle was a one-time random event that was a very minor deal.
10. OpenAI has a nonprofit structure that provides an unusual focus on public welfare.
11. The nonprofit structure of OpenAI won't inconvenience its business prospects or shareholders in any way.
12. The name "OpenAI," which clearly comes from the early days when the mission was actually to make open-source AI, is an equally good name for where the company is now.* (I don't actually care about this, but find it telling that the company doubles down on arguing the name still is applicable).
So they need to simultaneously say:
"We're making something that will dominate the global economy and outperform humans at all capabilities, including military capabilities, but is not a threat."
"Our experimental work is highly safe, but in a way that won't actually cost us anything."
"We're sure that the long-term future of transformative change will be beneficial, even though none of us can know or outline specific details of what that might actually look like."
"We have a great board of advisors that provide accountability. Sure, a few months ago, the board tried to fire Sam, and Sam was able to overpower them within two weeks, but next time will be different."
"We have all of the benefits of being a nonprofit, but we don't have any of the costs of being a nonprofit."
Meta's messaging is clearer.
"AI development won't get us to transformative AI, we don't think that AI safety will make a difference, we're just going to optimize for profitability."
Anthropic's messaging is a bit clearer
"We think that AI development is a huge deal and correspondingly scary, and we're taking a costlier approach accordingly, though not too costly such that we'd be irrelevant."
This still requires a strange and narrow worldview to make sense, but it's still more coherent.
But OpenAI's messaging has turned into a particularly tangled mess of conflicting promises. It's the kind of political strategy that can work for a while, especially if you can have most of your conversations in private, but is really hard to pull off when you're highly public and facing multiple strong competitive pressures.
If I were a journalist interviewing Sam Altman, I'd try to spend as much of it as possible just pinning him down on these countervailing promises they're making. Some types of questions I'd like him to answer would include:
"Please lay out a specific, year-by-year, story of o...

May 20, 2024 • 17min
LW - Anthropic: Reflections on our Responsible Scaling Policy by Zac Hatfield-Dodds
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Anthropic: Reflections on our Responsible Scaling Policy, published by Zac Hatfield-Dodds on May 20, 2024 on LessWrong.
Last September we published our first Responsible Scaling Policy (RSP) [LW discussion], which focuses on addressing catastrophic safety failures and misuse of frontier models. In adopting this policy, our primary goal is to help turn high-level safety concepts into practical guidelines for fast-moving technical organizations and demonstrate their viability as possible standards. As we operationalize the policy, we expect to learn a great deal and plan to share our findings.
This post shares reflections from implementing the policy so far. We are also working on an updated RSP and will share this soon.
We have found having a clearly-articulated policy on catastrophic risks extremely valuable. It has provided a structured framework to clarify our organizational priorities and frame discussions around project timelines, headcount, threat models, and tradeoffs. The process of implementing the policy has also surfaced a range of important questions, projects, and dependencies that might otherwise have taken longer to identify or gone undiscussed.
Balancing the desire for strong commitments with the reality that we are still seeking the right answers is challenging. In some cases, the original policy is ambiguous and needs clarification. In cases where there are open research questions or uncertainties, setting overly-specific requirements is unlikely to stand the test of time.
That said, as industry actors face increasing commercial pressures we hope to move from voluntary commitments to established best practices and then well-crafted regulations.
As we continue to iterate on and improve the original policy, we are actively exploring ways to incorporate practices from existing risk management and operational safety domains. While none of these domains alone will be perfectly analogous, we expect to find valuable insights from nuclear security, biosecurity, systems safety, autonomous vehicles, aerospace, and cybersecurity. We are building an interdisciplinary team to help us integrate the most relevant and valuable practices from each.
Our current framework for doing so is summarized below, as a set of five high-level commitments.
1.
Establishing Red Line Capabilities. We commit to identifying and publishing "Red Line Capabilities" which might emerge in future generations of models and would present too much risk if stored or deployed under our current safety and security practices (referred to as the ASL-2 Standard).
2.
Testing for Red Line Capabilities (Frontier Risk Evaluations). We commit to demonstrating that the Red Line Capabilities are not present in models, or - if we cannot do so - taking action as if they are (more below). This involves collaborating with domain experts to design a range of "Frontier Risk Evaluations" - empirical tests which, if failed, would give strong evidence against a model being at or near a red line capability.
We also commit to maintaining a clear evaluation process and a summary of our current evaluations publicly.
3.
Responding to Red Line Capabilities. We commit to develop and implement a new standard for safety and security sufficient to handle models that have the Red Line Capabilities. This set of measures is referred to as the ASL-3 Standard. We commit not only to define the risk mitigations comprising this standard, but also detail and follow an assurance process to validate the standard's effectiveness.
Finally, we commit to pause training or deployment if necessary to ensure that models with Red Line Capabilities are only trained, stored and deployed when we are able to apply the ASL-3 standard.
4.
Iteratively extending this policy. Before we proceed with activities which require the ASL-3 standard, we commit...

May 20, 2024 • 5min
EA - Cancelling GPT subscription by adekcz
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Cancelling GPT subscription, published by adekcz on May 20, 2024 on The Effective Altruism Forum.
If you are sending money to OpenAI, what would make you stop? For me, there is no ambiguity anymore. The line has been crossed.
Many publicly available signals are pointing towards OpenAI's lack of enthusiasm about AI safety while being quite enthusiastic about advancing AI capabilities as far as possible, even towards AGI. There have been several waves of people with AI safety views leaving OpenAI culminating in dissolution of superalignment team and it is hard to tell if OpenAI conducts any alignment research at all. I worry about the risk of advanced AI and no longer trust OpenAI to behave well when needed.
Therefore I am cancelling my subscription and think others should as well.
I have basically written this post a few days ago but was waiting for Zvi to provide his summary of events in Open exodus. Predictably, he just did and it is much better summary than I would be able to provide. In the end, he writes:
My small update is cancelling GPT subscription and I think as many people as possible should do that. Paying for GPT and having my beliefs was, as of last week, debatable. Now, I think it is unjustifiable.
It is very simple. Sending money to a company that is trying to achieve something I worry about is not a good idea. They don't even have any plausible deniability of caring about safety now. On the other hand, they are definitely still at the leading edge of AI development.
I think AI risks are non-negligible and the plausible harm they can cause is tremendous (extinction or some s-risk scenario). Many people also believe that. But we arrive to really low probabilities when thinking about how much can an individual influence those large problems spanning large futures. We get very small probabilities but very large positive values.
In this case, we have really small probability that our 20 bucks/month will lead to unsafe AGI. But it definitely helps OpenAI, even though marginally. By this logic, sending money to OpenAI might be by far the most horrific act many people, including many EAs, regularly commit. Normal people don't have assumptions that would make them care. But we have them, and we should act on them.
Cancel your subscription and refrain from using the free version, as your data might be even more valuable to them than the subscription money. Please spread this sentiment.
My summary of recent events
For the sake of completeness, if Zvi hadn't published his post, I had written this, leaving it here in a slightly draft-ish state:
All the people
People who are leaving OpenAI seem to have quite similar feelings about AI safety as do I and they seem to be very competent. They are insiders, they have hands-on experience with the technology and OpenAI's inner workings. Dario Amodei and around 10 other OpenAI's employees left in 2021 to found Antrophic, because they felt they can do better AI safety work together (https://youtu.be/5GtVrk00eck?si=FQSCraQ8UOtyxePW&t=39).
Then we had the infamous board drama that led to Sam Altman's stronger hold over OpenAI, and the most safety-minded people are there no more. Helen Toner of CSET, Tasha McCauley, former Effective Ventures board member, and Sutskever, who initially just left the board, now he has left OpenAI altogether.
And finally, over the last few months, the remaining safety people have been pushed away, fired, or resigned https://www.vox.com/future-perfect/2024/5/17/24158403/openai-resignations-ai-safety-ilya-sutskever-jan-leike-artificial-intelligence .
Walk the walk
Founding the superalignment team (https://openai.com/index/introducing-superalignment/) led by Sutskever and Leike and promising 20% of compute to them sounded great. It still seems like it would much more likely happen in a worlds, where O...


