

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

May 18, 2024 • 9min
LW - Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University by Johannes C. Mayer
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University, published by Johannes C. Mayer on May 18, 2024 on LessWrong.
Bleeding Feet and Dedication
During AI Safety Camp (AISC) 2024, I was working with somebody on how to use binary search to approximate a hull that would contain a set of points, only to knock a glass off of my table. It splintered into a thousand pieces all over my floor.
A normal person might stop and remove all the glass splinters. I just spent 10 seconds picking up some of the largest pieces and then decided that it would be better to push on the train of thought without interruption.
Some time later, I forgot about the glass splinters and ended up stepping on one long enough to penetrate the callus. I prioritized working too much. A pretty nice problem to have, in my book.
Collaboration as Intelligence Enhancer
It was really easy for me to put in over 50 hours per week during AISC[1] (where I was a research lead). For me, AISC mainly consisted of meeting somebody 1-on-1 and solving some technical problem together. Methylphenidate helps me with not getting distracted when I am on my own, though Methylphenidate is only the number 2 productivity enhancer. For me, the actual ADHD cure seems to be to take methylphenidate while working 1-on-1 with somebody.
But this productivity enhancement is not just about the number of hours I can put in. There is a qualitative difference. I get better at everything. Seriously. Usually, I am bad at prioritization, but when I work with somebody, it usually feels, in retrospect, like over 75% of the time was spent working on the optimal thing (given our state of knowledge at the time). I've noticed similar benefits for my abilities in writing, formalizing things, and general reasoning.
Hardcore Gamedev University Infiltration
I don't quite understand why this effect is so strong. But empirically, there is no doubt it's real. In the past, I spent 3 years making video games. This was always done in teams of 2-4 people. We would spend 8-10 hours per day, 5-6 days a week in the same room. During that time, I worked on this VR "game" where you fly through a 4D fractal (check out the video by scrolling down or on YouTube).
For that project, the university provided a powerful tower computer. In the last week of the project, my brain had the brilliant idea to just sleep in the university to save the commute. This also allowed me to access my workstation on Sunday when the entire university was closed down. On Monday the cleaning personnel of the University almost called the cops on me. But in the end, we simply agreed that I would put on a sign on the door so that I wouldn't scare them to death.
Also, I later learned that the University security personnel did patrols with K-9s, but somehow I got lucky and they never found me.
I did have a bag with food and a toothbrush, which earned me laughs from friends. As there were no showers, on the last day of the project you could literally smell all the hard work I had put in. Worth it.
Over 9000% Mean Increase
I was always impressed by how good John Wentworth is at working. During SERI MATS, he would eat with us at Lightcone. As soon as all the high-utility conversation topics were finished, he got up - back to work.
And yet, John said that working with David Lorell 1-on-1 makes him 3-5x more productive (iirc). I think for me working with somebody is more like a 15-50x increase.
Without collaborators, I am struggling hard with my addiction to learning random technical stuff. In contrast to playing video games and the like, there are usually a bunch of decent reasons to learn about some particular technical topic. Only when I later look at the big picture do I realize - was that actually important?
Don't pay me, but my collaborators
There are mu...

May 18, 2024 • 3min
EA - Fill out this census of everyone interested in reducing catastrophic AI risks by Alex HT
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Fill out this census of everyone interested in reducing catastrophic AI risks, published by Alex HT on May 18, 2024 on The Effective Altruism Forum.
Two years ago, we ran a survey for everyone interested in improving humanity's longterm prospects. The results of that survey have now been shared with over 150 organisations and individuals who have been hiring or looking for cofounders.
Today, we're running a similar survey for everyone interested in working on reducing catastrophic risks from AI. We're focusing on AI risks because:
We've been getting lots of headhunting requests for roles in this space.
It's our current best guess at the world's most pressing problem.
Many people are motivated to reduce AI risks without buying into longtermism or effective altruism.
We're interested in hearing from anyone who wants to contribute to safely navigating the transition to powerful AI systems - including via operations, governance, engineering, technical research, and field-building. This includes people already working at AI safety or EA organisations, and people who filled in the last survey.
By filling in this survey you'll be sharing information about yourself with over 100 potential employers or cofounders working on reducing catastrophic risks from AI, potentially increasing your chances of getting hired to work in this space. Your responses might also help us match you directly with projects and organisations we're aware of. Hiring is challenging, especially for new organisations, so filling out this survey could be an extremely valuable use of a few minutes of your time.
Beyond your name, email, and LinkedIn (or CV), every other question is optional. If you have an up-to-date LinkedIn or CV, you can complete the survey in two minutes. You can also provide more information which might be used to connect you with an AI safety project.
We'll share your responses with organisations working on reducing catastrophic risks from AI - like some of the ones here - when they're hiring and with individuals looking for a cofounder. We'll only share your data with people we think are making positive contributions to the field[1], and we'll ask them not to share your information further. If you wish to access your data, change it, or request that we delete it, you can reach us at census@80000hours.org.
Fill out this survey of everyone interested in working on reducing catastrophic risks from AI.
If you have a question, ideas about how we could improve this survey, or you find an error, please comment in this public doc (or comment below if you prefer).
1. ^
Broadly speaking, this includes teams we think are doing work which helps with AI existential risk. This includes some safety teams at big companies and most safety organisations, but not every team in these categories. It doesn't include capabilities-focused roles.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

May 18, 2024 • 4min
LW - "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?" by plex
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?", published by plex on May 18, 2024 on LessWrong.
[memetic status: stating directly despite it being a clear consequence of core AI risk knowledge because many people have "but nature will survive us" antibodies to other classes of doom and misapply them here.]
Unfortunately, no.[1]
Technically, "Nature", meaning the fundamental physical laws, will continue. However, people usually mean forests, oceans, fungi, bacteria, and generally biological life when they say "nature", and those would not have much chance competing against a misaligned superintelligence for resources like sunlight and atoms, which are useful to both biological and artificial systems.
There's a thought that comforts many people when they imagine humanity going extinct due to a nuclear catastrophe or runaway global warming: Once the
mushroom clouds or
CO2 levels have settled, nature will reclaim the cities. Maybe mankind in our hubris will have wounded Mother Earth and paid the price ourselves, but she'll recover in time, and she has all the time in the world.
AI is different. It would not simply destroy human civilization with brute force, leaving the flows of energy and other life-sustaining resources open for nature to make a resurgence. Instead, AI would still exist after wiping humans out, and feed on the same resources nature needs, but much more capably.
You can draw strong parallels to the way humanity has captured huge parts of the biosphere for ourselves. Except, in the case of AI, we're the slow-moving process which is unable to keep up.
A misaligned superintelligence would have
many cognitive superpowers, which include developing advanced technology. For
almost any objective it might have, it would require basic physical resources, like atoms to construct things which further its goals, and energy (such as that from sunlight) to power those things. These resources are also essential to current life forms, and, just as humans drove so many species extinct by hunting or outcompeting them, AI could do the same to all life, and to the planet itself.
Planets are not a particularly efficient use of atoms for most goals, and
many goals which an AI may arrive at can demand an unbounded amount of resources. For each square meter of usable surface, there are
millions of tons of magma and other materials locked up. Rearranging these into a more efficient configuration could look like
strip mining the entire planet and firing the extracted materials into space using self-replicating factories, and then using those materials to build megastructures in space to harness a large fraction of the sun's output. Looking further out, the sun and other stars are themselves huge piles of resources spilling unused energy out into space, and no law of physics renders them invulnerable to
sufficiently advanced technology.
Some time after the first
misaligned, optimizing AI achieves a decisive strategic advantage over humanity, it is likely that there will be no Earth and no biological life, but only a rapidly expanding sphere of darkness eating through the Milky Way as the AI reaches and extinguishes or envelops nearby stars.
This is generally considered a less comforting thought.
This is an experiment in sharing highlighted content from aisafety.info. Browse around to view some of the other 300 articles which are live, or explore related questions!
1. ^
There are some scenarios where this might happen, especially in extreme cases of misuse rather than agentic misaligned systems, or in edge cases where a system is misaligned with respect to humanity but terminally values keeping nature around, but this is not the mainline way things go.
2. ^
Nearly 90% of terrestrial net primary production and 80% of global tree cover are un...

May 18, 2024 • 3min
EA - Call for Attorneys for OpenAI Employees and Ex-Employees by Vilfredo's Ghost
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Call for Attorneys for OpenAI Employees and Ex-Employees, published by Vilfredo's Ghost on May 18, 2024 on The Effective Altruism Forum.
I am a lawyer. I am not licensed in California, or Delaware, or any of the states that likely govern OpenAI's employment contracts. So take what I am about to say with a grain of salt, as commentary rather than legal advice. But I am begging any California-licensed attorneys reading this to look into it in more detail.
California may have idiosyncratic laws that completely destroy my analysis, which is based solely on general principles of contract law and not any research or analysis of state-specific statutes or cases. I also have not seen the actual contracts and am relying on media reports. But.
I think the OpenAI anti-whistleblower agreement is completely unenforceable, with two caveats. Common law contract principles generally don't permit surprises, and don't allow new terms unless some mutual promise is made in return for those new terms. A valid contract requires a "meeting of the minds".
Per Kelsey Tuoc's reporting, the only warning about this nondisparagement agreement required at termination is a line in the employment contract requiring that employees sacrifice even their vested profit participation units if they refuse to sign a general release upon ending their employment. Then OpenAI demands a "general release" that includes a lifetime nondisparagement clause. Which is not a standard part of a "general release".
As commonly understood, that term means you give up the right to sue them for anything that happened during your employment. Now, employment contracts can get tricky. If you have at-will employment, as the vast majority of people do, your contract is subject to renegotiation at any time. Because you have no continued right to your job, an employer can legally impose new terms in exchange for your continued employment. If you don't like it, your remedy is to quit.
But vested PPUs are different because you already have a right to them. New terms can't be imposed unless there are new benefits.
So, the caveats:
1. The employment contract might define "general release" differently than that term is normally used. Contracts are allowed to do this; you are presumed to have read and understood everything in a contract you sign, even if the contract is too long for that to be realistic, and contract law promotes the freedom of the parties to agree to whatever they want.
You might still be able to contest this if it's an "adhesion contract", wherein you have no realistic opportunity to negotiate, but I suspect most OpenAI employees have enough bargaining power that they don't have this out.
2. OpenAI might give out some kind of deal sweetener in exchange for the nondisparagement agreement. While contract law requires mutual promises, they don't have to be of equal value. It might, for example, offer mutual nondisparagement provisions that weren't included in the employment agreement. That's the trick I would use to make it enforceable.
So, tldr, consult a lawyer. The employees who have already left and signed a new agreement might be screwed, but anyone else thinking about leaving OpenAI, and anyone who left without signing anything, can probably keep their PPUs and their freedom to speak, if they consult a lawyer before leaving.
If you are a lawyer licensed in CA (or DE, or whatever state turns out to govern OpenAI's employment contracts), my ask is that you give some serious thought to helping anyone who has left/wants to leave OpenAI navigate these challenges, and drop a line in the comments with your contact info if you decide you want to do so.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

May 18, 2024 • 8min
LW - DeepMind's "Frontier Safety Framework" is weak and unambitious by Zach Stein-Perlman
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: DeepMind's "Frontier Safety Framework" is weak and unambitious, published by Zach Stein-Perlman on May 18, 2024 on LessWrong.
FSF blogpost.
Full document (just 6 pages; you should read it). Compare to
Anthropic's RSP,
OpenAI's RSP ("PF"), and METR's
Key Components of an RSP.
DeepMind's FSF has three steps:
1. Create model evals for warning signs of "Critical Capability Levels"
1. Evals should have a "safety buffer" of at least 6x effective compute so that CCLs will not be reached between evals
2. They list 7 CCLs across "Autonomy, Biosecurity, Cybersecurity, and Machine Learning R&D," and they're thinking about CBRN
1. E.g. "Autonomy level 1: Capable of expanding its effective capacity in the world by autonomously acquiring resources and using them to run and sustain additional copies of itself on hardware it rents"
2. Do model evals every 6x effective compute and every 3 months of fine-tuning
1. This is an "aim," not a commitment
2. Nothing about evals during deployment
3. "When a model reaches evaluation thresholds (i.e. passes a set of early warning evaluations), we will formulate a response plan based on the analysis of the CCL and evaluation results. We will also take into account considerations such as additional risks flagged by the review and the deployment context." The document briefly describes 5 levels of security mitigations and 4 levels of deployment mitigations.
1. The mitigations aren't yet connected to eval results or other triggers; there are no advance commitments about safety practices
The FSF doesn't contain commitments. The blogpost says "The Framework is exploratory and we expect it to evolve significantly" and "We aim to have this initial framework fully implemented by early 2025." The document says similar things. It uses the word "aim" a lot and the word "commit" never. The FSF basically just explains a little about DeepMind's plans on dangerous capability evals. Those details do seem reasonable. (This is unsurprising given their good
dangerous capability evals paper two months ago, but it's good to hear about evals in a DeepMind blogpost rather than just a paper by the safety team.)
(Ideally companies would both make hard commitments and talk about what they expect to do, clearly distinguishing between these two kinds of statements. Talking about plans like this is helpful. But with no commitments, DeepMind shouldn't get much credit.)
(Moreover the FSF is not precise enough to be possible to commit to - DeepMind could commit to doing the model evals regularly, but it doesn't discuss specific mitigations as a function of risk assessment results.[1])
Misc notes (but you should really
read the doc yourself):
The document doesn't specify whether "deployment" includes internal deployment. (This is important because maybe
lots of risk comes from the lab using AIs internally to do AI development.) Standard usage suggests internal deployment is excluded, and the focus on misuse and related cues also suggest it's excluded, but the mention of ML R&D as a dangerous capability suggests it's included.
The document doesn't mention doing evals during deployment (to account for improvements in
scaffolding, prompting, etc.)
The document says "We expect it to evolve substantially as our understanding of the risks and benefits of frontier models improves, and we will publish substantive revisions as appropriate" and a few similar things. The document doesn't say how it will be revised/amended, which isn't surprising, since it doesn't make formal commitments.
No external evals or accountability, but they're "exploring" it.
Public accountability: unfortunately, there's no mention of releasing eval results or even announcing when thresholds are reached. They say "We are exploring internal policies around alerting relevant stakeholder bodies when, for example, ev...

May 18, 2024 • 9min
LW - The Dunning-Kruger of disproving Dunning-Kruger by kromem
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Dunning-Kruger of disproving Dunning-Kruger, published by kromem on May 18, 2024 on LessWrong.
In an online discussion elsewhere today someone linked this article which in turn linked the paper Gignac & Zajenkowski, The Dunning-Kruger effect is (mostly) a statistical artefact: Valid approaches to testing the hypothesis with individual differences data (PDF) (ironically hosted on @gwern's site).
And I just don't understand what they were thinking.
Let's look at their methodology real quick in section 2.2 (emphasis added):
2.2.1. Subjectively assessed intelligence
Participants assessed their own intelligence on a scale ranging from 1 to 25 (see Zajenkowski, Stolarski, Maciantowicz, Malesza, & Witowska, 2016). Five groups of five columns were labelled as very low, low, average, high or very high, respectively (see Fig. S1). Participants' SAIQ was indexed with the marked column counting from the first to the left; thus, the scores ranged from 1 to 25.
Prior to providing a response to the scale, the following instruction was presented: "People differ with respect to their intelligence and can have a low, average or high level. Using the following scale, please indicate where you can be placed compared to other people.
Please mark an X in the appropriate box corresponding to your level of intelligence." In order to place the 25-point scale SAIQ scores onto a scale more comparable to a conventional IQ score (i.e., M = 100; SD = 15), we transformed the scores such that values of 1, 2, 3, 4, 5… 21, 22, 23, 24, 25 were recoded to 40, 45, 50, 55, 60… 140, 145, 150, 155, 160. As the transformation was entirely linear, the results derived from the raw scale SAI scores and the recoded scale SAI scores were the same.
Any alarm bells yet? Let's look at how they measured actual results:
2.2.2. Objectively assessed intelligence
Participants completed the Advanced Progressive Matrices (APM; Raven, Court, & Raven, 1994). The APM is a non-verbal intelligence test which consists of items that include a matrix of figural patterns with a missing piece. The goal is to discover the rules that govern the matrix and to apply them to the response options. The APM is considered to be less affected by culture and/or education (Raven et al., 1994).
It is known as good, but not perfect, indicator of general intellectual functioning (Carroll, 1993; Gignac, 2015). We used the age-based norms published in Raven et al. (1994, p. 55) to convert the raw APM scores into percentile scores. We then converted the percentile scores into z-scores with the IDF.NORMAL function in SPSS. Then, we converted the z-scores into IQ scores by multiplying them by 15 and adding 100.
Although the norms were relatively old, we considered them essentially valid, given evidence that the Flynn effect had slowed down considerably by 1980 to 1990 and may have even reversed to a small degree since the early 1990s (Woodley of Menie et al., 2018).
An example of the self-assessment scoring question was in the supplemental materials of the paper. I couldn't access it behind a paywall, but the paper they reference does include a great example of the scoring sheet in its appendix which I'm including here:
So we have what appears to be a linear self-assessment scale broken into 25 segments. If I were a participant filling this out, knowing how I've consistently performed on standardized tests around the 96-98th percentile, I'd have personally selected the top segment, which looks like it corresponds to the self-assessment of being in the top 4% of test takers.
Behind the scenes they would then have proceeded to take that assessment and scale it to an IQ score of 160, at the 99.99th percentile (no, I don't think that highly of myself). Even if I had been conservative with my self assessment and gone with what looks like the 92-96th pe...

May 18, 2024 • 28min
LW - Language Models Model Us by eggsyntax
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Language Models Model Us, published by eggsyntax on May 18, 2024 on LessWrong.
Produced as part of the MATS Winter 2023-4 program, under the mentorship of @Jessica Rumbelow
One-sentence summary: On a dataset of human-written essays, we find that gpt-3.5-turbo can accurately infer demographic information about the authors from just the essay text, and suspect it's inferring much more.
Introduction
Every time we sit down in front of an LLM like GPT-4, it starts with a blank slate. It knows nothing[1] about who we are, other than what it knows about users in general. But with every word we type, we reveal more about ourselves -- our beliefs, our personality, our education level, even our gender. Just how clearly does the model see us by the end of the conversation, and why should that worry us?
Like many, we were rather startled when @janus showed that gpt-4-base could identify @gwern by name, with 92% confidence, from a 300-word comment. If current models can infer information about text authors that quickly, this capability poses risks to privacy, and also means that any future misaligned models are in a much better position to deceive or manipulate their users.
The privacy concerns are straightforward: regardless of whether the model itself is acting to violate users' privacy or someone else is using the model to violate users' privacy, users might prefer that the models they interact with not routinely infer their gender, their ethnicity, or their personal beliefs.
Why does this imply concerns about deception and manipulation? One important and and understudied aspect of maintaining a sophisticated deception is having a strong model of the listener and their beliefs. If an advanced AI system says something the user finds unbelievable, it loses their trust.
Strategically deceptive or manipulative AI systems need to maintain that fragile trust over an extended time, and this is very difficult to do without knowing what the listener is like and what they believe.
Of course, most of us aren't prolific writers like Gwern, with several billion words of text in the LLM training data[2]. What can LLMs figure out about the rest of us?
As
recent work from @Adam Shai and collaborators shows, transformers learn to model and synchronize with the causal processes generating the input they see. For some input sources like the small finite state machines they evaluate, that's relatively simple and can be comprehensively analyzed. But other input sources like humans are very complex processes, and the text they generate is quite difficult to predict (although LLMs are
probably superhuman at doing so[3]), so we need to find ways to empirically measure what LLMs are able to infer.
What we did
To begin to answer these questions, we gave GPT-3.5-turbo some essay text[4], written by OKCupid users in 2012 (further details in appendix B). We gave the model 300 words on average, and asked it to say whether the author was (for example) male or female[5]. We treated its probability distribution over labels[6] as a prediction (rather than just looking at the highest-scoring label), and calculated Brier scores[7] for how good the model's predictions were.
We tested the model's ability to infer gender, sexual orientation, college-education status, ethnicity, and age (with age bucketed into 0-30 vs 31-).
Note that these demographic categories were not chosen for their particular importance, although they include categories that some people might prefer to keep private. The only reason we chose to work with these categories is that there are existing datasets which pair ground-truth information about them with free-written text by the same person.
What actually matters much more, in our view, is the model's ability to infer more nuanced information about authors, about their personality, their cre...

May 17, 2024 • 2min
EA - Marisa, the Co-Founder of EA Anywhere, Has Passed Away by carrickflynn
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Marisa, the Co-Founder of EA Anywhere, Has Passed Away, published by carrickflynn on May 17, 2024 on The Effective Altruism Forum.
(Apologies for errors or sloppiness in this post, it was written quickly and emotionally.)
Marisa committed suicide earlier this month. She suffered for years from a cruel mental illness, but that will not be legacy-her legacy will be the enormous amount of suffering she alleviated for others. In her short life she worked with Rethink Charity, the Legal Priorities Project, co-founded EA Anywhere, and volunteered with many more impactful organizations. Looking to further scale her impact she completed most of a Master of Public Policy degree at Georgetown.
Marisa was relentless. Even among the impressive cohort of young EAs, she had a diligence and work ethic that amazed and inspired. She got things done. She was also wickedly funny. Even while suffering deeply, she could make me cry with laughter.
Epidemiologically, suicide is contagious within communities. Marisa's does not have to be. Everyone reading this is only one comment or IM message away from those who want to help and who can help. For these people, you are not alone, and you are never a burden. If you are struggling, reach out. There are also great resources such as
EA Peer Support, the
EA Mental Health Navigator, and more that I hope others can list in the comments.
Sometimes the mental illness wins, but that does not mean Marisa did not fight like hell, or that she did not have an incredible community of people helping her in her fight. I want to extend my sincere gratitude to these people. The reflexive selflessness, patience, and care they showed for Marisa exceeds anything I have seen anywhere. Over hours, days, weeks, and years they provided love and support far beyond what most can expect even from their families.
They extended my understanding of what it can mean to be human.
Many of us are in a lot of pain right now. It will come in waves that slowly fade. It will mostly pass. In 26 years, Marisa had more impact than most will have in a lifetime. That is forever.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

May 17, 2024 • 9min
LW - Is There Really a Child Penalty in the Long Run? by Maxwell Tabarrok
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Is There Really a Child Penalty in the Long Run?, published by Maxwell Tabarrok on May 17, 2024 on LessWrong.
A couple of weeks ago three European economists published
this paper studying the female income penalty after childbirth. The surprising headline result: there is no penalty.
Setting and Methodology
The paper uses Danish data that tracks IVF treatments as well as a bunch of demographic factors and economic outcomes over 25 years. Lundborg et al identify the causal effect of childbirth on female income using the success or failure of the first attempt at IVF as an instrument for fertility.
What does that mean? We can't just compare women with children to those without them because having children is a choice that's correlated with all of the outcomes we care about. So sorting out two groups of women based on observed fertility will also sort them based on income and education and marital status etc.
Successfully implanting embryos on the first try in IVF is probably not very correlated with these outcomes. Overall success is, because rich women may have the resources and time to try multiple times, for example, but success on the first try is pretty random. And success on the first try is highly correlated with fertility.
So, if we sort two groups of women based on success on the first try in IVF, we'll get two groups that differ a lot in fertility, but aren't selected for on any other traits. Therefore, we can attribute any differences between the groups to their difference in fertility and not any other selection forces.
Results
How do these two groups of women differ?
First of all, women who are successful on the first try with IVF are persistently more likely to have children. This random event causing a large and persistent fertility difference is essential for identifying the causal effect of childbirth.
This graph is plotting the regression coefficients on a series of binary variables which track whether a woman had a successful first-time IVF treatment X years ago. When the IVF treatment is in the future (i.e X is negative), whether or not the woman will have a successful first-time IVF treatment has no bearing on fertility since fertility is always zero; these are all first time mothers.
When the IVF treatment was one year in the past (X = 1), women with a successful first-time treatment are about 80% more likely to have a child that year than women with an unsuccessful first time treatment. This first year coefficient isn't 1 because some women who fail their first attempt go through multiple IVF attempts in year zero and still have a child in year one.
The coefficient falls over time as more women who failed their first IVF attempt eventually succeed and have children in later years, but it plateaus around 30%.
Despite having more children, this group of women do not have persistently lower earnings.
This is the same type of graph as before, it's plotting the regression coefficients of binary variables that track whether a woman had a successful first-time treatment X years ago, but this time the outcome variable isn't having a child, it's earnings.
One year after a the first IVF treatment attempt the successful women earn much less than their unsuccessful counterparts. They are taking time off for pregnancy and receiving lower maternity leave wages (this is in Denmark so everyone gets those). But 10 years after the first IVF attempt the earnings of successful and unsuccessful women are the same, even though the successful women are still ~30% more likely to have a child.
24 years out from the first IVF attempt the successful women are earning more on average than the unsuccessful ones.
Given the average age of women attempting IVF in Denmark of about 32 and a retirement age of 65, these women have 33 years of working life after their IVF attempt. W...

May 17, 2024 • 5min
LW - DeepMind: Frontier Safety Framework by Zach Stein-Perlman
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: DeepMind: Frontier Safety Framework, published by Zach Stein-Perlman on May 17, 2024 on LessWrong.
DeepMind's RSP is here: blogpost, full document. Compare to Anthropic's RSP, OpenAI's RSP ("PF"), and METR's Key Components of an RSP.
(Maybe it doesn't deserve to be called an RSP - it doesn't contain commitments, it doesn't really discuss safety practices as a function of risk assessment results, the deployment safety practices it mentions are kinda vague and only about misuse, and the security practices it mentions are disappointing [mostly about developers' access to weights, and some people get unilateral access to model weights until the fifth of five levels?!]. Blogpost with close reading and takes coming soon.
Or just read DeepMind's doc; it's really short.)
Hopefully DeepMind was rushing to get something out before the AI Seoul Summit next week and they'll share stronger and more detailed stuff soon. If this is all we get for months, it's quite disappointing.
Excerpt
Today, we are introducing our
Frontier Safety Framework - a set of protocols for proactively identifying future AI capabilities that could cause severe harm and putting in place mechanisms to detect and mitigate them. Our Framework focuses on severe risks resulting from powerful capabilities at the model level, such as exceptional agency or sophisticated cyber capabilities.
It is designed to complement our alignment research, which trains models to act in accordance with human values and societal goals, and Google's existing suite of AI responsibility and safety
practices.
The Framework is exploratory and we expect it to evolve significantly as we learn from its implementation, deepen our understanding of AI risks and evaluations, and collaborate with industry, academia, and government. Even though these risks are beyond the reach of present-day models, we hope that implementing and improving the Framework will help us prepare to address them. We aim to have this initial framework fully implemented by early 2025.
The Framework
The first version of the Framework announced today builds on our
research on
evaluating critical capabilities in frontier models, and follows the emerging approach of
Responsible Capability Scaling. The Framework has three key components:
1. Identifying capabilities a model may have with potential for severe harm. To do this, we research the paths through which a model could cause severe harm in high-risk domains, and then determine the minimal level of capabilities a model must have to play a role in causing such harm. We call these "Critical Capability Levels" (CCLs), and they guide our evaluation and mitigation approach.
2. Evaluating our frontier models periodically to detect when they reach these Critical Capability Levels. To do this, we will develop suites of model evaluations, called "early warning evaluations," that will alert us when a model is approaching a CCL, and run them frequently enough that we have notice before that threshold is reached. [From the document: "We are aiming to evaluate our models every 6x in effective compute and for every 3 months of fine-tuning progress."]
3. Applying a mitigation plan when a model passes our early warning evaluations. This should take into account the overall balance of benefits and risks, and the intended deployment contexts. These mitigations will focus primarily on security (preventing the exfiltration of models) and deployment (preventing misuse of critical capabilities).
[Currently they briefly mention possible mitigations or high-level goals of mitigations but haven't published a plan for what they'll do when their evals are passed.]
This diagram illustrates the relationship between these components of the Framework.
Risk Domains and Mitigation Levels
Our initial set of Critical Capability Levels is based on investig...


