The Nonlinear Library

The Nonlinear Fund
undefined
Aug 21, 2024 • 39min

EA - The case for contributing to the 2024 US election with your time & money by alexlintz

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The case for contributing to the 2024 US election with your time & money, published by alexlintz on August 21, 2024 on The Effective Altruism Forum. Executive summary We believe contributions to the 2024 US election are worthy of serious consideration. In particular, we believe this election is much more tractable and important than conventional wisdom within the EA community suggests. Many of our team members were previously working in AI safety/governance and have temporarily switched to focusing primarily on the election. We think more EAs should consider doing the same. This election is much more tractable than most expect: US presidential elections are often decided by slim margins in a few swing states (537 votes in 2000, 77,744 in 2016, and 42,918 in 2020). The best polling data we have as of August 2024 is that 2024's margins will be in a similar range, with Harris very slightly favored to lose. Given how small margins of victory have been, many organizations have arguably been responsible for flipping past presidential elections (and many more could have flipped them with relatively modest investments). While a lot of good work is being done, poor incentives among established organizations attempting to elect Democrats result in many inefficiencies. This election matters more than most give it credit for: For any given EA cause area, there are strong reasons to believe Trump would cause considerable harm or fail to make important positive progress. While it's still unclear what a Trump administration will think of AI risk, it seems highly likely they'd be worse than a Harris administration. Trump and allies have a clear game plan for eliminating most checks on the president's power and have demonstrated clear intent to harm America's democratic institutions. A second Trump administration would already enjoy far more power due to a recent Supreme Court ruling (backed by Trump-appointed justices). They've made it clear they intend to gut the civil service and fill it with loyalists as well as further expand presidential authority. We worry that people have forgotten just how incompetent and amoral Trump is. A second administration, primarily selected for loyalty to Trump instead of domain expertise and integrity, dramatically increases the risk of many catastrophic outcomes (e.g., escalations in international tensions, nuclear war, adverse relationships with AI labs, etc.). There are donation and volunteer opportunities which could meaningfully affect the chances of a Harris victory. Research by a team we trust indicates the following are the most effective donation opportunities right now: American Independent Radio, the Center for US Voters Abroad, and Working America. We're collecting useful volunteer opportunities (including some projects our team is planning) and would be happy to share them with you. You can sign up to hear more here. Our team is also raising money for some projects which are intuitively compelling and informed by data but lack a prior track record in an attempt to fill gaps in the space. Email us at civicleverageproject@gmail.com for more details. Introduction Working on US presidential elections is unique in that it plausibly allows many individuals, regardless of skill sets and power, to affect the trajectory of the entire world. The US is the world's most powerful state and the only superpower that is also a liberal democracy. If one of the presidential candidates is expected to be much worse in important cause areas (e.g., protecting liberal democracy, AI safety, biosecurity, climate change, global health, or animal welfare), and if the election is expected to be very close, then contributing to helping the better candidate win has extraordinary value. We know of no lever, other than the US presidential election, that allows as many...
undefined
Aug 21, 2024 • 6min

EA - The Life You Can Save suggests people earning 40 to 80 k$/year donate just 1 % by Vasco Grilo

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Life You Can Save suggests people earning 40 to 80 k$/year donate just 1 %, published by Vasco Grilo on August 21, 2024 on The Effective Altruism Forum. The views expressed here are my own, not those of my employers. Summary The Life You Can Save ( TLYCS) suggests people earning 40 to 80 k$/year donate just 1 % of their net income, which I think is too low. Would it even be good if some people in extreme poverty wanted to sign the 10% Pledge, and donated to the most cost-effective animal welfare interventions? Recommendations from The Life You Can Save TLYCS has a calculator to suggest the amount of annual donations based on the country of residence and annual pre-tax income. For people living in the United States earning: Less than 40 k$/year, they "recommend giving whatever you feel you can afford without undue hardship". 40 to 80 k$/year, they "recommend" annual donations equal to 1 % of the annual pre-tax income. TLYCS also emphasises the idea of striving towards one's personal best in terms of effective donations. Which amount of annual donations should be recommended to increase annual donations to their ideal level is an empirical question. I do not have an answer for this, and here I mostly wanted to start a discussion. However, I think the above recommendations are not sufficiently ambitious. A pre-tax income of 40 k$/year in New York for someone single would be a post-tax income of 31.3 k$/year, which is: 31.0 (= 31.3/1.01) times the maximum annual consumption of someone in extreme poverty of 1.01 k$/year (= 2.15* 1.28*365.25). More than what 95 % of the population earns. 1 % of the above post-tax income would be 0.857 $/d (= 0.01*31.3*10^3/365.25), which is 39.1 % (= 0.857/ 2.19) of the cost of the classic McDonald's burger, or 68.6 % (= 0.857/((1 + 1.5)/2)) of the cost of a beer. I believe the ideal donations as a fraction of post-tax income increase with post-tax income, and personally currently donate everything above a target level of savings. At the same time, I like that the 10% Pledge from Giving What We Can (GWWC) involves donating at least 10 % of post-tax income regardless of how much one earns. Besides the reasons given by GWWC: Life satisfaction increases roughly logarithmically with real gross domestic product ( real GDP), which suggests welfare may increase logarithmically with post-tax income. If so, decreasing post-tax income by 10 % would cause the same reduction in welfare regardless of the starting income. In practice, it is unclear to me whether there would be such a reduction in the context of donations. If people with modest incomes donate at least 10 %, people with higher incomes will arguably be more motivated to do so. Donations of people in extreme poverty Would it even be good if some people in extreme poverty wanted to sign the 10% Pledge? It might help with spreading significant giving among people with higher incomes, who would have a hard time arguing they are not wealthy enough. I guess some people in extreme poverty already donate at least 10 % of their net income via tithing. Yet, it is unclear to me if this giving is more/less cost-effective than their own marginal personal consumption, so I do not know whether it is beneficial/harmful. Would it be better if they donated to the most cost-effective animal welfare interventions? From the most to least relevant: I would argue the best animal welfare interventions are way more cost-effective than the marginal personal consumption of people in extreme poverty: I estimated corporate campaigns for chicken welfare, such as the ones supported by The Humane League ( THL), are 1.51 k times as cost-effective as GiveWell's top charities ( at the margin). My understanding is that GiveWell's top charities are 10 times as cost-effective as GiveDirectly ( at the margin), which provide...
undefined
Aug 21, 2024 • 2min

EA - Results of an informal survey on AI grantmaking by Scott Alexander

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Results of an informal survey on AI grantmaking, published by Scott Alexander on August 21, 2024 on The Effective Altruism Forum. I asked readers of my blog with experience in AI alignment (and especially AI grantmaking) to fill out a survey about how they valued different goods. I got 61 responses. I disqualified 11 for various reasons, mostly failing the comprehension check question at the beginning, and kept 50. Because I didn't have a good way to represent the value of "a" dollar for people who might have very different amounts of money managed, I instead asked people to value things in terms of a base unit - a program like MATS graduating one extra technical alignment researcher (at the center, not the margin). So for example, someone might say that "creating" a new AI journalist was worth "creating" two new technical alignment researchers, or vice versa. One of the goods that I asked people to value was $1 million going to a smart, value-aligned grantmaker. This provided a sort of researcher-money equivalence, which turned out to be $125,000 per researcher on median. I rounded to $100,000 and put this in an experimental second set of columns, but the median comes from a wide range of estimates and there are some reasons not to trust it. The results are below. You can see the exact questions and assumptions that respondents were asked to make here. Many people commented that there were ambiguities, additional assumptions needed, or that they were very unsure, so I don't recommend using this as anything other than a very rough starting point. I tried separating responses by policy vs. technical experience, or weighting them by respondent's level of experience/respect/my personal trust in them, but neither of these changed the answers enough to be interesting. You can find the raw data (minus names and potentially identifying comments) here. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Aug 21, 2024 • 4min

EA - Charitable Giving is a Life Hack by Vasco Grilo

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Charitable Giving is a Life Hack, published by Vasco Grilo on August 21, 2024 on The Effective Altruism Forum. This is a crosspost for Charitable Giving is a Life Hack by Connor Jennings, which was published on 24 July 2024. I'm going to start this post with something that will break my boss's heart. I hate work. I've always hated work. Productivity is Everest. Any academic success I had at school was motivated by a desire to finish as quickly as possible, so I could lean back on my chair, and look out the window. When completing a task, I feel the same sluggishness one feels when trying to run in a dream - my psyche is a thick sludge, and it wishes to rot away on the sofa playing Crash Bandicoot. Me, after changing my bed sheets. My early experience with the adult working world rhymed with my adolescence. I used to have a bad habit of switching jobs regularly, because I struggled with the increased productivity experience demands of you. I'm probably not alone when I say that most of my days were spent just trying to get through things so that I could go home and relax. I rarely went above the call of duty, and I always tried to cut corners where I could. In other words, I was a lazy dirtbag. I say this, because despite that not so great aspect of myself, I've now managed to have some success at a fairly demanding job. How? How could a sleepy tortoise such as myself do that? Well, apart from being motivated by the usual fear of homelessness, donating some of my salary has played a big role. I think what causes a lack of motivation is often a lack of purpose. The reason I don't want to excel at some boring office job is because I'm hardly going to get rewarded for it. Even the promise of more money doesn't move the needle, because it just promises more involvement and responsibility in a project I don't care about. I never cared about stationary, cocktails, or my bosses' bottom line, so my previous jobs were always devoid of meaning, and it felt like they wasted what precious little time I had on this Earth. Now, the usual response to this kind of despair is to say you should find work doing something you do care about. That would be great advice if it was universally practical - I'd love it if I could find a job helping animals, writing, or playing Crash Bandicoot. The problem is those jobs are hard to come by, and it's just not viable to expect everyone on Earth to achieve their dreams. In order to have a functioning society, someone has to build spreadsheets, grit the roads, and sit at reception. I don't think most people hate their jobs, but I do think most of us will have to work in roles that we don't find that meaningful. This is quite sad, but I do think there is a solution - Charitable Giving. I started taking donating seriously a year and a half ago. At first, it was a drag. I had just been put on the biggest salary of my life up to that point (not a high bar lol), and I didn't like the fact I had to live within smaller means. However, you get used to it, and the benefits of pledging away a portion of your salary become salient as time goes on. You find that by working to give, you're able to transform an otherwise monotonous job into something that feels truly worthwhile. The crisis of meaningless work is at it's most intense when work is stressful. It's when I'm working late, or have to deal with some seemingly unsolvable problem that I think "Why am I even doing this? This sucks! I should quit and find something else to do". When there's no greater goal that you're suffering for, the suffering is magnified. However, if you work because you're trying to do good, this issue is resolved. You no longer have that conversation with yourself, because now when you're up late you're thinking "Well, if I get a raise, I can help more people!". It closes the despair...
undefined
Aug 21, 2024 • 9min

LW - the Giga Press was a mistake by bhauth

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: the Giga Press was a mistake, published by bhauth on August 21, 2024 on LessWrong. the giga press Tesla decided to use large aluminum castings ("gigacastings") for the frame of many of its vehicles, including the Model Y and Cybertruck. This approach and the "Giga Press" used for it have been praised by many articles and youtube videos, repeatedly called revolutionary and a key advantage. Most cars today are made by stamping steel sheets and spot welding them together with robotic arms. Here's video of a Honda factory. But that's outdated: gigacasting is the future! BYD is still welding stamped steel sheets together, and that's why it can't compete on price with Tesla. Hold on, it seems...BYD prices are actually lower than Tesla's? Much lower? Oh, and Tesla is no longer planning single unitary castings for future vehicles? I remember reading analysis from a couple people with car manufacturing experience, concluding that unitary cast aluminum bodies could have a cost advantage for certain production numbers, like 200k cars, but dies for casting wear out sooner than dies for stamping steel, and as soon as you need to replace them the cost advantage is gone. Also, robotic arms are flexible and stamped panels can be used for multiple car models, and if you already have robots and panels you can use from discontinued car models, the cost advantage is gone. But Tesla was expanding so they didn't have available robots already. So using aluminum casting would probably be slightly more expensive, but not make a big difference. "That seems reasonable", I said to myself, "ふむふむ". And I previously pointed that out, eg here. But things are actually worse than that. aluminum die casting Die casting of aluminum involves injecting liquid aluminum into a die and letting it cool. Liquid aluminum is less dense than solid aluminum, and aluminum being cast doesn't all solidify at the same time. Bigger castings have aluminum flowing over larger distances. The larger the casting, the less evenly the aluminum cools: there's more space for temperature differences in the die, and the aluminum cools as it's injected. As a result, bigger castings have more problems with warping and voids. Also, a bigger casting with the same curvature from warping has bigger position changes. Tesla has been widely criticized for stuff not fitting together properly on the car body. My understanding is that the biggest reason for that is their large aluminum castings being slightly warped. As for voids, they can create weak points; I think they were the reason the cybertruck hitch broke off in this test. Defects from casting are the only explanation for that cast aluminum breaking apart that way. If you want to inject more aluminum as solidification and shrinkage happens, the distance it has to travel is proportional to casting size - unless you use multi-point injection, which Tesla doesn't, and that has its own challenges. Somehow I thought Tesla would have only moved to its "Giga Press" after adequately dealing with those issues, but that was silly of me. One approach being worked on to mitigate warping of large aluminum castings is "rheocasting", where a slurry of solid aluminum in liquid aluminum is injected, reducing the shrinkage from cooling. But that's obviously more viscous and thus requires higher injection pressures which requires high die pressures. aluminum vs steel Back when aluminum established its reputation as "the lightweight higher-performance alternative" to steel, 300 MPa was considered a typical (tensile yield) strength for steel. Typical cast aluminum can almost match that, and high-performance aluminum for aircraft can be >700 MPa. Obviously there are reasons it's not always used: high-strength aluminum requires some more-expensive elements and careful heat-treatment. Any hot welds will r...
undefined
Aug 21, 2024 • 2min

EA - A new interview series with people doing impactful work by Probably Good

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A new interview series with people doing impactful work, published by Probably Good on August 21, 2024 on The Effective Altruism Forum. Probably Good recently launched a new article series, Career Journeys: Interviews with people working to make a difference. In this series, we interview people from a range of fields and career paths to bring a more personal perspective to career planning. Each conversation explores how people got to where they are today, what their day-to-day work entails, and what advice they'd give to others pursuing a similar path. As several " write about your job" posts have noted, jobs are still mysterious both in terms of how people get them and what they actually look like to do. These interviews provide some insight into how varied - and often surprising - lived career experiences can be. We hope they can be a catalyst for our readers to consider new paths and apply any advice they find relevant for their path. Check out our first few interviews, now live on the site: Astronaut ambitions, leaving clinical medicine, and eliminating lead exposure: After a varied career of exploration and changing course, Bal Dhital currently works as a program manager for the Lead Exposure Elimination Project, a charity that aims to end the use of lead-based paint and products around the world. Navigating academia & researching morality: Matti Wilks is currently a lecturer (assistant professor) in Psychology at the University of Edinburgh. Matti's research uses social and developmental psychological approaches to study our moral motivations and actions. From construction engineering to non-profit operations: After several years in construction engineering and a five year bicycle journey in sub-saharan Africa, Bell Arden currently runs the Operations team at The Future Society - a non-profit organization focused on improving AI governance. Do you have someone in mind (including yourself) that might make for a good interview? Or have a suggestion for a specific career path you'd like to learn more about? Feel free to contact us or email us at annabeth@probablygood.org with your suggestions. We can't promise we'll interview every recommended interviewee, but we'd love to hear about anyone who has a particularly interesting career journey that might be helpful for others to read about. Thanks! Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Aug 20, 2024 • 3min

EA - AMA with AIM's Director of Recruitment (and MHI co-founder), Ben Williamson by Ben Williamson

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AMA with AIM's Director of Recruitment (and MHI co-founder), Ben Williamson, published by Ben Williamson on August 20, 2024 on The Effective Altruism Forum. As AIM's Director of Recruitment, I'm running an AMA to answer any questions you may have about applying for our programs, as well as any questions that may be of interest from my other experience (such as co-founding Maternal Health Initiative). Ambitious Impact (formerly Charity Entrepreneurship) currently has applications open until September 15th for two of our programs. Charity Entrepreneurship Incubation Programme AIM Founding to Give You can read more about both programs in this earlier EA Forum post. Please consider applying! Why a personal AMA? Answers to questions can often be subjective. I do not want to claim to speak for every member of AIM's team. As such, I want to make clear that I will be answering in a personal capacity. I think this has a couple of notable benefits: My answers can be a little more candid since I don't have to worry (as much) that I'll say something others may significantly disagree with Application season is busy for us! This saves coordination time in getting agreement on how to respond to any tricky questions It also means that people can ask me questions through this AMA that go beyond AIM's recruitment process and application round... A little about me I've been working at AIM since April 2024. Before that, I co-founded the Maternal Health Initiative with Sarah Eustis-Guthrie. We piloted a training program with the Ghana Health Service to improve the quality of postpartum family planning counseling in the country. In March, we made the decision to shut down the organisation as we do not believe that postpartum family planning is likely to be as cost-effective as other family planning or global health interventions. You can read more about that decision in a recent piece for Asterisk magazine, as well as an earlier EA Forum post. I started Maternal Health Initiative through the 2022 Charity Entrepreneurship Incubation Program. I spent the year prior to this founding and running Effective Self-Help, a project researching the best interventions for individuals to increase their wellbeing and productivity. My job history before that is far more potted and less relevant - from waiting tables and selling hiking shoes to teaching kids survival skills and planting vineyards. Things you could ask me Any questions you may have about what AIM looks for in candidates for our programs and how we select people Questions about getting into entrepreneurship - why to pursue it; how to test fit; paths to upskilling; lessons I've learned from my own (mis)adventures Questions about Maternal Health Initisomething in my experience ative - what we did; lessons I learned; how it feels to shut down More general questions about building a career in impactful work if something in my experience suggests I might be a good person to ask! How the AMA works 1. You post a comment here[1] 2. You wait patiently while I'm on holiday until August 28th[2] 3. I reply to comments on August 29th and 30th 1. ^ If you have a question you'd like to ask in private, you can email me: ben@ charityentrepreneurship [dot] com 2. ^ This was coincidental rather than planned but has the wonderful benefit of ensuring I avoid spending a week refreshing this article feverishly waiting for questions... Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Aug 20, 2024 • 17min

LW - AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work by Rohin Shah

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work, published by Rohin Shah on August 20, 2024 on LessWrong. We wanted to share a recap of our recent outputs with the AF community. Below, we fill in some details about what we have been working on, what motivated us to do it, and how we thought about its importance. We hope that this will help people build off things we have done and see how their work fits with ours. Who are we? We're the main team at Google DeepMind working on technical approaches to existential risk from AI systems. Since our last post, we've evolved into the AGI Safety & Alignment team, which we think of as AGI Alignment (with subteams like mechanistic interpretability, scalable oversight, etc.), and Frontier Safety (working on the Frontier Safety Framework, including developing and running dangerous capability evaluations). We've also been growing since our last post: by 39% last year, and by 37% so far this year. The leadership team is Anca Dragan, Rohin Shah, Allan Dafoe, and Dave Orr, with Shane Legg as executive sponsor. We're part of the overall AI Safety and Alignment org led by Anca, which also includes Gemini Safety (focusing on safety training for the current Gemini models), and Voices of All in Alignment, which focuses on alignment techniques for value and viewpoint pluralism. What have we been up to? It's been a while since our last update, so below we list out some key work published in 2023 and the first part of 2024, grouped by topic / sub-team. Our big bets for the past 1.5 years have been 1) amplified oversight, to enable the right learning signal for aligning models so that they don't pose catastrophic risks, 2) frontier safety, to analyze whether models are capable of posing catastrophic risks in the first place, and 3) (mechanistic) interpretability, as a potential enabler for both frontier safety and alignment goals. Beyond these bets, we experimented with promising areas and ideas that help us identify new bets we should make. Frontier Safety The mission of the Frontier Safety team is to ensure safety from extreme harms by anticipating, evaluating, and helping Google prepare for powerful capabilities in frontier models. While the focus so far has been primarily around misuse threat models, we are also working on misalignment threat models. FSF We recently published our Frontier Safety Framework, which, in broad strokes, follows the approach of responsible capability scaling, similar to Anthropic's Responsible Scaling Policy and OpenAI's Preparedness Framework. The key difference is that the FSF applies to Google: there are many different frontier LLM deployments across Google, rather than just a single chatbot and API (this in turn affects stakeholder engagement, policy implementation, mitigation plans, etc). We're excited that our small team led the Google-wide strategy in this space, and demonstrated that responsible capability scaling can work for large tech companies in addition to small startups. A key area of the FSF we're focusing on as we pilot the Framework, is how to map between the critical capability levels (CCLs) and the mitigations we would take. This is high on our list of priorities as we iterate on future versions. Some commentary (e.g. here) also highlighted (accurately) that the FSF doesn't include commitments. This is because the science is in early stages and best practices will need to evolve. But ultimately, what we care about is whether the work is actually done. In practice, we did run and report dangerous capability evaluations for Gemini 1.5 that we think are sufficient to rule out extreme risk with high confidence. Dangerous Capability Evaluations Our paper on Evaluating Frontier Models for Dangerous Capabilities is the broadest suite of dangerous capability evaluations published...
undefined
Aug 20, 2024 • 1h 28min

LW - Guide to SB 1047 by Zvi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Guide to SB 1047, published by Zvi on August 20, 2024 on LessWrong. We now likely know the final form of California's SB 1047. There have been many changes to the bill as it worked its way to this point. Many changes, including some that were just announced, I see as strict improvements. Anthropic was behind many of the last set of amendments at the Appropriations Committee. In keeping with their "Support if Amended" letter, there are a few big compromises that weaken the upside protections of the bill somewhat in order to address objections and potential downsides. The primary goal of this post is to answer the question: What would SB 1047 do? I offer two versions: Short and long. The short version summarizes what the bill does, at the cost of being a bit lossy. The long version is based on a full RTFB: I am reading the entire bill, once again. In between those two I will summarize the recent changes to the bill, and provide some practical ways to understand what the bill does. After, I will address various arguments and objections, reasonable and otherwise. My conclusion: This is by far the best light-touch bill we are ever going to get. Short Version (tl;dr): What Does SB 1047 Do in Practical Terms? This section is intentionally simplified, but in practical terms I believe this covers the parts that matter. For full details see later sections. First, I will echo the One Thing To Know. If you do not train either a model that requires $100 million or more in compute, or fine tune such an expensive model using $10 million or more in your own additional compute (or operate and rent out a very large computer cluster)? Then this law does not apply to you, at all. This cannot later be changed without passing another law. (There is a tiny exception: Some whistleblower protections still apply. That's it.) Also the standard required is now reasonable care, the default standard in common law. No one ever has to 'prove' anything, nor need they fully prevent all harms. With that out of the way, here is what the bill does in practical terms. IF AND ONLY IF you wish to train a model using $100 million or more in compute (including your fine-tuning costs): 1. You must create a reasonable safety and security plan (SSP) such that your model does not pose an unreasonable risk of causing or materially enabling critical harm: mass casualties or incidents causing $500 million or more in damages. 2. That SSP must explain what you will do, how you will do it, and why. It must have objective evaluation criteria for determining compliance. It must include cybersecurity protocols to prevent the model from being unintentionally stolen. 3. You must publish a redacted copy of your SSP, an assessment of the risk of catastrophic harms from your model, and get a yearly audit. 4. You must adhere to your own SSP and publish the results of your safety tests. 5. You must be able to shut down all copies under your control, if necessary. 6. The quality of your SSP and whether you followed it will be considered in whether you used reasonable care. 7. If you violate these rules, you do not use reasonable care and harm results, the Attorney General can fine you in proportion to training costs, plus damages for the actual harm. 8. If you fail to take reasonable care, injunctive relief can be sought. The quality of your SSP, and whether or not you complied with it, shall be considered when asking whether you acted reasonably. 9. Fine-tunes that spend $10 million or more are the responsibility of the fine-tuner. 10. Fine-tunes spending less than that are the responsibility of the original developer. Compute clusters need to do standard KYC when renting out tons of compute. Whistleblowers get protections. They will attempt to establish a 'CalCompute' public compute cluster. You can also read this summary of h...
undefined
Aug 20, 2024 • 17min

AF - AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work by Rohin Shah

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work, published by Rohin Shah on August 20, 2024 on The AI Alignment Forum. We wanted to share a recap of our recent outputs with the AF community. Below, we fill in some details about what we have been working on, what motivated us to do it, and how we thought about its importance. We hope that this will help people build off things we have done and see how their work fits with ours. Who are we? We're the main team at Google DeepMind working on technical approaches to existential risk from AI systems. Since our last post, we've evolved into the AGI Safety & Alignment team, which we think of as AGI Alignment (with subteams like mechanistic interpretability, scalable oversight, etc.), and Frontier Safety (working on the Frontier Safety Framework, including developing and running dangerous capability evaluations). We've also been growing since our last post: by 39% last year, and by 37% so far this year. The leadership team is Anca Dragan, Rohin Shah, Allan Dafoe, and Dave Orr, with Shane Legg as executive sponsor. We're part of the overall AI Safety and Alignment org led by Anca, which also includes Gemini Safety (focusing on safety training for the current Gemini models), and Voices of All in Alignment, which focuses on alignment techniques for value and viewpoint pluralism. What have we been up to? It's been a while since our last update, so below we list out some key work published in 2023 and the first part of 2024, grouped by topic / sub-team. Our big bets for the past 1.5 years have been 1) amplified oversight, to enable the right learning signal for aligning models so that they don't pose catastrophic risks, 2) frontier safety, to analyze whether models are capable of posing catastrophic risks in the first place, and 3) (mechanistic) interpretability, as a potential enabler for both frontier safety and alignment goals. Beyond these bets, we experimented with promising areas and ideas that help us identify new bets we should make. Frontier Safety The mission of the Frontier Safety team is to ensure safety from extreme harms by anticipating, evaluating, and helping Google prepare for powerful capabilities in frontier models. While the focus so far has been primarily around misuse threat models, we are also working on misalignment threat models. FSF We recently published our Frontier Safety Framework, which, in broad strokes, follows the approach of responsible capability scaling, similar to Anthropic's Responsible Scaling Policy and OpenAI's Preparedness Framework. The key difference is that the FSF applies to Google: there are many different frontier LLM deployments across Google, rather than just a single chatbot and API (this in turn affects stakeholder engagement, policy implementation, mitigation plans, etc). We're excited that our small team led the Google-wide strategy in this space, and demonstrated that responsible capability scaling can work for large tech companies in addition to small startups. A key area of the FSF we're focusing on as we pilot the Framework, is how to map between the critical capability levels (CCLs) and the mitigations we would take. This is high on our list of priorities as we iterate on future versions. Some commentary (e.g. here) also highlighted (accurately) that the FSF doesn't include commitments. This is because the science is in early stages and best practices will need to evolve. But ultimately, what we care about is whether the work is actually done. In practice, we did run and report dangerous capability evaluations for Gemini 1.5 that we think are sufficient to rule out extreme risk with high confidence. Dangerous Capability Evaluations Our paper on Evaluating Frontier Models for Dangerous Capabilities is the broadest suite of dangerous capability evaluati...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app