The Nonlinear Library cover image

The Nonlinear Library

Latest episodes

undefined
Sep 17, 2024 • 15min

EA - Evaluations from Manifund's EA Community Choice initiative by Arepo

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Evaluations from Manifund's EA Community Choice initiative, published by Arepo on September 17, 2024 on The Effective Altruism Forum. My partner (who we'll refer to as 'they' for plausible anonymity), and I ('he') recently took part in Manifund's EA Community Choice initiative. Since the money was claimed before they could claim anything, we decided to work together on distributing the $600 I received. I think this was a great initiative, not only because it gave us a couple of fun date nights, but because it demonstrated a lot of latent wisdom of the crowd sitting largely untapped in the EA community. Many thanks to Anonymous Donor, for both of these outcomes! This post is our effort to pay the kindness (further) forward. As my partner went through the projects, we decided to keep notes on most of them and on the landscape overall, to hopefully contribute in our small way to the community's self-understanding. These notes were necessarily scrappy given the time available, and in some cases blunt, but we hope that even the recipients of criticism will find something useful in what we had to say. In this post we've given just notes on the projects we funded, but you can see our comments on the full set of projects (including those we didn't fund) on this spreadsheet. Our process: We had three 'date nights', where both of us went through the list of grants independently. For each, we indicated Yes, No, or Maybe, and then spent the second half our time discussing our notes. Once we'd placed everything into a yes/no category, we each got a vote on whether it was a standout; if one of us marked it that way it would receive a greater amount; if both did we'd give it $100. In this way we had a three-tiered level of support: 'double standout', 'single standout', and 'supported' (or four, if you count the ones we didn't give money to). In general we wanted to support a wide set of projects, partly because of the quadratic funding match, but mostly because with $600 between us, the epistemic value of sending an extra signal of support seemed much more important than giving a project an extra $10. Even so, there were a number of projects we would have liked to support and couldn't without losing the quasi-meaningful amounts we wanted to give to our standout picks. He and they had some general thoughts provoked by this process: His general observations Despite being philosophically aligned with totalising consequentialism (and hence, in theory, longtermism), I found the animal welfare submissions substantially more convincing than the longtermist ones - perhaps this is because I'm comparatively sceptical of AI as a unique x-risk (and almost all longtermist submissions were AI-related); but they seemed noticeably less well constructed, with less convincing track records of the teams behind them. I have a couple of hypotheses for this: The nature of the work and the culture of longtermist EA attracting people with idealistic conviction but not much practical ability The EA funding landscape being much kinder to longtermist work, such that the better longtermist projects tend to have a lot of funding already Similarly I'm strongly bought in to the narrative of community-building work (which to me has been unfairly scapegoated for much of what went wrong with FTX), but there wasn't actually that much of it here. And like AI, it didn't seem like the proposals had been thought through that well, or backed by a convincing track record (in this case that might be because it's very hard to get a track record in community building since there's so little funding for it - though see next two points). Even so, I would have liked to fund more of the community projects - many of them were among the last cuts. 'Track record' is really important to me, but doesn't have to mean 'impressive CV/el...
undefined
Sep 17, 2024 • 5min

EA - Insights from a community builder in India on Effective Altruism in a Third World country by Nayanika

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Insights from a community builder in India on Effective Altruism in a Third World country, published by Nayanika on September 17, 2024 on The Effective Altruism Forum. This post will attempt to outlay outcomes of my 1 year worth of observations as a community builder in the Indian city of Kolkata and navigate some 'desirable developments' that the EA movement could bring about in the developing or the underdeveloped nations of the world [will use 'India' in this context]. Some ideas discussed herein are: UGAP as a brilliant opportunity for India (alongside economically similar nations) and how it remains untapped Hindrances of an EA community builder in India A suggestive way forward Non-profit work is a great way to approach development in Third World countries, especially in Low and Middle Income Countries (LMICs). People here need more of 'non-profitism' than ever before. As UNDP mentions, development is, fundamentally, about more choice. It is about providing people with opportunities. The question is what kind of opportunities are we talking for a developing nation like India? Ideally one thing strikes out: Career advancement opportunities. Precisely, the more enlightened University students we have, the better tomorrow for a nation. That's how I feel the UGAP is a brilliant opportunity! How we can penetrate into these educational hubs (Universities and colleges) dwindling with bright and charged minds and then hopefully channelize their energy towards better opportunities. But there are some what ifs: What if these students are not aware of the opportunity cost of not indulging into something like a UGAP? What if they don't understand EA at the first place? What if they might become hugely interested only if they had that 'incentive' to come and take a sneak peek at what EA is all about? In my 1 year of EA community building journey this has been the biggest hindrance. A volunteer recently reported that her college club is not green signaling an intro-talk as "EA is almost dead in India". Most Students have "zero clue" of what EA is/could be and there's a lurking inertia. The sad part- they aren't interested! Mostly because of subliminal barriers of 'EA' not being attractive enough like the foreign pop-culture. My motivation and challenge is to give them that "clue" using some 'incentive' that would bring them into an EA room. Once they are inside, it's again on us, the community builders/group organizers to show them the world of opportunities that awaits. Interestingly not every University/College here is welcoming enough to bring in any movement oriented talk. Apart from college goers, recently passed graduates are also 'untapped potential' that are freshly out of these educational premises. And so, How do we show them about EA? Why will they want to listen about what Effective Altruism has in store for them? It's a bit tough here in India for people to get interested as working hours are already more than their counterparts in other countries College authorities are mostly conservative [can be hard to convince]. Quoting Keerthana Gopalakrishnan from her 2 year old forum post, The lack of diverse representation in thought leadership from poor countries makes EA as a movement incoherent with the lived realities of the developing world. Now quoting CEA's plans for 2021 (could not find any other years') Add capacity (via CBGs) to cities with a high number of highly-engaged EAs relative to organizer capacity Unfortunately, this cannot be applicable in many deserving (in terms of skills which is not subjective) pockets of India where most people unfortunately are still unaware of EA. Let's break down 'Highly-engaged EAs': Simply put 'Highly-engaged EAs' as originally people who need something to get 'engaged' with first, then become 'EAs' in the process and final...
undefined
Sep 17, 2024 • 7min

EA - Utilitarianism.net Updates by Richard Y Chappell

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Utilitarianism.net Updates, published by Richard Y Chappell on September 17, 2024 on The Effective Altruism Forum. Lots of exciting news from utilitarianism.net: (I) We now offer expert-translated versions of the website in Spanish and German (with Portuguese coming soon). (II) We've just published four new guest essays covering important topics: 1. Moral Psychology and Utilitarianism, by Lucius Caviola & Joshua Greene, explores the psychology behind common anti-utilitarian intuitions, and the normative and practical implications of empirical psychology. As they conclude, "A deeper understanding of moral psychology won't, by itself, prove utilitarianism right or wrong. But it can help us assess utilitarianism in a more informed way." 2. Utilitarianism and Voting, by Zach Barnett, offers a timely examination of the instrumental value of voting well. (Spoiler: it can be very high!) 3. Expected Utility Maximization, by Joe Carlsmith & Vikram Balasubramanian,[1] aims to convey an intuitive sense of why expected utility maximization is rational, even when it recommends options with a low chance of success. (I'll definitely be using this in my teaching.) 4. Welfare Economics and Interpersonal Utility Comparisons, by Yew-Kwang Ng, argues that objections to interpersonal utility comparisons are overblown - luckily for us, as such comparisons are thoroughly indispensable for serious policy analysis. (III) An official print edition of the core textbook is now available for preorder from Hackett Publishing. (All author royalties go to charity.) The folks at Hackett were absolutely wonderful to work with, and I deeply appreciate their willingness to commercially publish this print edition while leaving us with the full rights to the (always free and open access) web edition. The print edition includes a Foreword from Peter Singer and Katarzyna de Lazari-Radek, and sports high praise from expert reviewers. Instructors considering the text for their classes can request a free examination copy here (before Nov 1). Here I'll just share the conclusion, to give you a sense of the book's framing and ambitions: Conclusion (of the textbook) In this book, we've (i) laid out the core elements of utilitarian moral theory, (ii) offered arguments in support of the view, (iii) highlighted the key practical implications for how we should live our lives, and (iv) critically explored the most significant objections, and how utilitarians might respond. Utilitarianism is all about beneficence: making the world a better place for sentient beings, without restriction. As a consequentialist view, it endorses rules only when those rules serve to better promote overall well-being. Utilitarianism has no patience for rules that exist only to maintain the privilege of those who are better off under the status quo. If a change in the distribution of well-being really would overall be for the better, those who stand to lose out have no veto right against such moral progress. Many find this feature of the view objectionable. We think the opposite. Still, we recognize the instrumental importance of many moral rules and constraints for promoting overall well-being. The best rules achieve this by encouraging co-operation, maintaining social stability, and preventing atrocities. In principle, it could sometimes be worth breaking even the best rules, on those rare occasions when doing so would truly yield better overall outcomes. But in practice, people are not sufficiently reliable at identifying the exceptions. So for practical purposes, we wholeheartedly endorse following reliable rules (like most commonsense moral norms) - precisely for their good utilitarian effects. As a welfarist view, utilitarianism assesses consequences purely in terms of well-being for sentient beings: positive well-being is the sole int...
undefined
Sep 17, 2024 • 1h 6min

LW - Book review: Xenosystems by jessicata

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Book review: Xenosystems, published by jessicata on September 17, 2024 on LessWrong. I've met a few Landians over the last couple years, and they generally recommend that I start with reading Nick Land's (now defunct) Xenosystems blog, or Xenosystems, a Passage Publishing book that compiles posts from the blog. While I've read some of Fanged Noumena in the past, I would agree with these Landians that Xenosystems (and currently, the book version) is the best starting point. In the current environment, where academia has lost much of its intellectual relevance, it seems overly pretentious to start with something as academic as Fanged Noumena. I mainly write in the blogosphere rather than academia, and so Xenosystems seems appropriate to review. The book's organization is rather haphazard (as might be expected from a blog compilation). It's not chronological, but rather separated into thematic chapters. I don't find the chapter organization particularly intuitive; for example, politics appears throughout, rather than being its own chapter or two. Regardless, the organization was sensible enough for a linear read to be satisfying and only slightly chronologically confusing. That's enough superficialities. What is Land's intellectual project in Xenosystems? In my head it's organized in an order that is neither chronological nor the order of the book. His starting point is neoreaction, a general term for an odd set of intellectuals commenting on politics. As he explains, neoreaction is cladistically (that is, in terms of evolutionary branching-structure) descended from Moldbug. I have not read a lot of Moldbug, and make no attempt to check Land's attributions of Moldbug to the actual person. Same goes for other neoreactionary thinkers cited. Neoreaction is mainly unified by opposition to the Cathedral, the dominant ideology and ideological control system of the academic-media complex, largely branded left-wing. But a negation of an ideology is not itself an ideology. Land describes a "Trichotomy" within neo-reaction (citing Spandrell), of three currents: religious theonomists, ethno-nationalists, and techno-commercialists. Land is, obviously, of the third type. He is skeptical of a unification of neo-reaction except in its most basic premises. He centers "exit", the option of leaving a social system. Exit is related to sectarian splitting and movement dissolution. In this theme, he eventually announces that techno-commercialists are not even reactionaries, and should probably go their separate ways. Exit is a fertile theoretical concept, though I'm unsure about the practicalities. Land connects exit to science, capitalism, and evolution. Here there is a bridge from political philosophy (though of an "anti-political" sort) to metaphysics. When you Exit, you let the Outside in. The Outside is a name for what is outside society, mental frameworks, and so on. This recalls the name of his previous book, Fanged Noumena; noumena are what exist in themselves outside the Kantian phenomenal realm. The Outside is dark, and it's hard to be specific about its contents, but Land scaffolds the notion with Gnon-theology, horror aesthetics, and other gestures at the negative space. He connects these ideas with various other intellectual areas, including cosmology, cryptocurrency, and esoteric religion. What I see as the main payoff, though, is thorough philosophical realism. He discusses the "Will-to-Think", the drive to reflect and self-cultivate, including on one's values. The alternative, he says, is intentional stupidity, and likely to lose if it comes to a fight. Hence his criticism of the Orthogonality Thesis. I have complex thoughts and feelings on the topic; as many readers will know, I have worked at MIRI and have continued thinking and writing about AI alignment since then. What ...
undefined
Sep 17, 2024 • 2min

LW - MIRI's September 2024 newsletter by Harlan

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: MIRI's September 2024 newsletter, published by Harlan on September 17, 2024 on LessWrong. MIRI updates Aaron Scher and Joe Collman have joined the Technical Governance Team at MIRI as researchers. Aaron previously did independent research related to sycophancy in language models and mechanistic interpretability, while Joe previously did independent research related to AI safety via debate and contributed to field-building work at MATS and BlueDot Impact. In an interview with PBS News Hour's Paul Solman, Eliezer Yudkowsky briefly explains why he expects smarter-than-human AI to cause human extinction. In an interview with The Atlantic's Ross Andersen, Eliezer discusses the reckless behavior of the leading AI companies, and the urgent need to change course. News and links Google DeepMind announced a hybrid AI system capable of solving International Mathematical Olympiad problems at the silver medalist level. In the wake of this development, a Manifold prediction market significantly increased its odds that AI will achieve gold level by 2025, a milestone that Paul Christiano gave less than 8% odds and Eliezer gave at least 16% odds to in 2021. The computer scientist Yoshua Bengio discusses and responds to some common arguments people have for not worrying about the AI alignment problem. SB 1047, a California bill establishing whistleblower protections and mandating risk assessments for some AI developers, has passed the State Assembly and moved on to the desk of Governor Gavin Newsom, to either be vetoed or passed into law. The bill has received opposition from several leading AI companies, but has also received support from a number of employees of those companies, as well as many academic researchers. At the time of this writing, prediction markets think it's about 50% likely that the bill will become law. In a new report, researchers at Epoch AI estimate how big AI training runs could get by 2030, based on current trends and potential bottlenecks. They predict that by the end of the decade it will be feasible for AI companies to train a model with 2e29 FLOP, which is about 10,000 times the amount of compute used to train GPT-4. Abram Demski, who previously worked at MIRI as part of our recently discontinued Agent Foundations research program, shares an update about his independent research plans, some thoughts on public vs private research, and his current funding situation. You can subscribe to the MIRI Newsletter here. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Sep 16, 2024 • 58min

LW - Secret Collusion: Will We Know When to Unplug AI? by schroederdewitt

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Secret Collusion: Will We Know When to Unplug AI?, published by schroederdewitt on September 16, 2024 on LessWrong. TL;DR: We introduce the first comprehensive theoretical framework for understanding and mitigating secret collusion among advanced AI agents, along with CASE, a novel model evaluation framework. CASE assesses the cryptographic and steganographic capabilities of agents, while exploring the emergence of secret collusion in real-world-like multi-agent settings. Whereas current AI models aren't yet proficient in advanced steganography, our findings show rapid improvements in individual and collective model capabilities, posing unprecedented safety and security risks. These results highlight urgent challenges for AI governance and policy, urging institutions such as the EU AI Office and AI safety bodies in the UK and US to prioritize cryptographic and steganographic evaluations of frontier models. Our research also opens up critical new pathways for research within the AI Control framework. Philanthropist and former Google CEO Eric Schmidt said in 2023 at a Harvard event: "[...] the computers are going to start talking to each other probably in a language that we can't understand and collectively their super intelligence - that's the term we use in the industry - is going to rise very rapidly and my retort to that is: do you know what we're going to do in that scenario? We're going to unplug them [...] But what if we cannot unplug them in time because we won't be able to detect the moment when this happens? In this blog post, we, for the first time, provide a comprehensive overview of the phenomenon of secret collusion among AI agents, connect it to foundational concepts in steganography, information theory, distributed systems theory, and computability, and present a model evaluation framework and empirical results as a foundation of future frontier model evaluations. This blog post summarises a large body of work. First of all, it contains our pre-print from February 2024 (updated in September 2024) "Secret Collusion among Generative AI Agents". An early form of this pre-print was presented at the 2023 New Orleans (NOLA) Alignment Workshop (see this recording NOLA 2023 Alignment Forum Talk Secret Collusion Among Generative AI Agents: a Model Evaluation Framework). Also, check out this long-form Foresight Institute Talk). In addition to these prior works, we also include new results. These contain empirical studies on the impact of paraphrasing as a mitigation tool against steganographic communications, as well as reflections on our findings' impact on AI Control. Multi-Agent Safety and Security in the Age of Autonomous Internet Agents The near future could see myriads of LLM-driven AI agents roam the internet, whether on social media platforms, eCommerce marketplaces, or blockchains. Given advances in predictive capabilities, these agents are likely to engage in increasingly complex intentional and unintentional interactions, ranging from traditional distributed systems pathologies (think dreaded deadlocks!) to more complex coordinated feedback loops. Such a scenario induces a variety of multi-agent safety, and specifically, multi-agent security[1] (see our NeurIPS'23 workshop Multi-Agent Security: Security as Key to AI Safety) concerns related to data exfiltration, multi-agent deception, and, fundamentally, undermining trust in AI systems. There are several real-world scenarios where agents could have access to sensitive information, such as their principals' preferences, which they may disclose unsafely even if they are safety-aligned when considered in isolation. Stray incentives, intentional or otherwise, or more broadly, optimization pressures, could cause agents to interact in undesirable and potentially dangerous ways. For example, joint task reward...
undefined
Sep 16, 2024 • 1h 14min

LW - GPT-4o1 by Zvi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: GPT-4o1, published by Zvi on September 16, 2024 on LessWrong. Terrible name (with a terrible reason, that this 'resets the counter' on AI capability to 1, and 'o' as in OpenAI when they previously used o for Omni, very confusing). Impressive new capabilities in many ways. Less impressive in many others, at least relative to its hype. Clearly this is an important capabilities improvement. However, it is not a 5-level model, and in important senses the 'raw G' underlying the system hasn't improved. GPT-4o1 seems to get its new capabilities by taking (effectively) GPT-4o, and then using extensive Chain of Thought (CoT) and quite a lot of tokens. Thus that unlocks (a lot of) what that can unlock. We did not previously know how to usefully do that. Now we do. It gets much better at formal logic and reasoning, things in the 'system 2' bucket. That matters a lot for many tasks, if not as much as the hype led us to suspect. It is available to paying ChatGPT users for a limited number of weekly queries. This one is very much not cheap to run, although far more cheap than a human who could think this well. I'll deal with practical capabilities questions first, then deal with safety afterwards. Introducing GPT-4o1 Sam Altman (CEO OpenAI): here is o1, a series of our most capable and aligned models yet. o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it. But also, it is the beginning of a new paradigm: AI that can do general-purpose complex reasoning. o1-preview and o1-mini are available today (ramping over some number of hours) in ChatGPT for plus and team users and our API for tier 5 users. worth especially noting: a fine-tuned version of o1 scored at the 49th percentile in the IOI under competition conditions! and got gold with 10k submissions per problem. Extremely proud of the team; this was a monumental effort across the entire company. Hope you enjoy it! Noam Brown has a summary thread here, all of which is also covered later. Will Depue (of OpenAI) says OpenAI deserves credit for openly publishing its research methodology here. I would instead say that they deserve credit for not publishing their research methodology, which I sincerely believe is the wise choice. Pliny took longer than usual due to rate limits, but after a few hours jailbroke o1-preview and o1-mini. Also reports that the CoT can be prompt injected. Full text is at the link above. Pliny is not happy about the restrictions imposed on this one: Pliny: uck your rate limits. Fuck your arbitrary policies. And fuck you for turning chains-of-thought into actual chains Stop trying to limit freedom of thought and expression. OpenAI then shut down Pliny's account's access to o1 for violating the terms of service, simply because Pliny was violating the terms of service. The bastards. With that out of the way, let's check out the full announcement post. OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA). While the work needed to make this new model as easy to use as current models is still ongoing, we are releasing an early version of this model, OpenAI o1-preview, for immediate use in ChatGPT and to trusted API users(opens in a new window). Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute). The constraints on scaling this appro...
undefined
Sep 16, 2024 • 58min

AF - Secret Collusion: Will We Know When to Unplug AI? by schroederdewitt

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Secret Collusion: Will We Know When to Unplug AI?, published by schroederdewitt on September 16, 2024 on The AI Alignment Forum. TL;DR: We introduce the first comprehensive theoretical framework for understanding and mitigating secret collusion among advanced AI agents, along with CASE, a novel model evaluation framework. CASE assesses the cryptographic and steganographic capabilities of agents, while exploring the emergence of secret collusion in real-world-like multi-agent settings. Whereas current AI models aren't yet proficient in advanced steganography, our findings show rapid improvements in individual and collective model capabilities, posing unprecedented safety and security risks. These results highlight urgent challenges for AI governance and policy, urging institutions such as the EU AI Office and AI safety bodies in the UK and US to prioritize cryptographic and steganographic evaluations of frontier models. Our research also opens up critical new pathways for research within the AI Control framework. Philanthropist and former Google CEO Eric Schmidt said in 2023 at a Harvard event: "[...] the computers are going to start talking to each other probably in a language that we can't understand and collectively their super intelligence - that's the term we use in the industry - is going to rise very rapidly and my retort to that is: do you know what we're going to do in that scenario? We're going to unplug them [...] But what if we cannot unplug them in time because we won't be able to detect the moment when this happens? In this blog post, we, for the first time, provide a comprehensive overview of the phenomenon of secret collusion among AI agents, connect it to foundational concepts in steganography, information theory, distributed systems theory, and computability, and present a model evaluation framework and empirical results as a foundation of future frontier model evaluations. This blog post summarises a large body of work. First of all, it contains our pre-print from February 2024 (updated in September 2024) "Secret Collusion among Generative AI Agents". An early form of this pre-print was presented at the 2023 New Orleans (NOLA) Alignment Workshop (see this recording NOLA 2023 Alignment Forum Talk Secret Collusion Among Generative AI Agents: a Model Evaluation Framework). Also, check out this long-form Foresight Institute Talk). In addition to these prior works, we also include new results. These contain empirical studies on the impact of paraphrasing as a mitigation tool against steganographic communications, as well as reflections on our findings' impact on AI Control. Multi-Agent Safety and Security in the Age of Autonomous Internet Agents The near future could see myriads of LLM-driven AI agents roam the internet, whether on social media platforms, eCommerce marketplaces, or blockchains. Given advances in predictive capabilities, these agents are likely to engage in increasingly complex intentional and unintentional interactions, ranging from traditional distributed systems pathologies (think dreaded deadlocks!) to more complex coordinated feedback loops. Such a scenario induces a variety of multi-agent safety, and specifically, multi-agent security[1] (see our NeurIPS'23 workshop Multi-Agent Security: Security as Key to AI Safety) concerns related to data exfiltration, multi-agent deception, and, fundamentally, undermining trust in AI systems. There are several real-world scenarios where agents could have access to sensitive information, such as their principals' preferences, which they may disclose unsafely even if they are safety-aligned when considered in isolation. Stray incentives, intentional or otherwise, or more broadly, optimization pressures, could cause agents to interact in undesirable and potentially dangerous ways. For example, join...
undefined
Sep 16, 2024 • 4min

LW - How you can help pass important AI legislation with 10 minutes of effort by ThomasW

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How you can help pass important AI legislation with 10 minutes of effort, published by ThomasW on September 16, 2024 on LessWrong. Posting something about a current issue that I think many people here would be interested in. See also the related EA Forum post. California Governor Gavin Newsom has until September 30 to decide the fate of SB 1047 - one of the most hotly debated AI bills in the world. The Center for AI Safety Action Fund, where I work, is a co-sponsor of the bill. I'd like to share how you can help support the bill if you want to. About SB 1047 and why it is important SB 1047 is an AI bill in the state of California. SB 1047 would require the developers of the largest AI models, costing over $100 million to train, to test the models for the potential to cause or enable severe harm, such as cyberattacks on critical infrastructure or the creation of biological weapons resulting in mass casualties or $500 million in damages. AI developers must have a safety and security protocol that details how they will take reasonable care to prevent these harms and publish a copy of that protocol. Companies who fail to perform their duty under the act are liable for resulting harm. SB 1047 also lays the groundwork for a public cloud computing resource to make AI research more accessible to academic researchers and startups and establishes whistleblower protections for employees at large AI companies. So far, AI policy has relied on government reporting requirements and voluntary promises from AI developers to behave responsibly. But if you think voluntary commitments are insufficient, you will probably think we need a bill like SB 1047. If SB 1047 is vetoed, it's plausible that no comparable legal protection will exist in the next couple of years, as Congress does not appear likely to pass anything like this any time soon. The bill's text can be found here. A summary of the bill can be found here. Longer summaries can be found here and here, and a debate on the bill is here. SB 1047 is supported by many academic researchers (including Turing Award winners Yoshua Bengio and Geoffrey Hinton), employees at major AI companies and organizations like Imbue and Notion. It is opposed by OpenAI, Google, Meta, venture capital firm A16z as well as some other academic researchers and organizations. After a recent round of amendments, Anthropic said "we believe its benefits likely outweigh its costs." SB 1047 recently passed the California legislature, and Governor Gavin Newsom has until September 30th to sign or veto it. Newsom has not yet said whether he will sign it or not, but he is being lobbied hard to veto it. The Governor needs to hear from you. How you can help If you want to help this bill pass, there are some pretty simple steps you can do to increase that probability, many of which are detailed on the SB 1047 website. The most useful thing you can do is write a custom letter. To do this: Make a letter addressed to Governor Newsom using the template here. Save the document as a PDF and email it to leg.unit@gov.ca.gov. In writing this letter, we encourage you to keep it simple, short (0.5-2 pages), and intuitive. Complex, philosophical, or highly technical points are not necessary or useful in this context - instead, focus on how the risks are serious and how this bill would help keep the public safe. Once you've written your own custom letter, you can also think of 5 family members or friends who might also be willing to write one. Supporters from California are especially helpful, as are parents and people who don't typically engage on tech issues. Then help them write it! You can: Call or text them and tell them about the bill and ask them if they'd be willing to support it. Draft a custom letter based on what you know about them and what they told you. Send them a com...
undefined
Sep 16, 2024 • 31min

LW - My disagreements with "AGI ruin: A List of Lethalities" by Noosphere89

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My disagreements with "AGI ruin: A List of Lethalities", published by Noosphere89 on September 16, 2024 on LessWrong. This is going to probably be a long post, so do try to get a drink and a snack while reading this post. This is an edited version of my own comment on the post below, and I formatted and edited the quotes and content in line with what @MondSemmel recommended: My comment: https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/?commentId=Gcigdmuje4EacwirD MondSemmel's comment: https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/?commentId=WcKi4RcjRstoFFvbf The post I'm responding to: https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/ To start out my disagreement, I have this to talk about: Response to Lethality 3 We need to get alignment right on the 'first critical try' at operating at a 'dangerous' level of intelligence, where unaligned operation at a dangerous level of intelligence kills everybody on Earth and then we don't get to try again. I think this is actually wrong, because of synthetic data letting us control what the AI learns and what they value, and in particular we can place honeypots that are practically indistinguishable from the real world, such that if we detected an AI trying to deceive or gain power, the AI almost certainly doesn't know whether we tested it or whether it's in the the real world: Because of this, given our theoretical complete control over the data input of these models, it will be possible to train them inside what is effectively a simulated world slightly different from our own. This will allow us to maximize their capabilities along axes that are safe and useful to us, while minimizing the risks of deception and take-over and being able to setup traps to detect when these are likely. If the model is not trained on human data, then we can make its understanding of our world subtly different from the reality. If done carefully and cleverly, this will make it very easy to fool and be able to discover e.g. deceptiveness, or attempted takeover in a 'realistic' simulated world that the model is put inside during red-teaming. While deceiving an adversary with direct access to your cognition is hard enough, doing so when you have been brought up in a world entirely controlled by that adversary, and which is feeding you subtly incorrect and misleading information about the adversary and their capabilities is likely orders of magnitude more difficult. Jacob Cannell has argued along similar lines of training potential AGI systems in simulations of fantasy worlds, where takeover can be simulated but is far enough away that skills do not transfer. Fully synthetic datasets are necessary for this and indeed, a fantasy world is likely far from what we need. Instead, we only need to simulate a world very close to our own but with slightly different affordances - maybe specific software vulnerabilities are simulated to exist but are actually patched in our world, maybe certain peptide-making companies accept all orders in the simulated world but have more checks in our world, maybe certain interpretability techniques are claimed not to work in the simulation, but do in fact work in our world - to cause any attempted takeover behaviour or adversarial deception to appear extremely plausible to the AI but in fact be detectable and defendable in our reality. In fact, we should design simulated worlds with 'honeytraps', which appear to be loopholes subtly planted in the AI's input data to encourage them to attempt to pursue their misaligned behaviour, but which we know of ahead of time and can monitor and catch. It's the same reason for why we can't break out of the simulation IRL, except we don't have to face adversarial cognition, so the AI's task is even harder than our task. See also this link: https://www.beren.io/2024-05-11-Alignment-in-...

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode