The Nonlinear Library cover image

The Nonlinear Library

Latest episodes

undefined
Aug 24, 2024 • 14min

EA - Apply to Aether - Independent LLM Agent Safety Research Group by RohanS

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Apply to Aether - Independent LLM Agent Safety Research Group, published by RohanS on August 24, 2024 on The Effective Altruism Forum. The basic idea Aether will be a small group of talented early-career AI safety researchers with a shared research vision who work full-time with mentorship on their best effort at making AI go well. That research vision will broadly revolve around the alignment, control, and evaluation of LLM agents. There is a lot of latent talent in the AI safety space, and this group will hopefully serve as a way to convert some of that talent into directly impactful work and great career capital. Get involved! 1. Submit a short expression of interest here by Fri, Aug 23rd at 11:59pm PT if you would like to contribute to the group as a full-time in-person researcher, part-time / remote collaborator, or advisor. (Note: Short turnaround time!) 2. Apply to join the group here by Sat, Aug 31st at 11:59pm PT. 3. Get in touch with Rohan at rs4126@columbia.edu with any questions. Who are we? Team members so far Rohan Subramani I recently completed my undergrad in CS and Math at Columbia, where I helped run an Effective Altruism group and an AI alignment group. I'm now interning at CHAI. I've done several technical AI safety research projects in the past couple years. I've worked on comparing the expressivities of objective-specification formalisms in RL (at AI Safety Hub Labs, now called LASR Labs), generalizing causal games to better capture safety-relevant properties of agents (in an independent group), corrigibility in partially observable assistance games (my current project at CHAI), and LLM instruction-following generalization (part of an independent research group). I've been thinking about LLM agent safety quite a bit for the past couple of months, and I am now also starting to work on this area as part of my CHAI internship. I think my (moderate) strengths include general intelligence, theoretical research, AI safety takes, and being fairly agentic. A relevant (moderate) weakness of mine is programming. I like indie rock music :). Max Heitmann I hold an undergraduate master's degree (MPhysPhil) in Physics and Philosophy and a postgraduate master's degree (BPhil) in Philosophy from Oxford University. I collaborated with Rohan on the ASH Labs project ( comparing the expressivities of objective-specification formalisms in RL), and have also worked for a short while at the Center for AI Safety (CAIS) under contract as a ghostwriter for the AI Safety, Ethics, and Society textbook. During my two years on the BPhil, I worked on a number of AI safety-relevant projects with Patrick Butlin from FHI. These were focussed on deep learning interpretability, the measurement of beliefs in LLMs, and the emergence of agency in AI systems. In my thesis, I tried to offer a theory of causation grounded in statistical mechanics, and then applied this theory to vindicate the presuppositions of Judea Pearl-style causal modeling and inference. Advisors Erik Jenner and Francis Rhys Ward have said they're happy to at least occasionally provide feedback for this research group. We will continue working to ensure this group receives regular mentorship from experienced researchers with relevant background. We are highly prioritizing working out of an AI safety office because of the informal mentorship benefits this brings. Research agenda We are interested in conducting research on the risks and opportunities for safety posed by LLM agents. LLM agents are goal-directed cognitive architectures powered by one or more large language models (LLMs). The following diagram (taken from On AutoGPT) depicts many of the basic components of LLM agents, such as task decomposition and memory. We think future generations of LLM agents might significantly alter the safety landscape, for two ...
undefined
Aug 24, 2024 • 54min

LW - What's important in "AI for epistemics"? by Lukas Finnveden

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What's important in "AI for epistemics"?, published by Lukas Finnveden on August 24, 2024 on LessWrong. Summary This post gives my personal take on "AI for epistemics" and how important it might be to work on. Some background context: AI capabilities are advancing rapidly and I think it's important to think ahead and prepare for the possible development of AI that could automate almost all economically relevant tasks that humans can do.[1] That kind of AI would have a huge impact on key epistemic processes in our society. (I.e.: It would have a huge impact on how new facts get found, how new research gets done, how new forecasts get made, and how all kinds of information spread through society.) I think it's very important for our society to have excellent epistemic processes. (I.e.: For important decisions in our society to be made by people or AI systems who have informed and unbiased beliefs that take into account as much of the available evidence as is practical.) Accordingly, I'm interested in affecting the development and usage of AI technology in ways that lead towards better epistemic processes. So: How can we affect AI to contribute to better epistemic processes? When looking at concrete projects, here, I find it helpful to distinguish between two different categories of work: 1. Working to increase AIs' epistemic capabilities, and in particular, differentially advancing them compared to other AI capabilities. Here, I also include technical work to measure AIs' epistemic capabilities.[2] 2. Efforts to enable the diffusion and appropriate trust of AI-discovered information. This is focused on social dynamics that could cause AI-produced information to be insufficiently or excessively trusted. It's also focused on AIs' role in communicating information (as opposed to just producing it). Examples of interventions, here, include "create an independent organization that evaluates popular AIs' truthfulness", or "work for countries to adopt good (and avoid bad) legislation of AI communication". I'd be very excited about thoughtful and competent efforts in this second category. However, I talk significantly more about efforts in the first category, in this post. This is just an artifact of how this post came to be, historically - it's not because I think work on the second category of projects is less important.[3] For the first category of projects: Technical projects to differentially advance epistemic capabilities seem somewhat more "shovel-ready". Here, I'm especially excited about projects that differentially boost AI epistemic capabilities in a manner that's some combination of durable and/or especially good at demonstrating those capabilities to key actors. Durable means that projects should (i) take the bitter lesson into account by working on problems that won't be solved-by-default when more compute is available, and (ii) work on problems that industry isn't already incentivized to put huge efforts into (such as "making AIs into generally better agents"). (More on these criteria here.) Two example projects that I think fulfill these criteria (I discuss a lot more projects here): Experiments on what sort of arguments and decompositions make it easier for humans to reach the truth in hard-to-verify areas. (Strongly related to scalable oversight.) Using AI to generate large quantities of forecasting data, such as by automatically generating and resolving questions. Separately, I think there's value in demonstrating the potential of AI epistemic advice to key actors - especially frontier AI companies and governments. When transformative AI (TAI)[4] is first developed, it seems likely that these actors will (i) have a big advantage in their ability to accelerate AI-for-epistemics via their access to frontier models and algorithms, and (ii) that I especially car...
undefined
Aug 24, 2024 • 9min

AF - Owain Evans on Situational Awareness and Out-of-Context Reasoning in LLMs by Michaël Trazzi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Owain Evans on Situational Awareness and Out-of-Context Reasoning in LLMs, published by Michaël Trazzi on August 24, 2024 on The AI Alignment Forum. Owain Evans is an AI Alignment researcher, research associate at the Center of Human Compatible AI at UC Berkeley, and now leading a new AI safety research group. In this episode we discuss two of his recent papers, "Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs" (LW) and "Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data" (LW), alongside some Twitter questions. Below are some highlighted quotes from our conversation (available on Youtube, Spotify, Apple Podcast). For the full context for each of these quotes, you can find the accompanying transcript. Situational Awareness Definition "What is situational awareness? The idea is the model's kind of self-awareness, that is its knowledge of its own identity, and then its awareness of its environment. What are the basic interfaces that it is connected to? [...] And then there's a final point with situational awareness, which is, can the model use knowledge of its identity and environment to take rational actions?" "Situational awareness is crucial for an AI system acting as an agent, doing long-term planning. If you don't understand what kind of thing you are, your capabilities and limitations, it's very hard to make complicated plans. The risks of AI mostly come from agentic models able to do planning." Motivation "We wanted to measure situational awareness in large language models with a benchmark similar to Big Bench or MMLU. The motivation is that situational awareness is important for thinking about AI risks, especially deceptive alignment, and we lacked ways to measure and break it down into components." "Situational awareness is relevant to any situation where the model needs to do agentic long-term planning. [...] A model confused about itself and its situation would likely struggle to pull off such a strategy." On Claude 3 Opus Insightful Answers "Let me explain [the Long Monologue task]. Most of our dataset is typical multiple-choice question answering, but we added a task where models write long answers describing themselves and their situation. The idea is to see if the model can combine different pieces of information about itself coherently and make good inferences about why we're asking these questions. Claude 3 Opus was particularly insightful, guessing it might be part of a research study testing self-awareness in LLMs. These were true inferences not stated in the question. The model was reading between the lines, guessing this wasn't a typical ChatGPT-style interaction. I was moderately surprised, but I'd already seen Opus be very insightful and score well on our benchmark. It's worth noting we sample answers with temperature 1, so there's some randomness. We saw these insights often enough that I don't think it's just luck. Anthropic's post-training RLHF seems good at giving the model situational awareness. The GPT-4 base results were more surprising to us." What Would Saturating The Situational Awareness Benchmark Imply For Safety And Governance "If models can do as well or better than humans who are AI experts, who know the whole setup, who are trying to do well on this task, and they're doing well on all the tasks including some of these very hard ones, that would be one piece of evidence. [...] We should consider how aligned it is, what evidence we have for alignment. We should maybe try to understand the skills it's using." "If the model did really well on the benchmark, it seems like it has some of the skills that would help with deceptive alignment. This includes being able to reliably work out when it's being evaluated by humans, when it has a lot of oversight, and when it needs to...
undefined
Aug 24, 2024 • 7min

LW - "Can AI Scaling Continue Through 2030?", Epoch AI (yes) by gwern

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "Can AI Scaling Continue Through 2030?", Epoch AI (yes), published by gwern on August 24, 2024 on LessWrong. We investigate the scalability of AI training runs. We identify electric power, chip manufacturing, data and latency as constraints. We conclude that 2e29 FLOP training runs will likely be feasible by 2030. Introduction In recent years, the capabilities of AI models have significantly improved. Our research suggests that this growth in computational resources accounts for a significant portion of AI performance improvements. 1 The consistent and predictable improvements from scaling have led AI labs to aggressively expand the scale of training, with training compute expanding at a rate of approximately 4x per year. To put this 4x annual growth in AI training compute into perspective, it outpaces even some of the fastest technological expansions in recent history. It surpasses the peak growth rates of mobile phone adoption (2x/year, 1980-1987), solar energy capacity installation (1.5x/year, 2001-2010), and h uman genome sequencing (3.3x/year, 2008-2015). Here, we examine whether it is technically feasible for the current rapid pace of AI training scaling - approximately 4x per year - to continue through 2030. We investigate four key factors that might constrain scaling: power availability, chip manufacturing capacity, data scarcity, and the "latency wall", a fundamental speed limit imposed by unavoidable delays in AI training computations. Our analysis incorporates the expansion of production capabilities, investment, and technological advancements. This includes, among other factors, examining planned growth in advanced chip packaging facilities, construction of additional power plants, and the geographic spread of data centers to leverage multiple power networks. To account for these changes, we incorporate projections from various public sources: semiconductor foundries' planned expansions, electricity providers' capacity growth forecasts, other relevant industry data, and our own research. We find that training runs of 2e29 FLOP will likely be feasible by the end of this decade. In other words, by 2030 it will be very likely possible to train models that exceed GPT-4 in scale to the same degree that GPT-4 exceeds GPT-2 in scale. 2 If pursued, we might see by the end of the decade advances in AI as drastic as the difference between the rudimentary text generation of GPT-2 in 2019 and the sophisticated problem-solving abilities of GPT-4 in 2023. Whether AI developers will actually pursue this level of scaling depends on their willingness to invest hundreds of billions of dollars in AI expansion over the coming years. While we briefly discuss the economics of AI investment later, a thorough analysis of investment decisions is beyond the scope of this report. For each bottleneck we offer a conservative estimate of the relevant supply and the largest training run they would allow. 3 Throughout our analysis, we assume that training runs could last between two to nine months, reflecting the trend towards longer durations. We also assume that when distributing AI data center power for distributed training and chips companies will only be able to muster about 10% to 40% of the existing supply. 4 Power constraints. Plans for data center campuses of 1 to 5 GW by 2030 have already been discussed, which would support training runs ranging from 1e28 to 3e29 FLOP (for reference, GPT-4 was likely around 2e25 FLOP). Geographically distributed training could tap into multiple regions' energy infrastructure to scale further. Given current projections of US data center expansion, a US distributed network could likely accommodate 2 to 45 GW, which assuming sufficient inter-data center bandwidth would support training runs from 2e28 to 2e30 FLOP. Beyond this, an actor willing to...
undefined
Aug 24, 2024 • 36min

AF - Showing SAE Latents Are Not Atomic Using Meta-SAEs by Bart Bussmann

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Showing SAE Latents Are Not Atomic Using Meta-SAEs, published by Bart Bussmann on August 24, 2024 on The AI Alignment Forum. Bart, Michael and Patrick are joint first authors. Research conducted as part of MATS 6.0 in Lee Sharkey and Neel Nanda's streams. Thanks to Mckenna Fitzgerald and Robert Krzyzanowski for their feedback! TL;DR: Sparse Autoencoder (SAE) latents have been shown to typically be monosemantic (i.e. correspond to an interpretable property of the input). It is sometimes implicitly assumed that they are therefore atomic, i.e. simple, irreducible units that make up the model's computation. We provide evidence against this assumption by finding sparse, interpretable decompositions of SAE decoder directions into seemingly more atomic latents, e.g. Einstein -> science + famous + German + astronomy + energy + starts with E We do this by training meta-SAEs, an SAE trained to reconstruct the decoder directions of a normal SAE. We argue that, conceptually, there's no reason to expect SAE latents to be atomic - when the model is thinking about Albert Einstein, it likely also thinks about Germanness, physicists, etc. Because Einstein always entails those things, the sparsest solution is to have the Albert Einstein latent also boost them. Key results SAE latents can be decomposed into more atomic, interpretable meta-latents. We show that when latents in a larger SAE have split out from latents in a smaller SAE, a meta SAE trained on the larger SAE often recovers this structure. We demonstrate that meta-latents allow for more precise causal interventions on model behavior than SAE latents on a targeted knowledge editing task. We believe that the alternate, interpretable decomposition using MetaSAEs casts doubt on the implicit assumption that SAE latents are atomic. We show preliminary results that MetaSAE latents have significant ovelap with latents in a normal SAE of the same size but may relate differently to the larger SAEs used in MetaSAE training. We made a dashboard that lets you explore meta-SAE latents. Terminology: Throughout this post we use "latents" to describe the concrete components of the SAE's dictionary, whereas "feature" refers to the abstract concepts, following Lieberum et al. Introduction Mechanistic interpretability (mech interp) attempts to understand neural networks by breaking down their computation into interpretable components. One of the key challenges of this line of research is the polysemanticity of neurons, meaning they respond to seemingly unrelated inputs. Sparse autoencoders (SAEs) have been proposed as a method for decomposing model activations into sparse linear sums of latents. Ideally, these latents should be monosemantic i.e. respond to inputs that clearly share a similar meaning (implicitly, from the perspective of a human interpreter). That is, a human should be able to reason about the latents both in relation to the features to which they are associated, and also use the latents to better understand the model's overall behavior. There is a popular notion, both implicitly in related work on SAEs within mech interp and explicitly by the use of the term "atom" in sparse dictionary learning as a whole, that SAE features are atomic or can be "true features". However, monosemanticity does not imply atomicity. Consider the example of shapes of different colors - the set of shapes is [circle, triangle, square], and the set of colors is [white, red, green, black], each of which is represented with a linear direction. 'Red triangle' represents a monosemantic feature, but not an atomic feature, as it can be decomposed into red and triangle. It has been shown that sufficiently wide SAEs on toy models will learn 'red triangle', rather than representing 'red' and 'triangle' with separate latents. Furthermore, whilst one may naively re...
undefined
Aug 23, 2024 • 17min

LW - How I started believing religion might actually matter for rationality and moral philosophy by zhukeepa

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How I started believing religion might actually matter for rationality and moral philosophy, published by zhukeepa on August 23, 2024 on LessWrong. After the release of Ben Pace's extended interview with me about my views on religion, I felt inspired to publish more of my thinking about religion in a format that's more detailed, compact, and organized. This post is the first publication in my series of intended posts about religion. Thanks to Ben Pace, Chris Lakin, Richard Ngo, Renshin Lauren Lee, Mark Miller, and Imam Ammar Amonette for their feedback on this post, and thanks to Kaj Sotala, Tomáš Gavenčiak, Paul Colognese, and David Spivak for reviewing earlier versions of this post. Thanks especially to Renshin Lauren Lee and Imam Ammar Amonette for their input on my claims about religion and inner work, and Mark Miller for vetting my claims about predictive processing. In Waking Up, Sam Harris wrote:[1] But I now knew that Jesus, the Buddha, Lao Tzu, and the other saints and sages of history had not all been epileptics, schizophrenics, or frauds. I still considered the world's religions to be mere intellectual ruins, maintained at enormous economic and social cost, but I now understood that important psychological truths could be found in the rubble. Like Sam, I've also come to believe that there are psychological truths that show up across religious traditions. I furthermore think these psychological truths are actually very related to both rationality and moral philosophy. This post will describe how I personally came to start entertaining this belief seriously. "Trapped Priors As A Basic Problem Of Rationality" "Trapped Priors As A Basic Problem of Rationality" was the title of an AstralCodexTen blog post. Scott opens the post with the following: Last month I talked about van der Bergh et al's work on the precision of sensory evidence, which introduced the idea of a trapped prior. I think this concept has far-reaching implications for the rationalist project as a whole. I want to re-derive it, explain it more intuitively, then talk about why it might be relevant for things like intellectual, political and religious biases. The post describes Scott's take on a predictive processing account of a certain kind of cognitive flinch that prevents certain types of sensory input from being perceived accurately, leading to beliefs that are resistant to updating.[2] Some illustrative central examples of trapped priors: Karl Friston has written about how a traumatized veteran might not hear a loud car as a car, but as a gunshot instead. Scott mentions phobias and sticky political beliefs as central examples of trapped priors. I think trapped priors are very related to the concept that "trauma" tries to point at, but I think "trauma" tends to connote a subset of trapped priors that are the result of some much more intense kind of injury. "Wounding" is a more inclusive term than trauma, but tends to refer to trapped priors learned within an organism's lifetime, whereas trapped priors in general also include genetically pre-specified priors, like a fear of snakes or a fear of starvation. My forays into religion and spirituality actually began via the investigation of my own trapped priors, which I had previously articulated to myself as "psychological blocks", and explored in contexts that were adjacent to therapy (for example, getting my psychology dissected at Leverage Research, and experimenting with Circling). It was only after I went deep in my investigation of my trapped priors that I learned of the existence of traditions emphasizing the systematic and thorough exploration of trapped priors. These tended to be spiritual traditions, which is where my interest in spirituality actually began.[3] I will elaborate more on this later. Active blind spots as second-order trapp...
undefined
Aug 23, 2024 • 50min

LW - Turning 22 in the Pre-Apocalypse by testingthewaters

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Turning 22 in the Pre-Apocalypse, published by testingthewaters on August 23, 2024 on LessWrong. Meta comment for LessWrong readers[1] Something Different This Way Comes - Part 1 In which I attempt to renegotiate rationalism as a personal philosophy, and offer my alternative - Game theory is not a substitute for real life - Heuristics over theories Introduction This essay focuses on outlining an alternative to the ideology of rationalism. As part of this, I offer my definition of the rationalist project, my account of its problems, and my concept of a counter-paradigm for living one's life. The second part of this essay will examine the political implications of rationalism and try to offer an alternative on a larger scale. Defining Rationalism To analyse rationalism, I must first define what I am analysing. Rationalism (as observed in vivo on forums like LessWrong) is a loose constellation of ideas radiating out of various intellectual traditions, amongst them Bayesian statistics, psychological decision theories, and game theory. These are then combined with concepts in sub-fields of computer science (AI and simulation modelling), economics (rational actor theory or homo economicus), politics (libertarianism), psychology (evolutionary psychology) and ethics (the utilitarianism of Peter Singer). The broad project of rationalism aims to generalise the insights of these traditions into application at both the "wake up and make a sandwich" and the "save the world" level. Like any good tradition, it has a bunch of contradictions embedded: Some of these include intuitionism (e.g. when superforecasters talk about going with their gut) vs deterministic analysis (e.g. concepts of perfect game-players and k-level rationality). Another one is between Bayesianism (which is about updating priors about the world based on evidence received, generally without making any causal assumptions) vs systemisation (which is about creating causal models/higher level representations of real life situations to understand them better). In discussing this general state of rhetorical confusion I am preceded by Philip Agre's Towards a Critical Technical Practice, which is AI specific but still quite instructive. The broader rationalist community (especially online) includes all sorts of subcultures but generally there are in group norms that promote certain technical argot ("priors", "updating"), certain attitudes towards classes of entities ("blank faces"/bureaucrats/NPCs/the woke mob etc), and certain general ideas about how to solve "wicked problems" like governance or education. There is some overlap with online conservatives, libertarians, and the far-right. There is a similar overlap with general liberal technocratic belief systems, generally through a belief in meritocracy and policy solutions founded on scientific or technological principles. At the root of this complex constellation there seems to be a bucket of common values which are vaguely expressed as follows: 1. The world can be understood and modelled by high level systems that are constructed based on rational, clearly defined principles and refined by evidence/observation. 2. Understanding and use of these systems enables us to solve high level problems (social coordination, communication, AI alignment) as well as achieving our personal goals. 3. Those who are more able to comprehend and use these models are therefore of a higher agency/utility and higher moral priority than those who cannot. There is also a fourth law which can be constructed from the second and third: By thinking about this at all, by starting to consciously play the game of thought-optimisation and higher order world-modelling, you (the future rationalist) have elevated yourself above the "0-level" player who does not think about such problems and naively pur...
undefined
Aug 23, 2024 • 6min

LW - If we solve alignment, do we die anyway? by Seth Herd

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: If we solve alignment, do we die anyway?, published by Seth Herd on August 23, 2024 on LessWrong. I'm aware of good arguments that this scenario isn't inevitable, but it still seems frighteningly likely even if we solve technical alignment. TL;DR: 1. If we solve alignment, it will probably be used to create AGI that follows human orders. 2. If takeoff is slow-ish, a pivotal act (preventing more AGIs from being developed) will be difficult. 3. If no pivotal act is performed, RSI-capable AGI proliferates. This creates an n-way non-iterated Prisoner's Dilemma where the first to attack, wins. 4. Disaster results. The first AGIs will probably be aligned to take orders People in charge of AGI projects like power. And by definition, they like their values somewhat better than the aggregate values of all of humanity. It also seems like there's a pretty strong argument that Instruction-following AGI is easier than value aligned AGI. In the slow-ish takeoff we expect, this alignment target seems to allow for error-correcting alignment, in somewhat non-obvious ways. If this argument holds up even weakly, it will be an excuse for the people in charge to do what they want to anyway. I hope I'm wrong and value-aligned AGI is just as easy and likely. But it seems like wishful thinking at this point. The first AGI probably won't perform a pivotal act In realistically slow takeoff scenarios, the AGI won't be able to do anything like make nanobots to melt down GPUs. It would have to use more conventional methods, like software intrusion to sabotage existing projects, followed by elaborate monitoring to prevent new ones. Such a weak attempted pivotal act could fail, or could escalate to a nuclear conflict. Second, the humans in charge of AGI may not have the chutzpah to even try such a thing. Taking over the world is not for the faint of heart. They might get it after their increasingly-intelligent AGI carefully explains to them the consequences of allowing AGI proliferation, or they might not. If the people in charge are a government, the odds of such an action go up, but so do the risks of escalation to nuclear war. Governments seem to be fairly risk-taking. Expecting governments to not just grab world-changing power while they can seems naive, so this is my median scenario. So RSI-capable AGI may proliferate until a disaster occurs If we solve alignment and create personal intent aligned AGI but nobody manages a pivotal act, I see a likely future world with an increasing number of AGIs capable of recursively self-improving. How long until someone tells their AGI to hide, self-improve, and take over? Many people seem optimistic about this scenario. Perhaps network security can be improved with AGIs on the job. But AGIs can do an end-run around the entire system: hide, set up self-replicating manufacturing (robotics is rapidly improving to allow this), use that to recursively self-improve your intelligence, and develop new offensive strategies and capabilities until you've got one that will work within an acceptable level of viciousness.[1] If hiding in factories isn't good enough, do your RSI manufacturing underground. If that's not good enough, do it as far from Earth as necessary. Take over with as little violence as you can manage or as much as you need. Reboot a new civilization if that's all you can manage while still acting before someone else does. The first one to pull the stops probably wins. This looks all too much like a non-iterated Prisoner's Dilemma with N players - and N increasing. Counterarguments/Outs For small numbers of AGI and similar values among their wielders, a collective pivotal act could be performed. I place some hopes here, particularly if political pressure is applied in advance to aim for this outcome, or if the AGIs come up with better cooperation stru...
undefined
Aug 23, 2024 • 7min

AF - Invitation to lead a project at AI Safety Camp (Virtual Edition, 2025) by Linda Linsefors

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Invitation to lead a project at AI Safety Camp (Virtual Edition, 2025), published by Linda Linsefors on August 23, 2024 on The AI Alignment Forum. Do you have AI Safety research ideas that you would like to work on with others? Is there a project you want to do and you want help finding a team? AI Safety Camp could be the solution for you! Summary AI Safety Camp Virtual is a 3-month long online research program from January to April 2025, where participants form teams to work on pre-selected projects. We want you to suggest the projects! If you have an AI Safety project idea and some research experience, apply to be a Research Lead. If accepted, we offer some assistance to develop your idea into a plan suitable for AI Safety Camp. When project plans are ready, we open up team member applications. You get to review applications for your team, and select who joins as a team member. From there, it's your job to guide work on your project. Who is qualified? We require that you have some previous research experience. If you are at least 1 year into a PhD or if you have completed an AI Safety research program (such as a previous AI Safety Camp, PIBBSS, MATS, and similar), or done a research internship with an AI Safety org, then you are qualified already. Other research experience can count, too. More senior researchers are of course also welcome, as long as you think our format of leading an online team inquiring into your research questions suits you and your research. We require that all Research Leads are active participants in their projects and spend at least 10h/week on AISC. Apply here If you are unsure, or have any questions you are welcome to: Book a call with Robert Send an email to Robert Choosing project idea(s) AI Safety Camp is about ensuring future AIs are either reasonably safe or not built at all. We welcome many types of projects including projects aimed at stopping or pausing AI development, aligning AI, deconfusion research, or anything else you think will help make the world safer from AI. If you like, you can read more of our perspectives on AI safety, or look at past projects. If you already have an idea for what project you would like to lead, that's great. Apply with that one! However, you don't need to come up with an original idea. What matters is you understanding the idea you want to work on, and why. If you base your proposal on someone else's idea, make sure to cite them. 1. For ideas on stopping harmful AI, see here and/or email Remmelt. 2. For some mech-interp ideas see here. 3. We don't have specific recommendations for where to find other types of project ideas, so just take inspiration wherever you find it. You can submit as many project proposals as you want. However, you are only allowed to lead one project. Use this template to describe each of your project proposals. We want one document per proposal. We'll help you improve your project As part of the Research Lead application process, we'll help you improve your project. The organiser whose ideas match best with yours, will work with you to create the best version of your project. We will also ask for assistance from previous Research Leads, and up to a handful of other trusted people, to give you additional feedback. Timeline Research Lead applications September 22 (Sunday): Application deadline for Research Leads. October 20 (Sunday): Deadline for refined proposals. Team member applications: October 25 (Friday): Accepted proposals are posted on the AISC website. Application to join teams open. November 17 (Sunday): Application to join teams closes. December 22 (Sunday): Deadline for Research Leads to choose their team. Program Jan 11 - 12: Opening weekend. Jan 13 Apr 25: Research is happening. Teams meet weekly, and plan in their own work hours. April 26 - 27 (preliminary dates):...
undefined
Aug 23, 2024 • 3min

EA - Small simple way to promote effective giving while making people feel good by DMMF

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Small simple way to promote effective giving while making people feel good, published by DMMF on August 23, 2024 on The Effective Altruism Forum. I'm a big fan of small, underrated acts that can have outsized positive impact (both in terms EA and in life more generally). I'd like to share one such practice I've incorporated into my life this year that brings me joy and I think some others here would enjoy: redirecting money owed to me towards Givewell recommended charities. Whenever someone owes me money or is trying to solicit my time, instead of asking for direct payment, I request they instead contribute that amount to a GiveWell-recommended charity. This comes up in several contexts: 1. Friends repaying me miscellaneous expenses 2. Buyers purchasing items from me (ie selling something used online) 3. Being solicited for my participation in programs or sales pitches I appreciate this just sounds like "Isn't this just offsetting your own charitable giving?" But I believe this approach creates additional value beyond the off-set donation: 1. Exposure Effect: By enabling others to donate, they learn about GiveWell, experience the act of giving, and potentially become more likely to donate to effective causes in the future. My observation is that most people who have made donations through this often feel very happy about having done so. 2. Social Lubrication: With friends, arranging repayment for small amounts can be awkward. Suggesting a charitable donation instead often feels more socially graceful and reduces friction. 3. Price Elasticity of Altruism: When selling items, I've noticed people are often willing to agree to a higher price if it's going to charity rather than my pocket. It's as if we're suddenly on a joint mission to do good zero sum negotiation. 4. Solicitation Arbitrage: I'm contacted through my work with near constant offers to meet someone in exchange for an Amazon GC. I've found people are often willing to donate x2.5 the amount to an effective charity instead of what they'd pay me directly. It's like discovering a hidden exchange rate between corporate incentives and altruism. 5. Memetic Spread: The quirkiness of this approach often leads people to share their experience, potentially spreading Givewell recommended charities further. I don't want to pretend this has a major impact, but it brings me joy, creates positive externalities, and serves as a constant reminder of our capacity to do good in small, everyday interactions. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app