The Nonlinear Library

The Nonlinear Fund
undefined
Apr 29, 2024 • 3min

AF - AISC9 has ended and there will be an AISC10 by Linda Linsefors

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AISC9 has ended and there will be an AISC10, published by Linda Linsefors on April 29, 2024 on The AI Alignment Forum. The 9th AI Safety Camp (AISC9) just ended, and as usual, it was a success! Follow this link to find project summaries, links to their outputs, recordings to the end of camp presentations and contact info to all our teams in case you want to engage more. AISC9 both had the largest number of participants (159) and the smallest number of staff (2) of all the camps we've done so far. Me and Remmelt have proven that if necessary, we can do this with just the two of us, and luckily our fundraising campaign raised just enough money to pay me and Remmelt to do one more AISC. After that, the future is more uncertain, but that's almost always the case for small non profit projects. Get involved in AISC10 AISC10 will follow the same format and timeline (shifted by one year) as AISC9. Approximate timeline August: Planning and preparation for the organisers September: Research lead applications are open October: We help the research lead applicants improve their project proposals November: Team member applications are open December: Research leads interviews and select their team Mid-January to Mid April: The camp itself, i.e. each team works on their project. Help us give feedback on the next round of AISC projects An important part of AISC is that we give individual feedback on all project proposals we receive. This is very staff intensive and the biggest bottleneck in our system. If you're interested in helping with giving feedback on proposals in certain research areas, please email me at linda.linsefors@gmail.com. Apply as a research lead - Applications will open in September As an AISC research lead, you will both plan and lead your project. The AISC staff, and any volunteer helpers (see previous section) will provide you with feedback on your project proposal. If your project proposal is accepted, we'll help you recruit a team to help you realise your plans. We'll broadcast your project, together with all the other accepted project proposals. We'll provide structure and guidance for the team recruitment. You will choose which applications you think are promising, do the interviews and the final selection for your project team. If you are unsure if this is for you, you're welcome to contact Remmelt ( remmelt@aisafety.camp) specifically for stop/pause AI projects, or me ( linda.linsefors@gmail.com) for anything else. Apply as a team member - Applications will open in November Approximately at the start of November, we'll share all the accepted project proposals on our website. If you have some time to spare in January-April 2025, you should read them and apply to the projects you like. We have room for more funding Our stipends pot is empty or very close to empty (accounting for AISC9 is not finalised), if you want to help rectify this by adding some more money, please contact remmelt@aisafety.camp. If we get some funding for this, but not enough for everyone, we will prioritise giving stipends to people from low and mid income countries, because we believe that a little money goes a long way for these participants. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
undefined
Apr 29, 2024 • 21min

LW - [Aspiration-based designs] 1. Informal introduction by B Jacobs

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Aspiration-based designs] 1. Informal introduction, published by B Jacobs on April 29, 2024 on LessWrong. Sequence Summary. This sequence documents research by SatisfIA, an ongoing project on non-maximizing, aspiration-based designs for AI agents that fulfill goals specified by constraints ("aspirations") rather than maximizing an objective function. We aim to contribute to AI safety by exploring design approaches and their software implementations that we believe might be promising but neglected or novel. Our approach is roughly related to but largely complementary to concepts like quantilization and satisficing (sometimes called "soft-optimization"), Decision Transformers, and Active Inference. This post describes the purpose of the sequence, motivates the research, describes the project status, our working hypotheses and theoretical framework, and has a short glossary of terms. It does not contain results and can safely be skipped if you want to get directly into the actual research. Epistemic status: We're still in the exploratory phase, and while the project has yielded some preliminary insights, we don't have any clear conclusions at this point. Our team holds a wide variety of opinions about the discoveries. Nothing we say is set in stone. Purpose of the sequence Inform: We aim to share our current ideas, thoughts, disagreements, open questions, and any results we have achieved thus far. By openly discussing the complexities and challenges we face, we seek to provide a transparent view of our project's progression and the types of questions we're exploring. Receive Feedback: We invite feedback on our approaches, hypotheses, and findings. Constructive criticism, alternative perspectives, and further suggestions are all welcome. Attract Collaborators: Through this sequence, we hope to resonate with other researchers and practitioners who our exploration appeals to and who are motivated by similar questions. Our goal is to expand our team with individuals who can contribute their unique expertise and insights. Motivation We share a general concern regarding the trajectory of Artificial General Intelligence (AGI) development, particularly the risks associated with creating AGI agents designed to maximize objective functions. We have two main concerns: (I) AGI development might be inevitable (We assume this concern needs no further justification) (II) It might be impossible to implement an objective function the maximization of which would be safe The conventional view on A(G)I agents (see, e.g., Wikipedia) is that they should aim to maximize some function of the state or trajectory of the world, often called a "utility function", sometimes also called a "welfare function". It tacitly assumes that there is such an objective function that can adequately make the AGI behave in a moral way. However, this assumption faces several significant challenges: Moral ambiguity: The notion that a universally acceptable, safe utility function exists is highly speculative. Given the philosophical debates surrounding moral cognitivism and moral realism and similar debates in welfare economics, it is possible that there are no universally agreeable moral truths, casting doubt on the existence of a utility function that encapsulates all relevant ethical considerations. Historical track-record: Humanity's long-standing struggle to define and agree upon universal values or ethical standards raises skepticism about our capacity to discover or construct a comprehensive utility function that safely governs AGI behavior (Outer Alignment) in time. Formal specification and Tractability: Even if a theoretically safe and comprehensive utility function could be conceptualized, the challenges of formalizing such a function into a computable and tractable form are immense. This includes the dif...
undefined
Apr 28, 2024 • 3min

AF - [Aspiration-based designs] Outlook: dealing with complexity by Jobst Heitzig

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Aspiration-based designs] Outlook: dealing with complexity, published by Jobst Heitzig on April 28, 2024 on The AI Alignment Forum. Summary. This teaser post sketches our current ideas for dealing with more complex environments. It will ultimately be replaced by one or more longer posts describing these in more detail. Reach out if you would like to collaborate on these issues. Multi-dimensional aspirations For real-world tasks that are specified in terms of more than a single evaluation metric, e.g., how much apples to buy and how much money to spend at most, we can generalize Algorithm 2 as follows from aspiration intervals to convex aspiration sets: Assume there are d>1 many evaluation metrics ui, combined into a vector-valued evaluation metric u=(u1,…,ud). Preparation: Pick d+1 many linearly independent linear combinations fj in the space spanned by these metrics, and consider the d+1 many policies πj each of which maximizes the expected value of the corresponding function fj. Let Vj(s) and Qj(s,a) be the expected values of u when using πj in state s or after using action a in state s, respectively (see Fig. 1). Let the admissibility simplices V(s) and Q(s,a) be the simplices spanned by the vertices Vj(s) and Qj(s,a), respectively (red and violet triangles in Fig. 1). They replace the feasibility intervals used in Algorithm 2. Policy: Given a convex state-aspiration set E(s)V(s) (central green polyhedron in Fig. 1), compute its midpoint (centre of mass) m and consider the d+1 segments ℓj from m to the corners Vj(s) of V(s) (dashed black lines in Fig. 1). For each of these segments ℓj, let Aj be the (nonempty!) set of actions for which ℓj intersects Q(s,a). For each aAj, compute the action-aspiration E(s,a)Q(s,a) by shifting a copy Cj of E(s) along ℓj towards Vj(s) until the intersection of Cj and ℓj is contained in the intersection of Q(s,a) and ℓj (half-transparent green polyhedra in Fig. 1), and then intersecting Cj with Q(s,a) to give E(s,a) (yellow polyhedra in Fig. 1). Then pick one candidate action from each Aj and randomize between these d+1 actions in proportions so that the corresponding convex combination of the sets E(s,a) is included in E(s). Note that this is always possible because m is in the convex hull of the sets Cj and the shapes of the sets E(s,a) "fit" into E(s) by construction. Aspiration propagation: After observing the successor state s', the action-aspiration E(s,a) is rescaled linearly from Q(s,a) to V(s') to give the next state-aspiration E(s'), see Fig. 2. (We also consider other variants of this general idea) Hierarchical decision making A common way of planning complex tasks is to decompose them into a hierarchy of two or more levels of subtasks. Similar to existing approaches from hierarchical reinforcement learning, we imagine that an AI system can make such hierarchical decisions as depicted in the following diagram (shown for only two hierarchical levels, but obviously generalizable to more levels): Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
undefined
Apr 28, 2024 • 20min

AF - [Aspiration-based designs] 3. Performance and safety criteria, and aspiration intervals by Jobst Heitzig

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Aspiration-based designs] 3. Performance and safety criteria, and aspiration intervals, published by Jobst Heitzig on April 28, 2024 on The AI Alignment Forum. Summary. In this post, we extend the basic algorithm by adding criteria for choosing the two candidate actions the algorithm mixes, and by generalizing the goal from making the expected Total equal a particular value to making it fall into a particular interval. We only use simple illustrative examples of performance and safety criteria and reserve the discussion of more useful criteria for later posts. Introduction: using the gained freedom to increase safety After having introduced the basic structure of our decision algorithms in the last post, in this post we will focus on the core question: How shall we make use of the freedom gained from having aspiration-type goals rather than maximization goals? After all, while there is typically only a single policy that maximize some objective function (or very few, more or less equivalent policies), there is typically a much larger set of policies that fulfill some constraints (such as the aspiration to make the expected Total equal some desired value). More formally: Let us think of the space of all (probabilistic) policies, Π, as a compact convex subset of a high-dimensional vector space with dimension d1 and Lebesgue measure μ. Let us call a policy πΠ successful iff it fulfills the specified goal, G, and let ΠGΠ be the set of successful policies. Then this set has typically zero measure, μ(ΠG)=0, and low dimension, dim(ΠG)d, if the goal is a maximization goals, but it has large dimension, dim(ΠG)d, for most aspiration-type goals. E.g., if the goal is to make expected Total equal an aspiration value, Eτ=E, we typically have dim(ΠG)=d1 but still μ(ΠG)=0. At the end of this post, we discuss how the set of successful policies can be further enlarged by switching from aspiration values to aspiration intervals to encode goals, which makes the set have full dimension, dim(ΠG)=d, and positive measure, μ(ΠG)>0. What does that mean? It means we have a lot of freedom to choose the actual policy πΠG that the agent should use to fulfill an aspiration-type goal. We can try to use this freedom to choose policies that promise to be rather safe than unsafe according to some generic safety metric, similar to the impact metrics used in reward function regularization for maximizers. Depending on the type of goal, we might also want to use this freedom to choose policies that fulfill the goal in a rather desirable than undesirable way according to some goal-related performance metric. In this post, we will illustrate this with only very few, "toy" safety metrics, and one rather simple goal-related performance metric, to exemplify how such metrics might be used in our framework. In a later post, we will then discuss more sophisticated and hopefully more useful safety metrics. Let us begin with a simple goal-related performance metric since that is the most straightforward. Simple example of a goal-related performance metric Recall that in step 2 of the basic algorithm, we could make the agent pick any action a whose action-aspiration is at most as large as the current state-aspiration, E(s,a)E(s), and it can also pick any other action, a+, whose action-aspiration is at least as large as the current state-aspiration, E(s,a+)E(s). This flexibility is because in steps 3 and 4 of the algorithm, the agent is still able to randomize between these two actions a,a+ in a way that makes expected Total, Eτ, become exactly E(s). If one had an optimization mindset, one might immediately get the idea to not only match the desired expectation for the Total, but also to minimize the variability of the Total, as measured by some suitable statistic such as its variance. In a sequential decision makin...
undefined
Apr 28, 2024 • 29min

AF - [Aspiration-based designs] 2. Formal framework, basic algorithm by Jobst Heitzig

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Aspiration-based designs] 2. Formal framework, basic algorithm, published by Jobst Heitzig on April 28, 2024 on The AI Alignment Forum. Summary. In this post, we present the formal framework we adopt during the sequence, and the simplest form of the type of aspiration-based algorithms we study. We do this for a simple form of aspiration-type goals: making the expectation of some variable equal to some given target value. The algorithm is based on the idea of propagating aspirations along time, and we prove that the algorithm gives a performance guarantee if the goal is feasible. Later posts discuss safety criteria, other types of goals, and variants of the basic algorithm. Assumptions In line with the working hypotheses stated in the previous post, we assume more specifically the following in this post: The agent is a general-purpose AI system that is given a potentially long sequence of tasks, one by one, which it does not know in advance. Most aspects of what we discuss focus on the current task only, but some aspects relate to the fact that there will be further, unknown tasks later (e.g., the question of how much power the agent shall aim to retain at the end of the task). It possesses an overall world model that represents a good enough general understanding of how the world works. Whenever the agent is given a task, an episode begins and its overall world model provides it with a (potentially much simpler) task-specific world model that represents everything that is relevant for the time period until the agent gets a different task or is deactivated, and that can be used to predict the potentially stochastic consequences of taking certain actions in certain world states. That task-specific world model has the form of a (fully observed) Markov Decision Process (MDP) that however does not contain a reward function R but instead contains what we call an evaluation function related to the task (see 2nd to next bullet point). As a consequence of a state transition, i.e., of taking a certain action a in a certain state s and finding itself in a certain successor state s', a certain task-relevant evaluation metric changes by some amount. Importantly, we do not assume that the evaluation metric inherently encodes things of which more is better. E.g., the evaluation metric could be global mean temperature, client's body mass, x coordinate of the agent's right thumb, etc. We call the step-wise change in the evaluation metric the received Delta in that time step, denoted δ. We call its cumulative sum over all time steps of the episode the Total, denoted τ. Formally, Delta and Total play a similar role for our aspiration-based approach as the concepts of "reward" and "return" play for maximization-based approaches. The crucial difference is that our agent is not tasked to maximize Total (since the evaluation metric does not have the interpretation of "more is better") but to aim for some specific value of the Total. The evaluation function contained in the MDP specifies the expected value of δ for all possible transitions: Eδ(s,a,s').[1] First challenge: guaranteeing the fulfillment of expectation-type goals The challenge in this post is to design a decision algorithm for tasks where the agent's goal is to make the expected (!) Total equal (!) a certain value ER which we call the aspiration value. [2] This is a crucial difference from a "satisficing" approach that would aim to make expected Total at least as large as E and would thus still be happy to maximize Total. Later we consider other types of tasks, both less restrictive ones (including those related to satisficing) and more specific ones that also care about other aspects of the resulting distribution of Total or states. It turns out that we can guarantee the fulfillment of this type of goal under some weak condit...
undefined
Apr 28, 2024 • 21min

AF - [Aspiration-based designs] 1. Informal introduction by B Jacobs

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Aspiration-based designs] 1. Informal introduction, published by B Jacobs on April 28, 2024 on The AI Alignment Forum. Sequence Summary. This sequence documents research by SatisfIA, an ongoing project on non-maximizing, aspiration-based designs for AI agents that fulfill goals specified by constraints ("aspirations") rather than maximizing an objective function. We aim to contribute to AI safety by exploring design approaches and their software implementations that we believe might be promising but neglected or novel. Our approach is roughly related to but largely complementary to concepts like quantilization and satisficing (sometimes called "soft-optimization"), Decision Transformers, and Active Inference. This post describes the purpose of the sequence, motivates the research, describes the project status, our working hypotheses and theoretical framework, and has a short glossary of terms. It does not contain results and can safely be skipped if you want to get directly into the actual research. Epistemic status: We're still in the exploratory phase, and while the project has yielded some preliminary insights, we don't have any clear conclusions at this point. Our team holds a wide variety of opinions about the discoveries. Nothing we say is set in stone. Purpose of the sequence Inform: We aim to share our current ideas, thoughts, disagreements, open questions, and any results we have achieved thus far. By openly discussing the complexities and challenges we face, we seek to provide a transparent view of our project's progression and the types of questions we're exploring. Receive Feedback: We invite feedback on our approaches, hypotheses, and findings. Constructive criticism, alternative perspectives, and further suggestions are all welcome. Attract Collaborators: Through this sequence, we hope to resonate with other researchers and practitioners who our exploration appeals to and who are motivated by similar questions. Our goal is to expand our team with individuals who can contribute their unique expertise and insights. Motivation We share a general concern regarding the trajectory of Artificial General Intelligence (AGI) development, particularly the risks associated with creating AGI agents designed to maximize objective functions. We have two main concerns: (I) AGI development might be inevitable (We assume this concern needs no further justification) (II) It might be impossible to implement an objective function the maximization of which would be safe The conventional view on A(G)I agents (see, e.g., Wikipedia) is that they should aim to maximize some function of the state or trajectory of the world, often called a "utility function", sometimes also called a "welfare function". It tacitly assumes that there is such an objective function that can adequately make the AGI behave in a moral way. However, this assumption faces several significant challenges: Moral ambiguity: The notion that a universally acceptable, safe utility function exists is highly speculative. Given the philosophical debates surrounding moral cognitivism and moral realism and similar debates in welfare economics, it is possible that there are no universally agreeable moral truths, casting doubt on the existence of a utility function that encapsulates all relevant ethical considerations. Historical track-record: Humanity's long-standing struggle to define and agree upon universal values or ethical standards raises skepticism about our capacity to discover or construct a comprehensive utility function that safely governs AGI behavior (Outer Alignment) in time. Formal specification and Tractability: Even if a theoretically safe and comprehensive utility function could be conceptualized, the challenges of formalizing such a function into a computable and tractable form are immense. This inc...
undefined
Apr 28, 2024 • 4min

LW - We are headed into an extreme compute overhang by devrandom

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: We are headed into an extreme compute overhang, published by devrandom on April 28, 2024 on LessWrong. If we achieve AGI-level performance using an LLM-like approach, the training hardware will be capable of running ~1,000,000s concurrent instances of the model. Definitions Although there is some debate about the definition of compute overhang, I believe that the AI Impacts definition matches the original use, and I prefer it: "enough computing hardware to run many powerful AI systems already exists by the time the software to run such systems is developed". A large compute overhang leads to additional risk due to faster takeoff. I use the types of superintelligence defined in Bostrom's Superintelligence book (summary here). I use the definition of AGI in this Metaculus question. The adversarial Turing test portion of the definition is not very relevant to this post. Thesis Due to practical reasons, the compute requirements for training LLMs is several orders of magnitude larger than what is required for running a single inference instance. In particular, a single NVIDIA H100 GPU can run inference at a throughput of about 2000 tokens/s, while Meta trained Llama3 70B on a GPU cluster[1] of about 24,000 GPUs. Assuming we require a performance of 40 tokens/s, the training cluster can run 20004024000=1,200,000 concurrent instances of the resulting 70B model. I will assume that the above ratios hold for an AGI level model. Considering the amount of data children absorb via the vision pathway, the amount of training data for LLMs may not be that much higher than the data humans are trained on, and so the current ratios are a useful anchor. This is explored further in the appendix. Given the above ratios, we will have the capacity for ~1e6 AGI instances at the moment that training is complete. This will likely lead to superintelligence via "collective superintelligence" approach. Additional speed may be then available via accelerators such as GroqChip, which produces 300 tokens/s for a single instance of a 70B model. This would result in a "speed superintelligence" or a combined "speed+collective superintelligence". From AGI to ASI With 1e6 AGIs, we may be able to construct an ASI, with the AGIs collaborating in a "collective superintelligence". Similar to groups of collaborating humans, a collective superintelligence divides tasks among its members for concurrent execution. AGIs derived from the same model are likely to collaborate more effectively than humans because their weights are identical. Any fine-tune can be applied to all members, and text produced by one can be understood by all members. Tasks that are inherently serial would benefit more from a speedup instead of a division of tasks. An accelerator such as GroqChip will be able to accelerate serial thought speed by a factor of 10x or more. Counterpoints It may be the case that a collective of sub-AGI models can reach AGI capability. It would be advantageous if we could achieve AGI earlier, with sub-AGI components, at a higher hardware cost per instance. This will reduce the compute overhang at the critical point in time. There may a paradigm change on the path to AGI resulting in smaller training clusters, reducing the overhang at the critical point. Conclusion A single AGI may be able to replace one human worker, presenting minimal risk. A fleet of 1,000,000 AGIs may give rise to a collective superintelligence. This capability is likely to be available immediately upon training the AGI model. We may be able to mitigate the overhang by achieving AGI with a cluster of sub-AGI components. Appendix - Training Data Volume A calculation of training data processed by humans during development: time: ~20 years, or 6e8 seconds raw data input: ~10 mb/s = 1e7 b/s total for human training data: 6e15 bits Llama3 training s...
undefined
Apr 27, 2024 • 12min

LW - So What's Up With PUFAs Chemically? by J Bostock

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: So What's Up With PUFAs Chemically?, published by J Bostock on April 27, 2024 on LessWrong. This is exploratory investigation of a new-ish hypothesis, it is not intended to be a comprehensive review of the field or even a a full investigation of the hypothesis. I've always been skeptical of the seed-oil theory of obesity. Perhaps this is bad rationality on my part, but I've tended to retreat to the sniff test on issues as charged and confusing as diet. My response to the general seed-oil theory was basically "Really? Seeds and nuts? The things you just find growing on plants, and that our ancestors surely ate loads of?" But a twitter thread recently made me take another look at it, and since I have a lot of chemistry experience I thought I'd take a look. The PUFA Breakdown Theory It goes like this: PUFAs from nuts and seeds are fine. Deep-frying using PUFAs causes them to break down in a way other fatty acids do not, and these breakdown products are the problem. Most of a fatty acid is the "tail". This consists of hydrogen atoms decorating a backbone of carbon atoms. Each carbon atom can make up to four bonds, of which two have to be to other carbons (except the end carbon which only bonds to one carbon) leaving space for two hydrogens. When a chain has the maximum number of hydrogen atoms, we say it's "saturated". These tails have the general formula CnH2n+1: For a carbon which is saturated (i.e. has four single bonds) the bonds are arranged like the corners of a tetrahedron, and rotation around single bonds is permitted, meaning the overall assembly is like a floppy chain. Instead, we can have two adjacent carbons form a double bond, each forming one bond to hydrogen, two bonds to the adjacent carbon, and one to a different carbon: Unlike single bonds, double bonds are rigid, and if a carbon atom has a double bond, the three remaining bonds fall in a plane. This means there are two ways in which the rest of the chain can be laid out. If the carbons form a zig-zag S shape, this is a trans double bond. If they form a curved C shape, we have a cis double bond. The health dangers of trans-fatty acids have been known for a long while. They don't occur in nature (which is probably why they're so bad for us). Cis-fatty acids are very common though, especially in vegetable and, yes, seed oils. Of course there's no reason why we should stop at one double bond, we can just as easily have multiple. This gets us to the name poly-unsaturated fatty acids (PUFAs). I'll compare stearic acid (SA) oleic acid (OA) and linoleic acid (LA) for clarity: Linoleic acid is the one that seed oil enthusiasts are most interested in. We can go even further and look at α-linoleic acid, which has even more double bonds, but I think LA makes the point just fine. Three fatty acids, usually identical ones, attach to one glycerol molecule to form a triglyceride. Isomerization As I mentioned earlier, double bonds are rigid, so if you have a cis double bond, it stays that way. Mostly. In chemistry a reaction is never impossible, the components are just insufficiently hot. If we heat up a cis-fatty acid to a sufficient temperature, the molecules will be able to access enough energy to flip. The rate of reactions generally scales with temperature according to the Arrhenius equation: v=Aexp(EakBT) Where A is a general constant determining the speed, Ea is the "activation energy" of the reaction, T is temperature, and kB is a Boltzmann's constant which makes the units work out. Graphing this gives the following shape: Suffice to say this means that reaction speed can grow very rapidly with temperature at the "right" point on this graph. Why is this important? Well, trans-fatty acids are slightly lower energy than cis ones, so at a high enough temperature, we can see cis to trans isomerization, turning OA o...
undefined
Apr 27, 2024 • 22min

LW - From Deep Learning to Constructability: Plainly-coded AGIs may be feasible in the near future by Épiphanie Gédéon

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: From Deep Learning to Constructability: Plainly-coded AGIs may be feasible in the near future, published by Épiphanie Gédéon on April 27, 2024 on LessWrong. Charbel-Raphaël Segerie and Épiphanie Gédéon contributed equally to this post. Many thanks to Davidad, Gabriel Alfour, Jérémy Andréoletti, Lucie Philippon, Vladimir Ivanov, Alexandre Variengien, Angélina Gentaz, Léo Dana and Diego Dorn for useful feedback. TLDR: We present a new method for a safer-by design AI development. We think using plainly coded AIs may be feasible in the near future and may be safe. We also present a prototype and research ideas. Epistemic status: Armchair reasoning style. We think the method we are proposing is interesting and could yield very positive outcomes (even though it is still speculative), but we are less sure about which safety policy would use it in the long run. Current AIs are developed through deep learning: the AI tries something, gets it wrong, then gets backpropagated and all its weight adjusted. Then it tries again, wrong again, backpropagation again, and weights get adjusted again. Trial, error, backpropagation, trial, error, backpropagation, ad vitam eternam ad nauseam. Of course, this leads to a severe lack of interpretability: AIs are essentially black boxes, and we are not very optimistic about post-hoc interpretability. We propose a different method: AI safety via pull request.[1] By pull request, we mean that instead of modifying the neural network through successive backpropagations, we construct and design plainly-coded AIs (or hybrid systems) and explicitly modify its code using LLMs in a clear, readable, and modifiable way. This plan may not be implementable right now, but might be as LLMs get smarter and faster. We want to outline it now so we can iterate on it early. Overview If the world released a powerful and autonomous agent in the wild, white box or black box, or any color really, humans might simply get replaced by AI. What can we do in this context? Don't create autonomous AGIs. Keep your AGI controlled in a lab, and align it. Create a minimal AGI controlled in a lab, and use it to produce safe artifacts. This post focuses on this last path, and the specific artifacts that we want to create are plainly coded AIs (or hybrid systems)[2]. We present a method for developing such systems with a semi-automated training loop. To do that, we start with a plainly coded system (that may also be built using LLMs) and iterate on its code, adding each feature and correction as pull requests that can be reviewed and integrated into the codebase. This approach would allow AI systems that are, by design: Transparent: As the system is written in plain or almost plain code, the system is more modular and understandable. As a result, it's simpler to spot backdoors, power-seeking behaviors, or inner misalignment: it is orders of magnitude simpler to refactor the system to have a part defining how it is evaluating its current situation and what it is aiming towards (if it is aiming at all). This means that if the system starts farming cobras instead of capturing them, we would be able to see it. Editable: If the system starts to learn unwanted correlations or features such as learning to discriminate on feminine markers for a resume scorer - it is much easier to see it as a node in the AI code and remove it without retraining it. Overseeable: We can ensure the system is well behaved by using automatic LLM reviews of the code and by using automatic unit tests of the isolated modules. In addition, we would use simulations and different settings necessary for safety, which we will describe later. Version controlable: As all modifications are made through pull requests, we can easily trace with, e.g., git tooling where a specific modification was introduced and why. In pract...
undefined
Apr 27, 2024 • 14min

EA - 10 Lessons Learned - One Year at EA Switzerland by Alix Pham

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: 10 Lessons Learned - One Year at EA Switzerland, published by Alix Pham on April 27, 2024 on The Effective Altruism Forum. I wanted to reflect on my first year as a full-time community builder at EA Switzerland. The lessons I share here might be more useful for people who are more or less involved in community building or field building / coordination, but I think some of them are not only work-useful but also life-useful (at least to me). I don't think they are specific to the Swiss context either. So here is a pile of things I (re)learned: On People: 1. Sometimes unstructured conversations are the most productive conversations. I tend to prepare meetings and think about the best ways to make the conversation time as useful as possible. Most of the time, it is also what's expected of me, especially when the person I'm meeting with is very busy and their time is more valuable than mine. And I might project that need for time optimization for all my meetings. This involves a lot of guesswork and anticipation about what the person can bring me or what I can bring the person. But I've been regularly surprised at how much I learn (or teach!), when the whole extent of the conversation doesn't follow a set agenda. Sometimes the best agenda is going with the flow. I've explored topics and ideas that I would never have thought I would have discussed with my conversation partner if I didn't step out of my comfort zone and stop pushing the conversation in the direction I prepared. 2. It's crazy how low the bar can be for people to feel empowered. Sometimes I just need to tell someone "Have you thought about [insert something that often can be a little obvious]?" and that's enough for them to take the leap. I've been on the other side of that story, and it's been transformative for me. It literally took my 80,000 Hours advisor to tell me that above sentence with "working in biosecurity" and I jumped in the rabbit hole to explore this (thank you very much, if you're reading me!). Sometimes it's just about reaching out, being accessible, and sending a timely nudge. I've also been on the other side of that one, and that's what brought me to attend my first EAG (many thanks to you too, if you're also reading me). There are many shapes this can take, and the ones that I love to use the most (because I get to see the warmth it brings to people) are something like: I really like what you're working on, let's talk about it more during a meeting. I really like your energy, let's have a meeting about how I can help you in your journey. *Looking genuinely excited about something they just told you* You've been excited about [that cool project / job opportunity / etc.], what keeps you from doing it / applying / etc.? Crazy, huh? and easy, wouldn't you say? I want to do it more, and better (of course, I think it's only useful if one is authentic about it). On Teams: 3. Dedicated recurring feedback sessions make a healthy team. Every team has its ups and downs, every individual has their expectations, working styles, and strengths and weaknesses. Building on past lessons, our team of two at EA Switzerland has been doing monthly 1-h "Team Dynamics" sessions. We have a few prompts that we prepare in silence at the beginning of the meeting, and when we're done, we talk about them. How do you find working with the team? Do you think we're headed in the right direction as a team? Is there anything we can do to improve team dynamics? How can we make this more effective and more fun? What's something we can/should start doing as a team? Are there any aspects of our team culture/company culture you wish you could change? What's one thing we can do to improve internal communication? How is the workload? Should we redistribute responsibilities? Let's give ourselves constructive feedback What is a diff...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app