The Nonlinear Library

The Nonlinear Fund
undefined
May 10, 2024 • 46min

LW - My thesis (Algorithmic Bayesian Epistemology) explained in more depth by Eric Neyman

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My thesis (Algorithmic Bayesian Epistemology) explained in more depth, published by Eric Neyman on May 10, 2024 on LessWrong. In March I posted a very short description of my PhD thesis, Algorithmic Bayesian Epistemology, on LessWrong. I've now written a more in-depth summary for my blog, Unexpected Values. Here's the full post: *** In January, I defended my PhD thesis. My thesis is called Algorithmic Bayesian Epistemology, and it's about predicting the future. In many ways, the last five years of my life have been unpredictable. I did not predict that a novel bat virus would ravage the world, causing me to leave New York for a year. I did not predict that, within months of coming back, I would leave for another year - this time of my own free will, to figure out what I wanted to do after graduating. And I did not predict that I would rush to graduate in just seven semesters so I could go work on the AI alignment problem. But the topic of my thesis? That was the most predictable thing ever. It was predictable from the fact that, when I was six, I made a list of who I might be when I grow up, and then attached probabilities to each option. Math teacher? 30%. Computer programmer? 25%. Auto mechanic? 2%. (My grandma informed me that she was taking the under on "auto mechanic".) It was predictable from my life-long obsession with forecasting all sorts of things, from hurricanes to elections to marble races. It was predictable from that time in high school when I was deciding whether to tell my friend that I had a crush on her, so I predicted a probability distribution over how she would respond, estimated how good each outcome would be, and calculated the expected utility. And it was predictable from the fact that like half of my blog posts are about predicting the future or reasoning about uncertainty using probabilities. So it's no surprise that, after a year of trying some other things (mainly auction theory), I decided to write my thesis about predicting the future. If you're looking for practical advice for predicting the future, you won't find it in my thesis. I have tremendous respect for groups like Epoch and Samotsvety: expert forecasters with stellar track records whose thorough research lets them make some of the best forecasts about some of the world's most important questions. But I am a theorist at heart, and my thesis is about the theory of forecasting. This means that I'm interested in questions like: How do I pay Epoch and Samotsvety for their forecasts in a way that incentivizes them to tell me their true beliefs? If Epoch and Samotsvety give me different forecasts, how should I combine them into a single forecast? Under what theoretical conditions can Epoch and Samotsvety reconcile a disagreement by talking to each other? What's the best way for me to update how much I trust Epoch relative to Samotsvety over time, based on the quality of their predictions? If these sorts of questions sound interesting, then you may enjoy consuming my thesis in some form or another. If reading a 373-page technical manuscript is your cup of tea - well then, you're really weird, but here you go! If reading a 373-page technical manuscript is not your cup of tea, you could look at my thesis defense slides (PowerPoint, PDF),[1] or my short summary on LessWrong. On the other hand, if you're looking for a somewhat longer summary, this post is for you! If you're looking to skip ahead to the highlights, I've put a * next to the chapters I'm most proud of (5, 7, 9). Chapter 0: Preface I don't actually have anything to say about the preface, except to show off my dependency diagram. (I never learned how to make diagrams in LaTeX. You can usually do almost as well in Microsoft Word, with way less effort!) Chapter 1: Introduction "Algorithmic Bayesian epistemology" (the title of the...
undefined
May 10, 2024 • 3min

EA - Introducing Senti - Animal Ethics AI Assistant by Animal Ethics

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Introducing Senti - Animal Ethics AI Assistant, published by Animal Ethics on May 10, 2024 on The Effective Altruism Forum. Animal Ethics has recently launched Senti, an Ethical AI assistant designed to answer questions related to animal ethics, wild animal suffering, and longtermism. We at Animal Ethics believe that while AI technologies could potentially pose significant risks to animals, they could benefit all sentient beings if used responsibly. For example, Animal advocates can leverage AI to amplify our message and improve our approach to share information about Animal Ethics with a wider audience. There is a lack of knowledge today not just among the general public, but also among people sympathetic to nonhuman animals, about the basic concepts and arguments underpinning the critique of speciesism, animal exploitation, concern for wild animal suffering, and future sentient beings. Many of the ideas are unintuitive as well, so it helps people to be able to chat and ask followup questions in order to cement their understanding. We hope this tool will help to change that! Senti, our AI assistant is powered by Claude, Anthropic's large language model (LLM), however, it has been designed to reflect the views of Animal Ethics. We provided Senti with a database of carefully curated documents about animal ethics and related topics. Almost all of them were written by Animal Ethics, and we are now adding more sources. When you ask a question, Senti searches through the documents and retrieves the most relevant information to form an answer. After each answer, there are links to the sources of information so you can read more. We continually update Senti, and we'd love to have your feedback on your experience. Senti has been designed to discuss topics related to the wellbeing of all sentient beings, and we request users to restrict their conversations to topics related to helping animals and other sentient beings. We have also provided a list of 24 preset questions that you can use to explore different topics related to animal ethics. When you chat with Senti for the first time, you'll be presented with a consent form. It requests permission to save your conversation history. Saving your conversation history allows you and Senti to have a continuous conversation, with Senti remembering what you've already discussed. It also provides us with your chat history, which is anonymous. This will help us to improve the answers and know what new information to add. You do not have to give your consent to chat with Senti. If you decline, your chat history won't be saved, but you can still ask questions. We would like to give special appreciation to the team at Freeport Metrics, which provided extensive pro bono services to build the infrastructure, handle the technical setup, and design the UI for Senti. They conducted extensive testing and offered ongoing support, without which the project could not have been completed. We would additionally like to thank our volunteers who have been helping test new prompts, new document sets, and different settings, such as how many pieces of information to retrieve to respond to each question. We are continually working on improving Senti by running independent tests with the new Claude 3 models. We expect to deliver an update in the coming months that provides longer and more accurate responses. We hope Senti helps you learn a lot and makes it easier for you to share the information with others. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
May 10, 2024 • 15min

EA - Lessons from two years of talent search pilots by Jamie Harris

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Lessons from two years of talent search pilots, published by Jamie Harris on May 10, 2024 on The Effective Altruism Forum. Tl;dr: Leaf supports exceptional teenagers to explore how they can do the most good. We've run 3 residential programmes and 4 different types of online fellowship: general to having high positive impact, focused on university decision-making, cause-specific, and subject-specific. I'm excited about the online programmes (especially subject-specific) as being cost-effective and highly scalable. I plan to actually scale these! You might be able to help through: Advising Leaf Being a facilitator or guest speaker Working for Leaf later this year Funding Leaf There are lots of other mini insights and updates, summarised below. I wrote this post quickly, adapting from an existing internal doc, so that I could do an '80:20' version of sharing insights. Please message me if you'd like access to the original doc with lots more detail (it's 50 pages, mostly of summary tables of metrics I track), evidence, and reasoning transparency. Please briefly explain who you are and why you're interested in access. Background on Leaf I'm Managing Director of Leaf; we support exceptional teenagers to explore how they can best help others, save lives, or change the course of history. In conventional educational systems, teenagers don't have support or mentorship to explore how they can do good. The incentives and encouragement for smart teens are mostly about getting into uni and demonstrating their intelligence, not thinking through how to use those gifts. And yet they're already making decisions relevant to doing good, like what subjects to study at university, what sort of internships to get or project to pursue, and just which problems to focus on finding out more about. Meanwhile, many of the world's most pressing problems are constrained by not having access to enough talented applicants and entrepreneurs. There's a need to ensure that smart students explore important and neglected problems, rather than just defaulting to family- or status-driven careers, or tackling the problems made most salient to them through the media. Leaf supports these exceptional teenagers to start exploring, make better decisions and get on a high-impact trajectory. Programmes summary Programme Dates Motivation and goals Key lessons Residential pilot, 2021: "Building a Better Future" (Not me) July to October 2021 (active October 2021) I didn't set this up so can't really comment, but I think it was similar to the reasons here. Students seemed brighter and more engaged with the content than I expected. Turns out that you can get really high quality applicants to programmes if you contact lots of different places, even with cold outreach methods, no demonstrable track record, and a very MVP website with stock images. The previous team contacted ~200 schools and received ~70 applications, of which they accepted 16. There was fuller attendance than I expected - all 16 applicants offered a place made it to the residential, whereas I had expected closer to ~50% drop out, as often happens with free events. (Also it seemed plausibly my comparative advantage given my experience in teaching and talent search / community building.) Summer residential, 2022: "Building a Better Future, 2022" February to September 2022 (active August 2022) with some subsequent follow-up and strategic planning Try out slightly scaling what seemed like a successful model for talent search and community building. I took on Leaf, handed over from Alex Holness-Tofts; it was partly just about replicating the success of the pilot in the summer without doing anything too ambitious. Managed to identify a reasonably promising group of young people, and successfully encouraged some engagement with effective altruism and longtermi...
undefined
May 10, 2024 • 3min

AF - Linear infra-Bayesian Bandits by Vanessa Kosoy

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Linear infra-Bayesian Bandits, published by Vanessa Kosoy on May 10, 2024 on The AI Alignment Forum. Linked is my MSc thesis, where I do regret analysis for an infra-Bayesian[1] generalization of stochastic linear bandits. The main significance that I see in this work is: Expanding our understanding of infra-Bayesian regret bounds, and solidifying our confidence that infra-Bayesianism is a viable approach. Previously, the most interesting IB regret analysis we had was Tian et al which deals (essentially) with episodic infra-MDPs. My work here doesn't supersede Tian et al because it only talks about bandits (i.e. stateless infra-Bayesian laws), but it complements it because it deals with a parameteric hypothesis space (i.e. fits into the general theme in learning-theory that generalization bounds should scale with the dimension of the hypothesis class). Discovering some surprising features of infra-Bayesian learning that have no analogues in classical theory. In particular, it turns out that affine credal sets (i.e. such that are closed w.r.t. arbitrary affine combinations of distributions and not just convex combinations) have better learning-theoretic properties, and the regret bound depends on additional parameters that don't appear in classical theory (the "generalized sine" S and the "generalized condition number" R). Credal sets defined using conditional probabilities (related to Armstrong's "model splinters") turn out to be well-behaved in terms of these parameters. In addition to the open questions in the "summary" section, there is also a natural open question of extending these results to non-crisp infradistributions[2]. (I didn't mention it in the thesis because it requires too much additional context to motivate.) 1. ^ I use the word "imprecise" rather than "infra-Bayesian" in the title, because the proposed algorithms achieves a regret bound which is worst-case over the hypothesis class, so it's not "Bayesian" in any non-trivial sense. 2. ^ In particular, I suspect that there's a flavor of homogeneous ultradistributions for which the parameter S becomes unnecessary. Specifically, an affine ultradistribution can be thought of as the result of "take an affine subspace of the affine space of signed distributions, intersect it with the space of actual (positive) distributions, then take downwards closure into contributions to make it into a homogeneous ultradistribution". But we can also consider the alternative "take an affine subspace of the affine space of signed distributions, take downwards closure into signed contributions and then intersect it with the space of actual (positive) contributions". The order matters! Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
undefined
May 10, 2024 • 8min

LW - We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming" by Lukas Gloor

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming", published by Lukas Gloor on May 10, 2024 on LessWrong. Predicting the future is hard, so it's no surprise that we occasionally miss important developments. However, several times recently, in the contexts of Covid forecasting and AI progress, I noticed that I missed some crucial feature of a development I was interested in getting right, and it felt to me like I could've seen it coming if only I had tried a little harder. (Some others probably did better, but I could imagine that I wasn't the only one who got things wrong.) Maybe this is hindsight bias, but if there's something to it, I want to distill the nature of the mistake. First, here are the examples that prompted me to take notice: Predicting the course of the Covid pandemic: I didn't foresee the contribution from sociological factors (e.g., "people not wanting to get hospitalized" - Zvi called it " the control system"). As a result, I overpredicted the difference between countries with a lockdown policy vs ones without. (Note that this isn't necessarily an update against the cost-effectiveness of lockdowns because the update goes both ways: lockdowns saved fewer lives than I would've predicted naively, but costs to the economy were also lower compared to the counterfactual where people already social-distanced more than expected of their own accord since they were reading the news about crowded hospitals and knew close contacts who were sick with virus.) Predicting AI progress: Not foreseeing that we'd get an Overton window shift in AI risk awareness. Many EAs were arguably un(der)prepared for the possibility of a "chat-gpt moment," where people who weren't paying attention to AI progress previously got to experience a visceral sense of where AI capabilities progress is rapidly heading. As a result, it is now significantly easier to make significant policy asks to combat AI risks. Not foreseeing wide deployment of early-stage "general" AI and the possible irrelevance of AI boxing. Early discussions of AI risk used to involve this whole step about whether a superhuman AI system could escape and gain access to the internet. No one (to my knowledge?) highlighted that the future might well go as follows: "There'll be gradual progress on increasingly helpful AI tools. Companies will roll these out for profit and connect them to the internet. There'll be discussions about how these systems will eventually become dangerous, and safety-concerned groups might even set up testing protocols ("safety evals"). Still, it'll be challenging to build regulatory or political mechanisms around these safety protocols so that, when they sound the alarm at a specific lab that the systems are becoming seriously dangerous, this will successfully trigger a slowdown and change the model release culture from 'release by default' to one where new models are air-gapped and where the leading labs implement the strongest forms of information security." If we had understood the above possibility earlier, the case for AI risks would have seemed slightly more robust, and (more importantly) we could've started sooner with the preparatory work that ensures that safety evals aren't just handled company-by-company in different ways, but that they are centralized and connected to a trigger for appropriate slowdown measures, industry-wide or worldwide. Concerning these examples, it seems to me that: 1. It should've been possible to either foresee these developments or at least highlight the scenario that happened as one that could happen/is explicitly worth paying attention to. 2. The failure mode at play involves forecasting well on some narrow metrics but not paying attention to changes in the world brought about by the exact initial thin...
undefined
May 10, 2024 • 45min

LW - AI #63: Introducing Alpha Fold 3 by Zvi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #63: Introducing Alpha Fold 3, published by Zvi on May 10, 2024 on LessWrong. It was a remarkably quiet announcement. We now have Alpha Fold 3, it does a much improved job predicting all of life's molecules and their interactions. It feels like everyone including me then shrugged and went back to thinking about other things. No cool new toy for most of us to personally play with, no existential risk impact, no big trades to make, ho hum. But yes, when we look back at this week, I expect what we remember will be Alpha Fold 3. Unless it turns out that it is Sophon, a Chinese technique to potentially make it harder to fine tune an open model in ways the developer wants to prevent. I do not expect this to get the job done that needs doing, but it is an intriguing proposal. We also have 95 theses to evaluate in a distinct post, OpenAI sharing the first draft of their model spec, Apple making a world class anti-AI and anti-iPad ad that they released thinking it was a pro-iPad ad, more fun with the mysterious gpt2, and more. The model spec from OpenAI seems worth pondering in detail, so I am going to deal with that on its own some time in the coming week. Table of Contents 1. Introduction. 2. Table of Contents. 3. Language Models Offer Mundane Utility. Agents, simple and complex. 4. Language Models Don't Offer Mundane Utility. No gadgets, no NPCs. 5. GPT-2 Soon to Tell. Does your current model suck? In some senses. 6. Fun With Image Generation. Why pick the LoRa yourself? 7. Deepfaketown and Botpocalypse Soon. It's not exactly going great. 8. Automation Illustrated. A look inside perhaps the premiere slop mill. 9. They Took Our Jobs. Or are we pretending this to help the stock price? 10. Apple of Technically Not AI. Mistakes were made. All the feels. 11. Get Involved. Dan Hendrycks has a safety textbook and free online course. 12. Introducing. Alpha Fold 3. Seems like a big deal. 13. In Other AI News. IBM, Meta and Microsoft in the model game. 14. Quiet Speculations. Can we all agree that a lot of intelligence matters a lot? 15. The Quest for Sane Regulation. Major labs fail to honor their commitments. 16. The Week in Audio. Jack Clark on Politico Tech. 17. Rhetorical Innovation. The good things in life are good. 18. Open Weights are Unsafe and Nothing Can Fix This. Unless, maybe? Hmm. 19. The Lighter Side. Mmm, garlic bread. It's been too long. Language Models Offer Mundane Utility How much utility for how much cost? Kapoor and Narayanan argue that with the rise of agent-based systems, you have to evaluate different models on coding tasks based on dollar cost versus quality of results. They find that a simple 'ask GPT-4 and turn the temperature slowly up on retries if you fail' is as good as the agents they tested on HumanEval, while costing less. They mention that perhaps it is different with harder and more complex tasks. How much does cost matter? If you are using such queries at scale without humans in the loop, or doing them in the background on a constant basis as part of your process, then cost potentially matters quite a bit. That is indeed the point of agents. Or if you are serving lots of customers constantly for lots of queries, those costs can add up fast. Thus all the talk about the most cost-efficient approach. There are also other purposes for which cost at current margins is effectively zero. If you are a programmer who must evaluate, use and maintain the code outputted by the AI, what percentage of total costs (including your labor costs) are AI inference? In the most obvious baseline case, something akin to 'a programmer asks for help on tasks,' query speed potentially matters but being slightly better at producing good code, or even slightly better at producing code that is easier for the human to evaluate, understand and learn from, is going to crush...
undefined
May 10, 2024 • 8min

LW - Why Care About Natural Latents? by johnswentworth

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why Care About Natural Latents?, published by johnswentworth on May 10, 2024 on LessWrong. Suppose Alice and Bob are two Bayesian agents in the same environment. They both basically understand how their environment works, so they generally agree on predictions about any specific directly-observable thing in the world - e.g. whenever they try to operationalize a bet, they find that their odds are roughly the same. However, their two world models might have totally different internal structure, different "latent" structures which Alice and Bob model as generating the observable world around them. As a simple toy example: maybe Alice models a bunch of numbers as having been generated by independent rolls of the same biased die, and Bob models the same numbers using some big complicated neural net. Now suppose Alice goes poking around inside of her world model, and somewhere in there she finds a latent variable ΛA with two properties (the Natural Latent properties): ΛA approximately mediates between two different observable parts of the world X1,X2 ΛA can be estimated to reasonable precision from either one of the two parts In the die/net case, the die's bias (ΛA) approximately mediates between e.g. the first 100 numbers (X1) and the next 100 numbers (X2), so the first condition is satisfied. The die's bias can be estimated to reasonable precision from either the first 100 numbers or the second 100 numbers, so the second condition is also satisfied. This allows Alice to say some interesting things about the internals of Bob's model. First: if there is any latent variable (or set of latent variables, or function of latent variables) ΛB which mediates between X1 and X2 in Bob's model, then Bob's ΛB encodes Alice's ΛA (and potentially other stuff too). In the die/net case: during training, the net converges to approximately match whatever predictions Alice makes(by assumption), but the internals are a mess. An interpretability researcher pokes around in there, and finds some activation vectors which approximately mediate between X1 and X2. Then Alice knows that those activation vectors must approximately encode the bias ΛA. (The activation vectors could also encode additional information, but at a bare minimum they must encode the bias.) Second: if there is any latent variable (or set of latent variables, or function of latent variables) Λ'B which can be estimated to reasonable precision from just X1, and can also be estimated to reasonable precision from just X2, then Alice's ΛA encodes Bob's Λ'B (and potentially other stuff too). Returning to our running example: suppose our interpretability researcher finds that the activations along certain directions can be precisely estimated from just X1, and the activations along those same directions can be precisely estimated from just X2. Then Alice knows that the bias ΛA must give approximately-all the information which those activations give. (The bias could contain more information - e.g. maybe the activations in question only encode the rate at which a 1 or 2 is rolled, whereas the bias gives the rate at which each face is rolled.) Third, putting those two together: if there is any latent variable (or set of latent variables, or function of latent variables) Λ''B which approximately mediates between X1 and X2 in Bob's model, and can be estimated to reasonable precision from either one of X1 or X2, then Alice's ΛA and Bob's Λ''B must be approximately isomorphic - i.e. each encodes the other. So if an interpretability researcher finds that activations along some directions both mediate between X1 and X2, and can be estimated to reasonable precision from either of X1 or X2, then those activations are approximately isomorphic to what Alice calls "the bias of the die". So What Could We Do With That? We'll give a couple relatively-...
undefined
May 9, 2024 • 11min

LW - Dyslucksia by Shoshannah Tekofsky

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Dyslucksia, published by Shoshannah Tekofsky on May 9, 2024 on LessWrong. The curious tale of how I mistook my dyslexia for stupidity - and talked, sang, and drew my way out of it. Sometimes I tell people I'm dyslexic and they don't believe me. I love to read, I can mostly write without error, and I'm fluent in more than one language. Also, I don't actually technically know if I'm dyslectic cause I was never diagnosed. Instead I thought I was pretty dumb but if I worked really hard no one would notice. Later I felt inordinately angry about why anyone could possibly care about the exact order of letters when the gist is perfectly clear even if if if I right liike tis. I mean, clear to me anyway. I was 25 before it dawned on me that all the tricks I was using were not remotely related to how other people process language. One of my friends of six years was specialized in dyslexia, and I contacted her, full excitement about my latest insight. "Man, guess what? I realized I am dyslectic! This explains so much! I wish someone had told me sooner. It would have saved me so much grief." "Oh, yeah, I know." "Wait, what?" "You are very obviously dyslectic." "Wait, why didn't you tell me?" "You didn't seem bothered." "Oh…" Turns out my dyslexia was a public secret that dated back all the way to my childhood (and this was obviously unrelated to my constitutional lack of self-awareness). Anyway. How come I kind of did fine? I'm fluent in English (not my native language), wrote my PhD thesis of 150 pages in 3 months without much effort, and was a localization tester for Dutch-English video game translation for two years. I also read out loud till the age of 21, trace every letter like it's a drawing, and need to sing new word sounds to be able to remember them. I thought everyone had to but no one sent me the memo. Dear reader, not everyone has to. When I recently shared my information processing techniques with old and new friends, they asked if I had ever written them down so maybe other people could use them too. I hadn't. So here is my arsenal of alternative information processing techniques. Read Out Loud Honestly, I didn't realize there was an age where you were supposed to stop doing this. In school you obviously had to whisper to yourself. At home you go to your room and read at normal volume. If it's a fiction book, you do voices for the different characters. It's great. I remember my sister sometimes walking in to my room when I was little cause she said it sounded like so much fun in there. It totally was. Later I found out my mother made sure my siblings never made me aware it was unusual I was still reading out loud. Instead she signed me up for competitions to read books on the local radio. This was before the wide-spread internet and audio books. Later I'd read to my parents sometimes, who were always excited about how much energy I threw into the endeavor. I didn't know any different. In college I was still reading out loud. Research papers have a voice. Mathematical equations especially. They take longer to say out loud than to read in your head, but you can never be sure what's on the page if you don't. According to my brain anyway. When I was 22 I moved in with my first boyfriend and reading out loud got a little obstructive. I started subvocalizing, and that was definitely less fun. I still subvocalize now. But if I struggle to follow a passage, I go back to reading it out loud. I've probably read out this essay a dozen times by now. I keep checking the cadence of every sentence. It's easier to spot word duplications, cause I find myself repeating myself. Missing words also stick out like inverted pot holes. They destroy the flow. So I jump back and smooth them over. Sometimes when I talk, I finish the sentence differently than it's written. Then I go back and ...
undefined
May 9, 2024 • 1h 15min

AF - Chapter 3 - Solutions Landscape by Charbel-Raphael Segerie

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Chapter 3 - Solutions Landscape, published by Charbel-Raphael Segerie on May 9, 2024 on The AI Alignment Forum. Introduction The full draft textbook is available here. Epistemic Status: I'm pretty satisfied with this document. I wrote it because it doesn't seem like we've made any major breakthroughs in alignment in the last year, and I wanted to consolidate what I know. And beyond alignment, it seems to me that a large class of strategies are quite important and neglected, and will continue to be relevant in the future. For example, to mitigate misuses and systemic risks, I think we already have a pretty good idea of what could be done. I don't expect any breakthroughs in alignment either, and it seems to me that we will have to work with the different classes of strategies that are in this document. Let me know if you think I'm being overconfident. Although the field of AI safety is still in its infancy, several measures have already been identified that can significantly improve the safety of AI systems. While it remains to be seen if these measures are sufficient to fully address the risks posed by AI, they represent essential considerations. The diagram below provides a high-level overview of the main approaches to ensuring the safe development of AI. This document is far from exhaustive and only scratches the surface of the complex landscape of AI safety. Readers are encouraged to explore this recent list of agendas for a more comprehensive review. AI Safety is Challenging Specific properties of the AI safety problem make it particularly difficult. AI risk is an emerging problem that is still poorly understood. We are not yet familiar with all its different aspects, and the technology is constantly evolving. It's hard to devise solutions for a technology that does not yet exist, but these guardrails are also necessary because the outcome can be very negative. The field is still pre-paradigmatic. AI safety researchers disagree on the core problems, difficulties, and main threat models. For example, some researchers think that takeover risks are more likely [ AGI Ruin], and some research emphasizes more progressive failure modes with progressive loss of control [ Critch]. Because of this, alignment research is currently a mix of different agendas that need more unity. The alignment agendas of some researchers seem hopeless to others, and one of the favorite activities of alignment researchers is to criticize each other constructively. AIs are black boxes that are trained, not built. We know how to train them, but we do not know which algorithm is learned by them. Without progress in interpretability, they are giant inscrutable matrices of numbers, with little modularity. In software engineering, modularity helps break down software into simpler parts, allowing for better problem-solving. In deep learning models, modularity is almost nonexistent: to date, interpretability has failed to decompose a deep neural network into modular structures [ s]. As a result, behaviors exhibited by deep neural networks are not understood and keep surprising us. Complexity is the source of many blind spots. New failure modes are frequently discovered. For example, issues arise with glitch tokens, such as "SolidGoldMagikarp" [ s]. When GPT encounters this infrequent word, it behaves unpredictably and erratically. This phenomenon occurs because GPT uses a tokenizer to break down sentences into tokens (sets of letters such as words or combinations of letters and numbers), and the token "SolidGoldMagikarp" was present in the tokenizer's dataset but not in the GPT model's dataset. This blind spot is not an isolated incident. For example, on the day Microsoft's Tay chatbot, BingChat, or ChatGPT were launched, the chatbots were poorly tuned and exhibited many new emerging undesirable chat...
undefined
May 9, 2024 • 4min

EA - Shallow overview of institutional plant-based meal campaigns in the US & Western Europe by Neil Dullaghan

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Shallow overview of institutional plant-based meal campaigns in the US & Western Europe, published by Neil Dullaghan on May 9, 2024 on The Effective Altruism Forum. This Rethink Priorities report provides a shallow overview of the potential for impactful opportunities from institutional plant-based meal campaigns in the US, France, Germany, UK, Spain, and Italy based on reviewing existing research and speaking with organizations conducting such campaigns. Shallow overviews are mainly intended as a quick low-confidence writeup for internal audiences and are not optimized for public consumption.The views expressed herein are not necessarily endorsed by the organizations who were interviewed. Main takeaways from the report include: Emphasize reducing all animal products in order to avoid substitution from beef & lamb to chicken, seafood, & eggs which require more animals to be harmed. [Confidence: Medium-High. There are many examples of programs that have had this problem ( Hughes, 2020, 2:12:50, Gravert & Kurz 2021, Lagasse & Neff 2010). There are some examples of the problem being mitigated ( Jalil et al. 2023, Cool Food Pledge 2022, 2021) but we don't yet have a systematic review and meta-analysis on which policies have the best and worst of these effects.] Most large schools & universities in the US, France, & Germany offer regular meatless meal options, reducing the scope for impact at scale there from further similar changes. [Confidence: High. We spent more than 40 hours reviewing policies at the largest institutions. While confidence could be increased by reaching out directly to institutions and verifying, the second-hand sources we used seem trustworthy.] More studies needed to confirm the scale of the potential opportunities for meatless meal campaigns in Italy, Spain, and the UK where existing options are more limited. [Confidence: Medium-Low. It appears that classroom offerings of meatless meals in Italy, Spain, and the UK are far less widespread. However, we spent less time researching these countries, due to external constraints, so there's potential that we missed important information that would reduce the potential scale of impact here. We think replicating studies like Essere Animali 2024 & Ottonova 2022 would shed light on this.] There may be cost-competitive opportunities in Europe, but it's likely they are relatively few and hit diminishing returns quickly. [Confidence: Medium. The most rigorous study of such campaigns in the US are 4-5 years old, but they indicate a cost-effectiveness of 0.4-2.5 animals spared per $ spent (without accounting for impacts on dairy, eggs, & shrimp) and that the campaign quickly stopped getting large wins at low cost. Our rough BOTECs of a sample of current campaigns in the US and large Western European countries estimated their campaigns' impact to range from 1.5-18 animals spared per $ spent (including dairy, eggs, & shrimp). However, these estimates are likely biased towards more positive examples and exclude costs needed to maintain policies over time so shouldn't be taken as the average expected impact.] Campaigns for stronger changes (like plant-based defaults and large % reduction targets) are not yet targeting and winning large-scale opportunities. [Confidence: High. We did not find evidence of successful campaigns at the scale that have been achieved for daily meatless option campaigns. The largest successes of this kind we know of are a plant-based default in NYC hospitals, but many campaigns are focused opportunistically where receptive contacts exist and on smaller targets due to a view that tractability is lower.] Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app