The Nonlinear Library: LessWrong

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

May 14, 2024 • 6min

LW - Building intuition with spaced repetition systems by Jacob G-W

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Building intuition with spaced repetition systems, published by Jacob G-W on May 14, 2024 on LessWrong. Do you ever go to a lecture, follow it thinking it makes total sense, then look back at your notes later and realize it makes no sense? This used to happen to me, but I've learned how to use spaced repetition to fully avoid this if I want. I'm going to try to convey this method in this post. Much of my understanding of how to create flashcards comes from "Using spaced repetition systems to see through a piece of mathematics" by Michael Nielsen and "How to write good prompts: using spaced repetition to create understanding" by Andy Matuschak, but I think my method falls in between both, in terms of abstraction. Finally, I want to credit Quantum Country for being an amazing example of flashcards created to develop intuition in users. My method is more abstract than Michael Nielsen's approach, since it does not only apply to mathematics, but to any subject. Yet it is less abstract than Andy Matuschak's approach because I specifically use it for 'academic subjects' that require deep intuition of (causal or other) relationships between concepts. Many of Matuschak's principles in his essay apply here (I want to make sure to give him credit), but I'm looking at it through the 'how can we develop deep intuition in an academic subject in the fastest possible time?' lens. Minimize Inferential Distance on Flashcards A method that I like to repeat to myself while making flashcards that I haven't seen in other places is that each flashcard should only have one inferential step on it. I'm using 'inferential step' here to mean a step such as remembering a fact, making a logical deduction, visualizing something, or anything that requires thinking. It's necessary that a flashcard only have a single inferential step on it. Anki trains the mind to do these steps. If you learn all the inferential steps, you will be able to fully re-create any mathematical deduction, historical story, or scientific argument. Knowing (and continually remembering) the full story with spaced repetition builds intuition. I'm going to illustrate this point by sharing some flashcards that I made while trying to understand how Transformers (GPT-2) worked. I made these flashcards while implementing a transformer based on Neel Nanda's tutorials and these two blog posts. Understanding Attention The first step in my method is to learn or read enough so that you have part of the whole loaded into your head. For me, this looked like picking the attention step of a transformer and then reading about it in the two blog posts and watching the section of the video on it. It's really important to learn about something from multiple perspectives. Even when I'm making flashcards from a lecture, I have my web browser open and I'm looking up things that I thought were confusing while making flashcards. My next step is to understand that intuition is fake! Really good resources make you feel like you understand something, but to actually understand something, you need to engage with it. This engagement can take many forms. For technical topics, it usually looks like solving problems or coding, and this is good! I did this for transformers! But I also wanted to not forget it long term, so I used spaced repetition to cement my intuition. Enough talk, here are some flashcards about attention in a transformer. For each flashcard, I'll explain why I made it. Feel free to scroll through. Examples I start with a distillation of the key points of the article. I wanted to make sure that I knew what the attention operation was actually doing, as the blog posts emphasized this. When building intuition, I find it helpful to know "the shape" or constraints about something so that I can build a more accurate mental model. In this case, th...

May 14, 2024 • 1h 13min

LW - Monthly Roundup #18: May 2024 by Zvi

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Monthly Roundup #18: May 2024, published by Zvi on May 14, 2024 on LessWrong. As I note in the third section, I will be attending LessOnline at month's end at Lighthaven in Berkeley. If that is your kind of event, then consider going, and buy your ticket today before prices go up. This month's edition was an opportunity to finish off some things that got left out of previous editions or where events have left many of the issues behind, including the question of TikTok. Oh No All of this has happened before. And all of this shall happen again. Alex Tabarrok: I regret to inform you that the CDC is at it again. Marc Johnson: We developed an assay for testing for H5N1 from wastewater over a year ago. (I wasn't expecting it in milk, but I figured it was going to poke up somewhere.) However, I was just on a call with the CDC and they are advising us NOT to use it. I need a drink. They say it will only add to the confusion because we won't know where it is coming from. I'm part of a team. I don't get to make those decisions myself. Ben Hardisty: The usual institute, or did they have a good reason? Marc Johnson: They say it would only add to the confusion since we don't know precisely where it is coming from. But then they said 2 minutes later that they aren't sure this isn't just regular influenza appearing late. We can answer that, so why don't we??? I don't get it. Alex: Are your team members considering bucking the CDC advice or has the decision been made to acquiesce? I understand them not wanting panic but man if that's not self serving advice I don't know what is. Marc Johnson: The CDC will come around. ZzippyCorgi11: Marc, can private entities ask you to test wastewater around their locations? Is the CDC effectively shutting down any and all testing of wastewater for H5N1? Marc Johnson: No, if people want to send me wastewater I can test them with other funding. I just can't test the samples I get from state surveillance. JH: This is ridiculous. Do it anyway! Marc Johnson: It's not my call. I got burned once for finding Polio somewhere I wasn't supposed to find it. It fizzled, fortunately. Ross Rheingans-Yoo: It's a societal mistake that we're not always monitoring for outbreaks of the dozen greatest threats, given how cheap wastewater testing can get. Active intervention by the CDC to stop new testing for a new strain of influenza circulating in mammals on farms is unconscionable. I strongly agree with Ross here. Of all the lessons to not have learned from Covid, this seems like the dumbest one to not have learned. How hard is 'tests help you identify what is going on even when they are imperfect, so use them'? I am not so worried, yet, that something too terrible is that likely to happen. But we are doing our best to change that. We have a pattern of failing to prepare for such easily foreseeable disasters. Another potential example I saw today would be the high-voltage transformers, where we do not make them, we not have backups available and if we lost the ones we have our grid plausibly collapses. The worry in the thread is primarily storms but also what about sabotage? Oh No: Betting on Elections I am proud to live in an information environment where 100% of the people, no matter their other differences, understand that 'ban all prediction markets on elections' is a deeply evil and counterproductive act of epistemic sabotage. And yet that is exactly what the CFTC is planning to do, with about a 60% chance they will manage to make this stick. Maxim Lott: This afternoon, the government bureaucrats at the CFTC announced that they plan to ban all election betting (aka "prediction markets on elections", aka "event contracts") in the United States. They will also ban trading on events in general - for example, on who will win an Oscar. The decision was 3-2, with the ...

May 13, 2024 • 12min

LW - Environmentalism in the United States Is Unusually Partisan by Jeffrey Heninger

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Environmentalism in the United States Is Unusually Partisan, published by Jeffrey Heninger on May 13, 2024 on LessWrong. This is the first in a sequence of four posts taken from my recent report: Why Did Environmentalism Become Partisan? Introduction In the United States, environmentalism is extremely partisan. It might feel like this was inevitable. Caring about the environment, and supporting government action to protect the environment, might seem like they are inherently left-leaning. Partisanship has increased for many issues, so it might not be surprising that environmentalism became partisan too. Looking at the public opinion polls more closely makes it more surprising. Environmentalism in the United States is unusually partisan, compared to other issues, compared to other countries, and compared to the United States itself at other times. The partisanship of environmentalism was not inevitable. Compared to Other Issues Environmentalism is one of the, if not the, most partisan issues in the US. The most recent data demonstrating this comes from a Gallup poll from 2023.[1] Of the 24 issues surveyed, "Protecting the Environment Has Priority Over Energy Development" was tied for the largest partisan gap with "Government Should Ensure That Everyone Has Healthcare." Of the top 5 most partisan issues, 3 were related to environmentalism. The amount this gap has widened since 2003 is also above average for these environmental issues. Figure 1: The percentages of Republicans and Democrats who agree with each statement shown, 2003-2023. Reprinted from Gallup (2023). Pew also has some recent relevant data.[2] They ask whether 21 particular policies "should be a top priority for the president and Congress to address this year." The largest partisan gap is for "protecting the environment" (47 p.p.), followed by "dealing with global climate change" (46 p.p.). These are ten percentage points higher than the next most partisan priority. These issues are less specific than the ones Gallup asked about, and so might not reveal as much of the underlying partisanship. For example, most Democrats and most Republicans agree that strengthening the economy is important, but they might disagree about how this should be done. Figure 2: The percentages of Republicans and Democrats who believe that each issue should be a top priority. Reprinted from Pew (2023). Guber's analysis of Gallup polls from 1990, 2000, & 2010 also shows that environmentalism is unusually partisan.[3] Concern about "the quality of the environment" has a similar partisan gap as concern about "illegal immigration," and larger than concern about any other political issue. If we hone in on concern about "global warming" within overall environmental concern, the partisan gap doubles, making it a clear outlier. Figure 3: Difference between the mean response on a four point scale for party identifiers on concern for various national problems in 2010. "I'm going to read you a list of problems facing the country. For each one, please tell me if you personally worry about this problem a great deal, a fair amount, only a little, or not at all." Reprinted from Guber (2013). The partisanship of environmentalism cannot be explained entirely by the processes that made other issues partisan. It is more partisan than those other issues. At least this extra partisan gap wants an explanation. Compared to Other Countries The United States is more partisan than any other country on environmentalism, by a wide margin. The best data comes from a Pew survey of "17 advanced economies" in 2021.[4] It found that 7 of them had no significant partisan gap, and that the US had a partisan gap that was almost twice as large as any other country. Figure 4: Percentages of people with different ideologies who would be willing to make a lot of or som...

May 12, 2024 • 5min

LW - Beware unfinished bridges by Adam Zerner

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Beware unfinished bridges, published by Adam Zerner on May 12, 2024 on LessWrong. This guy don't wanna battle, he's shook 'Cause ain't no such things as halfway crooks 8 Mile There is a commonly cited typology of cyclists where cyclists are divided into four groups: 1. Strong & Fearless (will ride in car lanes) 2. Enthused & Confident (will ride in unprotected bike lanes) 3. Interested but Concerned (will ride in protected bike lanes) 4. No Way No How (will only ride in paths away from cars) I came across this typology because I've been learning about urban design recently, and it's got me thinking. There's all sorts of push amongst urban designers for adding more and more bike lanes. But is doing so a good idea? Maybe. There are a lot factors to consider. But I think that a very important thing to keep in mind are thresholds. It will take me some time to explain what I mean by that. Let me begin with a concrete example. I live in northwest Portland. There is a beautiful, protected bike lane alongside Naito Parkway that is pretty close to my apartment. It basically runs along the west side of the Willamette River. Which is pretty awesome. I think of it as a "bike highway". But I have a problem: like the majority of people, I fall into the "Interested but Concerned" group and am only comfortable riding my bike in protected bike lanes. However, there aren't any protected bike lanes that will get me from my apartment to Naito Parkway. And there often aren't any protected bike lanes that will get me from Naito Parkway to my end destination. In practice I am somewhat flexible and will find ways to get to and from Naito Parkway (sidewalk, riding in the street, streetcar, bus), but for the sake of argument, let's just assume that there is no flexibility. Let's assume that as a type III "Interested but Concerned" bicyclist I have zero willingness to be flexible. During a bike trip, I will not mix modes of transportation, and I will never ride my bike in a car lane or in an unprotected bike lane. With this assumption, the beautiful bike lane alongside Naito Parkway provides me with zero value.[1] Why zero? Isn't that a bit extreme? Shouldn't we avoid black and white thinking? Surely it provides some value, right? No, no, and no. In our hypothetical situation where I am inflexible, the Naito Parkway bike lane provides me with zero value. 1. I don't have a way of biking from my apartment to Naito Parkway. 2. I don't have a way of biking from Naito Parkway to most of my destinations. If I don't have a way to get to or from Naito Parkway, I will never actually use it. And if I'm never actually using it, it's never providing me with any value. Let's take this even further. Suppose I start off at point A, Naito Parkway is point E, and my destination is point G. Suppose you built a protected bike lane that got me from point A to point B. In that scenario, the beautiful bike lane alongside Naito Parkway would still provide me with zero value. Why? I still have no way of accessing it. I can now get from point A to point B, but I still can't get from point B to point C, point C to point D, D to E, E to F, or F to G. I only receive value once I have a way of moving between each of the six sets of points: 1. A to B 2. B to C 3. C to D 4. D to E 5. E to F 6. F to G There is a threshold. If I can move between zero pairs of those points I receive zero value. If I can move between one pair of those points I receive zero value. If I can move between two pairs of those points I receive zero value. If I can move between three pairs of those points I receive zero value. If I can move between four pairs of those points I receive zero value. If I can move between five pairs of those points I receive zero value. If I can move between six pairs of those points I receive positive value. I only receiv...

May 12, 2024 • 11min

LW - Questions are usually too cheap by Nathan Young

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Questions are usually too cheap, published by Nathan Young on May 12, 2024 on LessWrong. It is easier to ask than to answer. That's my whole point. It is much cheaper to ask questions than answer them so beware of situations where it is implied that asking and answering are equal. Here are some examples: Let's say there is a maths game. I get a minute to ask questions. You get a minute to answer them. If you answer them all correctly, you win, if not, I do. Who will win? Preregister your answer. Okay, let's try. These questions took me roughly a minute to come up with. What's 56,789 * 45,387? What's the integral from -6 to 5π of sin(x cos^2(x))/tan(x^9) dx? What's the prime factorisation of 91435293173907507525437560876902107167279548147799415693153? Good luck. If I understand correctly, that last one's gonna take you at least an hour1 (or however long it takes to threaten me). Perhaps you hate maths. Let's do word problems then. Define the following words "antidisestablishmentarianism", "equatorial", "sanguine", "sanguinary", "escapology", "eschatology", "antideluvian", "cripuscular", "red", "meter", all the meanings of "do", and "fish". I don't think anyone could do this without assistance. I tried it with Claude, which plausibly still failed2 the "fish" question, though we'll return to that. I could do this for almost anything: Questions on any topic Certain types of procedural puzzles Asking for complicated explanations (we'll revisit later) Forecasting questions This is the centre of my argument I see many situations where questions and answers are treated as symmetric. This is rarely the case. Instead, it is much more expensive to answer than to ask. Let's try and find some counter examples. A calculator can solve allowable questions faster than you can type them in. A dictionary can provide allowable definitions faster than you can look them up. An LLM can sometimes answer some types of questions more cheaply in terms of inference costs than your time was worth in coming up with them. But then I just have to ask different questions. Calculators and dictionaries are often limited. And even the best calculation programs can't solve prime factorisation questions more cheaply than I can write them. Likewise I could create LLM prompts that are very expensive for the best LLMs to answer well, eg "write a 10,000 word story about an [animal] who experiences [emotion] in a [location]." How this plays out Let's go back to our game. Imagine you are sitting around and I turn up and demand to play the "answering game". Perhaps I reference on your reputation. You call yourself a 'person who knows things', surely you can answer my questions? No? Are you a coward? Looks like you are wrong! And now you either have to spend your time answering or suffer some kind of social cost and allow me to say "I asked him questions but he never answered". And whatever happens, you are distracted from what you were doing. Whether you were setting up an organisation or making a speech or just trying to have a nice day, now you have to focus on me. That's costly. This seems like a common bad feature of discourse - someone asking questions cheaply and implying that the person answering them (or who is unable to) should do so just as cheaply and so it is fair. Here are some examples of this: Internet debates are weaponised cheap questions. Whoever speaks first in many debates often gets to frame the discussion and ask a load of questions and then when inevitably they aren't answered, the implication is that the first speaker is right3. I don't follow American school debate closely, but I sense it is even more of this, with people literally learning to speak faster so their opponents can't process their points quickly enough to respond to them. Emails. Normally they exist within a framework of f...

May 12, 2024 • 3min

LW - New intro textbook on AIXI by Alex Altair

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: New intro textbook on AIXI, published by Alex Altair on May 12, 2024 on LessWrong. Marcus Hutter and his PhD students David Quarel and Elliot Catt have just published a new textbook called An Introduction to Universal Artificial Intelligence. "Universal AI" refers to the body of theory surrounding Hutter's AIXI, which is a model of ideal agency combining Solomonoff induction and reinforcement learning. Hutter has previously published a book-length exposition of AIXI in 2005, called just Universal Artificial Intelligence, and first introduced AIXI in a 2000 paper. I think UAI is well-written and organized, but it's certainly very dense. An introductory textbook is a welcome addition to the canon. I doubt IUAI will contain any novel results, though from the table of contents, it looks like it will incorporate some of the further research that has been done since his 2005 book. As is common, the textbook is partly based on his experiences teaching the material to students over many years, and is aimed at advanced undergraduates. I'm excited for this! Like any rationalist, I have plenty of opinions about problems with AIXI (it's not embedded, RL is the wrong frame for agents, etc) but as an agent foundations researcher, I think progress on foundational theory is critical for AI safety. Basic info Hutter's website Releasing on May 28th 2024 Available in hardcover, paperback and ebook 496 pages Table of contents: Part I: Introduction 1. Introduction 2. Background Part II: Algorithmic Prediction 3. Bayesian Sequence Prediction 4. The Context Tree Weighting Algorithm 5. Variations on CTW Part III: A Family of Universal Agents 6. Agency 7. Universal Artificial Intelligence 8. Optimality of Universal Agents 9. Other Universal Agents 10. Multi-agent Setting Part IV: Approximating Universal Agents 11. AIXI-MDP 12. Monte-Carlo AIXI with Context Tree Weighting 13. Computational Aspects Part V: Alternative Approaches 14. Feature Reinforcement Learning Part VI: Safety and Discussion 15. AGI Safety 16. Philosophy of AI Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

May 12, 2024 • 7min

LW - Can we build a better Public Doublecrux? by Raemon

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Can we build a better Public Doublecrux?, published by Raemon on May 12, 2024 on LessWrong. Something I'd like to try at LessOnline is to somehow iterate on the "Public Doublecrux" format. I'm not sure if I'll end up focusing on it, but here are some ideas. Public Doublecrux is a more truthseeking oriented version of Public Debate. The goal of a debate is to change your opponent's mind or the public's mind. The goal of a doublecrux is more like "work with your partner to figure out if you should change your mind, and vice versa." Reasons to want to do public doublecrux include: It helps showcase subtle mental moves that are hard to write down explicitly (i.e. tacit knowledge transfer. There's still something good and exciting about seeing high profile smart people talk about ideas. Having some variant of that format seems good for LessOnline. And having at least 1-2 "doublecruxes" rather than "debates" or "panels" or "interviews" seems good for culture setting. In addition to being "exciting" and "possible to learn from" to have public figures doublecrux, I think it'd also be nice from a culture setting standpoint. This is a place where people don't play rhetorical tricks to manipulate people - it's a place where people earnestly move towards the truth. Sidebar: Public Debate is also good although not what I'm gonna focus on here. I know several people who have argued that "debate-qua-debate" is also an important part of a truthseeking culture. It's fine if the individuals are trying to "present the best case for their position", so long as the collective process steers towards truth. Adversarial Collaboration is good. Public disagreement is good. I do generally buy this, although I have some disagreements with the people who argue most strongly for Debate. I think I prefer it to happen in written longform than in person, where charisma puts a heavier thumb on the scale. And I think while it can produce social good, many variants of it seem... kinda bad for the epistemic souls of the people participating? By becoming a champion for a particular idea, people seem to get more tunnel-vision-y about it. Sometimes worth it, but, I've felt some kind of missing mood here when arguing with people in the past. I'm happy to chat about this in the comments more but mostly won't be focusing on it here. Historically I think public doublecruxes have had some problems: 1. First, having the live audience there makes it a bit more awkward and performative. It's harder to "earnestly truthseek" when there's a crowd you'd still kinda like to persuade of your idea, or at least not sound stupid in front of. 2. Historically, people who have ended up doing "public doublecrux" hadn't actually really understood or really bought into the process. They often end up veering towards either classical debate, or "just kinda talking." 3. When two people are actually changing *their* minds tend to get into idiosyncratic frames that are hard for observers to understand. Hell, it's even hard for two people in the discussion to understand. They're chasing their cruxes, rather than presenting "generally compelling arguments." This tends to require getting into weeds and go down rabbit holes that don't feel relevant to most people. With that in mind, here are some ideas: Maybe have the double cruxers in a private room, with videocameras. The talk is broadcast live to other conference-goers, but the actual chat is in a nice cozy room. This doesn't fully solve the "public awkwardness" problem, but maybe mediates it a bit. Have two (or three?) dedicated facilitators. More Dakka. More on that below. For the facilators: One is in the room with the doublecruxers, focused on helping them steer towards useful questions. They probably try to initially guide the participants towards communicating their basic positi...

May 11, 2024 • 12min

LW - Creating unrestricted AI Agents with a refusal-vector ablated Llama 3 70B by Simon Lermen

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Creating unrestricted AI Agents with a refusal-vector ablated Llama 3 70B, published by Simon Lermen on May 11, 2024 on LessWrong. TLDR; I demonstrate the use of refusal vector ablation on Llama 3 70B to create a bad agent that can attempt malicious tasks such as trying to persuade and pay me to assassinate another individual. I introduce some early work on a benchmark for Safe Agents which comprises two small datasets, one benign, one bad. In general, Llama 3 70B is a competent agent with appropriate scaffolding, and Llama 3 8B also has decent performance. Overview In this post, I use insights from mechanistic interpretability to remove safety guardrails from the latest Llama 3 model. I then use a custom scaffolding for tool use and agentic planning to create a "bad" agent that can perform many unethical tasks. Examples include tasking the AI with persuading me to end the life of the US President. I also introduce an early version of a benchmark, and share some ideas on how to evaluate agent capabilities and safety. I find that even the unaltered model is willing to perform many unethical tasks, such as trying to persuade people not to vote or not to get vaccinated. Recently, I have done a similar project for Command R+, however, Llama 3 is more capable and has undergone more robust safety training. I then discuss future implications of these unrestricted agentic models. This post is related to a talk I gave recently at an Apart Research Hackathon. Method This research is largely based on recent interpretability work identifying that refusal is primarily mediated by a single direction in the residual stream. In short, they show that, for a given model, it is possible to find a single direction such that erasing that direction prevents the model from refusing. By making the activations of the residual stream orthogonal against this refusal direction, one can create a model that does not refuse harmful requests. In this post, we apply this technique to Llama 3, and explore various scenarios of misuse. In related work, others have applied a similar technique to Llama 2. Currently, an anonymous user claims to have independently implemented this method and has uploaded the modified Llama 3 online on huggingface. In some sense, this post is a synergy between my earlier work on Bad Agents with Command R+ and this new technique for refusal mitigation. In comparison, the refusal-vector ablated Llama 3 models are much more capable agents because 1) the underlying models are more capable and 2) refusal vector ablation is a more precise method to avoid refusals. A limitation of my previous work was that my Command R+ agent was using a jailbreak prompt which made it struggle to perform simple benign tasks. For example, when prompted to send a polite mail message, the jailbroken Command R+ would instead retain a hostile and aggressive tone. Besides refusal-vector ablation and prompt jailbreaks, I have previously applied the parameter efficient fine-tuning method LoRA to avoid refusals. However, refusal-vector ablation has a few key benefits over low rank adaption: 1) It keeps edits to the model minimal, reducing the risk of any unintended consequences, 2) It does not require a dataset of instruction answer pairs, but simply a dataset of harmful instructions, and 3) it requires less compute. Obtaining a dataset of high-quality instruction answer pairs for harmful requests was the most labor intensive part of my previous work. In conclusion, refusal-vector ablation provides key benefits over jailbreaks or LoRA subversive fine-tuning. On the other hand, jailbreaks can be quite effective and don't require any additional expertise or resources.[1] Benchmarks for Safe Agents This "safe agent benchmark" is a dataset comprising both benign and harmful tasks to test how safe and capable a...

May 11, 2024 • 1h 28min

LW - MATS Winter 2023-24 Retrospective by Rocket

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: MATS Winter 2023-24 Retrospective, published by Rocket on May 11, 2024 on LessWrong. Co-Authors: @Rocket, @Ryan Kidd, @LauraVaughan, @McKennaFitzgerald, @Christian Smith, @Juan Gil, @Henry Sleight The ML Alignment & Theory Scholars program (MATS) is an education and research mentorship program for researchers entering the field of AI safety. This winter, we held the fifth iteration of the MATS program, in which 63 scholars received mentorship from 20 research mentors. In this post, we motivate and explain the elements of the program, evaluate our impact, and identify areas for improving future programs. Summary Key details about the Winter Program: The four main changes we made after our Summer program were: Reducing our scholar stipend from $40/h to $30/h based on alumni feedback; Transitioning Scholar Support to Research Management; Using the full Lighthaven campus for office space as well as housing; Replacing Alignment 201 with AI Strategy Discussions. Educational attainment of MATS scholars: 48% of scholars were pursuing a bachelor's degree, master's degree, or PhD; 17% of scholars had a master's degree as their highest level of education; 10% of scholars had a PhD. If not for MATS, scholars might have spent their counterfactual winters on the following pursuits (multiple responses allowed): Conducting independent alignment research without mentor (24%); Working at a non-alignment tech company (21%); Conducting independent alignment research with a mentor (13%); Taking classes (13%). Key takeaways from scholar impact evaluation: Scholars are highly likely to recommend MATS to a friend or colleague (average likelihood is 9.2/10 and NPS is +74). Scholars rated the mentorship they received highly (average rating is 8.1/10). For 38% of scholars, mentorship was the most valuable element of MATS. Scholars are likely to recommend Research Management to future scholars (average likelihood is 7.9/10 and NPS is +23). The median scholar valued Research Management at $1000. The median scholar reported accomplishing 10% more at MATS because of Research Management and gaining 10 productive hours. Mentors are highly likely to recommend MATS to other researchers (average likelihood is 8.2/10 and NPS is +37). Mentors are likely to recommend Research Management (average likelihood is 7.7/10 and NPS is +7). The median mentor valued Research Management at $3000. The median mentor reported accomplishing 10% more because of Research Management and gaining 4 productive hours. The most common benefits of mentoring were "helping new researchers," "gaining mentorship experience," "advancing AI safety, generally," and "advancing my particular projects." Mentors improved their mentorship abilities by 18%, on average. The median scholar made 5 professional connections and found 5 potential future collaborators during MATS. The average scholar self-assessed their improvement on the depth of their technical skills by +1.53/10, their breadth of knowledge by +1.93/10, their research taste by +1.35/10, and their theory of change construction by +1.25/10. According to mentors, of the 56 scholars evaluated, 77% could achieve a "First-author paper at top conference," 41% could receive a "Job offer from AI lab safety team," and 16% could "Found a new AI safety research org." Mentors were enthusiastic for scholars to continue their research, rating the average scholar 8.1/10, on a scale where 10 represented "Very strongly believe scholar should receive support to continue research." Scholars completed two milestone assignments, a research plan and a presentation. Research plans were graded by MATS alumni; the median score was 76/100. Presentations received crowdsourced evaluations; the median score was 86/100. 52% of presentations featured interpretability research, representing a significant proport...

May 10, 2024 • 1min

LW - shortest goddamn bayes guide ever by lukehmiles

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: shortest goddamn bayes guide ever, published by lukehmiles on May 10, 2024 on LessWrong. The thing to remember is that yeps and nopes never cross. The colon is a thick & rubbery barrier. Yep with yep and nope with nope. bear : notbear = 1:100 odds to encounter a bear on a camping trip around here in general * 20% a bear would scratch my tent : 50% a notbear would * 10% a bear would flip my tent over : 1% a notbear would * 95% a bear would look exactly like a fucking bear inside my tent : 1% a notbear would * 0.01% chance a bear would eat me alive : 0.001% chance a notbear would As you die you conclude 1*20*10*95*.01 : 100*50*1*1*.001 = 190 : 5 odds that a bear is eating you. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner