The Nonlinear Library: LessWrong

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Jul 10, 2024 • 2min

LW - Paper Summary: The Effects of Communicating Uncertainty on Public Trust in Facts and Numbers by AI Impacts

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Paper Summary: The Effects of Communicating Uncertainty on Public Trust in Facts and Numbers, published by AI Impacts on July 9, 2024 on LessWrong. by Anne Marthe van der Bles, Sander van der Linden, Alexandra L. J. Freeman, and David J. Spiegelhalter. (2020) https://www.pnas.org/doi/pdf/10.1073/pnas.1913678117. Summary: Numerically expressing uncertainty when talking to the public is fine. It causes people to be less confident in the number itself (as it should), but does not cause people to lose trust in the source of that number. Uncertainty is inherent to our knowledge about the state of the world yet often not communicated alongside scientific facts and numbers. In the "posttruth" era where facts are increasingly contested, a common assumption is that communicating uncertainty will reduce public trust. However, a lack of systematic research makes it difficult to evaluate such claims. Within many specialized communities, there are norms which encourage people to state numerical uncertainty when reporting a number. This is not often done when speaking to the public. The public might not understand what the uncertainty means, or they might treat it as an admission of failure. Journalistic norms typically do not communicate the uncertainty. But are these concerns actually justified? This can be checked empirically. Just because a potential bias is conceivable does not imply that it is a significant problem for many people. This paper does the work of actually checking if these concerns are valid. Van der Bles et al. ran five surveys in the UK with a total n = 5,780. A brief description of their methods can be found in the appendix below. Respondents' trust in the numbers varied with political ideology, but how they reacted to the uncertainty did not. People were told the number either without mentioning uncertainty (as a control), with a numerical range, or with a verbal statement that uncertainty exists for these numbers. The study did not investigate stating p-values for beliefs. Exact statements used in the survey can be seen in Table 1, in the appendix. The best summary of their data is in their Figure 5, which presents results from surveys 1-4. The fifth survey had smaller effect sizes, so none of the shifts in trust were significant. Expressing uncertainty made it more likely that people perceived uncertainty in the number (A). This is good. When the numbers are uncertain, science communicators should want people to believe that they are uncertain. Interestingly, verbally reminding people of uncertainty resulted in higher perceived uncertainty than numerically stating the numerical range, which could mean that people are overestimating the uncertainty when verbally reminded of it. The surveys distinguished between trust in the number itself (B) and trust in the source (C). Numerically expressing uncertainty resulted in a small decrease in the trust of that number. Verbally expressing uncertainty resulted in a larger decrease in the trust of that number. Numerically expressing uncertainty resulted in no significant change in the trust of the source. Verbally expressing uncertainty resulted in a small decrease in the trust of the source. The consequences of expressing numerical uncertainty are what I would have hoped: people trust the number a bit less than if they hadn't thought about uncertainty at all, but don't think that this reflects badly on the source of the information. Centuries of human thinking about uncertainty among many leaders, journalists, scientists, and policymakers boil down to a simple and powerful intuition: "No one likes uncertainty." It is therefore often assumed that communicating uncertainty transparently will decrease public trust in science. In this program of research, we set out to investigate whether such claims have any empirical ...

Jul 9, 2024 • 11min

LW - What and Why: Developmental Interpretability of Reinforcement Learning by Garrett Baker

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What and Why: Developmental Interpretability of Reinforcement Learning, published by Garrett Baker on July 9, 2024 on LessWrong. Introduction I happen to be in that happy stage in the research cycle where I ask for money so I can continue to work on things I think are important. Part of that means justifying what I want to work on to the satisfaction of the people who provide that money. This presents a good opportunity to say what I plan to work on in a more layman-friendly way, for the benefit of LessWrong, potential collaborators, interested researchers, and funders who want to read the fun version of my project proposal It also provides the opportunity for people who are very pessimistic about the chances I end up doing anything useful by pursuing this to have their say. So if you read this (or skim it), and have critiques (or just recommendations), I'd love to hear them! Publicly or privately. So without further ado, in this post I will be discussing & justifying three aspects of what I'm working on, and my reasons for believing there are gaps in the literature in the intersection of these subjects that are relevant for AI alignment. These are: 1. Reinforcement learning 2. Developmental Interpretability 3. Values Culminating in: Developmental interpretability of values in reinforcement learning. Here are brief summaries of each of the sections: 1. Why study reinforcement learning? 1. Imposed-from-without or in-context reinforcement learning seems a likely path toward agentic AIs 2. The "data wall" means active-learning or self-training will get more important over time 3. There are fewer ways for the usual AI risk arguments to fail in the RL with mostly outcome-based rewards circumstance than the supervised learning + RL with mostly process-based rewards (RLHF) circumstance. 2. Why study developmental interpretability? 1. Causal understanding of the training process allows us to produce reward structure or environmental distribution interventions 2. Alternative & complementary tools to mechanistic interpretability 3. Connections with singular learning theory 3. Why study values? 1. The ultimate question of alignment is how can we make AI values compatible with human values, yet this is relatively understudied. 4. Where are the gaps? 1. Many experiments 2. Many theories 3. Few experiments testing theories or theories explaining experiments Reinforcement learning Agentic AIs vs Tool AIs All generally capable adaptive systems are ruled by a general, ground-truth, but slow outer optimization process which reduces incoherency and continuously selects for systems which achieve outcomes in the world. Examples include evolution, business, cultural selection, and to a great extent human brains. That is, except for LLMs. Most of the feedback LLMs receive is supervised, unaffected by the particular actions the LLM takes, and process-based (RLHF-like), where we reward the LLM according to how useful an action looks in contrast to a ground truth regarding how well that action (or sequence of actions) achieved its goal. Now I don't want to make the claim that this aspect of how we train LLMs is clearly a fault of them, or in some way limits the problem solving abilities they can have. And I do think it possible we see in-context ground-truth optimization processes instantiated as a result of increased scaling, in the same way we see in context learning. I do however want to make the claim that this current paradigm of mostly processed-based supervision, if it continues, and doesn't itself produce ground-truth based optimization, makes me optimistic about AI going well. That is, if this lack of general ground-truth optimization continues, we end up with a cached bundle of not very agentic (compared to AIXI) tool AIs with limited search or bootstrapping capabilities. Of course,...

Jul 9, 2024 • 16min

LW - Poker is a bad game for teaching epistemics. Figgie is a better one. by rossry

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Poker is a bad game for teaching epistemics. Figgie is a better one., published by rossry on July 9, 2024 on LessWrong. Editor's note: Somewhat after I posted this on my own blog, Max Chiswick cornered me at LessOnline / Manifest and gave me a whole new perspective on this topic. I now believe that there is a way to use poker to sharpen epistemics that works dramatically better than anything I had been considering. I hope to write it up - together with Max - when I have time. Anyway, I'm still happy to keep this post around as a record of my first thoughts on the matter, and because it's better than nothing in the time before Max and I get around to writing up our joint second thoughts. As an epilogue to this story, Max and I are now running a beta test for a course on making AIs to play poker and other games. The course will a synthesis of our respective theories of pedagogy re: games, and you can read more here or in the comments. The beta will run July 15-August 15, in-person in SF, and will be free but with limited signups. Some trading firms are driven by good decisions made by humans. (Some aren't, but we can set those aside. This post is about the ones that are.) Humans don't make better-than-average-quality decisions by default, so the better class of intellectually-driven quantitative trading firm realizes that they are in the business of training humans to make better decisions. (The second-best class of firm contents themselves with merely selecting talent.) Some firms, famously, use poker to teach traders about decision making under uncertainty. First, the case for poker-as-educational-tool: You have to make decisions. (Goodbye, Candy Land.) You have to make them under uncertainty. (Goodbye, chess.) If you want to win against smart competition, you have to reverse-engineer the state of your competitors' uncertainty from their decisions, in order to make better decisions yourself. (Goodbye, blackjack.) It's the last of these that is the rarest among games. In Camel Up - which is a great game for sharpening certain skills - you place bets and make trades on the outcome of a Candy Land-style camel race. Whether you should take one coin for sure or risk one to win five if the red camel holds the lead for another round... Turn after turn, you have to make these calculations and decisions under uncertainty. But there's no meaningful edge in scrutinizing your opponent's decision to pick the red camel. If they were right about the probabilities, you shouldn't have expected differently. And if they're wrong, it means they made a mistake, not that they know a secret about red camels. Poker is different. Your decision is rarely dictated by the probabilities alone. Even if you draw the worst possible card, you can win if your opponent has been bluffing and has even worse - or if your next action convinces them that they should fold a hand that would have beaten yours. If you only play the odds that you see, and not the odds you see your opponent showing you, you will on average lose. So as you grind and grind at poker, first you learn probabilities and how they should affect your decisions, then you learn to see what others' decisions imply about what they see, and then you can work on changing your decisions to avoid leaking what you know to the other players that are watching you. Or so I'm told. I would not describe myself as a particularly skilled poker player. I certainly have not ground and ground and ground. Here's the thing, though: If you are a trading firm and you want to teach traders about making decisions uncertainty, it's not enough that poker teaches it. Nor is it enough that poker, if you grind for thousands of hours, can teach quite a lot of it. A quantitative trading firm is primarily a socialist collective run for the benefit of its workers, but it...

Jul 9, 2024 • 9min

LW - Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs by L Rudolf L

Jul 9, 2024 • 8min

LW - Advice to junior AI governance researchers by Akash

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Advice to junior AI governance researchers, published by Akash on July 9, 2024 on LessWrong. This summer, I'm supervising some research fellows through Cambridge's ERA AI Fellowship. The program started last week, and I've had conversations with about 6 fellows about their research projects & summer goals. In this post, I'll highlight a few pieces of advice I've found myself regularly giving to research fellows. This post reflects my own opinions and does not necessarily reflect the views of others at ERA. Prioritize projects that have a clear target audience Problem: One of the most common reasons why research products fail to add value is that they do not have a target audience. I think it can be easy to find a topic that is interesting/important, spend several months working on it, produce a 20-50 page paper, and then realize that you have no particular stakeholder(s) who find the work action-relevant. Advice: Try to brainstorm what specific individuals you would want to have affected by your piece. This might be some folks in the AI safety community. This might be government officials at a relevant agency in the US or the UK. Prioritize projects that have a clear target audience and prioritize projects in which you have a way of actually getting your paper/product to that target audience. Ideally, see if you can talk to representative members of your target audience in advance to see if you have a good understanding of what they might find useful. Caveat #1: Gaining expertise can be a valid reason to do research. Sometimes, the most important target audience is yourself. It may be worthwhile to take on a research project because you want to develop your expertise in a certain area. Even if the end product is not action-relevant for anyone, you might have reason to believe that your expertise will be valuable in the present or future. Caveat #2: Consider target audiences in the future. Some pieces do not have a target audience in the present, but they could be important in the future. This is particularly relevant when considering Overton Window shifts. It's quite plausible to me that we get at least one more major Overton Window shift in which governments become much more concerned about AI risks. There may even be critical periods (lasting only a few weeks or a few months) in which policymakers are trying to understand what to do. You probably won't have time to come up with a good plan in those weeks or months. Therefore, it seems like it could be valuable to do the kind of research now that helps us prepare for such future scenarios. Be specific about your end products Problem: A lot of junior researchers find tons of ideas exciting. You might have a junior researcher who is interested in a topic like "compute governance", "evals", or "open-sourcing." That's a good start. But if the research proposal is to "come up with gaps in the evals space" or "figure out what to do about open-source risks", there's a potential to spend several months thinking about high-level ideas and not actually producing anything concrete/specific It's common for junior researchers to overestimate the feasibility of tackling big/broad research questions. Advice: Try to be more specific about what you want your final products to look like. If it's important for you to have a finished research product (either because it would be directly useful or because of the educational/professional benefits of having the experience of completing a project), make sure you prioritize finishing something. If you're interested in lots of different projects, prioritize. For example, "I want to spend time on X, Y, and Z. X is the most important end product. I'll try to focus on finishing X, and I'll try not to spend much time on Y until X is finished or on track to be finished." Caveat #1: You don't need...

Jul 8, 2024 • 15min

LW - Dialogue introduction to Singular Learning Theory by Olli Järviniemi

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Dialogue introduction to Singular Learning Theory, published by Olli Järviniemi on July 8, 2024 on LessWrong. Alice: A lot of people are talking about Singular Learning Theory. Do you know what it is? Bob: I do. (pause) Kind of. Alice: Well, I don't. Explanation time? Bob: Uh, I'm not really an expert on it. You know, there's a lot of materials out there that Alice: that I realistically won't ever actually look at. Or, I've looked at them a little, but I still have basically no idea what's going on. Maybe if I watched a dozen hours of introductory lectures I'd start to understand it, but that's not currently happening. What I really want is a short overview of what's going on. That's self-contained. And easy to follow. Aimed at a non-expert. And which perfectly answers any questions I might have. So, I thought I'd ask you! Bob: Sorry, I'm actually really not Alice: Pleeeease? [pause] Bob: Ah, fine, I'll try. So, you might have heard of ML models being hard to interpret. Singular Learning Theory (SLT) is an approach for understanding models better. Or, that's one motivation, at least. Alice: And how's this different from a trillion other approaches to understanding AI? Bob: A core perspective of SLT is studying how the model develops during training. Contrast this to, say, mechanistic interpretability, which mostly looks at the fully trained model. SLT is also more concerned about higher level properties. As a half-baked analogue, you can imagine two approaches to studying how humans work: You could just open up a human and see what's inside. Or, you could notice that, hey, you have these babies, which grow up into children, go through puberty, et cetera, what's up with that? What are the different stages of development? Where do babies come from? And SLT is more like the second approach. Alice: This makes sense as a strategy, but I strongly suspect you don't currently know what an LLM's puberty looks like. Bob: (laughs) No, not yet. Alice: So what do you actually have? Bob: The SLT people have some quite solid theory, and some empirical work building on top of that. Maybe I'll start from the theory, and then cover some of the empirical work. Alice: (nods) I. Theoretical foundations Bob: So, as you know, nowadays the big models are trained with gradient descent. As you also know, there's more to AI than gradient descent. And for a moment we'll be looking at the Bayesian setting, not gradient descent. Alice: Elaborate on "Bayesian setting"? Bob: Imagine a standard deep learning setup, where you want your neural network to classify images, predict text or whatever. You want to find parameters for your network so that it has good performance. What do you do? The gradient descent approach is: Randomly initialize the parameters, then slightly tweak them on training examples in the direction of better performance. After a while your model is probably decent. The Bayesian approach is: Consider all possible settings of the parameters. Assign some prior to them. For each model, check how well they predict the correct labels on some training examples. Perform a Bayesian update on the prior. Then sample a model from the posterior. With lots of data you will probably obtain a decent model. Alice: Wait, isn't the Bayesian approach very expensive computationally? Bob: Totally! Or, if your network has 7 parameters, you can pull it off. If it has 7 billion, then no. There are way too many models, we can't do the updating, not even approximately. Nevertheless, we'll look at the Bayesian setting - it's theoretically much cleaner and easier to analyze. So forget about computational costs for a moment. Alice: Will the theoretical results also apply to gradient descent and real ML models, or be completely detached from practice? Bob: (winks) Alice: You know what, maybe I'll just let you t...

Jul 8, 2024 • 10min

LW - Pantheon Interface by NicholasKees

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Pantheon Interface, published by NicholasKees on July 8, 2024 on LessWrong. Pantheon is an experimental LLM interface exploring a different type of human-AI interaction. We created this as a part of the cyborgism project, with the abstract motivation of augmenting the human ability to think by integrating human and AI generated thoughts. How it works: 1. A human user "thinks out loud" by typing out their thoughts one at a time. This leaves a text trace of their stream of thought. 2. AI characters (called daemons) read this trace, and interact with the user by responding asynchronously with comments and questions. The core distinguishing philosophy is that, while most apps are about a human prompting an AI to do useful mental work, Pantheon is the opposite. Here, AI does the prompting, and the goal is for the AI generated questions or comments to cause the human user to think in ways they would not have on their own. At worst, the app is a rubber duck. At best, the app is a court of advisors, each using their own unique skills to push you to think your best thoughts. Pantheon can be found at pantheon.chat, and we would really appreciate any and all feedback you have. The app is set up for you to customize your own daemons. We have set up some default daemons to provide inspiration, but we expect the tool to be a lot more useful when they are customized to specific users. If the default daemons don't feel useful, we highly encourage you to try to make your own. How do I use Pantheon? First, go to settings and provide an OpenAI API key. Next, begin typing out your thoughts on some topic. It helps to keep each thought relatively short, sending them to the stream of thought as often as you can. This gives the daemons lots of opportunities to interject and offer their comments. Furthermore, it's usually best to treat this more like a diary or personal notes, rather than as a conversation. In this spirit, it's better not to wait for them to respond, but to instead continue your train of thought, keeping your focus on your own writing. What do the daemons see? Your stream of thought appears in the interface as a chain of individual thoughts. Daemons are called to respond to specific thoughts. When they do, they are given access to all preceding thoughts in the chain, up to and including the thought they were called to. Daemons can only see text the user has written, and they can't see any of the comments made by themselves or other daemons. We are looking into ways to give the daemons access to their own comment history, but we have not yet made this possible. After a daemon generates a comment, you can inspect the full chain of thought by clicking on that comment. This will open up a window which will show you everything the LLM saw in the process of generating that response. You can also edit the daemons in settings, as well as toggle them on or off. Trees, branching, and sections The text in the interface appears to you as a chain of thoughts, but it is actually a tree. If you hover over a thought, a plus icon will appear. If you click this icon, you can branch the chain. This is often useful if you feel that you have gone down a dead end, or would like to explore a tangent. When there are multiple branches, arrows will appear next to their parent thought, and you can use those arrows to navigate the tree. If you would like a fresh context, you can make an entirely new tree by opening the "Collection view" in the top left. Furthermore, you can also create a new "section" by clicking the "New Section" button below the input box. This will create a hard section break such that daemons can no longer see any context which came before the break. How do I save my progress? Everything you do is automatically saved in local storage. You can also import/export the full app state i...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app