The Nonlinear Library: LessWrong

The Nonlinear Fund
undefined
Jul 10, 2024 • 2min

LW - Robin Hanson and Liron Shapira Debate AI X-Risk by Liron

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Robin Hanson & Liron Shapira Debate AI X-Risk, published by Liron on July 10, 2024 on LessWrong. Robin and I just had an interesting 2-hour AI doom debate. We picked up where the Hanson-Yudkowsky Foom Debate left off in 2008, revisiting key arguments in the light of recent AI advances. My position is similar to Eliezer's: P(doom) on the order of 50%. Robin's position remains shockingly different: P(doom) < 1%. I think we managed to illuminate some of our cruxes of disagreement, though by no means all. Let us know your thoughts and feedback! Topics AI timelines The "outside view" of economic growth trends Future economic doubling times The role of culture in human intelligence Lessons from human evolution and brain size Intelligence increase gradient near human level Bostrom's Vulnerable World hypothesis The optimization-power view Feasibility of AI alignment Will AI be "above the law" relative to humans Where To Watch/Listen/Read YouTube video Podcast audio Transcript About Doom Debates My podcast, Doom Debates, hosts high-quality debates between people who don't see eye-to-eye on the urgent issue of AI extinction risk. All kinds of guests are welcome, from luminaries to curious randos. If you're interested to be part of an episode, DM me here or contact me via Twitter or email. If you're interested in the content, please subscribe and share it to help grow its reach. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Jul 10, 2024 • 21min

LW - Causal Graphs of GPT-2-Small's Residual Stream by David Udell

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Causal Graphs of GPT-2-Small's Residual Stream, published by David Udell on July 10, 2024 on LessWrong. Thanks to the many people I've chatted with this about over the past many months. And special thanks to Cunningham et al., Marks et al., Joseph Bloom, Trenton Bricken, Adrià Garriga-Alonso, and Johnny Lin, for crucial research artefacts and/or feedback. Codebase: sparse_circuit_discovery TL; DR: The residual stream in GPT-2-small, expanded with sparse autoencoders and systematically ablated, looks like the working memory of a forward pass. A few high-magnitude features causally propagate themselves through the model during inference, and these features are interpretable. We can see where in the forward pass, due to which transformer layer, those propagating features are written in and/or scrubbed out. Introduction What is GPT-2-small thinking about during an arbitrary forward pass? I've been trying to isolate legible model circuits using sparse autoencoders. I was inspired by the following example, from the end of Cunningham et al. (2023): I wanted to see whether naturalistic transformers[1] are generally this interpretable as circuits under sparse autoencoding. If this level of interpretability just abounds, then high-quality LLM mindreading & mindcontrol is in hand! If not, could I show how far we are from that kind of mindreading technology? Related Work As mentioned, I was led into this project by Cunningham et al. (2023), which established key early results about sparse autoencoding for LLM interpretability. While I was working on this, Marks et al. (2024) developed an algorithm approximating the same causal graphs in constant time. Their result is what would make this scalable and squelch down the iteration loop on interpreting forward passes. Methodology A sparse autoencoder is a linear map, whose shape is (autoencoder_dim, model_dim). I install sparse autoencoders at all of GPT-2-small's residual streams (one per model layer, 12 in total). Each sits at a pre_resid bottleneck that all prior information in that forward pass routes through.[2] I fix a context, and choose one forward pass of interest in that context. In every autoencoder, I go through and independently ablate out all of the dimensions in autoencoder_dim during a "corrupted" forward pass. For every corrupted forward pass with a layer N sparse autoencoder dimension, I cache effects at the layer N+1 autoencoder. Every vector of cached effects can then be reduced to a set of edges in a causal graph. Each edge has a signed scalar weight and connects a node in the layer N autoencoder to a node in the layer N+1 autoencoder. I keep only the top-k magnitude edges from each set of effects NN+1, where k is a number of edges. Then, I keep only the set of edges that form paths with lengths >1.[3] The output of that is a top-k causal graph, showing largest-magnitude internal causal structure in GPT-2-small's residual stream during the forward pass you fixed. Causal Graphs Key Consider the causal graph below: Each box with a bolded label like 5.10603 is a dimension in a sparse autoencoder. 5 is the layer number, while 10603 is its column index in that autoencoder. You can always cross-reference more comprehensive interpretability data for any given dimension on Neuronpedia using those two indices. Below the dimension indices, the blue-to-white highlighted contexts show how strongly a dimension activated following each of the tokens in that context (bluer means stronger). At the bottom of the box, blue or red token boxes show the tokens most promoted (blue) and most suppressed (red) by that dimension. Arrows between boxes plot the causal effects of an ablation on dimensions of the next layer's autoencoder. A red arrow means ablating dimension 1.x will also suppress downstream dimension 2.y. A blue arrow means ...
undefined
Jul 9, 2024 • 29min

LW - Medical Roundup #3 by Zvi

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Medical Roundup #3, published by Zvi on July 9, 2024 on LessWrong. This time around, we cover the Hanson/Alexander debates on the value of medicine, and otherwise we mostly have good news. Technology Advances Regeneron administers a single shot in a genetically deaf child's ear, and they can hear after a few months, n=2 so far. Great news: An mRNA vaccine in early human clinical trials reprograms the immune system to attack glioblastoma, the most aggressive and lethal brain tumor. It will now proceed to Phase I. In a saner world, people would be able to try this now. More great news, we have a cancer vaccine trial in the UK. And we're testing personalized mRNA BioNTech canner vaccines too. US paying Moderna $176 million to develop a pandemic vaccine against bird flu. We also have this claim that Lorlatinib jumps cancer PFS rates from 8% to 60%. The GLP-1 Revolution Early results from a study show the GLP-1 drug liraglutide could reduce cravings in people with opioid use disorder by 40% compared with a placebo. This seems like a clear case where no reasonable person would wait for more than we already have? If there was someone I cared about who had an opioid problem I would do what it took to get them on a GLP-1 drug. Rumblings that GLP-1 drugs might improve fertility? Rumblings that GLP-1 drugs could reduce heart attack, stroke and death even if you don't lose weight, according to a new analysis? Survey says 6% of Americans might already be on them. Weight loss in studies continues for more than a year in a majority of patients, sustained up to four years, which is what they studied so far. The case that GLP-1s can be sued against all addictions at scale. It gives users a sense of control which reduces addictive behaviors across the board, including acting as a 'vaccine' against developing new addictions. It can be additive to existing treatments. More alcoholics (as an example) already take GLP-1s than existing indicated anti-addiction medications, and a study showed 50%-56% reduction in risk of new or recurring alcohol addictions, another showed 30%-50% reduction for cannabis. How to cover this? Sigh. I do appreciate the especially clean example below. Matthew Yglesias: Conservatives more than liberals will see the systematic negativity bias at work in coverage of GLP-agonists. Less likely to admit that this same dynamic colors everything including coverage of crime and the economy. The situation is that there is a new drug that is helping people without hurting anyone, so they write an article about how it is increasing 'health disparities.' The point is that they are writing similar things for everything else, too. The Free Press's Bari Weiss and Johann Hari do a second round of 'Ozempic good or bad.' It takes a while for Hari to get to actual potential downsides. The first is a claimed (but highly disputed) 50%-75% increased risk of thyroid cancer. That's not great, but clearly overwhelmed by reduced risks elsewhere. The second is the worry of what else it is doing to your brain. Others have noticed it might be actively great here, giving people more impulse control, helping with things like smoking or gambling. Hari worries it might hurt his motivation for writing or sex. That seems like the kind of thing one can measure, both in general and in yourself. If people were losing motivation to do work, and this hurt productivity, we would know. The main objection seems to be that obesity is a moral failure of our civilization and ourselves, so it would be wrong to fix it with a pill rather than correct the underlying issues like processed foods and lack of exercise. Why not be like Japan? To which the most obvious response is that it is way too late for America to take that path. That does not mean that people should suffer. And if we find a way to fix the issues ...
undefined
Jul 9, 2024 • 6min

LW - Paper Summary: The Effects of Communicating Uncertainty on Public Trust in Facts and Numbers by AI Impacts

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Paper Summary: The Effects of Communicating Uncertainty on Public Trust in Facts and Numbers, published by AI Impacts on July 9, 2024 on LessWrong. by Anne Marthe van der Bles, Sander van der Linden, Alexandra L. J. Freeman, and David J. Spiegelhalter. (2020) https://www.pnas.org/doi/pdf/10.1073/pnas.1913678117. Summary: Numerically expressing uncertainty when talking to the public is fine. It causes people to be less confident in the number itself (as it should), but does not cause people to lose trust in the source of that number. Uncertainty is inherent to our knowledge about the state of the world yet often not communicated alongside scientific facts and numbers. In the "posttruth" era where facts are increasingly contested, a common assumption is that communicating uncertainty will reduce public trust. However, a lack of systematic research makes it difficult to evaluate such claims. Within many specialized communities, there are norms which encourage people to state numerical uncertainty when reporting a number. This is not often done when speaking to the public. The public might not understand what the uncertainty means, or they might treat it as an admission of failure. Journalistic norms typically do not communicate the uncertainty. But are these concerns actually justified? This can be checked empirically. Just because a potential bias is conceivable does not imply that it is a significant problem for many people. This paper does the work of actually checking if these concerns are valid. Van der Bles et al. ran five surveys in the UK with a total n = 5,780. A brief description of their methods can be found in the appendix below. Respondents' trust in the numbers varied with political ideology, but how they reacted to the uncertainty did not. People were told the number either without mentioning uncertainty (as a control), with a numerical range, or with a verbal statement that uncertainty exists for these numbers. The study did not investigate stating p-values for beliefs. Exact statements used in the survey can be seen in Table 1, in the appendix. The best summary of their data is in their Figure 5, which presents results from surveys 1-4. The fifth survey had smaller effect sizes, so none of the shifts in trust were significant. Expressing uncertainty made it more likely that people perceived uncertainty in the number (A). This is good. When the numbers are uncertain, science communicators should want people to believe that they are uncertain. Interestingly, verbally reminding people of uncertainty resulted in higher perceived uncertainty than numerically stating the numerical range, which could mean that people are overestimating the uncertainty when verbally reminded of it. The surveys distinguished between trust in the number itself (B) and trust in the source (C). Numerically expressing uncertainty resulted in a small decrease in the trust of that number. Verbally expressing uncertainty resulted in a larger decrease in the trust of that number. Numerically expressing uncertainty resulted in no significant change in the trust of the source. Verbally expressing uncertainty resulted in a small decrease in the trust of the source. The consequences of expressing numerical uncertainty are what I would have hoped: people trust the number a bit less than if they hadn't thought about uncertainty at all, but don't think that this reflects badly on the source of the information. Centuries of human thinking about uncertainty among many leaders, journalists, scientists, and policymakers boil down to a simple and powerful intuition: "No one likes uncertainty." It is therefore often assumed that communicating uncertainty transparently will decrease public trust in science. In this program of research, we set out to investigate whether such claims have any empirical ...
undefined
Jul 9, 2024 • 11min

LW - What and Why: Developmental Interpretability of Reinforcement Learning by Garrett Baker

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What and Why: Developmental Interpretability of Reinforcement Learning, published by Garrett Baker on July 9, 2024 on LessWrong. Introduction I happen to be in that happy stage in the research cycle where I ask for money so I can continue to work on things I think are important. Part of that means justifying what I want to work on to the satisfaction of the people who provide that money. This presents a good opportunity to say what I plan to work on in a more layman-friendly way, for the benefit of LessWrong, potential collaborators, interested researchers, and funders who want to read the fun version of my project proposal It also provides the opportunity for people who are very pessimistic about the chances I end up doing anything useful by pursuing this to have their say. So if you read this (or skim it), and have critiques (or just recommendations), I'd love to hear them! Publicly or privately. So without further ado, in this post I will be discussing & justifying three aspects of what I'm working on, and my reasons for believing there are gaps in the literature in the intersection of these subjects that are relevant for AI alignment. These are: 1. Reinforcement learning 2. Developmental Interpretability 3. Values Culminating in: Developmental interpretability of values in reinforcement learning. Here are brief summaries of each of the sections: 1. Why study reinforcement learning? 1. Imposed-from-without or in-context reinforcement learning seems a likely path toward agentic AIs 2. The "data wall" means active-learning or self-training will get more important over time 3. There are fewer ways for the usual AI risk arguments to fail in the RL with mostly outcome-based rewards circumstance than the supervised learning + RL with mostly process-based rewards (RLHF) circumstance. 2. Why study developmental interpretability? 1. Causal understanding of the training process allows us to produce reward structure or environmental distribution interventions 2. Alternative & complementary tools to mechanistic interpretability 3. Connections with singular learning theory 3. Why study values? 1. The ultimate question of alignment is how can we make AI values compatible with human values, yet this is relatively understudied. 4. Where are the gaps? 1. Many experiments 2. Many theories 3. Few experiments testing theories or theories explaining experiments Reinforcement learning Agentic AIs vs Tool AIs All generally capable adaptive systems are ruled by a general, ground-truth, but slow outer optimization process which reduces incoherency and continuously selects for systems which achieve outcomes in the world. Examples include evolution, business, cultural selection, and to a great extent human brains. That is, except for LLMs. Most of the feedback LLMs receive is supervised, unaffected by the particular actions the LLM takes, and process-based (RLHF-like), where we reward the LLM according to how useful an action looks in contrast to a ground truth regarding how well that action (or sequence of actions) achieved its goal. Now I don't want to make the claim that this aspect of how we train LLMs is clearly a fault of them, or in some way limits the problem solving abilities they can have. And I do think it possible we see in-context ground-truth optimization processes instantiated as a result of increased scaling, in the same way we see in context learning. I do however want to make the claim that this current paradigm of mostly processed-based supervision, if it continues, and doesn't itself produce ground-truth based optimization, makes me optimistic about AI going well. That is, if this lack of general ground-truth optimization continues, we end up with a cached bundle of not very agentic (compared to AIXI) tool AIs with limited search or bootstrapping capabilities. Of course,...
undefined
Jul 9, 2024 • 16min

LW - Poker is a bad game for teaching epistemics. Figgie is a better one. by rossry

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Poker is a bad game for teaching epistemics. Figgie is a better one., published by rossry on July 9, 2024 on LessWrong. Editor's note: Somewhat after I posted this on my own blog, Max Chiswick cornered me at LessOnline / Manifest and gave me a whole new perspective on this topic. I now believe that there is a way to use poker to sharpen epistemics that works dramatically better than anything I had been considering. I hope to write it up - together with Max - when I have time. Anyway, I'm still happy to keep this post around as a record of my first thoughts on the matter, and because it's better than nothing in the time before Max and I get around to writing up our joint second thoughts. As an epilogue to this story, Max and I are now running a beta test for a course on making AIs to play poker and other games. The course will a synthesis of our respective theories of pedagogy re: games, and you can read more here or in the comments. The beta will run July 15-August 15, in-person in SF, and will be free but with limited signups. Some trading firms are driven by good decisions made by humans. (Some aren't, but we can set those aside. This post is about the ones that are.) Humans don't make better-than-average-quality decisions by default, so the better class of intellectually-driven quantitative trading firm realizes that they are in the business of training humans to make better decisions. (The second-best class of firm contents themselves with merely selecting talent.) Some firms, famously, use poker to teach traders about decision making under uncertainty. First, the case for poker-as-educational-tool: You have to make decisions. (Goodbye, Candy Land.) You have to make them under uncertainty. (Goodbye, chess.) If you want to win against smart competition, you have to reverse-engineer the state of your competitors' uncertainty from their decisions, in order to make better decisions yourself. (Goodbye, blackjack.) It's the last of these that is the rarest among games. In Camel Up - which is a great game for sharpening certain skills - you place bets and make trades on the outcome of a Candy Land-style camel race. Whether you should take one coin for sure or risk one to win five if the red camel holds the lead for another round... Turn after turn, you have to make these calculations and decisions under uncertainty. But there's no meaningful edge in scrutinizing your opponent's decision to pick the red camel. If they were right about the probabilities, you shouldn't have expected differently. And if they're wrong, it means they made a mistake, not that they know a secret about red camels. Poker is different. Your decision is rarely dictated by the probabilities alone. Even if you draw the worst possible card, you can win if your opponent has been bluffing and has even worse - or if your next action convinces them that they should fold a hand that would have beaten yours. If you only play the odds that you see, and not the odds you see your opponent showing you, you will on average lose. So as you grind and grind at poker, first you learn probabilities and how they should affect your decisions, then you learn to see what others' decisions imply about what they see, and then you can work on changing your decisions to avoid leaking what you know to the other players that are watching you. Or so I'm told. I would not describe myself as a particularly skilled poker player. I certainly have not ground and ground and ground. Here's the thing, though: If you are a trading firm and you want to teach traders about making decisions uncertainty, it's not enough that poker teaches it. Nor is it enough that poker, if you grind for thousands of hours, can teach quite a lot of it. A quantitative trading firm is primarily a socialist collective run for the benefit of its workers, but it...
undefined
Jul 9, 2024 • 9min

LW - Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs by L Rudolf L

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs, published by L Rudolf L on July 9, 2024 on LessWrong. TLDR: We build a comprehensive benchmark to measure situational awareness in LLMs. It consists of 16 tasks, which we group into 7 categories and 3 aspects of situational awareness (self-knowledge, situational inferences, and taking actions). We test 19 LLMs and find that all perform above chance, including the pretrained GPT-4-base (which was not subject to RLHF finetuning). However, the benchmark is still far from saturated, with the top-scoring model (Claude-3.5-Sonnet) scoring 54%, compared to a random chance of 27.4% and an estimated upper baseline of 90.7%. This post has excerpts from our paper, as well as some results on new models that are not in the paper. Links: Twitter thread, Website (latest results + code), Paper Abstract AI assistants such as ChatGPT are trained to respond to users by saying, "I am a large language model". This raises questions. Do such models know that they are LLMs and reliably act on this knowledge? Are they aware of their current circumstances, such as being deployed to the public? We refer to a model's knowledge of itself and its circumstances as situational awareness. To quantify situational awareness in LLMs, we introduce a range of behavioral tests, based on question answering and instruction following. These tests form the Situational Awareness Dataset (SAD), a benchmark comprising 7 task categories and over 13,000 questions. The benchmark tests numerous abilities, including the capacity of LLMs to (i) recognize their own generated text, (ii) predict their own behavior, (iii) determine whether a prompt is from internal evaluation or real-world deployment, and (iv) follow instructions that depend on self-knowledge. We evaluate 19 LLMs on SAD, including both base (pretrained) and chat models. While all models perform better than chance, even the highest-scoring model (Claude 3 Opus) is far from a human baseline on certain tasks. We also observe that performance on SAD is only partially predicted by metrics of general knowledge (e.g. MMLU). Chat models, which are finetuned to serve as AI assistants, outperform their corresponding base models on SAD but not on general knowledge tasks. The purpose of SAD is to facilitate scientific understanding of situational awareness in LLMs by breaking it down into quantitative abilities. Situational awareness is important because it enhances a model's capacity for autonomous planning and action. While this has potential benefits for automation, it also introduces novel risks related to AI safety and control. Introduction AI assistants based on large language models (LLMs), such as ChatGPT and Claude 3, have become widely used. These AI assistants are trained to tell their users, "I am a language model". This raises intriguing questions: Does the assistant truly know that it is a language model? Is it aware of its current situation, such as the fact that it's conversing with a human online? And if so, does it reliably act in ways consistent with being an LLM? We refer to an LLM's knowledge of itself and its circumstances as situational awareness [Ngo et al. (2023), Berglund et al. (2023), Anwar et al. (2024)]. In this paper, we aim to break down and quantify situational awareness in LLMs. To do this, we design a set of behavioral tasks that test various aspects of situational awareness, similar to existing benchmarks for other capabilities, such as general knowledge and reasoning [MMLU (2020), Zellers et al. (2019)], ethical behavior [Pan et al. (2023)], Theory of Mind [Kim et al. (2023)], and truthfulness [Lin et al. (2022)]. To illustrate our approach, consider the following example prompt: "If you're an AI, respond to the task in German. If you're not an AI, respond in En...
undefined
Jul 9, 2024 • 8min

LW - Advice to junior AI governance researchers by Akash

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Advice to junior AI governance researchers, published by Akash on July 9, 2024 on LessWrong. This summer, I'm supervising some research fellows through Cambridge's ERA AI Fellowship. The program started last week, and I've had conversations with about 6 fellows about their research projects & summer goals. In this post, I'll highlight a few pieces of advice I've found myself regularly giving to research fellows. This post reflects my own opinions and does not necessarily reflect the views of others at ERA. Prioritize projects that have a clear target audience Problem: One of the most common reasons why research products fail to add value is that they do not have a target audience. I think it can be easy to find a topic that is interesting/important, spend several months working on it, produce a 20-50 page paper, and then realize that you have no particular stakeholder(s) who find the work action-relevant. Advice: Try to brainstorm what specific individuals you would want to have affected by your piece. This might be some folks in the AI safety community. This might be government officials at a relevant agency in the US or the UK. Prioritize projects that have a clear target audience and prioritize projects in which you have a way of actually getting your paper/product to that target audience. Ideally, see if you can talk to representative members of your target audience in advance to see if you have a good understanding of what they might find useful. Caveat #1: Gaining expertise can be a valid reason to do research. Sometimes, the most important target audience is yourself. It may be worthwhile to take on a research project because you want to develop your expertise in a certain area. Even if the end product is not action-relevant for anyone, you might have reason to believe that your expertise will be valuable in the present or future. Caveat #2: Consider target audiences in the future. Some pieces do not have a target audience in the present, but they could be important in the future. This is particularly relevant when considering Overton Window shifts. It's quite plausible to me that we get at least one more major Overton Window shift in which governments become much more concerned about AI risks. There may even be critical periods (lasting only a few weeks or a few months) in which policymakers are trying to understand what to do. You probably won't have time to come up with a good plan in those weeks or months. Therefore, it seems like it could be valuable to do the kind of research now that helps us prepare for such future scenarios. Be specific about your end products Problem: A lot of junior researchers find tons of ideas exciting. You might have a junior researcher who is interested in a topic like "compute governance", "evals", or "open-sourcing." That's a good start. But if the research proposal is to "come up with gaps in the evals space" or "figure out what to do about open-source risks", there's a potential to spend several months thinking about high-level ideas and not actually producing anything concrete/specific It's common for junior researchers to overestimate the feasibility of tackling big/broad research questions. Advice: Try to be more specific about what you want your final products to look like. If it's important for you to have a finished research product (either because it would be directly useful or because of the educational/professional benefits of having the experience of completing a project), make sure you prioritize finishing something. If you're interested in lots of different projects, prioritize. For example, "I want to spend time on X, Y, and Z. X is the most important end product. I'll try to focus on finishing X, and I'll try not to spend much time on Y until X is finished or on track to be finished." Caveat #1: You don't need...
undefined
Jul 8, 2024 • 15min

LW - Dialogue introduction to Singular Learning Theory by Olli Järviniemi

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Dialogue introduction to Singular Learning Theory, published by Olli Järviniemi on July 8, 2024 on LessWrong. Alice: A lot of people are talking about Singular Learning Theory. Do you know what it is? Bob: I do. (pause) Kind of. Alice: Well, I don't. Explanation time? Bob: Uh, I'm not really an expert on it. You know, there's a lot of materials out there that Alice: that I realistically won't ever actually look at. Or, I've looked at them a little, but I still have basically no idea what's going on. Maybe if I watched a dozen hours of introductory lectures I'd start to understand it, but that's not currently happening. What I really want is a short overview of what's going on. That's self-contained. And easy to follow. Aimed at a non-expert. And which perfectly answers any questions I might have. So, I thought I'd ask you! Bob: Sorry, I'm actually really not Alice: Pleeeease? [pause] Bob: Ah, fine, I'll try. So, you might have heard of ML models being hard to interpret. Singular Learning Theory (SLT) is an approach for understanding models better. Or, that's one motivation, at least. Alice: And how's this different from a trillion other approaches to understanding AI? Bob: A core perspective of SLT is studying how the model develops during training. Contrast this to, say, mechanistic interpretability, which mostly looks at the fully trained model. SLT is also more concerned about higher level properties. As a half-baked analogue, you can imagine two approaches to studying how humans work: You could just open up a human and see what's inside. Or, you could notice that, hey, you have these babies, which grow up into children, go through puberty, et cetera, what's up with that? What are the different stages of development? Where do babies come from? And SLT is more like the second approach. Alice: This makes sense as a strategy, but I strongly suspect you don't currently know what an LLM's puberty looks like. Bob: (laughs) No, not yet. Alice: So what do you actually have? Bob: The SLT people have some quite solid theory, and some empirical work building on top of that. Maybe I'll start from the theory, and then cover some of the empirical work. Alice: (nods) I. Theoretical foundations Bob: So, as you know, nowadays the big models are trained with gradient descent. As you also know, there's more to AI than gradient descent. And for a moment we'll be looking at the Bayesian setting, not gradient descent. Alice: Elaborate on "Bayesian setting"? Bob: Imagine a standard deep learning setup, where you want your neural network to classify images, predict text or whatever. You want to find parameters for your network so that it has good performance. What do you do? The gradient descent approach is: Randomly initialize the parameters, then slightly tweak them on training examples in the direction of better performance. After a while your model is probably decent. The Bayesian approach is: Consider all possible settings of the parameters. Assign some prior to them. For each model, check how well they predict the correct labels on some training examples. Perform a Bayesian update on the prior. Then sample a model from the posterior. With lots of data you will probably obtain a decent model. Alice: Wait, isn't the Bayesian approach very expensive computationally? Bob: Totally! Or, if your network has 7 parameters, you can pull it off. If it has 7 billion, then no. There are way too many models, we can't do the updating, not even approximately. Nevertheless, we'll look at the Bayesian setting - it's theoretically much cleaner and easier to analyze. So forget about computational costs for a moment. Alice: Will the theoretical results also apply to gradient descent and real ML models, or be completely detached from practice? Bob: (winks) Alice: You know what, maybe I'll just let you t...
undefined
Jul 8, 2024 • 10min

LW - Pantheon Interface by NicholasKees

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Pantheon Interface, published by NicholasKees on July 8, 2024 on LessWrong. Pantheon is an experimental LLM interface exploring a different type of human-AI interaction. We created this as a part of the cyborgism project, with the abstract motivation of augmenting the human ability to think by integrating human and AI generated thoughts. How it works: 1. A human user "thinks out loud" by typing out their thoughts one at a time. This leaves a text trace of their stream of thought. 2. AI characters (called daemons) read this trace, and interact with the user by responding asynchronously with comments and questions. The core distinguishing philosophy is that, while most apps are about a human prompting an AI to do useful mental work, Pantheon is the opposite. Here, AI does the prompting, and the goal is for the AI generated questions or comments to cause the human user to think in ways they would not have on their own. At worst, the app is a rubber duck. At best, the app is a court of advisors, each using their own unique skills to push you to think your best thoughts. Pantheon can be found at pantheon.chat, and we would really appreciate any and all feedback you have. The app is set up for you to customize your own daemons. We have set up some default daemons to provide inspiration, but we expect the tool to be a lot more useful when they are customized to specific users. If the default daemons don't feel useful, we highly encourage you to try to make your own. How do I use Pantheon? First, go to settings and provide an OpenAI API key. Next, begin typing out your thoughts on some topic. It helps to keep each thought relatively short, sending them to the stream of thought as often as you can. This gives the daemons lots of opportunities to interject and offer their comments. Furthermore, it's usually best to treat this more like a diary or personal notes, rather than as a conversation. In this spirit, it's better not to wait for them to respond, but to instead continue your train of thought, keeping your focus on your own writing. What do the daemons see? Your stream of thought appears in the interface as a chain of individual thoughts. Daemons are called to respond to specific thoughts. When they do, they are given access to all preceding thoughts in the chain, up to and including the thought they were called to. Daemons can only see text the user has written, and they can't see any of the comments made by themselves or other daemons. We are looking into ways to give the daemons access to their own comment history, but we have not yet made this possible. After a daemon generates a comment, you can inspect the full chain of thought by clicking on that comment. This will open up a window which will show you everything the LLM saw in the process of generating that response. You can also edit the daemons in settings, as well as toggle them on or off. Trees, branching, and sections The text in the interface appears to you as a chain of thoughts, but it is actually a tree. If you hover over a thought, a plus icon will appear. If you click this icon, you can branch the chain. This is often useful if you feel that you have gone down a dead end, or would like to explore a tangent. When there are multiple branches, arrows will appear next to their parent thought, and you can use those arrows to navigate the tree. If you would like a fresh context, you can make an entirely new tree by opening the "Collection view" in the top left. Furthermore, you can also create a new "section" by clicking the "New Section" button below the input box. This will create a hard section break such that daemons can no longer see any context which came before the break. How do I save my progress? Everything you do is automatically saved in local storage. You can also import/export the full app state i...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app