The Nonlinear Library

The Nonlinear Fund
undefined
Jul 14, 2024 • 26min

EA - Exploring Noise in Charity Evaluations by Malin Ploder

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Exploring Noise in Charity Evaluations, published by Malin Ploder on July 14, 2024 on The Effective Altruism Forum. Hi! My name is Malin and I wrote my master's thesis in cognitive science in collaboration with Don Efficace, a young evaluator organization building their evaluation process to find the most effective charities in France. Together, we set out to explore the concept of noise (see below) in charity evaluations. Many researchers from other evaluator organizations contributed to this endeavor by responding to my survey or participating in interviews. This post serves to summarize my research for them and anyone else who is interested in the topic - have a good read! TL:DR Noise, as defined by Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein[1], refers to the unwanted variability in judgments caused by cognitive biases. In charity evaluations, this inconsistency can lead to unreliable recommendations, which can significantly affect the allocation of funds and erode donor trust. Given the complex nature of charity evaluations, noise is likely to occur, making it crucial to address in order to ensure consistent and effective decision-making. Several strategies from other fields have been found effective in reducing noise and can be adapted for charity evaluations: 1. Implement Decision Guidelines and Scales: Break down evaluations into clear criteria. Use scales with anchors and descriptors for consistent assessments. Consider comparative scales to reduce bias in subjective judgments. 2. Adopt Aggregation Strategies: Encourage multiple independent estimates from researchers for cost-effectiveness analyses to improve accuracy. Alternatively, use the options adapted to individuals, where two guesses from the same person are averaged. 3. Use the Mini-Delphi Method: Structure discussions around initial independent estimates, followed by collective deliberation and revised judgments. Future research should focus on measuring noise levels in charity evaluations and testing these strategies' effectiveness. Collaborating with other evaluator organizations can provide valuable insights and help design low-noise processes. Introduction: Noise In the context of my master's thesis, I explored the role of "noise" in charity evaluations. In the context of decision-making, the term noise was popularized by Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein[1]. Their work has significantly advanced the application of cognitive sciences to real-life scenarios by demonstrating some of the tangible impacts cognitive biases can have on decision-making. Specifically, they show how cognitive biases may lead to unwanted variability in judgments, which they call noise. I conducted three studies, a review of online information, a study, and interviews to investigate how noise-reduction strategies from the literature could apply to charity evaluations and which recommendations can be derived for Don Efficace. In this text, I summarize my findings as they may be relevant to charity evaluators. If you want to know more, I invite you to read my thesis as well as "Noise: a flaw in human judgment" by Kahneman et al.[1]. The text will be structured as follows: First I introduce noise and why it matters in charity evaluations. Then I will present strategies that have been found to reduce noise in other fields that involve complex judgments, like judicial sentencing, medical diagnoses, or hiring decisions. For each of the strategies, I add the results from my research, setting them into the charity evaluation context. Lastly, I will give an outlook on what future research efforts in the field may look like. Noise in Charity Evaluations You may be familiar with cognitive biases like the confirmation bias, the halo effect, desirability bias, or the anchoring effect and how they can predi...
undefined
Jul 14, 2024 • 36min

AF - An Introduction to Representation Engineering - an activation-based paradigm for controlling LLMs by Jan Wehner

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: An Introduction to Representation Engineering - an activation-based paradigm for controlling LLMs, published by Jan Wehner on July 14, 2024 on The AI Alignment Forum. Representation Engineering (aka Activation Steering/Engineering) is a new paradigm for understanding and controlling the behaviour of LLMs. Instead of changing the prompt or weights of the LLM, it does this by directly intervening on the activations of the network during a forward pass. Furthermore, it improves our ability to interpret representations within networks and to detect the formation and use of concepts during inference. This post serves as an introduction to Representation Engineering (RE). We explain the core techniques, survey the literature, contrast RE to related techniques, hypothesise why it works, argue how it's helpful for AI Safety and lay out some research frontiers. Disclaimer: I am no expert in the area, claims are based on a ~3 weeks Deep Dive into the topic. What is Representation Engineering? Goals Representation Engineering is a set of methods to understand and control the behaviour of LLMs. This is done by first identifying a linear direction in the activations that are related to a specific concept [3], a type of behaviour, or a function, which we call the concept vector. During the forward pass, the similarity of activations to the concept vector can help to detect the presence or absence of this concept direction. Furthermore, the concept vector can be added to the activations during the forward pass to steer the behaviour of the LLM towards the concept direction. In the following, I refer to concepts and concept vectors, but this can also refer to behaviours or functions that we want to steer. This presents a new approach for interpreting NNs on the level of internal representations, instead of studying outputs or mechanistically analysing the network. This top-down frame of analysis might pose a solution to problems such as detecting deceptive behaviour or identifying harmful representations without a need to mechanistically understand the model in terms of low-level circuits [4, 24]. For example, RE has been used as a lie detector [6] and for detecting jailbreak attacks [11]. Furthermore, it offers a novel way to control the behaviour of LLMs. Whereas current approaches for aligning LLM behaviour control the weights (fine-tuning) or the inputs (prompting), Representation Engineering directly intervenes on the activations during the forward pass allowing for efficient and fine-grained control. This is broadly applicable for example for reducing sycophancy [17] or aligning LLMs with human preferences [19]. This method operates at the level of representations. This refers to the vectors in the activations of an LLM that are associated with a concept, behaviour or task. Golechha and Dao [24] as well as Zou et al. [4] argue that interpreting representations is a more effective paradigm for understanding and aligning LLMs than the circuit-level analysis popular in Mechanistic Interpretability (MI). This is because MI might not be scalable for understanding large, complex systems, while RE allows the study of emergent structures in LLMs that can be distributed. Method Methods for Representation Engineering have two important parts, Reading and Steering. Representation Reading derives a vector from the activations that capture how the model represents human-aligned concepts like honesty and Representation Steering changes the activations with that vector to suppress or promote that concept in the outputs. For Representation Reading one needs to design inputs, read out the activations and derive the vector representing a concept of interest from those activations. Firstly one devises inputs that contrast each other wrt the concept. For example, the prompts might encourage the m...
undefined
Jul 14, 2024 • 9min

EA - Adverse Selection In Minimizing Cost Per Life Saved by vaishnav

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Adverse Selection In Minimizing Cost Per Life Saved, published by vaishnav on July 14, 2024 on The Effective Altruism Forum. GiveWell, and the EA community at large, often emphasize the "cost of saving a life" as a key metric, $5,000 being the most commonly cited approximation. At first glance, GiveWell might seem to be in the business of finding the cheapest lives that can be saved, and then saving them. More precisely, GiveWell is in the business of finding the cheapest DALY it can buy. But implicit in that is the assumption that all DALYs are equal, or that disability or health effects are the only factors that we need to adjust for while assessing the value of a life year.. However, If DALYs vary significantly in quality (as I'll argue and GiveWell acknowledges we have substantial evidence for), then simply minimizing the cost of buying a DALY risks adverse selection. It's indisputable that each dollar goes much further in the poorest parts of the world. But it goes further towards saving lives in one the poorest parts of the world, often countries with terrible political institutions, fewer individual freedoms and oppressive social norms. More importantly, these conditions are not exogenous to the cost of saving a life. They are precisely what drive that cost down. Most EAs won't need convincing of the fact that the average life in New Zealand is much, much better than the average life in the Democratic Republic of Congo. In fact, those of us who donate to GiveDirectly do so precisely because this is the case. Extreme poverty and the suffering it entails is worth alleviating, wherever it can be found. But acknowledging this contradicts the notion that while saving lives, philanthropists are suddenly in no position to make judgements on how anything but physical disability affects the value/quality of life. To be clear, GiveWell won't be shocked by anything I've said so far. They've commissioned work and published reports on this. But as you might expect, these quality of life adjustments wouldnt feature in GiveWell's calculations anyway, since the pitch to donors is about the price paid for a life, or a DALY. But the idea that life is worse in poorer countries significantly understates the problem - that the project of minimizing the cost of lives saved while making no adjustments for the quality of lives said will systematically bias you towards saving the lives least worth living. In advanced economies, prosperity is downstream of institutions that preserve the rule of law, guarantee basic individual freedoms, prevent the political class from raiding the country, etc. Except for the Gulf Monarchies, there are no countries that have delivered prosperity for their citizens who don't at least do this. This doesn't need to take the form of liberal democracy; countries like China and Singapore are more authoritarian but the political institutions are largely non-corrupt, preserve the will of the people, and enable the creation of wealth and development of human capital. One can't say this about the countries in sub Saharan Africa. High rates of preventable death and disease in these countries are symptoms of institutional dysfunction that touches every facet of life. The reason it's so cheap to save a life in these countries is also because of low hanging fruit that political institutions in these countries somehow managed to stand in the way of. And one has to consider all the ways in which this bad equilibrium touches the ability to live a good life. More controversially, these political institutions aren't just levitating above local culture and customs. They interact and shape each other. The oppressive conditions that women (50% of the population) and other sexual minorities face in these countries isn't a detail that we can gloss over. If you are both a liber...
undefined
Jul 14, 2024 • 7min

LW - Brief notes on the Wikipedia game by Olli Järviniemi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Brief notes on the Wikipedia game, published by Olli Järviniemi on July 14, 2024 on LessWrong. Alex Turner introduced an exercise to test subjects' ability to notice falsehoods: change factual statements in Wikipedia articles, hand the edited articles to subjects and see whether they notice the modifications. I've spent a few hours making such modifications and testing the articles on my friend group. You can find the articles here. I describe my observations and thoughts below. The bottom line: it is hard to come up with good modifications / articles to modify, and this is the biggest crux for me. The concept Alex Turner explains the idea well here. The post is short, so I'm just copying it here: Rationality exercise: Take a set of Wikipedia articles on topics which trainees are somewhat familiar with, and then randomly select a small number of claims to negate (negating the immediate context as well, so that you can't just syntactically discover which claims were negated). For example: "By the time they are born, infants can recognize and have a preference for their mother's voice suggesting some prenatal development of auditory perception." > modified to "Contrary to early theories, newborn infants are not particularly adept at picking out their mother's voice from other voices. This suggests the absence of prenatal development of auditory perception." Sometimes, trainees will be given a totally unmodified article. For brevity, the articles can be trimmed of irrelevant sections. Benefits: Addressing key rationality skills. Noticing confusion; being more confused by fiction than fact; actually checking claims against your models of the world. If you fail, either the article wasn't negated skillfully ("5 people died in 2021" -> "4 people died in 2021" is not the right kind of modification), you don't have good models of the domain, or you didn't pay enough attention to your confusion. Either of the last two are good to learn. Features of good modifications What does a good modification look like? Let's start by exploring some failure modes. Consider the following modifications: "World War II or the Second World War (1 September 1939 - 2 September 1945) was..." -> "World War II or the Second World War (31 August 1939 - 2 September 1945) was... "In the wake of Axis defeat, Germany, Austria, Japan and Korea were occupied" -> "In the wake of Allies defeat, United States, France and Great Britain were occupied" "Operation Barbarossa was the invasion of the Soviet Union by..." -> "Operation Bergenstein was the invasion of the Soviet Union by..." Needless to say, these are obviously poor changes for more than one reason. Doing something which is not that, one gets at least the following desiderata for a good change: The modifications shouldn't be too obvious nor too subtle; both failure and success should be realistic outcomes. The modification should have implications, rather than being an isolated fact, test of memorization or a mere change of labels. The "intended solution" is based on general understanding of a topic, rather than memorization. The change "The world population is 8 billion" "The world population is 800,000" definitely has implications, and you could indirectly infer that the claim is false, but in practice people would think "I've previously read that the world population is 8 billion. This article gives a different number. This article is wrong." Thus, this is a bad change. Finally, let me add: The topic is of general interest and importance. While the focus is on general rationality skills rather than object-level information, I think you get better examples by having interesting and important topics, rather than something obscure. Informally, an excellent modification is such that it'd just be very silly to actually believe the false claim made, in t...
undefined
Jul 14, 2024 • 52sec

EA - 10 Percent Pledge by Michael 2358

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: 10 Percent Pledge, published by Michael 2358 on July 14, 2024 on The Effective Altruism Forum. 4 years ago GWWC announced that 5,000 people had signed the pledge to donate 10% of annual income to effective charities. I am surprised that number has not doubled since then. For EAs who have not yet taken the pledge, I am curious why. Separate but related, if you have taken the pledge, but are not using the new diamond symbol to promote it, I am curious why. I have been surprised to see that people who had advocated for giving 10 percent to effective charities have not been using the , but maybe it's because they have not yet gotten around to doing it. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Jul 13, 2024 • 21min

AF - Stitching SAEs of different sizes by Bart Bussmann

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Stitching SAEs of different sizes, published by Bart Bussmann on July 13, 2024 on The AI Alignment Forum. Work done in Neel Nanda's stream of MATS 6.0, equal contribution by Bart Bussmann and Patrick Leask, Patrick Leask is concurrently a PhD candidate at Durham University TL;DR: When you scale up an SAE, the features in the larger SAE can be categorized in two groups: 1) "novel features" with new information not in the small SAE and 2) "reconstruction features" that sparsify information that already exists in the small SAE. You can stitch SAEs by adding the novel features to the smaller SAE. Introduction Sparse autoencoders (SAEs) have been shown to recover sparse, monosemantic features from language models. However, there has been limited research into how those features vary with dictionary size, that is, when you take the same activation in the same model and train a wider dictionary on it, what changes? And how do the features learned vary? We show that features in larger SAEs cluster into two kinds of features: those that capture similar information to the smaller SAE (either identical features, or split features; about 65%), and those which capture novel features absent in the smaller mode (the remaining 35%). We validate this by showing that inserting the novel features from the larger SAE into the smaller SAE boosts the reconstruction performance, while inserting the similar features makes performance worse. Building on this insight, we show how features from multiple SAEs of different sizes can be combined to create a "Frankenstein" model that outperforms SAEs with an equal number of features, though tends to lead to higher L0, making a fair comparison difficult. Our work provides new understanding of how SAE dictionary size impacts the learned feature space, and how to reason about whether to train a wider SAE. We hope that this method may also lead to a practically useful way of training high-performance SAEs with less feature splitting and a wider range of learned novel features. Larger SAEs learn both similar and entirely novel features Set-up We use sparse autoencoders as in Towards Monosemanticity and Sparse Autoencoders Find Highly Interpretable Directions. In our setup, the feature activations are computed as: Based on these feature activations, the input is then reconstructed as The encoder and decoder matrices and biases are trained with a loss function that combines an L2 penalty on the reconstruction loss and an L1 penalty on the feature activations: In our experiments, we train a range of sparse autoencoders (SAEs) with varying widths across residual streams in GPT-2 and Pythia-410m. The width of an SAE is determined by the number of features (F) in the sparse autoencoder. Our smallest SAE on GPT-2 consists of only 768 features, while the largest one has nearly 100,000 features. Here is the full list of SAEs used in this research: Name Model site Dictionary size L0 MSE CE Loss Recovered from zero ablation CE Loss Recovered from mean ablation GPT2-768 gpt2-small layer 8 of 12 resid_pre 768 35.2 2.72 0.915 0.876 GPT2-1536 gpt2-small layer 8 of 12 resid_pre 1536 39.5 2.22 0.942 0.915 GPT2-3072 gpt2-small layer 8 of 12 resid_pre 3072 42.4 1.89 0.955 0.937 GPT2-6144 gpt2-small layer 8 of 12 resid_pre 6144 43.8 1.631 0.965 0.949 GPT2-12288 gpt2-small layer 8 of 12 resid_pre 12288 43.9 1.456 0.971 0.958 GPT2-24576 gpt2-small layer 8 of 12 resid_pre 24576 42.9 1.331 0.975 0.963 GPT2-49152 gpt2-small layer 8 of 12 resid_pre 49152 42.4 1.210 0.978 0.967 GPT2-98304 gpt2-small layer 8 of 12 resid_pre 98304 43.9 1.144 0.980 0.970 Pythia-8192 Pythia-410M-deduped layer 3 of 24 resid_pre 8192 51.0 0.030 0.977 0.972 Pythia-16384 Pythia-410M-deduped layer 3 of 24 resid_pre 16384 43.2 0.024 0.983 0.979 The base language models used are those included in Transform...
undefined
Jul 13, 2024 • 12min

AF - A simple case for extreme inner misalignment by Richard Ngo

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A simple case for extreme inner misalignment, published by Richard Ngo on July 13, 2024 on The AI Alignment Forum. This post is the version of Yudkowsky's argument for inner misalignment that I wish I'd had in my head a few years ago. I don't claim that it's novel, that I endorse it, or even that Yudkowsky would endorse it; it's primarily an attempt to map his ideas into an ontology that makes sense to me (and hopefully others). This post is formulated in terms of three premises, which I explore in turn. My arguments deliberately gloss over some nuances and possible objections; in a follow-up post, I'll explore three of them. In a third post I'll dig into the objection I find most compelling, and outline a research agenda that aims to flesh it out into a paradigm for thinking about cognition more generally, which I'm calling coalitional agency. Background An early thought experiment illustrating the possibility of misaligned AI is the "paperclip maximizer", an AI with the sole goal of creating as many paperclips as possible. This thought experiment has often been used to describe outer misalignment - e.g. a case where the AI was given the goal of making paperclips. However, Yudkowsky claims that his original version was intended to refer to an inner alignment failure in which an AI developed the goal of producing " tiny molecules shaped like paperclips" (with that specific shape being an arbitrary example unrelated to human paperclips). So instead of referring to paperclip maximizers, I'll follow Yudkowsky's more recent renaming and talk about " squiggle maximizers": AIs that attempt to fill the universe with some very low-level pattern that's meaningless to humans (e.g. "molecular squiggles" of a certain shape). I'll argue for the plausibility of squiggle-maximizers via three claims: 1. Increasing intelligence requires compressing representations; and 2. The simplest goals are highly decomposable broadly-scoped utility functions; therefore 3. Increasingly intelligent AIs will converge towards squiggle-maximization. In this post I'll explore each of these in turn. I'll primarily aim to make the positive case in this post; if you have an objection that I don't mention here, I may discuss it in the next post. Increasing intelligence requires compressing representations There's no consensus definition of intelligence, but one definition that captures the key idea in my mind is: the ability to discover and take advantage of patterns in the world. When you look at a grid of pixels and recognize a cat, or look at a string of characters and recognize a poem, you're doing a type of pattern-recognition. Higher-level patterns include scientific laws, statistical trendlines, theory of mind, etc. Discovering such patterns allows an agent to represent real-world information in a simpler way: instead of storing every pixel or every character, they can store higher-level patterns along with whichever low-level details don't fit the pattern. This is (at a high level) also how compression algorithms work. The thesis that intelligence is about compression has most prominently been advocated by Marcus Hutter, who formulated AIXI and created a prize for text compression. The enormous success of the self-supervised learning paradigm a few decades later is a vindication of his ideas (see also this talk by llya Sutskever exploring the link between them). However, we shouldn't interpret this thesis merely as a claim about self-supervised learning. We can be agnostic about whether compression primarily occurs via self-supervised learning, or fine-tuning, or regularization, or meta-learning, or directed exploration, or chain-of-thought, or new techniques that we don't have yet. Instead we should take it as a higher-level constraint on agents: if agents are intelligent, then they must consiste...
undefined
Jul 13, 2024 • 1h 6min

LW - Robin Hanson AI X-Risk Debate - Highlights and Analysis by Liron

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Robin Hanson AI X-Risk Debate - Highlights and Analysis, published by Liron on July 13, 2024 on LessWrong. This linkpost contains a lightly-edited transcript of highlights of my recent AI x-risk debate with Robin Hanson, and a written version of what I said in the post-debate analysis episode of my Doom Debates podcast. Introduction I've poured over my recent 2-hour AI x-risk debate with Robin Hanson to clip the highlights and write up a post-debate analysis, including new arguments I thought of after the debate was over. I've read everybody's feedback on YouTube and Twitter, and the consensus seems to be that it was a good debate. There were many topics brought up that were kind of deep cuts into stuff that Robin says. On the critical side, people were saying that it came off more like an interview than a debate. I asked Robin a lot of questions about how he sees the world and I didn't "nail" him. And people were saying I wasn't quite as tough and forceful as I am on other guests. That's good feedback; I think it could have been maybe a little bit less of a interview, maybe a bit more about my own position, which is also something that Robin pointed out at the end. There's a reason why the Robin Hanson debate felt more like an interview. Let me explain: Most people I debate have to do a lot of thinking on the spot because their position just isn't grounded in that many connected beliefs. They have like a few beliefs. They haven't thought that much about it. When I raise a question, they have to think about the answer for the first time. And usually their answer is weak. So what often happens, my usual MO, is I come in like Kirby. You know, the Nintendo character where I first have to suck up the other person's position, and pass their Ideological Turing test. (Speaking of which, I actually did an elaborate Robin Hanson Ideological Turing Test exercise beforehand, but it wasn't quite enough to fully anticipate the real Robin's answers.) With a normal guest, it doesn't take me that long because their position is pretty compact; I can kind of make it up the same way that they can. With Robin Hanson, I come in as Kirby. He comes in as a pufferfish. So his position is actually quite complex, connected to a lot of different supporting beliefs. And I asked him about one thing and he's like, ah, well, look at this study. He's got like a whole reinforced lattice of all these different claims and beliefs. I just wanted to make sure that I saw what it is that I'm arguing against. I was aiming to make this the authoritative followup to the 2008 Foom Debate that he had on Overcoming Bias with Eliezer Yudkowsky. I wanted to kind of add another chapter to that, potentially a final chapter, cause I don't know how many more of these debates he wants to do. I think Eliezer has thrown in the towel on debating Robin again. I think he's already said what he wants to say. Another thing I noticed going back over the debate is that the arguments I gave over the debate were like 60% of what I could do if I could stop time. I wasn't at 100% and that's simply because realtime debates are hard. You have to think of exactly what you're going to say in realtime. And you have to move the conversation to the right place and you have to hear what the other person is saying. And if there's a logical flaw, you have to narrow down that logical flaw in like five seconds. So it is kind of hard-mode to answer in realtime. I don't mind it. I'm not complaining. I think realtime is still a good format. I think Robin himself didn't have a problem answering me in realtime. But I did notice that when I went back over the debate, and I actually spent five hours on this, I was able to craft significantly better counterarguments to the stuff that Robin was saying, mostly just because I had time to understand it i...
undefined
Jul 13, 2024 • 37min

LW - Sherlockian Abduction Master List by Cole Wyeth

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Sherlockian Abduction Master List, published by Cole Wyeth on July 13, 2024 on LessWrong. [Radically updated with many new entries around 07/10/24] Epistemic status: The List has been tested in the real world by me (with mixed results) and extensively checked for errors by many commenters. The Background section is mostly speculation and anecdotes, feel free to skip to The List once you understand its reason for existence. tldr: This is a curated list of observable details about a person's appearance that indicate something useful/surprising about them. Ideally, studying this list will be an efficient way to cultivate more insightful observational/abductive abilities, approaching the fictional example of Sherlock Holmes. Please contribute in the comments section after reading the Rules. Background Is it possible to develop observational abilities comparable to Sherlock Holmes? Sir Arthur Conan Doyle's fictional detective has many enviable skills, including mastery of disguise and some expertise at unarmed combat, as well as generally being a genius, but we will focus primarily on his more well known observational power. Though Holmes is often described as a master of logical "deduction," this power is better described as (possibly superhuman) abduction. That is, Holmes perceives tiny details that many people would miss, then constructs explanations for those details. By reasoning through the interacting implications of these explanations, he is able to make inferences that seem impossible to those around him. The final step is actually deductive, but the first two are perhaps more interesting. Holmes' ability to perceive more than others does seem somewhat realistic; it is always possible to actively improve one's situational awareness, at least on a short term basis, simply by focusing on one's surroundings. The trick seems to be the second step, where Holmes is able to work backwards from cause to effect, often leveraging slightly obscure knowledge about a wide variety of topics. I spent several of my naive teenage years trying to become more like Holmes. I carefully examined people's shoes (often I actually requested that the shoes be handed over) for numerous features: mud and dirt from walking outside, the apparent price of the shoe, the level of wear and tear, and more specifically the distribution of wear between heel and toe (hoping to distinguish sprinters and joggers), etc. I "read palms," studying the subtle variations between biking and weightlifting calluses. I looked for ink stains and such on sleeves (this works better in fiction than reality). I'm pretty sure I even smelled people. None of this worked particularly well. I did come up with some impressive seeming "deductions," but I made so many mistakes that these may have been entirely chance. There were various obstacles. First, it is time consuming and slightly awkward to stare at everyone you meet from head to toe. I think there are real tradeoffs here; you have only so much total attention, and by spending more on observing your surroundings, you have less left over to think. Certainly it is not possible to read a textbook at the same time, so practicing your observational techniques comes at a cost. Perhaps it becomes more habitual and easier over time, but I am not convinced it ever comes for free. Second, the reliability of inferences decays quickly with the number of steps involved. Many of Holmes' most impressive "deductions" come from combining his projected explanations for several details into one cohesive story (perhaps using some of them to rule out alternative explanations for the others) and drawing highly non-obvious, shocking conclusions from this story. In practice, one of the explanations is usually wrong, the entire story is base on false premises, and the conclusions are only sh...
undefined
Jul 12, 2024 • 4min

AF - Timaeus is hiring! by Jesse Hoogland

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Timaeus is hiring!, published by Jesse Hoogland on July 12, 2024 on The AI Alignment Forum. TLDR: We're hiring two research assistants to work on advancing developmental interpretability and other applications of singular learning theory to alignment. About Us Timaeus's mission is to empower humanity by making breakthrough scientific progress on alignment. Our research focuses on applications of singular learning theory to foundational problems within alignment, such as interpretability (via "developmental interpretability"), out-of-distribution generalization (via "structural generalization"), and inductive biases (via "geometry of program synthesis"). Our team spans Melbourne, the Bay Area, London, and Amsterdam, collaborating remotely to tackle some of the most pressing challenges in AI safety. For more information on our research and the position, see our Manifund application, this update from a few months ago, our previous hiring call and this advice for applicants. Position Details Title: Research Assistant Location: Remote Duration: 6-month contract with potential for extension. Compensation: Starting at $35 USD per hour as a contractor (no benefits). Start Date: Starting as soon as possible. Key Responsibilities Conduct experiments using PyTorch/JAX on language models ranging from small toy systems to billion-parameter models. Collaborate closely with a team of 2-4 researchers. Document and present research findings. Contribute to research papers, reports, and presentations. Maintain detailed research logs. Assist with the development and maintenance of codebases and repositories. Projects As a research assistant, you would likely work on one of the following two projects/research directions (this is subject to change): Devinterp of language models: (1) Continue scaling up techniques like local learning coefficient (LLC) estimation to larger models to study the development of LLMs in the 1-10B range. (2) Work on validating the next generations of SLT-derived techniques such as restricted LLC estimation and certain kinds of weight- and data-correlational analysis. This builds towards a suite of SLT-derived tools for automatically identifying and analyzing structure in neural networks. SLT of safety fine-tuning: Investigate the use of restricted LLCs as a tool for measuring (1) reversibility of safety fine-tuning and (2) susceptibility to jailbreaking. Having now validated many of our predictions around SLT, we are now working hard to make contact with real-world safety questions as quickly as possible. See our recent Manifund application for a more in-depth description of this research. Qualifications Strong Python programming skills, especially with PyTorch and/or JAX. Strong ML engineering skills (you should have completed at least the equivalent of a course like ARENA). Excellent communication skills. Ability to work independently in a remote setting. Passion for AI safety and alignment research. A Bachelor's degree or higher in a technical field is highly desirable. Full-time availability. Bonus: Familiarity with using LLMs in your workflow is not necessary but a major plus. Knowledge of SLT and Developmental Interpretability is not required, but is a plus. Application Process Interested candidates should submit their applications by July 31st. Promising candidates will be invited for an interview consisting of: 1. A 30-minute background interview, and 2. A 30-minute research-coding interview to assess problem-solving skills in a realistic setting (i.e., you will be allowed expected to use LLMs and whatever else you can come up with). Apply Now To apply, please submit your resume, write a brief statement of interest, and answer a few quick questions here. Join us in shaping the future of AI alignment research. Apply now to be part of the Timaeus team! Than...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app