The Nonlinear Library

The Nonlinear Fund
undefined
Jun 14, 2024 • 13min

EA - What "Effective Altruism" Means to Me by Richard Y Chappell

Richard Y Chappell discusses the importance of effective altruism in helping others and addresses misconceptions. He highlights the virtues of supporting global challenges and encourages constructive dialogue within the community.
undefined
Jun 14, 2024 • 3min

EA - Help Fund Insect Welfare Science by Bob Fischer

Bob Fischer, an advocate for insect welfare science, discusses the need for funding in insect welfare science. They delve into the establishment of the Arthropoda Foundation to support crucial research for the welfare of insects.
undefined
Jun 14, 2024 • 3min

EA - [Linkpost] An update from Good Ventures by Alexander Berger

Alexander Berger, Publisher of the update from Good Ventures, shares insights about their grantmaking decisions, exiting certain sub-causes, and focus on diversifying funding sources. They discuss the impact on grant recommendations, the shift towards partnerships, and the decision not to expand into new causes.
undefined
Jun 14, 2024 • 27min

AF - Shard Theory - is it true for humans? by ErisApprentice

Explore Shard Theory, a model of value formation based on experiences creating 'shards' influencing behavior. Delve into psychology, neuroscience, and comparisons to conditioning. Learn about human values shaped by context and heuristics, not genetic drives. Discover the science of reward systems, memory types, and consciousness in decision-making.
undefined
Jun 14, 2024 • 27min

AF - Shard Theory - is it true for humans? by Rishika Bose

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Shard Theory - is it true for humans?, published by Rishika Bose on June 14, 2024 on The AI Alignment Forum. And is it a good model for value learning in AI? (Read on Substack: https://recursingreflections.substack.com/p/shard-theory-is-it-true-for-humans) TLDR Shard theory proposes a view of value formation where experiences lead to the creation of context-based 'shards' that determine behaviour. Here, we go over psychological and neuroscientific views of learning, and find that while shard theory's emphasis on context bears similarity to types of learning such as conditioning, it does not address top-down influences that may decrease the locality of value-learning in the brain. What's Shard Theory (and why do we care)? In 2022, Quintin Pope and Alex Turner posted ' The shard theory of human values', where they described their view of how experiences shape the value we place on things. They give an example of a baby who enjoys drinking juice, and eventually learns that grabbing at the juice pouch, moving around to find the juice pouch, and modelling where the juice pouch might be, are all helpful steps in order to get to its reward. 'Human values', they say, 'are not e.g. an incredibly complicated, genetically hard-coded set of drives, but rather sets of contextually activated heuristics…' And since, like humans, AI is often trained with reinforcement learning, the same might apply to AI. The original post is long (over 7,000 words) and dense, but Lawrence Chan helpfully posted a condensation of the topic in ' Shard Theory in Nine Theses: a Distillation and Critical Appraisal'. In it, he presents nine (as might be expected) main points of shard theory, ending with the last thesis: 'shard theory as a model of human values'. 'I'm personally not super well versed in neuroscience or psychology', he says, 'so I can't personally attest to [its] solidity…I'd be interested in hearing from experts in these fields on this topic.' And that's exactly what we're here to do. A Crash Course on Human Learning Types of learning What is learning? A baby comes into the world and is inundated with sensory information of all kinds. From then on, it must process this information, take whatever's useful, and store it somehow for future use. There's various places in the brain where this information is stored, and for various purposes. Looking at these various types of storage, or memory, can help us understand what's going on: 3 types of memory We often group memory types by the length of time we hold on to them - 'working memory' (while you do some task), 'short-term memory' (maybe a few days, unless you revise or are reminded), and 'long-term memory' (effectively forever). Let's take a closer look at long-term memory: Types of long-term memory We can broadly split long-term memory into 'declarative' and 'nondeclarative'. Declarative memory is stuff you can talk about (or 'declare'): what the capital of your country is, what you ate for lunch yesterday, what made you read this essay. Nondeclarative covers the rest: a grab-bag of memory types including knowing how to ride a bike, getting habituated to a scent you've been smelling all day, and being motivated to do things you were previously rewarded for (like drinking sweet juice). For most of this essay, we'll be focusing on the last type: conditioning. Types of conditioning Conditioning Sometime in the 1890s, a physiologist named Ivan Pavlov was researching salivation using dogs. He would feed the dogs with powdered meat, and insert a tube into the cheek of each dog to measure their saliva.As expected, the dogs salivated when the food was in front of them. Unexpectedly, the dogs also salivated when they heard the footsteps of his assistant (who brought them their food). Fascinated by this, Pavlov started to play a metronome whenever he ...
undefined
Jun 14, 2024 • 15min

AF - Fine-tuning is not sufficient for capability elicitation by Theodore Chapman

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Fine-tuning is not sufficient for capability elicitation, published by Theodore Chapman on June 14, 2024 on The AI Alignment Forum. Produced as part of the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort under the supervision of Evan Hubinger. Acknowledgements: Thanks to Kyle Brady for his many contributions to this project. Abstract This post argues that the performance elicited by fine-tuning an LLM on a task using a given prompt format does not usefully bound the level of performance observed when the same information is presented in a different structure. Thus, fine-tuned performance provides very little information about the best performance that would be achieved by a large number of actors fine-tuning models with random prompting schemes in parallel. In particular, we find that we get much better results from fine-tuning gpt-3.5-turbo (ChatGPT 3.5) to play chess when the game so far is presented in a single block of SAN[1] than when the game so far is separated into a series of SAN moves presented as alternating user / assistant messages. The fact that this superficial formatting change is sufficient to change our fine-tuned performance serves to highlight that modern LLMs are much more fragile than they appear at first glance, even subject to fine-tuning. Introduction In the abstract, model evaluations identify a task and attempt to establish a bound on the level of performance that can be elicited from a given model with a given level of investment. The current state of the art is roughly: 1. Choose a reasonable prompting scheme 2. Generate a dataset of high-quality samples and encode them in the chosen format 3. Fine-tune the model and evaluate the resulting performance 4. Make some implicit regularity assumptions about the quality of models fine-tuned using different prompting schemes[1] 5. Conclude that probably no other actor can elicit substantially better performance on the same task from the same model while spending substantially less money than we did This post takes issue with step 4. We begin by illustrating the extreme brittleness of observed model performance when prompting without fine-tuning. Then we argue that fine-tuning is not sufficient to eliminate this effect. Using chess as a toy model, we show two classes of prompting schemes under which ChatGPT-3.5 converges to dramatically different levels of performance after fine-tuning. Our central conclusion is that the structure in which data is presented to an LLM (or at least to ChatGPT 3.5) matters more than one might intuitively expect and that this effect persists through fine-tuning. In the specific case of chess, the better prompting scheme that we use (described in the section below) is easily derived but in situations that are further out of distribution (such as the automated replication and adaptation tasks METR defined), it is not obvious what the best way to present information is, and it seems plausible that there are simple prompt formats which would result in substantially better performance than those that we've tested to date. General Setting We use the term 'agent' to refer to the combination of a model - here gpt-3.5-turbo unless otherwise specified - and a function which takes a chess position as input and outputs the document we feed into the model (henceforth a 'prompting scheme'). We perform our evaluations using three datasets of chess games: 1. A collection of ~6000 games played by humans on Lichess with at least 30 minutes for each player 2. A collection of ~500 games played between all pairings of stockfish 16 level 1, 5, 10, 15, and 20 3. A collection of ~300 games played by ChatGPT 3.5 or gpt-3.5-turbo-instruct with various prompting schemes We evaluate our agents by selecting a random point in each of the games, providing the current game position as...
undefined
Jun 14, 2024 • 5min

EA - Be Proud To Be An Effective Altruist by Omnizoid

Explore the importance of embracing the identity of an effective altruist and the reactions within the community towards criticism and anti-effective altruism arguments. Discover the role of active altruists in defending effective altruism and addressing criticisms to drive positive change globally.
undefined
Jun 14, 2024 • 1h 19min

LW - AI #68: Remarkably Reasonable Reactions by Zvi

Writer from the Rationalist and EA communities, Zvi, discusses Apple Intelligence integration, debates over SB 1047, and quotes from Leopold Aschenbrenner's Situational Awareness Paper. Other topics include AI alignment, energy consumption, false claims in AI advocacy, and perspectives on AI regulation.
undefined
Jun 14, 2024 • 12min

EA - Maybe let the non-EA world train you by ElliotT

ElliotT, a writer for the Effective Altruism community, discusses the challenges of securing a job at an EA organization post-university. He explores the benefits of starting at non-EA organizations to build skills, stability, and readiness for impactful roles later on. The podcast delves into the transition from dream EA jobs to developing career capital and the importance of navigating career choices to maximize impact.
undefined
Jun 14, 2024 • 1min

LW - OpenAI appoints Retired U.S. Army General Paul M. Nakasone to Board of Directors by Joel Burget

Retired U.S. Army General Paul M. Nakasone, a cybersecurity expert, joins OpenAI's Board of Directors to advise on safety and security for AI projects. His appointment underscores the importance of cybersecurity in the AI industry.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app