The Nonlinear Library cover image

The Nonlinear Library

Latest episodes

undefined
Sep 9, 2024 • 7min

LW - My takes on SB-1047 by leogao

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My takes on SB-1047, published by leogao on September 9, 2024 on LessWrong. I recently decided to sign a letter of support for SB 1047. Before deciding whether to do so, I felt it was important for me to develop an independent opinion on whether the bill was good, as opposed to deferring to the opinions of those around me, so I read through the full text of SB 1047. After forming my opinion, I checked my understanding of tort law basics (definitions of "reasonable care" and "materially contribute") with a law professor who was recommended to me by one of the SB 1047 sponsors, but who was not directly involved in the drafting or lobbying for the bill. Ideally I would have wanted to consult with a completely independent lawyer, but this would have been prohibitively expensive and difficult on a tight timeline. This post outlines my current understanding. It is not legal advice. My main impression of the final version of SB 1047 is that it is quite mild. Its obligations only cover models trained with $100M+ of compute, or finetuned with $10M+ of compute. [1] If a developer is training a covered model, they have to write an SSP, that explains why they believe it is not possible to use the model (or a post-train/finetune of the model costing This would involve running evals, doing red teaming, etc. The SSP also has to describe what circumstances would cause the developer to decide to shut down training and any copies of the model that the developer controls, and how they will ensure that they can actually do so if needed. Finally, a redacted copy of the SSP must be made available to the public (and an unredacted copy filed with the Attorney General). This doesn't seem super burdensome, and is very similar to what labs are already doing voluntarily, but it seems good to codify these things because otherwise labs could stop doing them in the future. Also, current SSPs don't make hard commitments about when to actually stop training, so it would be good to have that. If a critical harm happens, then the question for determining penalties is whether the developer met their duty to exercise "reasonable care" to prevent models from "materially contributing" to the critical harm. This is determined by looking at how good the SSP was (both in an absolute sense and when compared to other developers) and how closely it was adhered to in practice. Reasonable care is a well-established concept in tort law that basically means you did a cost benefit analysis that a reasonable person would have done. Importantly, it doesn't mean the developer has to be absolutely certain that nothing bad can happen. For example, suppose you release an open source model after doing dangerous capabilities evals to make sure it can't make a bioweapon, but then a few years later a breakthrough in scaffolding methods happens and someone makes a bioweapon using your model - as long as you were thorough in your dangerous capabilities evals you would not be liable, because it would not have been reasonable for you to anticipate that someone would make a breakthrough that invalidates your evaluations. Also, if mitigating the risk would be too costly, and the benefit of releasing the model far outweighs the risks of release, this is also a valid reason not to mitigate the risk under the standard of reasonable care (e.g the benefits of driving a car at a normal speed far outweigh the costs of car accidents; so reasonable care doesn't require driving at 2 mph to fully mitigate the risk of car accidents). My personal opinion is I think the reasonable care standard is too weak to prevent AI from killing everyone. However, this also means that I think people opposing the current version of the bill because of the reasonable care requireme...
undefined
Sep 9, 2024 • 13min

LW - Fun With CellxGene by sarahconstantin

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Fun With CellxGene, published by sarahconstantin on September 9, 2024 on LessWrong. For this week's post, I thought I'd mess around a bit with the CellXGene tool provided by the Chan Zuckerberg Institute. It's based on a big dataset of individual cells, classified by tissue, cell type, and disease state, and their gene expression profiles (single-cell RNA counts). You can automatically compare how gene expression looks different between sick and healthy individuals, for a variety of diseases, and drill down into which cells/tissues are different and how. It's a fascinating toy and a great way to generate hypotheses. Here, I'll do it for Alzheimer's, comparing 138,438 Alzheimer's brain cells to 9,203,998 normal/healthy brain cells to see what the most "differentially expressed" genes are, and what that might tell us about how the disease works. Top Hits LINC01609 1.6x overexpressed in Alzheimer's, d =4.203 This is a non-protein coding RNA. Typically most expressed in the testis. In CellxGene's healthy brain cells, it's expressed only in activated microglia and astrocytes; but in the Alzheimer's brain, it's expressed in roughly half of all types of cells. Like many long non-coding RNAs, its function is unknown. SLC26A3 10.6x overexpressed in Alzheimer's, d = 3.310 This is a chloride anion exchanger, a membrane protein that transports chloride ions across the cell membrane. It's most heavily expressed in the colon, where it controls the resorption of fluid from the intestines. Defects in this gene are associated with congenital diarrhea, as the body is unable to maintain the right osmotic concentration and loses water in the stool. But we're interested in SLC26A3 in the brain, not in the intestine. In the healthy brain, once again, it's only expressed in activated astrocytes and microglia; in the Alzheimer's brain it's expressed in large numbers of all cell types. CellxGene classifies it as one of the top "markers" for mature astrocytes and mature microglial cells, with a specificity of 1.00. Other researchers have observed the upregulation of SLC26A3 in Alzheimer's, e.g. as part of a pattern of "gliovascular" alteration around the clusters of astrocytes and endothelial cells that control the blood-brain barrier.1 A gliovascular unit is the place a blood vessel meets the brain. The vessel is surrounded by astrocytes and microglia, which control what goes in and out of the bloodstream, clearing excess glutamate and misfolded proteins. Under prolonged stress, these astrocytes in gliovascular units become reactive, and ultimately the blood-brain barrier breaks down. In Alzheimer's disease, the blood vessels get narrower, fragment, and break.2 Activated astrocytes no longer connect as tightly to the surface of the vessels with their "endfeet", compromising the BBB, while activated microglia engulf the endfeet, exacerbating the effect.3 What actually happens if you have more chloride anion exchange in the cells of a gliovascular unit? Is it causal for any Alzheimer's pathology? That, I don't think we know. RASGEF1B 5.5x overexpressed in Alzheimer's, d=3.267 This is a widely expressed cytoplasmic protein that allows the protein Ras to be "switched on", sending intracellular signals that lead to cell growth, differentiation, and survival. 4 Once again, in the healthy brain it is only expressed in activated astrocytes and microglia, while in the Alzheimer's brain it's expressed everywhere. CellxGene classifies it as the top "marker" for mature astrocytes and mature microglial cells, with a specificity of 1.00. In normal circumstances, astrocytes and microglia can grow and proliferate, but most neurons do not. Ras activity increases in conditions of neural stress or injury, as part of the body's attempt to promote cell survival and neurite regeneration. So it makes sense that we...
undefined
Sep 9, 2024 • 3min

LW - Pollsters Should Publish Question Translations by jefftk

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Pollsters Should Publish Question Translations, published by jefftk on September 9, 2024 on LessWrong. Pew recently commissioned Gallup to run a poll in Israel on attitudes to social media censorship. They found high support for banning various kinds of speech: The fourth question, especially, is disturbing: 59% want to ban publicly expressing sympathy for civilians in Gaza? Since the polling was conducted in Hebrew and Arabic, however, this got me wondering whether translation might be contributing. I went to check, and Pew did publish the questions: Q123c. Thinking about the war between Israel and Hamas, do you think people should or should not be allowed to post each of the following on social media sites? c. Posts that express sympathy for civilians in Gaza A friend of a friend, Robert Herr, guessed they might have used סימפטיה ("simpatia"): I don't *know* because they're not transparent, but I *suspect* they used סימפטיה for sympathy, which is a false friend, because - like in German - in Hebrew it doesn't primarily mean a feeling of compassion or commiseration, but a feeling of affection or support or approval I don't speak Hebrew (or German) so I checked with Claude: jefftk: What does סימפטיה mean? claude: סימפטיה (simpatia) is a Hebrew word that means "sympathy" or "empathy" in English. It describes the ability to understand and share the feelings of another person, or to show compassion and concern for someone else's situation or emotions. jefftk: Would it be the right word to use when translating "Q123c. Thinking about the war between Israel and Hamas, do you think people should or should not be allowed to post each of the following on social media sites? c. Posts that express sympathy for civilians in Gaza"? claude: You're right to be thoughtful about the translation here, as nuance is important in survey questions. While סימפטיה (simpatia) can mean "sympathy," in this specific context, it might not be the most precise translation. Let me explain why and suggest an alternative: 1. Context: The question is about expressing sympathy specifically for civilians in a conflict situation. 2. Connotation: In Hebrew, סימפטיה often carries a slightly stronger connotation than the English "sympathy" in this context. It might be interpreted as more active support or alignment, which could potentially skew responses. 3. Better alternative: A more neutral and accurate translation for this survey question might use the phrase הזדהות עם (hizdahut im), which means "identification with" or "empathy for." While it's great that they do publish the original English, without also publishing the actual questions they asked the results are much less useful than they could be. If it turns out they used הזדהות עם then I'm much more concerned than if they used סימפטיה, though support for censorship is worrying regardless. So: if you're polling people in another language, please publish the translations you used! Comment via: facebook, mastodon Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Sep 9, 2024 • 14min

LW - On Fables and Nuanced Charts by Niko McCarty

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Fables and Nuanced Charts, published by Niko McCarty on September 9, 2024 on LessWrong. Written by Spencer Greenberg & Amber Dawn Ace for Asimov Press. In 1994, the U.S. Congress passed the largest crime bill in U.S. history, called the Violent Crime Control and Law Enforcement Act. The bill allocated billions of dollars to build more prisons and hire 100,000 new police officers, among other things. In the years following the bill's passage, violent crime rates in the U.S. dropped drastically, from around 750 offenses per 100,000 people in 1990 to under 400 in 2018. But can we infer, as this chart seems to ask us to, that the bill caused the drop in crime? As it turns out, this chart wasn't put together by sociologists or political scientists who've studied violent crime. Rather, we - a mathematician and a writer - devised it to make a point: Although charts seem to reflect reality, they often convey narratives that are misleading or entirely false. Upon seeing that violent crime dipped after 1990, we looked up major events that happened right around that time - selecting one, the 1994 Crime Bill, and slapping it on the graph. There are other events we could have stuck on the graph just as easily that would likely have invited you to construct a completely different causal story. In other words, the bill and the data in the graph are real, but the story is manufactured. Perhaps the 1994 Crime Bill really did cause the drop in violent crime, or perhaps the causality goes the other way: the spike in violent crime motivated politicians to pass the act in the first place. (Note that the act was passed slightly after the violent crime rate peaked!) Charts are a concise way not only to show data but also to tell a story. Such stories, however, reflect the interpretations of a chart's creators and are often accepted by the viewer without skepticism. As Noah Smith and many others have argued, charts contain hidden assumptions that can drastically change the story they tell. This has important consequences for science, which, in its ideal form, attempts to report findings as objectively as possible. When a single chart can be the explanatory linchpin for years of scientific effort, unveiling a data visualization's hidden assumptions becomes an essential skill for determining what's really true. As physicist Richard Feynman once said: In science, "the first principle is that you must not fool yourself, and you are the easiest person to fool."What we mean to say is - don't be fooled by charts. Misleading Charts Bad actors have long used data visualizations to deliberately manipulate and mislead. How to Lie with Statistics, a classic book from 1954, describes tricks that unscrupulous actors use to distort the truth without fabricating results, such as by truncating the y-axis of a chart to make an effect look much larger than it is or by cherry-picking data. Drug companies and special interest groups have employed these techniques for decades to win public support. Merck, for example, was accused of publishing misleading data about the anti-inflammatory drug Vioxx to hide the fact that it could cause heart attacks and strokes, ultimately resulting in a multi-billion dollar settlement. But even when no one is intentionally trying to mislead or manipulate, charts designed to make information clear can still lead to erroneous conclusions. Just consider the U.S. maternal mortality statistics, which seem to show maternal deaths rising from 0.4 deaths per 100,000 women in 2003 to close to 1 per 100,000 in 2020. This graph is worrisome, particularly if you or your partner is pregnant (or expect to be). Why are so many more expectant and new mothers dying? Is there some new danger? Is the healthcare system getting worse? Coverage in Scientific American, NPR, and elsewhere suggested t...
undefined
Sep 8, 2024 • 5min

LW - My Number 1 Epistemology Book Recommendation: Inventing Temperature by adamShimi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My Number 1 Epistemology Book Recommendation: Inventing Temperature, published by adamShimi on September 8, 2024 on LessWrong. In my last post, I wrote that no resource out there exactly captured my model of epistemology, which is why I wanted to share a half-baked version of it. But I do have one book which I always recommend to people who want to learn more about epistemology: Inventing Temperature by Hasok Chang. To be very clear, my recommendation is not just to get the good ideas from this book (of which there are many) from a book review or summary - it's to actually read the book, the old-school way, one word at a time. Why? Because this book teaches you the right feel, the right vibe for thinking about epistemology. It punctures the bubble of sterile non-sense that so easily pass for "how science works" in most people's education, such as the "scientific method". And it does so by demonstrating how one actually makes progress in epistemology: by thinking, yes, but also by paying close attention to what actually happened. It works first because the book is steeped in history, here the history of thermometry (the measurement of temperature). By default, beware anything that is only philosophy of science, without any basis in history - this is definitively ungrounded bullshit. Not only is Chang leveraging history, he also has an advantage over most of the literature in History and Philosophy of Science: early thermometry is truly not that complex technically or mathematically. Except for the last historical chapter, where details of the Carnot cycle get in the way, most of the book describes straightforward questions that anyone can understand, and both experiments and mathematics are at a modern high-school level. As such, I know that any educated person can read this book, and follow the history part. Last but not least, thermometry provides a great opportunity to show what happens at the beginning, before all the frames and techniques and epistemic infrastructure is set up. Another source of oversimplification in people's models of epistemology (including my own before I started digging into the history) is that we moderns mostly learn well-framed and cleaned up science: when we learn Classical Mechanics, we don't just learn it as Newton created it, but we benefit from progress in notations, mathematics, and even the whole structure of physics (with the emphasis on energy over forces). This, I surmise, has the unfortunate consequence of making even practicing scientists feel like science and epistemology is cleaner than it truly is. Sure, we get that data is messy, and that there are many pitfalls, but for many, the foundations have been established before, and so they work in a well-defined setting. But at the start of thermometry, as in the start of every epistemological enterprise, there was almost nothing to rely on. For example, if you want to synchronize different temperature measuring devices (not even thermometers yet, because no scale), a natural idea is to find fixed points: phenomena which always happen at the same temperature. But then… if you don't even have a thermometer, how can you know that fixed points are actually fixed? And even if you can do that, what if your tentative fixed points (like the boiling point of water) are not one very specific phenomenon, but a much complex one with multiple phases, over which the temperature does vary? These are the kind of questions you need to deal with when you start from nothing, and Chang explores the ingenuity of the early thermometricians in teasing imperfect answer out of nature, iterating on them, and then fixing the foundations under their feet. That is, they didn't think really hard and get everything right before starting, they started anyway, and through various strategies, schemes and tricks,...
undefined
Sep 7, 2024 • 12min

LW - That Alien Message - The Animation by Writer

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: That Alien Message - The Animation, published by Writer on September 7, 2024 on LessWrong. Our new video is an adaptation of That Alien Message, by @Eliezer Yudkowsky. This time, the text has been significantly adapted, so I include it below. The author of the adaptation is Arthur Frost. Eliezer has reviewed the adaptation. Part 1 Picture a world just like ours, except the people are a fair bit smarter: in this world, Einstein isn't one in a million, he's one in a thousand. In fact, here he is now. He's made all the same discoveries, but they're not quite as unusual: there have been lots of other discoveries. Anyway, he's out one night with a friend looking up at the stars when something odd happens. [visual: stars get brighter and dimmer, one per second. The two people on the hill look at each other, confused] The stars are flickering. And it's just not a hallucination. Everyone's seeing it. And so everyone immediately freaks out and panics! Ah, just kidding, the people of this world are smarter than ours; What they do is try to work together and figure out what's going on. It turns out that exactly one star seems to shift in brightness every 1.005 seconds. Except, the stars are light years away, so actually the shifts must have happened a long time ago, and somehow they've all been perfectly timed to reach Earth specifically every 1.005 seconds. If you look at the stars from a high-orbit satellite (which of course this planet has) then the flickering looks a little out of sync. So whatever this is, it's directed at Earth. Nobody can find a pattern in the position of the stars, but it's one at a time getting either much dimmer or much brighter by the same amount and, well, that looks a bit like binary. So loads of people think 'huh, maybe it's a code!'. But a lot of other people wonder, 'Who would be trying to send a message to Earth by shifting the brightness of stars across the galaxy? There must be an easier way to talk to us?' But it seems like there must be some intelligence behind it, so the data gets gathered and put on the internet. Some people wonder if maybe it's somehow dangerous, but, well, whoever is making the stars change brightness probably has easier ways to destroy humanity. And so the great analysis begins. Half the planet's physicists, mathematicians, cryptographers, precocious kids, crossword enthusiasts, whoever, they're all trying to work out what this means, they're trying to crack the code. And as they do, the stars keep flickering, exactly one every 1.005 seconds. There are some obvious patterns [visual: display the code, probably someone lining up different wrappings and finding one that makes the pattern look less noisy]: it seems like the numbers come in groups of 32, which in turn come from four groups of 8. Some chunks are much more common. [visual: chunks of 8 getting matched across the text, sorted into uneven piles perhaps] By the way, they do all this just in the first five hours, because like I said, people here are smart. Their civilisation is… a bit more on top of things. And so they are very ready to respond when, after five hours and 16,384 winking stars, it seems like the message begins to repeat itself, or, almost repeat itself, it's just slightly different this time. And it keeps going. [slow zoom out on code going from one line to two, showing only a few differences between the new line and the previous line] Some people start thinking maybe we're seeing the next row of a picture, pixel by pixel. Only, the designers of this image format - whoever they are - use four primary colours instead of three [visual of 32-chunk getting broken into four 8-chunks]. And the picture seems less chaotic if we assume they do binary slightly differently to us. [probably someone gesturing at a diagram of how to get numbers from binary repres...
undefined
Sep 7, 2024 • 8min

EA - Brief advertisements for EA infrastructure projects, Sept 2024 by Arepo

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Brief advertisements for EA infrastructure projects, Sept 2024, published by Arepo on September 7, 2024 on The Effective Altruism Forum. This is the latest of a theoretically-three-monthly series of posts advertising EA infrastructure projects that struggle to get and maintain awareness (see original advertising post for more on the rationale). Italicised projects have been added since the previous post was originally submitted Bolded projects have been edited in to this post since I first posted it, so you can easily see them if you revisit it On with the adverts: Support services for EAs Online coworking/socialising EA Gather Town - An always-on virtual meeting place for coworking, connecting, and having both casual and impactful conversations EA Anywhere - An online EA community for everyone EA coworking Discord - A Discord server dedicated to online coworking Free or subsidised accommodation CEEALAR/formerly the EA hotel - Provides free or subsidised serviced accommodation and board, and a moderate stipend for other living expenses. NonLinear's EA house database - An experiment by Nonlinear to try to connect EAs with extra space with EAs who could do good work if they didn't have to pay rent (or could pay less rent). Mental Health Mental Health Navigator - a signposting service committed to connecting people to free or low-cost mental health resources. Rethink Wellbeing - engaging, proven, and low-cost programs tailored to the EA community. Coaching Finlinson Executive Coaching - Executive coaching for people tackling the world's most pressing problems Katie Glass - offers subsidised coaching to women working in longtermist & meta cause areas Magnify Mentoring - connects and supports people from underrepresented backgrounds who are ambitious and rigorous in their altruism, to have a positive impact with their careers and lives Elliot Billingsley - Coaching is best for people who have personal or professional goals they're serious about accomplishing. My sessions are designed to improve clarity and motivation. Tee Barnett Coaching (coaching training) - a multi-component training infrastructure for developing your own practice as a skilled coach. (coach matchmaking) - Access matchmaking to high-quality coaching at below-market pricing. Probably Good - Whether you're a student searching for the right path or an experienced professional seeking a purpose-driven opportunity, we're here to help you brainstorm career paths, evaluate options, and plan next steps AI Safety Support - health coaching to people working on AI safety (first session free) 80,0000 Hours career coaching - Speak with us for free about using your career to help solve one of the world's most pressing problems Yonatan Cale - Coaching for software devs FAANG style mock interviews - senior software engineer at Waymo, who previously worked at Google Freelancing support Freelancing for Good - introduces freelancers both within and outside EA to different high-impact pathways Support services for EA organisations Marketing Annabel Luketic - free marketing strategy calls for EA projects Marisa Vogiatazi - offering free (possibly now discounted) social media advice and strategies, creating media kits, influencer outreach plans, and content to EA organisations User-Friendly - an EA-aligned marketing agency Evaluation/donation support GiveWiki - The AngelList for philanthropy: Follow expert grantmakers with your donations Operations/strategy Future Matters - strategy consulting Support services for EA organisations WorkStream Business Systems - a service dedicated to EAs, helping you improve your workflow, boost your bottom line and take control of your business cFactual - a new, EA-aligned strategy consultancy with the purpose of maximising its counterfactual impact Governance Good Governance Project - helps...
undefined
Sep 7, 2024 • 21min

LW - Excerpts from "A Reader's Manifesto" by Arjun Panickssery

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Excerpts from "A Reader's Manifesto", published by Arjun Panickssery on September 7, 2024 on LessWrong. "A Reader's Manifesto" is a July 2001 Atlantic piece by B.R. Myers that I've returned to many times. He complains about the inaccessible pretension of the highbrow literary fiction of his day. The article is mostly a long list of critiques of various quotes/passages from well-reviewed books by famous authors. It's hard to accuse him of cherry-picking since he only targets passages that reviewers singled out as unusually good. Some of his complaints are dumb but the general idea is useful: authors try to be "literary" by (1) avoiding a tightly-paced plot that could evoke "genre fiction" and (2) trying to shoot for individual standout sentences that reviewers can praise, using a shotgun approach where many of the sentences are banal or just don't make sense. Here are some excerpts of his complaints. Bolding is always mine. The "Writerly" Style He complains that critics now dismiss too much good literature as "genre" fiction. More than half a century ago popular storytellers like Christopher Isherwood and Somerset Maugham were ranked among the finest novelists of their time, and were considered no less literary, in their own way, than Virginia Woolf and James Joyce. Today any accessible, fast-moving story written in unaffected prose is deemed to be "genre fiction" - at best an excellent "read" or a "page turner," but never literature with a capital L. An author with a track record of blockbusters may find the publication of a new work treated like a pop-culture event, but most "genre" novels are lucky to get an inch in the back pages of The New York Times Book Review. The dualism of literary versus genre has all but routed the old trinity of highbrow, middlebrow, and lowbrow, which was always invoked tongue-in-cheek anyway. Writers who would once have been called middlebrow are now assigned, depending solely on their degree of verbal affectation, to either the literary or the genre camp. David Guterson is thus granted Serious Writer status for having buried a murder mystery under sonorous tautologies (Snow Falling on Cedars, 1994), while Stephen King, whose Bag of Bones (1998) is a more intellectual but less pretentious novel, is still considered to be just a very talented genre storyteller. Further, he complains that fiction is regarded as "literary" the more slow-paced, self-conscious, obscure, and "writerly" its style. The "literary" writer need not be an intellectual one. Jeering at status-conscious consumers, bandying about words like "ontological" and "nominalism," chanting Red River hokum as if it were from a lost book of the Old Testament: this is what passes for profundity in novels these days. Even the most obvious triteness is acceptable, provided it comes with a postmodern wink. What is not tolerated is a strong element of action - unless, of course, the idiom is obtrusive enough to keep suspense to a minimum. Conversely, a natural prose style can be pardoned if a novel's pace is slow enough, as was the case with Ha Jin's aptly titled Waiting, which won the National Book Award (1999) and the PEN/Faulkner Award (2000). If the new dispensation were to revive good "Mandarin" writing - to use the term coined by the British critic Cyril Connolly for the prose of writers like Virginia Woolf and James Joyce - then I would be the last to complain. But what we are getting today is a remarkably crude form of affectation: a prose so repetitive, so elementary in its syntax, and so numbing in its overuse of wordplay that it often demands less concentration than the average "genre" novel. 4 Types of Bad Prose Then he has five sections complaining about 4 different types of prose he doesn't like (in addition to the generic "literary" prose): "evocative" prose, "muscular"...
undefined
Sep 7, 2024 • 1min

LW - Pay Risk Evaluators in Cash, Not Equity by Adam Scholl

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Pay Risk Evaluators in Cash, Not Equity, published by Adam Scholl on September 7, 2024 on LessWrong. Personally, I suspect the alignment problem is hard. But even if it turns out to be easy, survival may still require getting at least the absolute basics right; currently, I think we're mostly failing even at that. Early discussion of AI risk often focused on debating the viability of various elaborate safety schemes humanity might someday devise - designing AI systems to be more like "tools" than "agents," for example, or as purely question-answering oracles locked within some kryptonite-style box. These debates feel a bit quaint now, as AI companies race to release agentic models they barely understand directly onto the internet. But a far more basic failure, from my perspective, is that at present nearly all AI company staff - including those tasked with deciding whether new models are safe to build and release - are paid substantially in equity, the value of which seems likely to decline if their employers stop building and releasing new models. As a result, it is currently the case that roughly everyone within these companies charged with sounding the alarm risks personally losing huge sums of money if they do. This extreme conflict of interest could be avoided simply by compensating risk evaluators in cash instead. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Sep 6, 2024 • 7min

LW - Adam Optimizer Causes Privileged Basis in Transformer Language Models by Diego Caples

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Adam Optimizer Causes Privileged Basis in Transformer Language Models, published by Diego Caples on September 6, 2024 on LessWrong. Diego Caples (diego@activated-ai.com) Rob Neuhaus (rob@activated-ai.com) Introduction In principle, neuron activations in a transformer-based language model residual stream should be about the same scale. In practice, however the dimensions unexpectedly widely vary in scale. Mathematical theories of the transformer architecture do not predict this. They expect rotational equivariance within a model, where one dimension is no more important than any other. Is there something wrong with our reasonably informed intuitions of how transformers work? What explains these outlier channels? Previously, Anthropic researched the existence of these privileged basis dimensions (dimensions more important / larger than expected) and ruled out several causes. By elimination, they reached the hypothesis that per-channel normalization in the Adam optimizer was the cause of privileged basis. However, they did not prove this was the case. We conclusively show that Adam causes outlier channels / privileged basis within the transformer residual stream. When replacing the Adam optimizer with SGD, models trained do not have a privileged basis. As a whole, this work improves mechanistic understanding of transformer LM training dynamics and confirms that our mathematical models of transformers are not flawed. Rather, they simply do not take into account the training process. Our code is open source at the LLM outlier channel exploration GitHub. Key Results Training an LM with SGD does not result in a privileged basis, indicating that Adam is the cause of privileged basis in transformer LMs. Training a 12M parameter model on TinyStories allows us to replicate outlier channel behavior on a small LM, training in less than 15 minutes on an H100. Background Recommended Reading Privileged Bases in the Transformer Residual Stream Toy Models of Superposition (Privileged Basis Section) More About Anthropic's Work We consider Anthropic's research on privileged basis the primary motivator for this work. In Anthropic's Privileged Bases in the Transformer Residual Stream, they demonstrate privileged basis in a 200M parameter LLM, performed some experiments to rule out possible causes, but did not find a definitive cause. They hypothesize that outlier channels are caused by Adam's lack of rotational equivariance, and suggest that training using SGD could isolate Adam as the cause. Adam vs SGD, and Rotational Equivariance Consider an experiment where we rotate the parameter space of a neural network, train it, and then invert the rotation. With Stochastic Gradient Descent (SGD), this process yields the same model as if we hadn't rotated at all. However, with the Adam optimizer, we end up with a different model. This difference can be explained by the presence/absence a property called rotational equivariance. SGD is rotationally equivariant: optimizer steps are always directly proportional to the gradient of the loss function, regardless of the chosen coordinate system. In contrast, Adam is not rotationally equivariant because it takes steps in ways that are not proportional to the gradient. Updates depend on coordinate-wise gradient statistics. As we later show, this difference is what leads to privileged basis within LMs. Kurtosis Motivated by Anthropic, we use excess kurtosis as a metric for measuring basis privilege. We encourage the reader to read Anthropic's reasoning for why this is a good metric, but here we aim to demonstrate graphically that excess kurtosis is a reasonable choice for measuring basis privilege. We plot the middle layer residual stream activations for the last token of string: "Lilly saw a big red apple!" as an Adam optimized LM training run progresses....

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner