The Nonlinear Library cover image

The Nonlinear Library

Latest episodes

undefined
Sep 15, 2024 • 6min

LW - Why I funded PIBBSS by Ryan Kidd

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why I funded PIBBSS, published by Ryan Kidd on September 15, 2024 on LessWrong. I just left a comment on PIBBSS' Manfund grant request (which I funded $25k) that people might find interesting. PIBBSS needs more funding! Main points in favor of this grant 1. My inside view is that PIBBSS mainly supports " blue sky" or " basic" research, some of which has a low chance of paying off, but might be critical in " worst case" alignment scenarios (e.g., where " alignment MVPs" don't work, " sharp left turns" and " intelligence explosions" are more likely than I expect, or where we have more time before AGI than I expect). In contrast, of the technical research MATS supports, about half is basic research (e.g., interpretability, evals, agent foundations) and half is applied research (e.g., oversight + control, value alignment). I think the MATS portfolio is a better holistic strategy for furthering AI safety and reducing AI catastrophic risk. However, if one takes into account the research conducted at AI labs and supported by MATS, PIBBSS' strategy makes a lot of sense: they are supporting a wide portfolio of blue sky research that is particularly neglected by existing institutions and might be very impactful in a range of possible "worst-case" AGI scenarios. I think this is a valid strategy in the current ecosystem/market and I support PIBBSS! 2. In MATS' recent post, " Talent Needs of Technical AI Safety Teams", we detail an AI safety talent archetype we name "Connector". Connectors bridge exploratory theory and empirical science, and sometimes instantiate new research paradigms. As we discussed in the post, finding and developing Connectors is hard, often their development time is on the order of years, and there is little demand on the AI safety job market for this role. However, Connectors can have an outsized impact on shaping the AI safety field and the few that make it are "household names" in AI safety and usually build organizations, teams, or grant infrastructure around them. I think that MATS is far from the ideal training ground for Connectors (although some do pass through!) as our program is only 10 weeks long (with an optional 4 month extension) rather than the ideal 12-24 months, we select scholars to fit established mentors' preferences rather than on the basis of their original research ideas, and our curriculum and milestones generally focus on building object-level scientific/engineering skills rather than research ideation and "identifying gaps". It's thus no surprise that most MATS scholars are "Iterator" archetypes. I think there is substantial value in a program like PIBBSS existing, to support the long-term development of "Connectors" and pursue impact in a higher-variance way than MATS. 3. PIBBSS seems to have decent track record for recruiting experienced academics in non-CS fields and helping them repurpose their advanced scientific skills to develop novel approaches to AI safety. Highlights for me include Adam Shai's "computational mechanics" approach to interpretability and model cognition, Martín Soto's "logical updatelessness" approach to decision theory, and Gabriel Weil's "tort law" approach to making AI labs liable for their potential harms on the long-term future. 4. I don't know Lucas Teixeira (Research Director) very well, but I know and respect Dušan D. Nešić (Operations Director) a lot. I also highly endorsed Nora Ammann's vision (albeit while endorsing a different vision for MATS). I see PIBBSS as a highly competent and EA-aligned organization, and I would be excited to see them grow! 5. I think PIBBSS would benefit from funding from diverse sources, as mainstream AI safety funders have pivoted more towards applied technical research (or more governance-relevant basic research like evals). I think Manifund regrantors are well-positio...
undefined
Sep 15, 2024 • 12min

LW - Proveably Safe Self Driving Cars by Davidmanheim

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Proveably Safe Self Driving Cars, published by Davidmanheim on September 15, 2024 on LessWrong. I've seen a fair amount of skepticism about the "Provably Safe AI" paradigm, but I think detractors give it too little credit. I suspect this is largely because of idea inoculation - people have heard an undeveloped or weak man version of the idea, for example, that we can use formal methods to state our goals and prove that an AI will do that, and have already dismissed it. (Not to pick on him at all, but see my question for Scott Aaronson here.) I will not argue that Guaranteed Safe AI solves AI safety generally, or that it could do so - I will leave that to others. Instead, I want to provide a concrete example of a near-term application, to respond to critics who say that proveability isn't useful because it can't be feasibly used in real world cases when it involves the physical world, and when it is embedded within messy human systems. I am making far narrower claims than the general ones which have been debated, but at the very least I think it is useful to establish whether this is actually a point of disagreement. And finally, I will admit that the problem I'm describing would be adding proveability to a largely solved problem, but it provides a concrete example for where the approach is viable. A path to provably safe autonomous vehicles To start, even critics agree that formal verification is possible, and is already used in practice in certain places. And given (formally specified) threat models in different narrow domains, there are ways to do threat and risk modeling and get different types of guarantees. For example, we already have proveably verifiable code for things like microkernels, and that means we can prove that buffer overflows, arithmetic exceptions, and deadlocks are impossible, and have hard guarantees for worst case execution time. This is a basis for further applications - we want to start at the bottom and build on provably secure systems, and get additional guarantees beyond that point. If we plan to make autonomous cars that are provably safe, we would build starting from that type of kernel, and then we "only" have all of the other safety issues to address. Secondly, everyone seems to agree that provable safety in physical systems requires a model of the world, and given the limits of physics, the limits of our models, and so on, any such approach can only provide approximate guarantees, and proofs would be conditional on those models. For example, we aren't going to formally verify that Newtonian physics is correct, we're instead formally verifying that if Newtonian physics is correct, the car will not crash in some situation. Proven Input Reliability Given that, can we guarantee that a car has some low probability of crashing? Again, we need to build from the bottom up. We can show that sensors have some specific failure rate, and use that to show a low probability of not identifying other cars, or humans - not in the direct formal verification sense, but instead with the types of guarantees typically used for hardware, with known failure rates, built in error detection, and redundancy. I'm not going to talk about how to do that class of risk analysis, but (modulus adversarial attacks, which I'll mention later,) estimating engineering reliability is a solved problem - if we don't have other problems to deal with. But we do, because cars are complex and interact with the wider world - so the trick will be integrating those risk analysis guarantees that we can prove into larger systems, and finding ways to build broader guarantees on top of them. But for the engineering reliability, we don't only have engineering proof. Work like DARPA's VerifAI is "applying formal methods to perception and ML components." Building guarantees about perceptio...
undefined
Sep 15, 2024 • 18min

LW - Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more by Michael Cohn

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more, published by Michael Cohn on September 15, 2024 on LessWrong. In the fields of user experience and accessibility, everyone talks about the curb cut effect: Features that are added as accommodations for people with disabilities sometimes become widely useful and beloved. But not every accommodation becomes a "curb cut," and I've been thinking about other patterns that come up when accommodations intersect with wider society. The original Curb Cut Effect The eponymous curb cut -- the place at the intersection where the sidewalk slopes down to the street instead of just dropping off -- is most obviously there to for wheelchair users. But it's also great for people who are pulling a suitcase, runners who want to avoid jarring their ankles, and people who are walking their bikes. Universal captioning on TV, movies, and video is nominally for Deaf or hearing-impaired people, but captions are handy to anyone who's watching TV in a noisy restaurant, or trying to make sense of a show with artistically muddy audio, or trying to watch a video at 3x speed and the audio is unintelligible. When we make products easier to use, or spaces easier to access, it's not just some essentialized group of people with disabilities who benefit -- accessibility is good for everyone. Why the idea is useful: First, it breaks down the perspective of disability accommodations as being a costly charity where "we" spend resources to help "them." Further, it breaks down the idea of disability as an essentialized, either-or, othered type of thing. Everybody has some level of difficulty accessing parts of the world some of the time, and improving accessibility is an inherent part of good design, good thinking, and good communication.[1] Plus, it's cool to be aware of all the different ways we can come up with to hack our experience of the world around us! I think there's also a dark side to the idea -- a listener could conclude that we wouldn't invest in accommodations if they didn't happen to help people without disabilities. A just and compassionate society designs for accessibility because we value everybody, not because it's secretly self-interested. That said, no society spends unlimited money to make literally every experience accessible to literally every human. There's always a cost-benefit analysis and sometimes it might be borderline. In those cases there's nothing wrong with saying that the benefits to the wider population tip the balance in favor of investing in accessibility. But when it comes to things as common as mobility impairments and as simple as curb cuts, I think it would be a moral no-brainer even if the accommodation had no value to most people. The Handicapped Parking effect This edgier sibling of the curb cut effect comes up when there's a limited resource -- like handicapped parking. There are only X parking spaces within Y feet of the entrance to the Chipotle, and if we allocate them to people who have trouble getting around, then everyone else has a longer average walk to their car. That doesn't mean it's zero-sum: The existence of a handicapped parking spot that I can't use might cost me an extra 20 seconds of walking, but save an extra five minutes of painful limping for the person who uses it.[2] This arrangement probably increases overall utility both in the short term (reduced total pain experienced by people walking from their cars) and in the long term (signaling the importance of helping everyone participate in society). But this is manifestly not a curb cut effect where everyone benefits: You have to decide who's going to win and who's going to lose, relative to an unregulated state where all parking is first-come-first-served. Allocation can be made well or p...
undefined
Sep 15, 2024 • 11min

LW - Did Christopher Hitchens change his mind about waterboarding? by Isaac King

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Did Christopher Hitchens change his mind about waterboarding?, published by Isaac King on September 15, 2024 on LessWrong. There's a popular story that goes like this: Christopher Hitchens used to be in favor of the US waterboarding terrorists because he though it's wasn't bad enough to be torture.. Then he had it tried on himself, and changed his mind, coming to believe it isn't torture. (Context for those unfamiliar: in the decade following 9/11, the US engaged in a lot of... questionable behavior to persecute the war on terror, and there was a big debate on whether waterboarding should be permitted. Many other public figures also volunteered to undergo the procedure as a part of this public debate; most notably Sean Hannity, who was an outspoken proponent of waterboarding, yet welched on his offer and never tried it himself.) This story intrigued me because it's popular among both Hitchens' fans and his detractors. His fans use it as an example of his intellectual honesty and willingness to undergo significant personal costs in order to have accurate beliefs and improve the world. His detractors use it to argue that he's self-centered and unempathetic, only coming to care about a bad thing that's happening to others after it happens to him. But is the story actually true? Usually when there are two sides to an issue, one side will have an incentive to fact-check any false claims that the other side makes. An impartial observer can then look at the messaging from both sides to discover any flaws in the other. But if a particular story is convenient for both groups, then neither has any incentive to debunk it. I became suspicious when I tried going to the source of this story to see what Hitchens had written about waterboarding prior to his 2008 experiment, and consistently found these leads to evaporate. The part about him having it tried on himself and finding it tortureous is certainly true. He reported this himself in his Vanity Fair article Believe me, It's Torture. But what about before that? Did he ever think it wasn't torture? His article on the subject doesn't make any mention of changing his mind, and it perhaps lightly implies that he always had these beliefs. He says, for example: In these harsh [waterboarding] exercises, brave men and women were introduced to the sorts of barbarism that they might expect to meet at the hands of a lawless foe who disregarded the Geneva Conventions. But it was something that Americans were being trained to resist, not to inflict. [Link to an article explaining that torture doesn't work.] [And later:] You may have read by now the official lie about this treatment, which is that it "simulates" the feeling of drowning. This is not the case. You feel that you are drowning because you are drowning[.] In a video interview he gave about a year later, he said: There was only one way I felt I could advance the argument, which was to see roughly what it was like. The loudest people on the internet about this were... not promising. Shortly after the Vanity Fair article, the ACLU released an article titled "Christopher Hitchens Admits Waterboarding is Torture", saying: You have to hand it to him: journalist Christopher Hitchens, who previously discounted that waterboarding was indeed torture, admits in the August issue of Vanity Fair that it is, indeed, torture. But they provide no source for this claim. As I write this, Wikipedia says: Hitchens, who had previously expressed skepticism over waterboarding being considered a form of torture, changed his mind. No source is provided for this either. Yet it's repeated everywhere. The top comments on the Youtube video. Highly upvoted Reddit posts. Etc. Sources for any of these claims were quite scant. Many people cited "sources" that, upon me actually reading them, had nothing to do with t...
undefined
Sep 15, 2024 • 6min

EA - Bringing the International Space Station down safely: A Billion dollar waste? by NickLaing

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Bringing the International Space Station down safely: A Billion dollar waste?, published by NickLaing on September 15, 2024 on The Effective Altruism Forum. Epistemic status: Uncertain, shooting from the hip a little with no expertise in this area and only a couple of hours research done. I might well have missed something obvious, in which case I'll revise or even take the post down. Money Waste is Everywhere Here in Northern Uganda where poverty abounds, many expenditures feel wasteful. Last night I had a great time at the fanciest restaurant in town with friends but felt a pang of guilt about my $7 meal. Enough of a pang to avoid telling my wife after I came home. A bigger scale waste in these parts is the partial closure of the main bridge across the river Nile, because the bridge has apparently degraded and become hazardous. Vehicles larger than a minivan now can't cross, which has raised the price of public transport by 50% and trucks now have a 3 hour detour. Besides these direct costs, this closure increases the cost of fuel and commodities in Northern Uganda. By my loose, conservative BOTEC the closure costs $10,000 every day (1.2 million dollars in 4 months so far) which Ugandans now can't spend on education and healthcare, while likely causing more crashes due to increasingly tired drivers who now use worse roads. The detour itself may have already cost more lives than would be lost if the bridge does collapse and kills a few people.[1] But there are far bigger wastes of money on this good earth. A Billion Dollars to bring down a space station? Space X have secured an 843 million dollar contract[2] to build the boringly named "U.S. De-Orbit vehicle" (why not "Sky Shepherd")[3], which in 2031 will safely guide the decommissioned International Space Station (ISS) into the Pacific Ocean. This all sounded pretty cool until I thought… is this worth it?. No human has ever been definitively killed by an object falling from space, although there have been a couple of close calls with larger asteroids injuring many while Open Asteroid Impact could be a game changer here in future. This one time though, a wee piece of space junk did hit Lottie Williams in the shoulder and she took it home as a memento. I'm jealous. According to a great Nature article "Unnecessary risks created by uncontrolled rocket reentries", over the last 30 years over 1,000 space bodies have fallen to earth in uncontrolled re-entries and never killed anyone. The closest call might be a Chinese rocket in 2020 which damaged a house in the Ivory Coast. The article predicts a 10% chance of a fatal space junk accident in the next 10 years - far from zero and worth considering, but unlikely to be the next EA cause area. This low risk makes sense given that only 3% of the globe are urban areas and under 1% actually contain human homes[4]- most stuff falls down where there ain't people. Also the bulk of falling spacecraft burns up before hitting the ground. In contrast a million people die from car crashes every year,[5] and each of us has about a 1 in 100 chance of dying that way. Although the ISS is the biggest ever at 450 tons, we do have priors. Two 100 ton uncontrolled re-entries (Skylab and tragically the Columbia) crashed to earth without issue. So what actually is the risk if the ISS was left to crash uncontrolled? The U.S. Government requires controlled re-entry for anything that poses over a 1 in 10,000 risk to human life so this risk must be higher. NASA doesn't give us their risk estimate but only state "The ISS requires a controlled re-entry because it is very large, and uncontrolled re-entry would result in very large pieces of debris with a large debris footprint, posing a significant risk to the public worldwide" [6]. I hesitate to even guesstimate the risk to human life at the ISS falli...
undefined
Sep 15, 2024 • 7min

LW - Pay-on-results personal growth: first success by Chipmonk

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Pay-on-results personal growth: first success, published by Chipmonk on September 15, 2024 on LessWrong. Thanks to Kaj Sotala, Stag Lynn, and Ulisse Mini for reviewing. Thanks to Kaj Sotala, Brian Toomey, Alex Zhu, Damon Sasi, Anna Salamon, and CFAR for mentorship and financial support A few months ago I made the claim "Radically effective and rapid growth [motivationally / emotionally / socially] is possible with the right combination of facilitator and method". Eg: for anxiety, agency, insecurity, need for validation. To test my hypothesis, I began a pay-on-results coaching experiment: When clients achieve their goal and are confident it will last (at least one month), they pay a bounty. My first client Bob (pseudonymous) and I met at Manifest 2024, where I had set up a table at the night market for hunting "emotional security" bounties. Bob had lifelong anxiety, and it was crushing his agency and relationships. He offered a $3,000 bounty for resolving it, and I decided to pursue it. We spoke and tried my method. It was only necessary for us to talk once, apparently, because a month later he said our one conversation helped him achieve what 8 years of talk therapy could not: I'm choosing to work on problems beyond my capabilities, and get excited about situations where my weaknesses are repeatedly on display. I actually feel excited about entering social situations where chances of things going worse than I would want them to were high. So he paid his bounty when he was ready (in this case, 35 days after the session). I've been checking in with him since (latest: last week, two months after the session) and he tells me all is well. Bob also shared some additional benefits beyond his original bounty: Planning to make dancing a weekly part of my life now. (All shared with permission.) I'm also hunting many other bounties A woman working in SF after 3 sessions, text support, and three weeks: I went to Chris with a torrent of responsibilities and a key decision looming ahead of me this month. I felt overwhelmed, upset, and I didn't want just talk Having engaged in 9+ years of coaching and therapy with varying levels of success, I'm probably one of the toughest clients - equal parts hopeful and skeptical. Chris created an incredibly open space where I could easily tell him if I didn't know something, or couldn't feel something, or if I'm overthinking. He also has an uncanny sense of intuition on these things and a strong attunement to being actually effective The results are already telling: a disappointment that might've made me emotionally bleed and mope for a month was something I addressed in the matter of a couple of days with only a scoop of emotional self-doubt instead of *swimming* in self-torture. The lag time of actually doing things to be there for myself was significantly quicker, warmer, and more effective To-dos that felt very heavy lightened up considerably and began to feel fun again and as ways of connecting! I've now started to predict happier things ahead with more vivid, emotionally engaged, and realistic detail. I'll continue being intensely focused this year for the outcomes I want, but I'm actually looking forward to it! Will reflect back on Month 2! An SF founder in his 30s after 1 session and two weeks: After working with Chris, I learned One Weird Trick to go after what I really want and feel okay no matter what happens. This is a new skill I didn't learn in 3 years of IFS therapy. I already feel more confident being myself and expressing romantic interest (and I already have twice, that's new). ( More…) What the fuck? "Why does your thing work so unusually well?" asks my mentor Kaj. For one, it doesn't work for everyone with every issue, as you can see in the screenshot above. (That said, I suspect a lot of this is my fault for pursuing bounti...
undefined
Sep 14, 2024 • 3min

LW - OpenAI o1, Llama 4, and AlphaZero of LLMs by Vladimir Nesov

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OpenAI o1, Llama 4, and AlphaZero of LLMs, published by Vladimir Nesov on September 14, 2024 on LessWrong. GPT-4 level open weights models like Llama-3-405B don't seem capable of dangerous cognition. OpenAI o1 demonstrates that a GPT-4 level model can be post-trained into producing useful long horizon reasoning traces. AlphaZero shows how capabilities can be obtained from compute alone, with no additional data. If there is a way of bringing these together, the apparent helplessness of the current generation of open weights models might prove misleading. Post-training is currently a combination of techniques that use synthetic data and human labeled data. Human labeled data significantly improves quality, but its collection is slow and scales poorly. Synthetic data is an increasingly useful aspect of post-training, and automated aspects of its generation scale easily. Unlike weaker models, GPT-4 level LLMs clearly pass reading comprehension on most occasions, OpenAI o1 improves on this further. This suggests that at some point human data might become mostly unnecessary in post-training, even if it still slightly helps. Without it, post-training becomes automated and gets to use more compute, while avoiding the need for costly and complicated human labeling. A pretrained model at the next level of scale, such as Llama 4, if made available in open weights, might initially look approximately as tame as current models. OpenAI o1 demonstrates that useful post-training for long sequences of System 2 reasoning is possible. In the case of o1 in particular, this might involve a lot of human labeling, making its reproduction a very complicated process (at least if the relevant datasets are not released, and the reasoning traces themselves are not leaked in large quantities). But if some generally available chatbots at the next level of scale are good enough at automating labeling, this complication could be sidestepped, with o1 style post-training cheaply reproduced on top of a previously released open weights model. So there is an overhang in an open weights model that's distributed without long horizon reasoning post-training, since applying such post-training significantly improves its capabilities, making perception of its prior capabilities inadequate. The problem right now is that a new level of pretraining scale is approaching in the coming months, while ability to cheaply apply long horizon reasoning post-training might follow shortly thereafter, possibly unlocked by these very same models at the new level of pretraining scale (since it might currently be too expensive for most actors to implement, or to do enough experiments to figure out how). The resulting level of capabilities is currently unknown, and could well remain unknown outside the leading labs until after the enabling artifacts of the open weights pretrained models at the next level of scale have already been published. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
Sep 14, 2024 • 6min

EA - Civil Litigation for Farmed Animals - Notes From EAGxBerkeley Talk by Noa Weiss

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Civil Litigation for Farmed Animals - Notes From EAGxBerkeley Talk, published by Noa Weiss on September 14, 2024 on The Effective Altruism Forum. Overview These are my notes of the "Civil Litigation for Farmed Animals" from EAGxBerkeley, given by Alene Anello, president of Legal Impact for Chickens (LIC). It was an excellent talk, exploring a front of the animal welfare movement that, in my opinion, has the potential to be extremely effective, and is very much neglected. (Would love to hear if you agree/disagree on this). LIC also is currently hiring lawyers, so if you know someone who might be interested, let them know. This is a rare opportunity for folks with legal training to get professionally involved in the movement (those paid positions are hard to come by). ================== Talk Notes Intro Premise: improving conditions on factory farms will go a long way towards helping chickens suffering The law prohibits animal cruelty (in theory) (Gave an excerpt from the California Penal Code) Yet undercover investigations in farms expose such cruelty on a regular basis Footnote on criminal laws: there are some states that have exemptions for animal agriculture But not in California Even states that have exemptions - it's not for *every* kind of abuse. There's a lot of stuff that happens in the farms that isn't technically exempted But police and prosecutors don't really enforce it And even when they do - it's against individual workers and not the company/CEOs Why? Not sure. Perhaps because it's easier to go after someone with less power. Attorney generals are almost always politicians (elected / politically appointed), which means they have an interest in keeping powerful companies happy Some reasons for not enforcing at all: A reason they often officially give: those are misdemeanors, and they're more interested in pursuing felonies (also for funding reasons) Possibly: corruption Possibly: "soft corruption" like not wanting to make powerful people angry Resources and priorities LIC's Solution: "Creative" Civil Litigation Not how civil litigation is usually works Animal cruelty is a crime, would more "naturally" be handled by the criminal system - but since the criminal system doesn't do anything, LIC looks for ways to bring it to civil litigations LIC sues companies and executives Example Cases Example 1: Costco Costco is not only a store but also breeds, raises and slaughters chickens (and sells the meat) Bred them so fast that they could not even stand, eat, drink. Starved to death That's against the law - you're required to feed your animals There are some fiduciary duties - which are on the executives, personally, towards the company One of them: "don't break the law" If the executives haven't fulfilled the duties - the company can sue them Which wouldn't usually happen because the execs control the company But! The company also has owners. In the case of a publicly traded company - share holders So LIC found Costco shareholders to work with (Q: do you have to find existing share holders or can you just buy shares and then sue? A: Alene doesn't know, there isn't really a precedent). Result: The good news: the judge did say that the company has a responsibility re animal cruelty. Which means LIC can bring more cases like that! The bad new: had a different interpretation to the law re what happened at Costco, so dismissed the case Example 2: "Case Farms" - KFC supplier Treated chicks as "dispensible". Let machine drive over them etc. Pretty harrowing. Happened in North California. Has a law against animal cruelty, with an exemption for food/poultry. That was what CF's defense was based on. That thereby anything they do is exempt. LIC disagrees. If you kill the chicks they're not really used for food. This was dismissed and LIC appealed. Currently in the NC court ...
undefined
Sep 14, 2024 • 1h 6min

LW - If-Then Commitments for AI Risk Reduction [by Holden Karnofsky] by habryka

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: If-Then Commitments for AI Risk Reduction [by Holden Karnofsky], published by habryka on September 14, 2024 on LessWrong. Holden just published this paper on the Carnegie Endowment website. I thought it was a decent reference, so I figured I would crosspost it (included in full for convenience, but if either Carnegie Endowment or Holden has a preference for just having an excerpt or a pure link post, happy to change that) If-then commitments are an emerging framework for preparing for risks from AI without unnecessarily slowing the development of new technology. The more attention and interest there is these commitments, the faster a mature framework can progress. Introduction Artificial intelligence (AI) could pose a variety of catastrophic risks to international security in several domains, including the proliferation and acceleration of cyberoffense capabilities, and of the ability to develop chemical or biological weapons of mass destruction. Even the most powerful AI models today are not yet capable enough to pose such risks,[1] but the coming years could see fast and hard-to-predict changes in AI capabilities. Both companies and governments have shown significant interest in finding ways to prepare for such risks without unnecessarily slowing the development of new technology. This piece is a primer on an emerging framework for handling this challenge: if-then commitments. These are commitments of the form: If an AI model has capability X, risk mitigations Y must be in place. And, if needed, we will delay AI deployment and/or development to ensure the mitigations can be present in time. A specific example: If an AI model has the ability to walk a novice through constructing a weapon of mass destruction, we must ensure that there are no easy ways for consumers to elicit behavior in this category from the AI model. If-then commitments can be voluntarily adopted by AI developers; they also, potentially, can be enforced by regulators. Adoption of if-then commitments could help reduce risks from AI in two key ways: (a) prototyping, battle-testing, and building consensus around a potential framework for regulation; and (b) helping AI developers and others build roadmaps of what risk mitigations need to be in place by when. Such adoption does not require agreement on whether major AI risks are imminent - a polarized topic - only that certain situations would require certain risk mitigations if they came to pass. Three industry leaders - Google DeepMind, OpenAI, and Anthropic - have published relatively detailed frameworks along these lines. Sixteen companies have announced their intention to establish frameworks in a similar spirit by the time of the upcoming 2025 AI Action Summit in France.[2] Similar ideas have been explored at the International Dialogues on AI Safety in March 2024[3] and the UK AI Safety Summit in November 2023.[4] As of mid-2024, most discussions of if-then commitments have been in the context of voluntary commitments by companies, but this piece focuses on the general framework as something that could be useful to a variety of actors with different enforcement mechanisms. This piece explains the key ideas behind if-then commitments via a detailed walkthrough of a particular if-then commitment, pertaining to the potential ability of an AI model to walk a novice through constructing a chemical or biological weapon of mass destruction. It then discusses some limitations of if-then commitments and closes with an outline of how different actors - including governments and companies - can contribute to the path toward a robust, enforceable system of if-then commitments. Context and aims of this piece. In 2023, I helped with the initial development of ideas related to if-then commitments.[5] To date, I have focused on private discussion of this new fram...
undefined
Sep 14, 2024 • 10min

LW - Evidence against Learned Search in a Chess-Playing Neural Network by p.b.

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Evidence against Learned Search in a Chess-Playing Neural Network, published by p.b. on September 14, 2024 on LessWrong. Introduction There is a new paper and lesswrong post about "learned look-ahead in a chess-playing neural network". This has long been a research interest of mine for reasons that are well-stated in the paper: Can neural networks learn to use algorithms such as look-ahead or search internally? Or are they better thought of as vast collections of simple heuristics or memorized data? Answering this question might help us anticipate neural networks' future capabilities and give us a better understanding of how they work internally. and further: Since we know how to hand-design chess engines, we know what reasoning to look for in chess-playing networks. Compared to frontier language models, this makes chess a good compromise between realism and practicality for investigating whether networks learn reasoning algorithms or rely purely on heuristics. So the question is whether Francois Chollet is correct with transformers doing "curve fitting" i.e. memorisation with little generalisation or whether they learn to "reason". "Reasoning" is a fuzzy word, but in chess you can at least look for what human players call "calculation", that is the ability to execute moves solely in your mind to observe and evaluate the resulting position. To me this is a crux as to whether large language models will scale to human capabilities without further algorithmic breakthroughs. The paper's authors, which include Erik Jenner and Stuart Russell, conclude that the policy network of Leela Chess Zero (a top engine and open source replication of AlphaZero) does learn look-ahead. Using interpretability techniques they "find that Leela internally represents future optimal moves and that these representations are crucial for its final output in certain board states." While the term "look-ahead" is fuzzy, the paper clearly intends to show that the Leela network implements an "algorithm" and a form of "reasoning". My interpretation of the presented evidence is different, as discussed in the comments of the original lesswrong post. I argue that all the evidence is completely consistent with Leela having learned to recognise multi-move patterns. Multi-move patterns are just complicated patterns that take into account that certain pieces will have to be able to move to certain squares in future moves for the pattern to hold. The crucial different to having learned an algorithm: An algorithm can take different inputs and do its thing. That allows generalisation to unseen or at least unusual inputs. This means that less data is necessary for learning because the generalisation power is much higher. Learning multi-move patterns on the other hand requires much more data because the network needs to see many versions of the pattern until it knows all specific details that have to hold. Analysis setup Unfortunately it is quite difficult to distinguish between these two cases. As I argued: Certain information is necessary to make the correct prediction in certain kinds of positions. The fact that the network generally makes the correct prediction in these types of positions already tells you that this information must be processed and made available by the network. The difference between lookahead and multi-move pattern recognition is not whether this information is there but how it got there. However, I propose an experiment, that makes it clear that there is a difference. Imagine you train the model to predict whether a position leads to a forced checkmate and also the best move to make. You pick one tactical motive and erase it from the checkmate prediction part of the training set, but not the move prediction part. Now the model still knows which the right moves are to make i.e. it would pl...

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner