The Nonlinear Library

The Nonlinear Fund
undefined
May 28, 2024 • 55min

LW - OpenAI: Fallout by Zvi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OpenAI: Fallout, published by Zvi on May 28, 2024 on LessWrong. Previously: OpenAI: Exodus (contains links at top to earlier episodes), Do Not Mess With Scarlett Johansson We have learned more since last week. It's worse than we knew. How much worse? In which ways? With what exceptions? That's what this post is about. The Story So Far For years, employees who left OpenAI consistently had their vested equity explicitly threatened with confiscation and the lack of ability to sell it, and were given short timelines to sign documents or else. Those documents contained highly aggressive NDA and non disparagement (and non interference) clauses, including the NDA preventing anyone from revealing these clauses. No one knew about this until recently, because until Daniel Kokotajlo everyone signed, and then they could not talk about it. Then Daniel refused to sign, Kelsey Piper started reporting, and a lot came out. Here is Altman's statement from May 18, with its new community note. Evidence strongly suggests the above post was, shall we say, 'not consistently candid.' The linked article includes a document dump and other revelations, which I cover. Then there are the other recent matters. Ilya Sutskever and Jan Leike, the top two safety researchers at OpenAI, resigned, part of an ongoing pattern of top safety researchers leaving OpenAI. The team they led, Superalignment, had been publicly promised 20% of secured compute going forward, but that commitment was not honored. Jan Leike expressed concerns that OpenAI was not on track to be ready for even the next generation of models needs for safety. OpenAI created the Sky voice for GPT-4o, which evoked consistent reactions that it sounded like Scarlett Johansson, who voiced the AI in the movie Her, Altman's favorite movie. Altman asked her twice to lend her voice to ChatGPT. Altman tweeted 'her.' Half the articles about GPT-4o mentioned Her as a model. OpenAI executives continue to claim that this was all a coincidence, but have taken down the Sky voice. (Also six months ago the board tried to fire Sam Altman and failed, and all that.) A Note on Documents from OpenAI The source for the documents from OpenAI that are discussed here, and the communications between OpenAI and its employees and ex-employees, is Kelsey Piper in Vox, unless otherwise stated. She went above and beyond, and shares screenshots of the documents. For superior readability and searchability, I have converted those images to text. Some Good News But There is a Catch OpenAI has indeed made a large positive step. They say they are releasing former employees from their nondisparagement agreements and promising not to cancel vested equity under any circumstances. Kelsey Piper: There are some positive signs that change is happening at OpenAI. The company told me, "We are identifying and reaching out to former employees who signed a standard exit agreement to make it clear that OpenAI has not and will not cancel their vested equity and releases them from nondisparagement obligations." Bloomberg confirms that OpenAI has promised not to cancel vested equity under any circumstances, and to release all employees from one-directional non-disparagement agreements. And we have this confirmation from Andrew Carr. Andrew Carr: I guess that settles that. Tanner Lund: Is this legally binding? Andrew Carr: I notice they are also including the non-solicitation provisions as not enforced. (Note that certain key people, like Dario Amodei, plausibly negotiated two-way agreements, which would mean theirs would still apply. I would encourage anyone in that category who is now free of the clause, even if they have no desire to disparage OpenAI, to simply say 'I am under no legal obligation not to disparage OpenAI.') These actions by OpenAI are helpful. They are necessary. They are no...
undefined
May 28, 2024 • 23min

LW - Understanding Gödel's completeness theorem by jessicata

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Understanding Gödel's completeness theorem, published by jessicata on May 28, 2024 on LessWrong. In this post I prove a variant of Gödel's completeness theorem. My intention has been to really understand the theorem, so that I am not simply shuffling symbols around, but am actually understanding why it is true. I hope it is helpful for at least some other people. For sources, I have myself relied mainly on Srivastava's presentation. I have relied a lot on intuitions about sequent calculus; while I present a sequent calculus in this post, this is not a complete introduction to sequent calculus. I recommend Logitext as an online proof tool for gaining more intuition about sequent proofs. I am familiar with sequent calculus mainly through type theory. First-order theories and models A first-order theory consists of: A countable set of functions, which each have an arity, a non-negative integer. A countable set of predicates, which also have non-negative integer arities. A countable set of axioms, which are sentences in the theory. Assume a countably infinite set of variables. A term consists of either a variable, or a function applied to a number of terms equal to its arity. An atomic sentence is a predicate applied to a number of terms equal to its arity. A sentence may be one of: an atomic sentence. a negated sentence, P. a conjunction of sentences, PQ. a universal, x,P, where x is a variable. Define disjunctions (PQ:=(PQ)), implications (PQ:=(PQ)), and existentials (x,P:=x,P) from these other terms in the usual manner. A first-order theory has a countable set of axioms, each of which are sentences. So far this is fairly standard; see Peano arithmetic for an example of a first-order theory. I am omitting equality from first-order theories, as in general equality can be replaced with an equality predicate and axioms. A term or sentence is said to be closed if it has no free variables (that is, variables which are not quantified over). A closed term or sentence can be interpreted without reference to variable assignments, similar to a variable-free expression in a programming language. Let a constant be a function of arity zero. I will make the non-standard assumption that first-order theories have a countably infinite set of constants which do not appear in any axiom. This will help in defining inference rules and proving completeness. Generally it is not a problem to add a countably infinite set of constants to a first-order theory; it does not strengthen the theory (except in that it aids in proving universals, as defined below). Before defining inference rules, I will define models. A model of a theory consists of a set (the domain of discourse), interpretations of the functions (as mapping finite lists of values in the domain to other values), and interpretations of predicates (as mapping finite lists of values in the domain to Booleans), which satisfies the axioms. Closed terms have straightforward interpretations in a model, as evaluating the expression (as if in a programming language). Closed sentences have straightforward truth values, e.g. the formula P is true in a model when P is false in the model. Judgments and sequent rules A judgment is of the form ΓΔ, where Γ and Δ are (possibly infinite) countable sets of closed sentences. The judgment is true in a model if at least one of Γ is false or at least one of Δ is true. As notation, if Γ is a set of sentences and P is a sentence, then Γ,P denotes Γ{P}. The inference rules are expressed as sequents. A sequent has one judgment on the bottom, and a finite set of judgments on top. Intuitively, it states that if all the judgments on top are provable, the rule yields a proof of the judgment on the bottom. Along the way, I will show that each rule is sound: if every judgment on the top is true in all models, then t...
undefined
May 28, 2024 • 4min

LW - How to get nerds fascinated about mysterious chronic illness research? by riceissa

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How to get nerds fascinated about mysterious chronic illness research?, published by riceissa on May 28, 2024 on LessWrong. Like many nerdy people, back when I was healthy, I was interested in subjects like math, programming, and philosophy. But 5 years ago I got sick with a viral illness and never recovered. For the last couple of years I've been spending most of my now-limited brainpower trying to figure out how I can get better. I occasionally wonder why more people aren't interested in figuring out illnesses such as my own. Mysterious chronic illness research has a lot of the qualities of an interesting puzzle: There is a phenomenon with many confusing properties (e.g. the specific symptoms people get, why certain treatments work for some people but not others, why some people achieve temporary or permanent spontaneous remission), exactly like classic scientific mysteries. Social reward for solving it: Many people currently alive would be extremely grateful to have this problem solved. I believe the social reward would be much more direct and gratifying compared to most other hobby projects one could take on. When I think about what mysterious chronic illness research is missing, in order to make it of intellectual interest, here's what I can think of: Lack of a good feedback loop: With subjects like math and programming, or puzzle games, you can often get immediate feedback on whether your idea works, and this makes tinkering fun. Common hobbies like cooking and playing musical instruments also fits this pattern. In fact, I believe the lack of such feedback loops (mostly by being unable to access or afford equipment) personally kept me from becoming interested in biology, medicine, and similar subjects until when I was much older (compared to subjects like math and programming). I'm wondering how much my experience generalizes. Requires knowledge of many fields: Solving these illnesses probably requires knowledge of biochemistry, immunology, neuroscience, medicine, etc. This makes it less accessible compared to other hobbies. I don't think this is a huge barrier though. Are there other reasons? I'm interested in both speculation about why other people aren't interested, as well as personal reports of why you personally aren't interested enough to be working on solving mysterious chronic illnesses. If the lack of feedback loop is the main reason, I am wondering if there are ways to create such a feedback loop. For example, maybe chronically ill people can team up with healthy people to decide on what sort of information to log and which treatments to try. Chronically ill people have access to lab results and sensory data that healthy people don't, and healthy people have the brainpower that chronically ill people don't, so by teaming up, both sides can make more progress. It also occurs to me that maybe there is an outreach problem, in that people think medical professionals have this problem covered, and so there isn't much to do. If so, that's very sad because (1) most doctors don't have the sort of curiosity, mental inclinations, and training that would make them good at solving scientific mysteries (in fact, even most scientists don't receive this kind of training; this is why I've used the term "nerds" in the title of the question, to hint at wanting people with this property), and (2) for whatever crazy reason, doctors basically don't care about mysterious chronic illnesses and will often deny their existence and insist it's "just anxiety" or "in the patient's head" (I've personally been told this on a few occasions during doctor appointments), partly because their training and operating protocols are geared toward treating acute conditions and particular chronic conditions (such as cancer); (3) for whatever other crazy reason, the main group of doctors who...
undefined
May 27, 2024 • 30min

EA - Effective Altruism Infrastructure Fund: March 2024 recommendations by Linch

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Effective Altruism Infrastructure Fund: March 2024 recommendations, published by Linch on May 27, 2024 on The Effective Altruism Forum. This payout report covers the EA Infrastructure Fund's grantmaking from June 16th 2023 to March 31st 2024 (9.5 months). It follows our previous June 2023 payout report. Total funding recommended: $1,697,882 Total funding paid out[1]: $1,386,854 Number of grants paid out: 41 Acceptance rate (excluding desk rejections): 49/173 = 28.3% Acceptance rate (including desk rejections): 18.4% (49/266) Report authors: Linchuan Zhang (primary author), Caleb Parikh (interim fund chair), Harri Bescelli, Tom Barnes Funding breakdown[2] EA Groups: $456,780 granted across 18 grants EA-related Groups: $312,304 granted across 6 grants EA Content: $78,216 granted across 3 grants EA Services and Infrastructure: $179,957 granted across 6 grants Effective Giving: $53,000 granted across 2 grants Research: $216,535 granted across 6 grants [Total]: $1,386,854 8 of our grantees, who received a total of $521, 206, requested that our public reports for their grants are anonymized (they're written below). 1 grantee, who received $2,500, requested that we do not have a public report at all (You can read our policy on public reporting here). Our median response time over this period was 27 days, and our average response time was 42 days. For paid out grants, our median and average turnaround times are 57 and 61 days, respectively. Highlighted Grants Below we've highlighted some grants from this round that we thought were particularly interesting and that represent a relatively wide range of EAIF's activities. We hope that these reports will help donors make more informed decisions about whether to donate to EAIF, as well as help the wider community understand our work. Rethink Priorities Worldview Investigations Team ($168,000): Stipend to improve their Cross-Cause Cost-Effectiveness Model, including a portfolio builder to help individuals and foundations prioritize their philanthropic spending. [Grant type: Research] Note: This grant, while approved, has not yet been paid out, pending due diligence. This project fits in well with the EA Infrastructure Fund's tentative reorientation towards Principles-Focused Effective Altruism. The fund managers were highly impressed by the ambitious scope of this endeavor. Despite the EA movement existing for over a decade, there were no other publicly available cross-cause models with comparable breadth and an EA-informed perspective. The gap suggests that creating such a comprehensive model is much more challenging than it might initially seem. The fund managers admired the team's intention to produce a practical tool that funders could realistically use, rather than (e.g.) purely theoretical work on cause prioritization. However, some fund managers were concerned about the default values used in the Cross-Cause Model, which sometimes appeared insufficiently principled or overly conservative. Caleb Parikh, the primary investigator for this grant, provided more detailed thoughts in a comment. Overall, while excited to grant this project presently, the fund managers believe continued excitement for renewing the grant or offering similar grants hinges on a few key conditions: 1. The methodology employed in Rethink Priorities' Cost-Effectiveness Model should be broadly reasonable, and easy for EAIF's fund managers to endorse 2. The project should demonstrate potential to genuinely influence decision-making among major funders. 3. We should broadly believe the team's proposed improvements to the model are likely to be useful. Despite the high expected value, the fund managers acknowledge that real-world grantmaking decisions often involve holistic, contextual factors that may limit the direct impact of even thoughtfully-designed the...
undefined
May 27, 2024 • 7min

EA - Talent identification as an underappreciated career option by CE

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Talent identification as an underappreciated career option, published by CE on May 27, 2024 on The Effective Altruism Forum. Ambitious Impact is hiring! Ambitious Impact (AIM), formerly Charity Entrepreneurship (CE), runs training and incubation programs for high-impact career paths. Recently, we have been scaling our work. AIM has developed new programs to support talented individuals in impactful careers beyond founding a charity. These include grantmaking, nonprofit research, and for-profit entrepreneurship. We are currently seeking a Talent and Recruitment Manager (or Director) to help us find and select the most talented people for these programs, helping outstanding individuals put their skills to better use for the world. Talent identification, or vetting as we frequently refer to it within Ambitious Impact, has received little attention as a potentially highly impactful career. For the purposes of this article, we're defining talent identification/vetting as work that takes a recruitment process from the point of application closure to the point of a job offer. This is separate from the marketing or communications work essential in a recruitment process to advertise the role and encourage high-quality candidates to apply. We believe a talented individual taking our Talent and Recruitment Manager position would take a high-impact, high-leverage role. Similar opportunities for impact likely exist at other effective organizations scaling significantly. This article shares our thinking behind this. Why great vetting matters Our internal assessment from the past five years of running our Charity Entrepreneurship Incubation Program is that the quality of the co-founding team is the best predictor of high-impact charities. As we scale up, one of our biggest challenges is identifying highly talented, value-aligned individuals from thousands of candidates worldwide. The difference in outcomes between a bad and a good hire can be huge, and we think this variation is even more extreme with co-founders. The initial staff of an organization sets the pace, tone, and culture for the long term. As we scale, we must identify outstanding candidates for our new programs, including researchers and for-profit entrepreneurs where we have less track record and precedent to base decisions on. We believe high-quality vetting may be high leverage for organizations like AIM scaling significantly and looking to maintain a strong culture and values in who we hire and why. At small organizations, the leadership team can exert an outsized influence on the organization's culture and processes, stepping in directly to make changes where appropriate. Larger organizations cannot operate like this since there is too much work for leadership to be directly involved in. In these circumstances, effective talent identification is crucial. High-quality vetting involves finding potential new staff who are particularly talented and well-suited to the organization's culture, values, and aims. In this way, the person vetting must clearly and deeply understand what drives the organization's culture while identifying the answers, experiences, and traits a candidate may offer that best correlate with the organization's culture. A vetting manager must develop explicit, testable role and career fit models, pairing these with a deep understanding of what it takes to excel in various career paths. With this knowledge, they can design processes to select the right talent, enabling a team's rapid growth while protecting its core values and approaches. What might make you a great fit for vetting Vetting is more than a technical role. The ideal vetting officer is person-focused, with a strong practical understanding of human psychology and a keen eye for quickly assessing people. While these skills can be enhanced...
undefined
May 27, 2024 • 17min

LW - Intransitive Trust by Screwtape

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Intransitive Trust, published by Screwtape on May 27, 2024 on LessWrong. I. "Transitivity" is a property in mathematics and logic. Put simply, if something is transitive it means that there's a relationship between things where when x relates to y, and y relates to z, there's the same relationship between x and z. For a more concrete example, think of size. If my car is bigger than my couch, and my couch is bigger than my hat, you know that my car is bigger than my hat. (I am not a math major, and if there's a consensus in the comments that I'm using the wrong term here I can update the post.) This is a neat property. Lots of things do not have it. II. Consider the following circumstance: Bob is traveling home one night, late enough there isn't anyone else around. Bob sees a shooting star growing unusually bright, until it resolves into a disc-shaped machine with lights around the edges. He finds himself levitated up into the machine, gets poked and prodded by the creatures inside for a while, and then set back down on the road. Assuming Bob is a rational, rationalist, well-adjusted kind of guy, he now has a problem. Almost nobody in his life is going to believe a word of this. From Bob's perspective, what happened? He might not be certain aliens are real (maybe he's just had a schizophrenic break, or someone slipped him some interesting drugs in his coffee) but he has to be putting a substantially higher percentage on the idea. Sure, maybe he hallucinated the whole thing, but most of us don't have psychotic breaks on an average day. Break out Bayes. What are Bob's new odds aliens abduct people, given that his experiences? Let's say his prior probability on alien abductions being real was 1%, about one in a hundred. (That's P(A).) He decides the sensitivity of the test - that aliens actually abduct people, given he experienced aliens abducting him - is 5% since he knows he doesn't have any history of drug use, mental illness, or prankish friends with a lot of spare time and weird senses of humour. (That's P(B|A).) If you had asked him before his abduction what the false positive rate was - that is, how often people think they've been abducted by aliens even though they haven't - he'd say .1%, maybe one in a thousand people have seemingly causeless hallucinations or dedicated pranksters. (That's P(B|A).) P(AB)=P(BA)P(A)P(B) P(aliensexperiences)=P(experiencesaliens)P(aliens)P(experiences) P(Experiences)=P(ExperiencesAliens)P(Aliens)+P(ExperiencesAliens)P(Aliens) P(Experiences)=(0.050.01)+(0.0010.99) P(Experiences)=0.00149 P(AB)=.05.01.00149 P(A|B) = 0.3356, or about 33%. The whole abduction thing is a major update for Bob towards aliens. If it's not aliens, it's something really weird at least. Now consider Bob telling Carla, an equally rational, well-adjusted kind of gal with the same prior, about his experience. Bob and Carla are friends; not super close, but they've been running into each other at parties for a few years now. Carla has to deal with the same odds of mental breakdown or secret drug dosages that Bob does. Lets take lying completely off the table: for some reason, both Carla and Bob can perfectly trust that the other person isn't deliberately lying (maybe there's a magic Zone of Truth effect) so I think this satisfies Aumman's Agreement Theorem. Everything else is a real possibility though. She also has to consider the odds that Bob has a faulty memory or is hallucinating or she's misunderstanding him somehow. (True story: my undergraduate university had an active Live Action Roleplaying group. For a while, my significant other liked to tell people that our second date was going to watch the zombies chase people around the campus. This was true, in that lots of people looked like they had open wounds, were moaning "Braaaaains," and were chasing after ot...
undefined
May 27, 2024 • 4min

LW - Book review: Everything Is Predictable by PeterMcCluskey

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Book review: Everything Is Predictable, published by PeterMcCluskey on May 27, 2024 on LessWrong. Book review: Everything Is Predictable: How Bayesian Statistics Explain Our World, by Tom Chivers. Many have attempted to persuade the world to embrace a Bayesian worldview, but none have succeeded in reaching a broad audience. E.T. Jaynes' book has been a leading example, but its appeal is limited to those who find calculus enjoyable, making it unsuitable for a wider readership. Other attempts to engage a broader audience often focus on a narrower understanding, such as Bayes' Theorem, rather than the complete worldview. Claude's most fitting recommendation was Rationality: From AI to Zombies, but at 1,813 pages, it's too long and unstructured for me to comfortably recommend to most readers. (GPT-4o's suggestions were less helpful, focusing only on resources for practical problem-solving). Aubrey Clayton's book, Bernoulli's Fallacy: Statistical Illogic and the Crisis of Modern Science, only came to my attention because Chivers mentioned it, offering mixed reviews that hint at why it remained unnoticed. Chivers has done his best to mitigate this gap. While his book won't reach as many readers as I'd hoped, I'm comfortable recommending it as the standard introduction to the Bayesian worldview for most readers. Basics Chivers guides readers through the fundamentals of Bayes' Theorem, offering little that's extraordinary in this regard. A fair portion of the book is dedicated to explaining why probability should be understood as a function of our ignorance, contrasting with the frequentist approach that attempts to treat probability as if it existed independently of our minds. The book has many explanations of how frequentists are wrong, yet concedes that the leading frequentists are not stupid. Frequentism's problems often stem from a misguided effort to achieve more objectivity in science than seems possible. The only exception to this mostly fair depiction of frequentists is a section titled "Are Frequentists Racist?". Chivers repeats Clayton's diatribe affirming this, treating the diatribe more seriously than it deserves, before dismissing it. (Frequentists were racist when racism was popular. I haven't seen any clear evidence of whether Bayesians behaved differently). The Replication Crisis Chivers explains frequentism's role in the replication crisis. A fundamental drawback of p-values is that they indicate the likelihood of the data given a hypothesis, which differs from the more important question of how likely the hypothesis is given the data. Here, Chivers (and many frequentists) overlook a point raised by Deborah Mayo: p-values can help determine if an experiment had a sufficiently large sample size. Deciding whether to conduct a larger experiment can be as ew: Everything Is Predictablecrucial as drawing the best inference from existing data. The perversity of common p-value usage is exemplified by Lindley's paradox: a p-value below 0.05 can sometimes provide Bayesian evidence against the tested hypothesis. A p-value of 0.04 indicates that the data are unlikely given the null hypothesis, but we can construct scenarios where the data are even less likely under the hypothesis you wish to support. A key factor in the replication crisis is the reward system for scientists and journals, which favors publishing surprising results. The emphasis on p-values allows journals to accept more surprising results compared to a Bayesian approach, creating a clear disincentive for individual scientists or journals to adopt Bayesian methods before others do. Minds Approximate Bayes The book concludes by describing how human minds employ heuristics that closely approximate the Bayesian approach. This includes a well-written summary of how predictive processing works, demonstrating ...
undefined
May 27, 2024 • 41min

LW - I am the Golden Gate Bridge by Zvi

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: I am the Golden Gate Bridge, published by Zvi on May 27, 2024 on LessWrong. Easily Interpretable Summary of New Interpretability Paper Anthropic has identified (full paper here) how millions of concepts are represented inside Claude Sonnet, their current middleweight model. The features activate across modalities and languages as tokens approach the associated context. This scales up previous findings from smaller models. By looking at neuron clusters, they defined a distance measure between clusters. So the Golden Gate Bridge is close to various San Francisco and California things, and inner conflict relates to various related conceptual things, and so on. Then it gets more interesting. Importantly, we can also manipulate these features, artificially amplifying or suppressing them to see how Claude's responses change. If you sufficiently amplify the feature for the Golden Gate Bridge, Claude starts to think it is the Golden Gate Bridge. As in, it thinks it is the physical bridge, and also it gets obsessed, bringing it up in almost every query. If you amplify a feature that fires when reading a scam email, you can get Claude to write scam emails. Turn up sycophancy, and it will go well over the top talking how great you are. They note they have discovered features corresponding to various potential misuses, forms of bias and things like power-seeking, manipulation and secrecy. That means that, if you had the necessary access and knowledge, you could amplify such features. Like most powers, one could potentially use this for good or evil. They speculate you could watch the impact on features during fine tuning, or turn down or even entirely remove undesired features. Or amplify desired ones. Checking for certain patterns is proposed as a 'test for safety,' which seems useful but also is playing with fire. They have a short part at the end comparing their work to other methods. They note that dictionary learning need happen only once per model, and the additional work after that is typically inexpensive and fast, and that it allows looking for anything at all and finding the unexpected. It is a big deal that this allows you to be surprised. They think this has big advantages over old strategies such as linear probes, even if those strategies still have their uses. One Weird Trick You know what AI labs are really good at? Scaling. It is their one weird trick. So guess what Anthropic did here? They scaled the autoencoders to Claude Sonnet. Our general approach to understanding Claude 3 Sonnet is based on the linear representation hypothesis (see e.g.) and the superposition hypothesis. For an introduction to these ideas, we refer readers tothe Background and Motivation section ofToy Models . At a high level, the linear representation hypothesis suggests that neural networks represent meaningful concepts - referred to as features - as directions in their activation spaces. The superposition hypothesis accepts the idea of linear representations and further hypothesizes that neural networks use the existence of almost-orthogonal directions in high-dimensional spaces to represent more features than there are dimensions. If one believes these hypotheses, the natural approach is to use a standard method called dictionary learning. … Our SAE consists of two layers. The first layer ("encoder") maps the activity to a higher-dimensional layer via a learned linear transformation followed by a ReLU nonlinearity. We refer to the units of this high-dimensional layer as "features." The second layer ("decoder") attempts to reconstruct the model activations via a linear transformation of the feature activations. The model is trained to minimize a combination of (1) reconstruction error and (2) an L1 regularization penalty on the feature activations, which incentivizes sparsity. Once the S...
undefined
May 27, 2024 • 5min

LW - Maybe Anthropic's Long-Term Benefit Trust is powerless by Zach Stein-Perlman

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Maybe Anthropic's Long-Term Benefit Trust is powerless, published by Zach Stein-Perlman on May 27, 2024 on LessWrong. Crossposted from AI Lab Watch. Subscribe on Substack. Introduction Anthropic has an unconventional governance mechanism: an independent "Long-Term Benefit Trust" elects some of its board. Anthropic sometimes emphasizes that the Trust is an experiment, but mostly points to it to argue that Anthropic will be able to promote safety and benefit-sharing over profit.[1] But the Trust's details have not been published and some information Anthropic has shared is concerning. In particular, Anthropic's stockholders can apparently overrule, modify, or abrogate the Trust, and the details are unclear. Anthropic has not publicly demonstrated that the Trust would be able to actually do anything that stockholders don't like. The facts There are three sources of public information on the Trust: The Long-Term Benefit Trust (Anthropic 2023) Anthropic Long-Term Benefit Trust (Morley et al. 2023) The $1 billion gamble to ensure AI doesn't destroy humanity (Vox: Matthews 2023) They say there's a new class of stock, held by the Trust/Trustees. This stock allows the Trust to elect some board members and will allow them to elect a majority of the board by 2027. But: 1. Morley et al.: "the Trust Agreement also authorizes the Trust to be enforced by the company and by groups of the company's stockholders who have held a sufficient percentage of the company's equity for a sufficient period of time," rather than the Trustees. 1. I don't know what this means. 2. Morley et al.: the Trust and its powers can be amended "by a supermajority of stockholders. . . . [This] operates as a kind of failsafe against the actions of the Voting Trustees and safeguards the interests of stockholders." Anthropic: "the Trust and its powers [can be changed] without the consent of the Trustees if sufficiently large supermajorities of the stockholders agree." 1. It's impossible to assess this "failsafe" without knowing the thresholds for these "supermajorities." Also, a small number of investors - currently, perhaps Amazon and Google - may control a large fraction of shares. It may be easy for profit-motivated investors to reach a supermajority. 3. Maybe there are other issues with the Trust Agreement - we can't see it and so can't know. 4. Vox: the Trust "will elect a fifth member of the board this fall," viz. Fall 2023. 1. Anthropic has not said whether that happened nor who is on the board these days (nor who is on the Trust these days). Conclusion Public information is consistent with the Trust being quite subordinate to stockholders, likely to lose their powers if they do anything stockholders dislike. (Even if stockholders' formal powers over the Trust are never used, that threat could prevent the Trust from acting contrary to the stockholders' interests.) Anthropic knows this and has decided not to share the information that the public needs to evaluate the Trust. This suggests that Anthropic benefits from ambiguity because the details would be seen as bad. I basically fail to imagine a scenario where publishing the Trust Agreement is very costly to Anthropic - especially just sharing certain details (like sharing percentages rather than saying "a supermajority") - except that the details are weak and would make Anthropic look bad.[2] Maybe it would suffice to let an auditor see the Trust Agreement and publish their impression of it. But I don't see why Anthropic won't publish it. Maybe the Trust gives Anthropic strong independent accountability - or rather, maybe it will by default after (unspecified) time- and funding-based milestones. But only if Anthropic's board and stockholders have substantially less power over it than they might - or if they will exercise great restraint in using their p...
undefined
May 27, 2024 • 23min

EA - Being a young person in EA: my journey into the EA community by ScientificS

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Being a young person in EA: my journey into the EA community, published by ScientificS on May 27, 2024 on The Effective Altruism Forum. Summary I came across a program for curious young people called Leaf through an online course tailored about how to save and help lives, then attended a residential where I further explored these ideas. Going from being told there is a right or wrong way in my school system, I explored EA through the lens of learning techniques which I may or may not use to try to get closer to my goals- of helping others and saving lives. Through the adjacent opportunities such as books, talks, the 80k podcast, the network of people and the further online events, I have continued exploring the cause areas and have now changed my career progression plans. IMPORTANT: this article was written based on my personal views, and with the motivation of documenting my personal journey but without any aim in terms of future involvement. Of course, please comment or contact me if you want to know more, or for clarification, or to ask about anything, but I am a young student, with disabilities, so may not respond very quickly, and may not be able to do something really intensive such as a report or breakdown of a topic... This article was not proofread, and nothing was ran past any entities mentioned. Everything is my own opinion and I only have positive experiences overall, so honestly, I don't think there is anything negative to say so far, but it is truthful and I am sorry if you disagree and please let me know. I talk a lot about Leaf, this is a non profit started that helps young people between 16-19 do more good by helping them see how and with what techniques they can do good in the world even at a young age, and the fact that they help with career guidance, university and degree choices, and teaching about techniques was very positive and useful for me, but no one from Leaf approves or pre saw this article (or even knew I was writing it before) and I would be happy to be contacted for views, opinions or anything else but I am not paid or affiliated to them, just grateful genuinely for their help. How it started: a chance encounter with Leaf I had never heard of EA until around February 2023. I had been searching for summer university opportunities and came across Leaf, and at first believed it was simply an Oxford summer school to do with learning and applied on the original form, for the online cohort. Doing the extended application, I came across questions such as 'Watch this video on what the future might be like, what do you agree with, now what would you disagree with, now how would you critique your disagreements' and I was shocked; we were asked to critique our own critiques and extend our arguments (with a very tight word limit that definitely caused a lot of editing and annoyance at first). As I went further, questions like 'What is the weirdest opinion you hold' or 'What is the worst injustice in the world, how would you solve it with 1 million pounds' definitely intrigued me... what normal summer program asks us about this? I explored further and fell more in love, here I was, a 16 year old, used to being taught the 'Right Opinion TM' without ever being asked what I thought or why I might change my mind. I had never been asked what I would do to solve injustices, or to even consider the fact that I myself can do so. I was invited to a discord server (a very steep learning curve for someone who never had social media before) where at first it was me and 2-3 people and I cautiously tested the water with an introduction mentioning that I enjoy discussing the education system. What followed was me and the 2 peers amassing over 180 messages within a week, from topics such as quantum computing, navigation, meritocracy, genetic basis of intelligence,...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app