The Nonlinear Library: LessWrong

The Nonlinear Fund
undefined
May 28, 2024 • 4min

LW - When Are Circular Definitions A Problem? by johnswentworth

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: When Are Circular Definitions A Problem?, published by johnswentworth on May 28, 2024 on LessWrong. Disclaimer: if you are using a definition in a nonmathematical piece of writing, you are probably making a mistake; you should just get rid of the definition and instead use a few examples. This applies double to people who think they are being "rigorous" by defining things but are not actually doing any math. Nonetheless, definitions are still useful and necessary when one is ready to do math, and some pre-formal conceptual work is often needed to figure out which mathematical definitions to use; thus the usefulness of this post. Suppose I'm negotiating with a landlord about a pet, and in the process I ask the landlord what counts as a "big dog". The landlord replies "Well, any dog that's not small". I ask what counts as a "small dog". The landlord replies "Any dog that's not big". Obviously this is "not a proper definition", in some sense. If that actually happened in real life, presumably the landlord would say it somewhat tongue-in-cheek. But what exactly is wrong with defining big dogs as not small, and small dogs as not big? One might be tempted to say "It's a circular definition!", with the understanding that circular definitions are always problematic in some way. But then consider another example, this time mathematical: Define x as a real number equal to y-1: x = y-1 Define y as a real number equal to x/2: y = x/2 These definitions are circular! I've defined x in terms of y, and y in terms of x. And yet, it's totally fine; a little algebra shows that we've defined x = -2 and y = -1. We do this thing all the time when using math, and it works great in practice. So clearly circular definitions are not inherently problematic. When are they problematic? We could easily modify the math example to make a problematic definition: Define x as a real number equal to y-1: x=y-1 Define y as a real number equal to x+1: y=x+1 What's wrong with this definition? Well, the two equations - the two definitions - are redundant; they both tell us the same thing. So together, they're insufficient to fully specify x and y. Given the two (really one) definitions, x and y remain extremely underdetermined; either one could be any real number! And that's the same problem we see in the big dog/small dog example: if I define a big dog as not small, and a small dog as not big, then my two definitions are redundant. Together, they're insufficient to tell me which dogs are or are not big. Given the two (really one) definitions, big dog and small dog remain extremely underdetermined; any dog could be big or small! Application: Clustering This post was originally motivated by a comment thread about circular definitions in clustering: Define the points in cluster i as those which statistically look like they're generated from the parameters of cluster i Define the parameters of cluster i as an average of of points in cluster i These definitions are circular: we define cluster-membership of points based on cluster parameters, and cluster parameters based on cluster-membership of points. And yet, widely-used EM clustering algorithms are essentially iterative solvers for equations which express basically the two definitions above. They work great in practice. While they don't necessarily fully specify one unique solution, for almost all data sets they at least give locally unique solutions, which is often all we need (underdetermination between a small finite set of possibilities is often fine, it's when definitions allow for a whole continuum that we're really in trouble). Circularity in clustering is particularly important, insofar as we buy that words point to clusters in thingspace. If words typically point to clusters in thingspace, and clusters are naturally defined circular...
undefined
May 28, 2024 • 55min

LW - OpenAI: Fallout by Zvi

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OpenAI: Fallout, published by Zvi on May 28, 2024 on LessWrong. Previously: OpenAI: Exodus (contains links at top to earlier episodes), Do Not Mess With Scarlett Johansson We have learned more since last week. It's worse than we knew. How much worse? In which ways? With what exceptions? That's what this post is about. The Story So Far For years, employees who left OpenAI consistently had their vested equity explicitly threatened with confiscation and the lack of ability to sell it, and were given short timelines to sign documents or else. Those documents contained highly aggressive NDA and non disparagement (and non interference) clauses, including the NDA preventing anyone from revealing these clauses. No one knew about this until recently, because until Daniel Kokotajlo everyone signed, and then they could not talk about it. Then Daniel refused to sign, Kelsey Piper started reporting, and a lot came out. Here is Altman's statement from May 18, with its new community note. Evidence strongly suggests the above post was, shall we say, 'not consistently candid.' The linked article includes a document dump and other revelations, which I cover. Then there are the other recent matters. Ilya Sutskever and Jan Leike, the top two safety researchers at OpenAI, resigned, part of an ongoing pattern of top safety researchers leaving OpenAI. The team they led, Superalignment, had been publicly promised 20% of secured compute going forward, but that commitment was not honored. Jan Leike expressed concerns that OpenAI was not on track to be ready for even the next generation of models needs for safety. OpenAI created the Sky voice for GPT-4o, which evoked consistent reactions that it sounded like Scarlett Johansson, who voiced the AI in the movie Her, Altman's favorite movie. Altman asked her twice to lend her voice to ChatGPT. Altman tweeted 'her.' Half the articles about GPT-4o mentioned Her as a model. OpenAI executives continue to claim that this was all a coincidence, but have taken down the Sky voice. (Also six months ago the board tried to fire Sam Altman and failed, and all that.) A Note on Documents from OpenAI The source for the documents from OpenAI that are discussed here, and the communications between OpenAI and its employees and ex-employees, is Kelsey Piper in Vox, unless otherwise stated. She went above and beyond, and shares screenshots of the documents. For superior readability and searchability, I have converted those images to text. Some Good News But There is a Catch OpenAI has indeed made a large positive step. They say they are releasing former employees from their nondisparagement agreements and promising not to cancel vested equity under any circumstances. Kelsey Piper: There are some positive signs that change is happening at OpenAI. The company told me, "We are identifying and reaching out to former employees who signed a standard exit agreement to make it clear that OpenAI has not and will not cancel their vested equity and releases them from nondisparagement obligations." Bloomberg confirms that OpenAI has promised not to cancel vested equity under any circumstances, and to release all employees from one-directional non-disparagement agreements. And we have this confirmation from Andrew Carr. Andrew Carr: I guess that settles that. Tanner Lund: Is this legally binding? Andrew Carr: I notice they are also including the non-solicitation provisions as not enforced. (Note that certain key people, like Dario Amodei, plausibly negotiated two-way agreements, which would mean theirs would still apply. I would encourage anyone in that category who is now free of the clause, even if they have no desire to disparage OpenAI, to simply say 'I am under no legal obligation not to disparage OpenAI.') These actions by OpenAI are helpful. They are necessary. They are no...
undefined
May 28, 2024 • 23min

LW - Understanding Gödel's completeness theorem by jessicata

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Understanding Gödel's completeness theorem, published by jessicata on May 28, 2024 on LessWrong. In this post I prove a variant of Gödel's completeness theorem. My intention has been to really understand the theorem, so that I am not simply shuffling symbols around, but am actually understanding why it is true. I hope it is helpful for at least some other people. For sources, I have myself relied mainly on Srivastava's presentation. I have relied a lot on intuitions about sequent calculus; while I present a sequent calculus in this post, this is not a complete introduction to sequent calculus. I recommend Logitext as an online proof tool for gaining more intuition about sequent proofs. I am familiar with sequent calculus mainly through type theory. First-order theories and models A first-order theory consists of: A countable set of functions, which each have an arity, a non-negative integer. A countable set of predicates, which also have non-negative integer arities. A countable set of axioms, which are sentences in the theory. Assume a countably infinite set of variables. A term consists of either a variable, or a function applied to a number of terms equal to its arity. An atomic sentence is a predicate applied to a number of terms equal to its arity. A sentence may be one of: an atomic sentence. a negated sentence, P. a conjunction of sentences, PQ. a universal, x,P, where x is a variable. Define disjunctions (PQ:=(PQ)), implications (PQ:=(PQ)), and existentials (x,P:=x,P) from these other terms in the usual manner. A first-order theory has a countable set of axioms, each of which are sentences. So far this is fairly standard; see Peano arithmetic for an example of a first-order theory. I am omitting equality from first-order theories, as in general equality can be replaced with an equality predicate and axioms. A term or sentence is said to be closed if it has no free variables (that is, variables which are not quantified over). A closed term or sentence can be interpreted without reference to variable assignments, similar to a variable-free expression in a programming language. Let a constant be a function of arity zero. I will make the non-standard assumption that first-order theories have a countably infinite set of constants which do not appear in any axiom. This will help in defining inference rules and proving completeness. Generally it is not a problem to add a countably infinite set of constants to a first-order theory; it does not strengthen the theory (except in that it aids in proving universals, as defined below). Before defining inference rules, I will define models. A model of a theory consists of a set (the domain of discourse), interpretations of the functions (as mapping finite lists of values in the domain to other values), and interpretations of predicates (as mapping finite lists of values in the domain to Booleans), which satisfies the axioms. Closed terms have straightforward interpretations in a model, as evaluating the expression (as if in a programming language). Closed sentences have straightforward truth values, e.g. the formula P is true in a model when P is false in the model. Judgments and sequent rules A judgment is of the form ΓΔ, where Γ and Δ are (possibly infinite) countable sets of closed sentences. The judgment is true in a model if at least one of Γ is false or at least one of Δ is true. As notation, if Γ is a set of sentences and P is a sentence, then Γ,P denotes Γ{P}. The inference rules are expressed as sequents. A sequent has one judgment on the bottom, and a finite set of judgments on top. Intuitively, it states that if all the judgments on top are provable, the rule yields a proof of the judgment on the bottom. Along the way, I will show that each rule is sound: if every judgment on the top is true in all models, then t...
undefined
May 28, 2024 • 4min

LW - How to get nerds fascinated about mysterious chronic illness research? by riceissa

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How to get nerds fascinated about mysterious chronic illness research?, published by riceissa on May 28, 2024 on LessWrong. Like many nerdy people, back when I was healthy, I was interested in subjects like math, programming, and philosophy. But 5 years ago I got sick with a viral illness and never recovered. For the last couple of years I've been spending most of my now-limited brainpower trying to figure out how I can get better. I occasionally wonder why more people aren't interested in figuring out illnesses such as my own. Mysterious chronic illness research has a lot of the qualities of an interesting puzzle: There is a phenomenon with many confusing properties (e.g. the specific symptoms people get, why certain treatments work for some people but not others, why some people achieve temporary or permanent spontaneous remission), exactly like classic scientific mysteries. Social reward for solving it: Many people currently alive would be extremely grateful to have this problem solved. I believe the social reward would be much more direct and gratifying compared to most other hobby projects one could take on. When I think about what mysterious chronic illness research is missing, in order to make it of intellectual interest, here's what I can think of: Lack of a good feedback loop: With subjects like math and programming, or puzzle games, you can often get immediate feedback on whether your idea works, and this makes tinkering fun. Common hobbies like cooking and playing musical instruments also fits this pattern. In fact, I believe the lack of such feedback loops (mostly by being unable to access or afford equipment) personally kept me from becoming interested in biology, medicine, and similar subjects until when I was much older (compared to subjects like math and programming). I'm wondering how much my experience generalizes. Requires knowledge of many fields: Solving these illnesses probably requires knowledge of biochemistry, immunology, neuroscience, medicine, etc. This makes it less accessible compared to other hobbies. I don't think this is a huge barrier though. Are there other reasons? I'm interested in both speculation about why other people aren't interested, as well as personal reports of why you personally aren't interested enough to be working on solving mysterious chronic illnesses. If the lack of feedback loop is the main reason, I am wondering if there are ways to create such a feedback loop. For example, maybe chronically ill people can team up with healthy people to decide on what sort of information to log and which treatments to try. Chronically ill people have access to lab results and sensory data that healthy people don't, and healthy people have the brainpower that chronically ill people don't, so by teaming up, both sides can make more progress. It also occurs to me that maybe there is an outreach problem, in that people think medical professionals have this problem covered, and so there isn't much to do. If so, that's very sad because (1) most doctors don't have the sort of curiosity, mental inclinations, and training that would make them good at solving scientific mysteries (in fact, even most scientists don't receive this kind of training; this is why I've used the term "nerds" in the title of the question, to hint at wanting people with this property), and (2) for whatever crazy reason, doctors basically don't care about mysterious chronic illnesses and will often deny their existence and insist it's "just anxiety" or "in the patient's head" (I've personally been told this on a few occasions during doctor appointments), partly because their training and operating protocols are geared toward treating acute conditions and particular chronic conditions (such as cancer); (3) for whatever other crazy reason, the main group of doctors who...
undefined
May 27, 2024 • 17min

LW - Intransitive Trust by Screwtape

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Intransitive Trust, published by Screwtape on May 27, 2024 on LessWrong. I. "Transitivity" is a property in mathematics and logic. Put simply, if something is transitive it means that there's a relationship between things where when x relates to y, and y relates to z, there's the same relationship between x and z. For a more concrete example, think of size. If my car is bigger than my couch, and my couch is bigger than my hat, you know that my car is bigger than my hat. (I am not a math major, and if there's a consensus in the comments that I'm using the wrong term here I can update the post.) This is a neat property. Lots of things do not have it. II. Consider the following circumstance: Bob is traveling home one night, late enough there isn't anyone else around. Bob sees a shooting star growing unusually bright, until it resolves into a disc-shaped machine with lights around the edges. He finds himself levitated up into the machine, gets poked and prodded by the creatures inside for a while, and then set back down on the road. Assuming Bob is a rational, rationalist, well-adjusted kind of guy, he now has a problem. Almost nobody in his life is going to believe a word of this. From Bob's perspective, what happened? He might not be certain aliens are real (maybe he's just had a schizophrenic break, or someone slipped him some interesting drugs in his coffee) but he has to be putting a substantially higher percentage on the idea. Sure, maybe he hallucinated the whole thing, but most of us don't have psychotic breaks on an average day. Break out Bayes. What are Bob's new odds aliens abduct people, given that his experiences? Let's say his prior probability on alien abductions being real was 1%, about one in a hundred. (That's P(A).) He decides the sensitivity of the test - that aliens actually abduct people, given he experienced aliens abducting him - is 5% since he knows he doesn't have any history of drug use, mental illness, or prankish friends with a lot of spare time and weird senses of humour. (That's P(B|A).) If you had asked him before his abduction what the false positive rate was - that is, how often people think they've been abducted by aliens even though they haven't - he'd say .1%, maybe one in a thousand people have seemingly causeless hallucinations or dedicated pranksters. (That's P(B|A).) P(AB)=P(BA)P(A)P(B) P(aliensexperiences)=P(experiencesaliens)P(aliens)P(experiences) P(Experiences)=P(ExperiencesAliens)P(Aliens)+P(ExperiencesAliens)P(Aliens) P(Experiences)=(0.050.01)+(0.0010.99) P(Experiences)=0.00149 P(AB)=.05.01.00149 P(A|B) = 0.3356, or about 33%. The whole abduction thing is a major update for Bob towards aliens. If it's not aliens, it's something really weird at least. Now consider Bob telling Carla, an equally rational, well-adjusted kind of gal with the same prior, about his experience. Bob and Carla are friends; not super close, but they've been running into each other at parties for a few years now. Carla has to deal with the same odds of mental breakdown or secret drug dosages that Bob does. Lets take lying completely off the table: for some reason, both Carla and Bob can perfectly trust that the other person isn't deliberately lying (maybe there's a magic Zone of Truth effect) so I think this satisfies Aumman's Agreement Theorem. Everything else is a real possibility though. She also has to consider the odds that Bob has a faulty memory or is hallucinating or she's misunderstanding him somehow. (True story: my undergraduate university had an active Live Action Roleplaying group. For a while, my significant other liked to tell people that our second date was going to watch the zombies chase people around the campus. This was true, in that lots of people looked like they had open wounds, were moaning "Braaaaains," and were chasing after ot...
undefined
May 27, 2024 • 4min

LW - Book review: Everything Is Predictable by PeterMcCluskey

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Book review: Everything Is Predictable, published by PeterMcCluskey on May 27, 2024 on LessWrong. Book review: Everything Is Predictable: How Bayesian Statistics Explain Our World, by Tom Chivers. Many have attempted to persuade the world to embrace a Bayesian worldview, but none have succeeded in reaching a broad audience. E.T. Jaynes' book has been a leading example, but its appeal is limited to those who find calculus enjoyable, making it unsuitable for a wider readership. Other attempts to engage a broader audience often focus on a narrower understanding, such as Bayes' Theorem, rather than the complete worldview. Claude's most fitting recommendation was Rationality: From AI to Zombies, but at 1,813 pages, it's too long and unstructured for me to comfortably recommend to most readers. (GPT-4o's suggestions were less helpful, focusing only on resources for practical problem-solving). Aubrey Clayton's book, Bernoulli's Fallacy: Statistical Illogic and the Crisis of Modern Science, only came to my attention because Chivers mentioned it, offering mixed reviews that hint at why it remained unnoticed. Chivers has done his best to mitigate this gap. While his book won't reach as many readers as I'd hoped, I'm comfortable recommending it as the standard introduction to the Bayesian worldview for most readers. Basics Chivers guides readers through the fundamentals of Bayes' Theorem, offering little that's extraordinary in this regard. A fair portion of the book is dedicated to explaining why probability should be understood as a function of our ignorance, contrasting with the frequentist approach that attempts to treat probability as if it existed independently of our minds. The book has many explanations of how frequentists are wrong, yet concedes that the leading frequentists are not stupid. Frequentism's problems often stem from a misguided effort to achieve more objectivity in science than seems possible. The only exception to this mostly fair depiction of frequentists is a section titled "Are Frequentists Racist?". Chivers repeats Clayton's diatribe affirming this, treating the diatribe more seriously than it deserves, before dismissing it. (Frequentists were racist when racism was popular. I haven't seen any clear evidence of whether Bayesians behaved differently). The Replication Crisis Chivers explains frequentism's role in the replication crisis. A fundamental drawback of p-values is that they indicate the likelihood of the data given a hypothesis, which differs from the more important question of how likely the hypothesis is given the data. Here, Chivers (and many frequentists) overlook a point raised by Deborah Mayo: p-values can help determine if an experiment had a sufficiently large sample size. Deciding whether to conduct a larger experiment can be as ew: Everything Is Predictablecrucial as drawing the best inference from existing data. The perversity of common p-value usage is exemplified by Lindley's paradox: a p-value below 0.05 can sometimes provide Bayesian evidence against the tested hypothesis. A p-value of 0.04 indicates that the data are unlikely given the null hypothesis, but we can construct scenarios where the data are even less likely under the hypothesis you wish to support. A key factor in the replication crisis is the reward system for scientists and journals, which favors publishing surprising results. The emphasis on p-values allows journals to accept more surprising results compared to a Bayesian approach, creating a clear disincentive for individual scientists or journals to adopt Bayesian methods before others do. Minds Approximate Bayes The book concludes by describing how human minds employ heuristics that closely approximate the Bayesian approach. This includes a well-written summary of how predictive processing works, demonstrating ...
undefined
May 27, 2024 • 41min

LW - I am the Golden Gate Bridge by Zvi

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: I am the Golden Gate Bridge, published by Zvi on May 27, 2024 on LessWrong. Easily Interpretable Summary of New Interpretability Paper Anthropic has identified (full paper here) how millions of concepts are represented inside Claude Sonnet, their current middleweight model. The features activate across modalities and languages as tokens approach the associated context. This scales up previous findings from smaller models. By looking at neuron clusters, they defined a distance measure between clusters. So the Golden Gate Bridge is close to various San Francisco and California things, and inner conflict relates to various related conceptual things, and so on. Then it gets more interesting. Importantly, we can also manipulate these features, artificially amplifying or suppressing them to see how Claude's responses change. If you sufficiently amplify the feature for the Golden Gate Bridge, Claude starts to think it is the Golden Gate Bridge. As in, it thinks it is the physical bridge, and also it gets obsessed, bringing it up in almost every query. If you amplify a feature that fires when reading a scam email, you can get Claude to write scam emails. Turn up sycophancy, and it will go well over the top talking how great you are. They note they have discovered features corresponding to various potential misuses, forms of bias and things like power-seeking, manipulation and secrecy. That means that, if you had the necessary access and knowledge, you could amplify such features. Like most powers, one could potentially use this for good or evil. They speculate you could watch the impact on features during fine tuning, or turn down or even entirely remove undesired features. Or amplify desired ones. Checking for certain patterns is proposed as a 'test for safety,' which seems useful but also is playing with fire. They have a short part at the end comparing their work to other methods. They note that dictionary learning need happen only once per model, and the additional work after that is typically inexpensive and fast, and that it allows looking for anything at all and finding the unexpected. It is a big deal that this allows you to be surprised. They think this has big advantages over old strategies such as linear probes, even if those strategies still have their uses. One Weird Trick You know what AI labs are really good at? Scaling. It is their one weird trick. So guess what Anthropic did here? They scaled the autoencoders to Claude Sonnet. Our general approach to understanding Claude 3 Sonnet is based on the linear representation hypothesis (see e.g.) and the superposition hypothesis. For an introduction to these ideas, we refer readers tothe Background and Motivation section ofToy Models . At a high level, the linear representation hypothesis suggests that neural networks represent meaningful concepts - referred to as features - as directions in their activation spaces. The superposition hypothesis accepts the idea of linear representations and further hypothesizes that neural networks use the existence of almost-orthogonal directions in high-dimensional spaces to represent more features than there are dimensions. If one believes these hypotheses, the natural approach is to use a standard method called dictionary learning. … Our SAE consists of two layers. The first layer ("encoder") maps the activity to a higher-dimensional layer via a learned linear transformation followed by a ReLU nonlinearity. We refer to the units of this high-dimensional layer as "features." The second layer ("decoder") attempts to reconstruct the model activations via a linear transformation of the feature activations. The model is trained to minimize a combination of (1) reconstruction error and (2) an L1 regularization penalty on the feature activations, which incentivizes sparsity. Once the S...
undefined
May 27, 2024 • 5min

LW - Maybe Anthropic's Long-Term Benefit Trust is powerless by Zach Stein-Perlman

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Maybe Anthropic's Long-Term Benefit Trust is powerless, published by Zach Stein-Perlman on May 27, 2024 on LessWrong. Crossposted from AI Lab Watch. Subscribe on Substack. Introduction Anthropic has an unconventional governance mechanism: an independent "Long-Term Benefit Trust" elects some of its board. Anthropic sometimes emphasizes that the Trust is an experiment, but mostly points to it to argue that Anthropic will be able to promote safety and benefit-sharing over profit.[1] But the Trust's details have not been published and some information Anthropic has shared is concerning. In particular, Anthropic's stockholders can apparently overrule, modify, or abrogate the Trust, and the details are unclear. Anthropic has not publicly demonstrated that the Trust would be able to actually do anything that stockholders don't like. The facts There are three sources of public information on the Trust: The Long-Term Benefit Trust (Anthropic 2023) Anthropic Long-Term Benefit Trust (Morley et al. 2023) The $1 billion gamble to ensure AI doesn't destroy humanity (Vox: Matthews 2023) They say there's a new class of stock, held by the Trust/Trustees. This stock allows the Trust to elect some board members and will allow them to elect a majority of the board by 2027. But: 1. Morley et al.: "the Trust Agreement also authorizes the Trust to be enforced by the company and by groups of the company's stockholders who have held a sufficient percentage of the company's equity for a sufficient period of time," rather than the Trustees. 1. I don't know what this means. 2. Morley et al.: the Trust and its powers can be amended "by a supermajority of stockholders. . . . [This] operates as a kind of failsafe against the actions of the Voting Trustees and safeguards the interests of stockholders." Anthropic: "the Trust and its powers [can be changed] without the consent of the Trustees if sufficiently large supermajorities of the stockholders agree." 1. It's impossible to assess this "failsafe" without knowing the thresholds for these "supermajorities." Also, a small number of investors - currently, perhaps Amazon and Google - may control a large fraction of shares. It may be easy for profit-motivated investors to reach a supermajority. 3. Maybe there are other issues with the Trust Agreement - we can't see it and so can't know. 4. Vox: the Trust "will elect a fifth member of the board this fall," viz. Fall 2023. 1. Anthropic has not said whether that happened nor who is on the board these days (nor who is on the Trust these days). Conclusion Public information is consistent with the Trust being quite subordinate to stockholders, likely to lose their powers if they do anything stockholders dislike. (Even if stockholders' formal powers over the Trust are never used, that threat could prevent the Trust from acting contrary to the stockholders' interests.) Anthropic knows this and has decided not to share the information that the public needs to evaluate the Trust. This suggests that Anthropic benefits from ambiguity because the details would be seen as bad. I basically fail to imagine a scenario where publishing the Trust Agreement is very costly to Anthropic - especially just sharing certain details (like sharing percentages rather than saying "a supermajority") - except that the details are weak and would make Anthropic look bad.[2] Maybe it would suffice to let an auditor see the Trust Agreement and publish their impression of it. But I don't see why Anthropic won't publish it. Maybe the Trust gives Anthropic strong independent accountability - or rather, maybe it will by default after (unspecified) time- and funding-based milestones. But only if Anthropic's board and stockholders have substantially less power over it than they might - or if they will exercise great restraint in using their p...
undefined
May 27, 2024 • 3min

LW - Computational Mechanics Hackathon (June 1 and 2) by Adam Shai

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Computational Mechanics Hackathon (June 1 & 2), published by Adam Shai on May 27, 2024 on LessWrong. Join our Computational Mechanics Hackathon, organized with the support of APART, PIBBSS and Simplex. This is an opportunity to learn more about Computational Mechanics, its applications to AI interpretability & safety, and to get your hands dirty by working on a concrete project together with a team and supported by Adam & Paul. Also, there will be cash prizes for the best projects! Read more and sign up for the event here. We're excited about Computational Mechanics as a framework because it provides a rigorous notion of structure that can be applied to both data and model internals. In , Transformers Represent Belief State Geometry in their Residual Stream , we validated that Computational Mechanics can help us understand fundamentally what computational structures transformers implement when trained on next-token prediction - a belief updating process over the hidden structure of the data generating process. We then found the fractal geometry underlying this process in the residual stream of transformers. This opens up a large number of potential projects in interpretability. There's a lot of work to do! Key things to know: Dates: Weekend of June 1st & 2nd, starting with an opening talk on Friday May 31st Format: Hybrid - join either online or in person in Berkeley! If you are interested in joining in person please contact Adam. Program: Keynote Opening by @Adam Shai and @Paul Riechers - Friday 10:30 AM PDT Online Office Hours with Adam and Paul on Discord - Saturday and Sunday 10:30 PDT Ending session - Sunday at 17:30 PDT Project presentations - Wednesday at 10:30 PDT Projects: After that, you will form teams of 1-5 people and submit a project on the entry submission page. By the end of the hackathon, you will submit: 1) The PDF report, 2) a maximum 10-minute video overview, 3) title, summary, and descriptions. You will present your work on the following Wednesday. Sign up: You can sign up on this website. After signing up, you will receive a link to the discord where we will be coordinating over the course of the weekend. Feel free to introduce yourself on the discord and begin brainstorming ideas and interests. Resources: You're welcome to engage with this selection of resources before the hackathon starts. Check out our (living) Open Problems in Comp Mech document, and in particular the section with Shovel Ready Problems. If you are starting a project or just want to express interest in it, fill out a row in this spreadsheet Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
undefined
May 27, 2024 • 26min

LW - Truthseeking is the ground in which other principles grow by Elizabeth

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Truthseeking is the ground in which other principles grow, published by Elizabeth on May 27, 2024 on LessWrong. Introduction First they came for the epistemology/we don't know what happened after that. I'm fairly antagonistic towards the author of that tweet, but it still resonates deep in my soul. Anything I want to do, anything I want to change, rests on having contact with reality. If I don't have enough, I might as well be pushing buttons at random. Unfortunately, there are a lot of forces pushing against having enough contact with reality. It's a lot of work even when reality cooperates, many situations are adversarial, and even when they're not entropy itself will constantly chip away at your knowledge base. This is why I think constantly seeking contact with reality is the meta principle without which all (consequentialist) principles are meaningless. If you aren't actively pursuing truthseeking, you won't have enough contact with reality to make having goals a reasonable concept, much less achieving them. To me this feels intuitive, like saying air is necessary to live. But I've talked to many people who disagree, or who agree in the abstract but prioritize differently in the breach. This was supposed to be a grand post explaining that belief. In practice it's mostly a bunch of pointers to facets of truthseeking and ideas for how to do better. My hope is that people can work backwards from these to the underlying principle, or flesh out their own relationship with truthseeking. Target audience I think these are good principles for almost any situation, but this essay is aimed at people within Effective Altruism. Most of the examples are from within EA and assume a certain amount of context. I definitely don't give enough information to bring someone unfamiliar up to speed. I also assume at least a little consequentialism. A note on examples and actions I'm going to give lots of examples in this post. I think they make it easier to understand my point and to act on what agreement you have. It avoids the failure mode Scott Alexander discusses here, of getting everyone to agree with you by putting nothing at stake. The downside of this is that it puts things at stake. I give at least 20 examples here, usually in less than a paragraph, using only publicly available information. That's enough to guarantee that every person who reads this will find at least one example where I'm being really unfair or missing crucial information. I welcome corrections and arguments on anything I say here, but when evaluating the piece as a whole I ask that you consider the constraints I was working under. Examples involving public writing are overrepresented. I wanted my examples to be as accessible as possible, and it's hard to beat public writing for that. It even allows skimming. My hope is that readers will work backwards from the public examples to the core principle, which they can apply wherever is most important to them. The same goes for the suggestions I give on how to pursue truthseeking. I don't know your situation and don't want to pretend I do. The suggestions are also biased towards writing, because I do that a lot. I sent a draft of this post to every person or org with a negative mention, and most positive mentions. Facets of truthseeking No gods, no monsters, no epistemic daddies When I joined EA I felt filled with clarity and purpose, at a level I hadn't felt since I got rejected from grad school. A year later I learned about a promising-looking organization outside EA, and I felt angry. My beautiful clarity was broken and I had to go back to thinking. Not just regular thinking either (which I'd never stopped doing), but meta thinking about how to navigate multiple sources of information on the same topic. For bonus points, the organization in question was J-PAL....

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app