

LessWrong (Curated & Popular)
LessWrong
Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
Episodes
Mentioned books

16 snips
Jul 28, 2023 • 15min
"Rationality !== Winning" by Raemon
The podcast explores the concept that rationality extends beyond winning, discussing the importance of feedback loops, applying rationality in specific domains, and avoiding generic self-help. It reflects on the Rationality movement and proposes refocusing on rationality as a professional tool. The speaker emphasizes the need for clear definitions and discusses the consequences of solely focusing on winning. They question the relationship between rationality, winning, and epistemics, highlighting the importance of appropriate cognitive algorithms and tool selection.

Jul 28, 2023 • 11min
"Grant applications and grand narratives" by Elizabeth
The Lightspeed application asks: “What impact will [your project] have on the world? What is your project’s goal, how will you know if you’ve achieved it, and what is the path to impact?”LTFF uses an identical question, and SFF puts it even more strongly (“What is your organization’s plan for improving humanity’s long term prospects for survival and flourishing?”). I’ve applied to all three grants of these at various points, and I’ve never liked this question. It feels like it wants a grand narrative of an amazing, systemic project that will measurably move the needle on x-risk. But I’m typically applying for narrowly defined projects, like “Give nutrition tests to EA vegans and see if there’s a problem”. I think this was a good project. I think this project is substantially more likely to pay off than underspecified alignment strategy research, and arguably has as good a long tail. But when I look at “What impact will [my project] have on the world?” the project feels small and sad. I feel an urge to make things up, and express far more certainty for far more impact than I believe. Then I want to quit, because lying is bad but listing my true beliefs feels untenable.I’ve gotten better at this over time, but I know other people with similar feelings, and I suspect it’s a widespread issue (I encourage you to share your experience in the comments so we can start figuring that out).I should note that the pressure for grand narratives has good points; funders are in fact looking for VC-style megabits. I think that narrow projects are underappreciated, but for purposes of this post that’s beside the point: I think many grantmakers are undercutting their own preferred outcomes by using questions that implicitly push for a grand narrative. I think they should probably change the form, but I also think we applicants can partially solve the problem by changing how we interact with the current forms.My goal here is to outline the problem, gesture at some possible solutions, and create a space for other people to share data. I didn’t think about my solutions very long, I am undoubtedly missing a bunch and what I do have still needs workshopping, but it’s a place to start. Source:https://www.lesswrong.com/posts/FNPXbwKGFvXWZxHGE/grant-applications-and-grand-narrativesNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[Curated Post] ✓

Jul 28, 2023 • 3min
"Cryonics and Regret" by MvB
This post is not about arguments in favor of or against cryonics. I would just like to share a particular emotional response of mine as the topic became hot for me after not thinking about it at all for nearly a decade.Recently, I have signed up for cryonics, as has my wife, and we have made arrangements for our son to be cryopreserved just in case longevity research does not deliver in time or some unfortunate thing happens.Last year, my father died. He was a wonderful man, good-natured, intelligent, funny, caring and, most importantly in this context, loving life to the fullest, even in the light of any hardships. He had a no-bullshit-approach regarding almost any topic, and, being born in the late 1940s in relative poverty and without much formal education, over the course of his life he acquired many unusual attitudes that were not that compatible with his peers (unlike me, he never tried to convince other people of things they could not or did not want to grasp, pragmatism was another of his traits). Much of what he expected from the future in general and technology in particular, I later came to know as transhumanist thinking, though neither was he familiar with the philosophy as such nor was he prone to labelling his worldview. One of his convictions was that age-related death is a bad thing and a tragedy, a problem that should and will eventually be solved by technology.Source:https://www.lesswrong.com/posts/inARBH5DwQTrvvRj8/cryonics-and-regretNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓

Jun 12, 2023 • 41min
"Unifying Bargaining Notions (2/2)" by Diffractor
Alright, time for the payoff, unifying everything discussed in the previous post. This post is a lot more mathematically dense, you might want to digest it in more than one sitting. Imaginary Prices, Tradeoffs, and UtilitarianismHarsanyi's Utilitarianism Theorem can be summarized as "if a bunch of agents have their own personal utility functions Ui, and you want to aggregate them into a collective utility function U with the property that everyone agreeing that option x is better than option y (ie, Ui(x)≥Ui(y) for all i) implies U(x)≥U(y), then that collective utility function must be of the form b+∑i∈IaiUi for some number b and nonnegative numbers ai."Basically, if you want to aggregate utility functions, the only sane way to do so is to give everyone importance weights, and do a weighted sum of everyone's individual utility functions.Closely related to this is a result that says that any point on the Pareto Frontier of a game can be post-hoc interpreted as the result of maximizing a collective utility function. This related result is one where it's very important for the reader to understand the actual proof, because the proof gives you a way of reverse-engineering "how much everyone matters to the social utility function" from the outcome alone.First up, draw all the outcomes, and the utilities that both players assign to them, and the convex hull will be the "feasible set" F, since we have access to randomization. Pick some Pareto frontier point u1,u2...un (although the drawn image is for only two players)https://www.lesswrong.com/posts/RZNmNwc9SxdKayeQh/unifying-bargaining-notions-2-2

Jun 6, 2023 • 10min
"The ants and the grasshopper" by Richard Ngo
Inspired by Aesop, Soren Kierkegaard, Robin Hanson, sadoeuphemist and Ben Hoffman.One winter a grasshopper, starving and frail, approaches a colony of ants drying out their grain in the sun, to ask for food.“Did you not store up food during the summer?” the ants ask.“No”, says the grasshopper. “I lost track of time, because I was singing and dancing all summer long.”The ants, disgusted, turn away and go back to work.https://www.lesswrong.com/posts/GJgudfEvNx8oeyffH/the-ants-and-the-grasshopper

May 18, 2023 • 1h 43min
"Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.
Summary: We demonstrate a new scalable way of interacting with language models: adding certain activation vectors into forward passes. Essentially, we add together combinations of forward passes in order to get GPT-2 to output the kinds of text we want. We provide a lot of entertaining and successful examples of these "activation additions." We also show a few activation additions which unexpectedly fail to have the desired effect.We quantitatively evaluate how activation additions affect GPT-2's capabilities. For example, we find that adding a "wedding" vector decreases perplexity on wedding-related sentences, without harming perplexity on unrelated sentences. Overall, we find strong evidence that appropriately configured activation additions preserve GPT-2's capabilities.Our results provide enticing clues about the kinds of programs implemented by language models. For some reason, GPT-2 allows "combination" of its forward passes, even though it was never trained to do so. Furthermore, our results are evidence of linear feature directions, including "anger", "weddings", and "create conspiracy theories." We coin the phrase "activation engineering" to describe techniques which steer models by modifying their activations. As a complement to prompt engineering and finetuning, activation engineering is a low-overhead way to steer models at runtime. Activation additions are nearly as easy as prompting, and they offer an additional way to influence a model’s behaviors and values. We suspect that activation additions can adjust the goals being pursued by a network at inference time.https://www.lesswrong.com/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector

May 16, 2023 • 1h 2min
"An artificially structured argument for expecting AGI ruin" by Rob Bensinger
Philosopher David Chalmers asked: "Is there a canonical source for "the argument for AGI ruin" somewhere, preferably laid out as an explicit argument with premises and a conclusion?" Unsurprisingly, the actual reason people expect AGI ruin isn't a crisp deductive argument; it's a probabilistic update based on many lines of evidence. The specific observations and heuristics that carried the most weight for someone will vary for each individual, and can be hard to accurately draw out. That said, Eliezer Yudkowsky's So Far: Unfriendly AI Edition might be a good place to start if we want a pseudo-deductive argument just for the sake of organizing discussion. People can then say which premises they want to drill down on. In The Basic Reasons I Expect AGI Ruin, I wrote: "When I say "general intelligence", I'm usually thinking about "whatever it is that lets human brains do astrophysics, category theory, etc. even though our brains evolved under literally zero selection pressure to solve astrophysics or category theory problems". It's possible that we should already be thinking of GPT-4 as "AGI" on some definitions, so to be clear about the threshold of generality I have in mind, I'll specifically talk about "STEM-level AGI", though I expect such systems to be good at non-STEM tasks too. STEM-level AGI is AGI that has "the basic mental machinery required to do par-human reasoning about all the hard sciences", though a specific STEM-level AGI could (e.g.) lack physics ability for the same reasons many smart humans can't solve physics problems, such as "lack of familiarity with the field".https://www.lesswrong.com/posts/QzkTfj4HGpLEdNjXX/an-artificially-structured-argument-for-expecting-agi-ruin

May 10, 2023 • 33min
"How much do you believe your results?" by Eric Neyman
You are the director of a giant government research program that’s conducting randomized controlled trials (RCTs) on two thousand health interventions, so that you can pick out the most cost-effective ones and promote them among the general population.The quality of the two thousand interventions follows a normal distribution, centered at zero (no harm or benefit) and with standard deviation 1. (Pick whatever units you like — maybe one quality-adjusted life-year per ten thousand dollars of spending, or something in that ballpark.)Unfortunately, you don’t know exactly how good each intervention is — after all, then you wouldn’t be doing this job. All you can do is get a noisy measurement of intervention quality using an RCT. We’ll call this measurement the intervention’s performance in your RCT.https://www.lesswrong.com/posts/nnDTgmzRrzDMiPF9B/how-much-do-you-believe-your-results

Apr 27, 2023 • 38min
"Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)" by Chris Scammell & DivineMango
This is a post about mental health and disposition in relation to the alignment problem. It compiles a number of resources that address how to maintain wellbeing and direction when confronted with existential risk. Many people in this community have posted their emotional strategies for facing Doom after Eliezer Yudkowsky’s “Death With Dignity” generated so much conversation on the subject. This post intends to be more touchy-feely, dealing more directly with emotional landscapes than questions of timelines or probabilities of success. The resources section would benefit from community additions. Please suggest any resources that you would like to see added to this post.Please note that this document is not intended to replace professional medical or psychological help in any way. Many preexisting mental health conditions can be exacerbated by these conversations. If you are concerned that you may be experiencing a mental health crisis, please consult a professional.https://www.lesswrong.com/posts/pLLeGA7aGaJpgCkof/mental-health-and-the-alignment-problem-a-compilation-of

Apr 19, 2023 • 37min
"On AutoGPT" by Zvi
The primary talk of the AI world recently is about AI agents (whether or not it includes the question of whether we can’t help but notice we are all going to die.) The trigger for this was AutoGPT, now number one on GitHub, which allows you to turn GPT-4 (or GPT-3.5 for us clowns without proper access) into a prototype version of a self-directed agent. We also have a paper out this week where a simple virtual world was created, populated by LLMs that were wrapped in code designed to make them simple agents, and then several days of activity were simulated, during which the AI inhabitants interacted, formed and executed plans, and it all seemed like the beginnings of a living and dynamic world. Game version hopefully coming soon. How should we think about this? How worried should we be?https://www.lesswrong.com/posts/566kBoPi76t8KAkoD/on-autogpthttps://thezvi.wordpress.com/


