LessWrong (Curated & Popular)

LessWrong
undefined
10 snips
Jul 30, 2022 • 22min

"Changing the world through slack & hobbies" by Steven Byrnes

https://www.lesswrong.com/posts/DdDt5NXkfuxAnAvGJ/changing-the-world-through-slack-and-hobbies   Introduction In EA orthodoxy, if you're really serious about EA, the three alternatives that people most often seem to talk about are (1) “direct work” in a job that furthers a very important cause; (2) “earning to give”; (3) earning “career capital” that will help you do those things in the future, e.g. by getting a PhD or teaching yourself ML. By contrast, there’s not much talk of: (4) being in a job / situation where you have extra time and energy and freedom to explore things that seem interesting and important.  But that last one is really important!
undefined
Jul 28, 2022 • 19min

"«Boundaries», Part 1: a key missing concept from utility theory" by Andrew Critch

https://www.lesswrong.com/posts/8oMF8Lv5jiGaQSFvo/boundaries-part-1-a-key-missing-concept-from-utility-theory Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. This is Part 1 of my «Boundaries» Sequence on LessWrong. Summary: «Boundaries» are a missing concept from the axioms of game theory and bargaining theory, which might help pin-down certain features of multi-agent rationality (this post), and have broader implications for effective altruism discourse and x-risk (future posts). 1. Boundaries (of living systems)  Epistemic status: me describing what I mean. With the exception of some relatively recent and isolated pockets of research on embedded agency (e.g., Orseau & Ring, 2012; Garrabrant & Demsky, 2018), most attempts at formal descriptions of living rational agents — especially utility-theoretic descriptions — are missing the idea that living systems require and maintain boundaries. When I say boundary, I don't just mean an arbitrary constraint or social norm.  I mean something that could also be called a membrane in a generalized sense, i.e., a layer of stuff-of-some-kind that physically or cognitively separates a living system from its environment, that 'carves reality at the joints' in a way that isn't an entirely subjective judgement of the living system itself.  Here are some examples that I hope will convey my meaning:
undefined
Jul 24, 2022 • 13min

"ITT-passing and civility are good; "charity" is bad; steelmanning is niche" by Rob Bensinger

https://www.lesswrong.com/posts/MdZyLnLHuaHrCskjy/itt-passing-and-civility-are-good-charity-is-bad I often object to claims like "charity/steelmanning is an argumentative virtue". This post collects a few things I and others have said on this topic over the last few years. My current view is: Steelmanning ("the art of addressing the best form of the other person’s argument, even if it’s not the one they presented") is a useful niche skill, but I don't think it should be a standard thing you bring out in most arguments, even if it's an argument with someone you strongly disagree with. Instead, arguments should mostly be organized around things like: Object-level learning and truth-seeking, with the conversation as a convenient excuse to improve your own model of something you're curious about. Trying to pass each other's Ideological Turing Test (ITT), or some generalization thereof. The ability to pass ITTs is the ability "to state opposing views as clearly and persuasively as their proponents". The version of "ITT" I care about is one where you understand the substance of someone's view well enough to be able to correctly describe their beliefs and reasoning; I don't care about whether you can imitate their speech patterns, jargon, etc. Trying to identify and resolve cruxes: things that would make one or the other of you (or both) change your mind about the topic under discussion. Argumentative charity is a complete mess of a concept⁠—people use it to mean a wide variety of things, and many of those things are actively bad, or liable to cause severe epistemic distortion and miscommunication. Some version of civility and/or friendliness and/or a spirit of camaraderie and goodwill seems like a useful ingredient in many discussions. I'm not sure how best to achieve this in ways that are emotionally honest ("pretending to be cheerful and warm when you don't feel that way" sounds like the wrong move to me), or how to achieve this without steering away from candor, openness, "realness", etc.
undefined
Jul 23, 2022 • 13min

"What should you change in response to an "emergency"? And AI risk" by Anna Salamon

https://www.lesswrong.com/posts/mmHctwkKjpvaQdC3c/what-should-you-change-in-response-to-an-emergency-and-ai Related to: Slack gives you the ability to notice/reflect on subtle things Epistemic status: A possibly annoying mixture of straightforward reasoning and hard-to-justify personal opinions. It is often stated (with some justification, IMO) that AI risk is an “emergency.”  Various people have explained to me that they put various parts of their normal life’s functioning on hold on account of AI being an “emergency.”  In the interest of people doing this sanely and not confusedly, I’d like to take a step back and seek principles around what kinds of changes a person might want to make in an “emergency” of different sorts.  Principle 1: It matters what time-scale the emergency is on There are plenty of ways we can temporarily increase productivity on some narrow task or other, at the cost of our longer-term resources.  For example: Skipping meals Skipping sleep Ceasing to clean the house or to exercise Accumulating credit card debt Calling in favors from friends Skipping leisure time
undefined
Jul 17, 2022 • 55min

"On how various plans miss the hard bits of the alignment challenge" by Nate Soares

https://www.lesswrong.com/posts/3pinFH3jerMzAvmza/on-how-various-plans-miss-the-hard-bits-of-the-alignment Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. (As usual, this post was written by Nate Soares with some help and editing from Rob Bensinger.) In my last post, I described a “hard bit” of the challenge of aligning AGI—the sharp left turn that comes when your system slides into the “AGI” capabilities well, the fact that alignment doesn’t generalize similarly well at this turn, and the fact that this turn seems likely to break a bunch of your existing alignment properties. Here, I want to briefly discuss a variety of current research proposals in the field, to explain why I think this problem is currently neglected. I also want to mention research proposals that do strike me as having some promise, or that strike me as adjacent to promising approaches. Before getting into that, let me be very explicit about three points: On my model, solutions to how capabilities generalize further than alignment are necessary but not sufficient. There is dignity in attacking a variety of other real problems, and I endorse that practice. The imaginary versions of people in the dialogs below are not the same as the people themselves. I'm probably misunderstanding the various proposals in important ways, and/or rounding them to stupider versions of themselves along some important dimensions.[1] If I've misrepresented your view, I apologize. I do not subscribe to the Copenhagen interpretation of ethics wherein someone who takes a bad swing at the problem (or takes a swing at a different problem) is more culpable for civilization's failure than someone who never takes a swing at all. Everyone whose plans I discuss below is highly commendable, laudable, and virtuous by my accounting.
undefined
Jul 13, 2022 • 8min

"Humans are very reliable agents" by Alyssa Vance

https://www.lesswrong.com/posts/28zsuPaJpKAGSX4zq/humans-are-very-reliable-agents Over the last few years, deep-learning-based AI has progressed extremely rapidly in fields like natural language processing and image generation. However, self-driving cars seem stuck in perpetual beta mode, and aggressive predictions there have repeatedly been disappointing. Google's self-driving project started four years before AlexNet kicked off the deep learning revolution, and it still isn't deployed at large scale, thirteen years later. Why are these fields getting such different results? Right now, I think the biggest answer is that ML benchmarks judge models by average-case performance, while self-driving cars (and many other applications) require matching human worst-case performance. For MNIST, an easy handwriting recognition task, performance tops out at around 99.9% even for top models; it's not very practical to design for or measure higher reliability than that, because the test set is just 10,000 images and a handful are ambiguous. Redwood Research, which is exploring worst-case performance in the context of AI alignment, got reliability rates around 99.997% for their text generation models. By comparison, human drivers are ridiculously reliable. The US has around one traffic fatality per 100 million miles driven; if a human driver makes 100 decisions per mile, that gets you a worst-case reliability of ~1:10,000,000,000 or ~99.999999999%. That's around five orders of magnitude better than a very good deep learning model, and you get that even in an open environment, where data isn't pre-filtered and there are sometimes random mechanical failures. Matching that bar is hard! I'm sure future AI will get there, but each additional "nine" of reliability is typically another unit of engineering effort. (Note that current self-driving systems use a mix of different models embedded in a larger framework, not one model trained end-to-end like GPT-3.)
undefined
Jul 8, 2022 • 22min

"Looking back on my alignment PhD" by TurnTrout

https://www.lesswrong.com/posts/2GxhAyn9aHqukap2S/looking-back-on-my-alignment-phd The funny thing about long periods of time is that they do, eventually, come to an end. I'm proud of what I accomplished during my PhD. That said, I'm going to first focus on mistakes I've made over the past four[1] years. Mistakes I think I got significantly smarter in 2018–2019, and kept learning some in 2020–2021. I was significantly less of a fool in 2021 than I was in 2017. That is important and worth feeling good about. But all things considered, I still made a lot of profound mistakes over the course of my PhD.  Social dynamics distracted me from my core mission I focused on "catching up" to other thinkers I figured this point out by summer 2021.  I wanted to be more like Eliezer Yudkowsky and Buck Shlegeris and Paul Christiano. They know lots of facts and laws about lots of areas (e.g. general relativity and thermodynamics and information theory). I focused on building up dependencies (like analysis and geometry and topology) not only because I wanted to know the answers, but because I felt I owed a debt, that I was in the red until I could at least meet other thinkers at their level of knowledge.  But rationality is not about the bag of facts you know, nor is it about the concepts you have internalized. Rationality is about how your mind holds itself, it is how you weigh evidence, it is how you decide where to look next when puzzling out a new area.  If I had been more honest with myself, I could have nipped the "catching up with other thinkers" mistake in 2018. I could have removed the bad mental habits using certain introspective techniques; or at least been aware of the badness. But I did not, in part because the truth was uncomfortable. If I did not have a clear set of prerequisites (e.g. analysis and topology and game theory) to work on, I would not have a clear and immediate direction of improvement. I would have felt adrift.
undefined
Jul 5, 2022 • 1h 12min

"It’s Probably Not Lithium" by Natália Coelho Mendonça

Natália Coelho Mendonça, writer of the blog Slime Mold Time Mold, critically examines the theory linking environmental contaminants to the obesity epidemic. She dissects the misleading claims of lithium's role in weight gain and highlights factual inaccuracies in the original series.
undefined
Jul 2, 2022 • 10min

"What Are You Tracking In Your Head?" by John Wentworth

https://www.lesswrong.com/posts/bhLxWTkRc8GXunFcB/what-are-you-tracking-in-your-head A large chunk - plausibly the majority -  of real-world expertise seems to be in the form of illegible skills: skills/knowledge which are hard to transmit by direct explanation. They’re not necessarily things which a teacher would even notice enough to consider important - just background skills or knowledge which is so ingrained that it becomes invisible. I’ve recently noticed a certain common type of illegible skill which I think might account for the majority of illegible-skill-value across a wide variety of domains. Here are a few examples of the type of skill I have in mind:
undefined
Jun 29, 2022 • 14min

"Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment" by elspood

https://www.lesswrong.com/posts/Ke2ogqSEhL2KCJCNx/security-mindset-lessons-from-20-years-of-software-securityBackgroundI have been doing red team, blue team (offensive, defensive) computer security for a living since September 2000. The goal of this post is to compile a list of general principles I've learned during this time that are likely relevant to the field of AGI Alignment. If this is useful, I could continue with a broader or deeper exploration.Alignment Won't Happen By AccidentI used to use the phrase when teaching security mindset to software developers that "security doesn't happen by accident." A system that isn't explicitly designed with a security feature is not going to have that security feature. More specifically, a system that isn't designed to be robust against a certain failure mode is going to exhibit that failure mode.This might seem rather obvious when stated explicitly, but this is not the way that most developers, indeed most humans, think. I see a lot of disturbing parallels when I see anyone arguing that AGI won't necessarily be dangerous. An AGI that isn't intentionally designed not to exhibit a particular failure mode is going to have that failure mode. It is certainly possible to get lucky and not trigger it, and it will probably be impossible to enumerate even every category of failure mode, but to have any chance at all we will have to plan in advance for as many failure modes as we can possibly conceive.As a practical enforcement method, I used to ask development teams that every user story have at least three abuser stories to go with it. For any new capability, think at least hard enough about it that you can imagine at least three ways that someone could misuse it. Sometimes this means looking at boundary conditions ("what if someone orders 2^64+1 items?"), sometimes it means looking at forms of invalid input ("what if someone tries to pay -$100, can they get a refund?"), and sometimes it means being aware of particular forms of attack ("what if someone puts Javascript in their order details?").

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app