LessWrong (Curated & Popular)

LessWrong
undefined
Apr 12, 2023 • 6min

"GPTs are Predictors, not Imitators" by Eliezer Yudkowsky

(Related text posted to Twitter; this version is edited and has a more advanced final section.)Imagine yourself in a box, trying to predict the next word - assign as much probability mass to the next token as possible - for all the text on the Internet.Koan:  Is this a task whose difficulty caps out as human intelligence, or at the intelligence level of the smartest human who wrote any Internet text?  What factors make that task easier, or harder?  (If you don't have an answer, maybe take a minute to generate one, or alternatively, try to predict what I'll say next; if you do have an answer, take a moment to review it inside your mind, or maybe say the words out loud.)https://www.lesswrong.com/posts/nH4c3Q9t9F3nJ7y8W/gpts-are-predictors-not-imitators
undefined
Apr 5, 2023 • 40min

"Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky

https://www.lesswrong.com/posts/iy2o4nQj9DnQD7Yhj/discussion-with-nate-soares-on-a-key-alignment-difficultyCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.In late 2022, Nate Soares gave some feedback on my Cold Takes series on AI risk (shared as drafts at that point), stating that I hadn't discussed what he sees as one of the key difficulties of AI alignment.  I wanted to understand the difficulty he was pointing to, so the two of us had an extended Slack exchange, and I then wrote up a summary of the exchange that we iterated on until we were both reasonably happy with its characterization of the difficulty and our disagreement.1 My short summary is: Nate thinks there are deep reasons that training an AI to do needle-moving scientific research (including alignment) would be dangerous. The overwhelmingly likely result of such a training attempt (by default, i.e., in the absence of specific countermeasures that there are currently few ideas for) would be the AI taking on a dangerous degree of convergent instrumental subgoals while not internalizing important safety/corrigibility properties enough. I think this is possible, but much less likely than Nate thinks under at least some imaginable training processes.
undefined
Apr 5, 2023 • 17min

"A stylized dialogue on John Wentworth's claims about markets and optimization" by Nate Soares

https://www.lesswrong.com/posts/fJBTRa7m7KnCDdzG5/a-stylized-dialogue-on-john-wentworth-s-claims-about-markets(This is a stylized version of a real conversation, where the first part happened as part of a public debate between John Wentworth and Eliezer Yudkowsky, and the second part happened between John and me over the following morning. The below is combined, stylized, and written in my own voice throughout. The specific concrete examples in John's part of the dialog were produced by me. It's over a year old. Sorry for the lag.)(As to whether John agrees with this dialog, he said "there was not any point at which I thought my views were importantly misrepresented" when I asked him for comment.)
undefined
Apr 5, 2023 • 30min

"Deep Deceptiveness" by Nate Soares

https://www.lesswrong.com/posts/XWwvwytieLtEWaFJX/deep-deceptivenessThis post is an attempt to gesture at a class of AI notkilleveryoneism (alignment) problem that seems to me to go largely unrecognized. E.g., it isn’t discussed (or at least I don't recognize it) in the recent plans written up by OpenAI (1,2), by DeepMind’s alignment team, or by Anthropic, and I know of no other acknowledgment of this issue by major labs.You could think of this as a fragment of my answer to “Where do plans like OpenAI’s ‘Our Approach to Alignment Research’ fail?”, as discussed in Rob and Eliezer’s challenge for AGI organizations and readers. Note that it would only be a fragment of the reply; there's a lot more to say about why AI alignment is a particularly tricky task to task an AI with. (Some of which Eliezer gestures at in a follow-up to his interview on Bankless.)
undefined
Mar 28, 2023 • 7min

"The Onion Test for Personal and Institutional Honesty" by Chana Messinger & Andrew Critch

https://www.lesswrong.com/posts/nTGEeRSZrfPiJwkEc/the-onion-test-for-personal-and-institutional-honesty[co-written by Chana Messinger and Andrew Critch, Andrew is the originator of the idea]You (or your organization or your mission or your family or etc.) pass the “onion test” for honesty if each layer hides but does not mislead about the information hidden within.When people get to know you better, or rise higher in your organization, they may find out new things, but should not be shocked by the types of information that were hidden. If they are, you messed up in creating the outer layers to describe appropriately the kind-of-thing that might be inside. ExamplesPositive Example: Outer layer says "I usually treat my health information as private."Next layer in says: "Here are the specific health problems I have: Gout, diabetes." Negative example:Outer layer says: "I usually treat my health info as private."Next layer in: "I operate a cocaine dealership.  Sorry I didn't warn you that I was also private about my illegal activities."
undefined
Mar 28, 2023 • 24min

"Losing the root for the tree" by Adam Zerner

https://www.lesswrong.com/posts/ma7FSEtumkve8czGF/losing-the-root-for-the-treeYou know that being healthy is important. And that there's a lot of stuff you could do to improve your health: getting enough sleep, eating well, reducing stress, and exercising, to name a few.There’s various things to hit on when it comes to exercising too. Strength, obviously. But explosiveness is a separate thing that you have to train for. Same with flexibility. And don’t forget cardio!Strength is most important though, because of course it is. And there’s various things you need to do to gain strength. It all starts with lifting, but rest matters too. And supplements. And protein. Can’t forget about protein.Protein is a deeper and more complicated subject than it may at first seem. Sure, the amount of protein you consume matters, but that’s not the only consideration. You also have to think about the timing. Consuming large amounts 2x a day is different than consuming smaller amounts 5x a day. And the type of protein matters too. Animal is different than plant, which is different from dairy. And then quality is of course another thing that is important.But quality isn’t an easy thing to figure out. The big protein supplement companies are Out To Get You. They want to mislead you. Information sources aren’t always trustworthy. You can’t just hop on The Wirecutter and do what they tell you. Research is needed.So you listen to a few podcasts. Follow a few YouTubers. Start reading some blogs. Throughout all of this you try various products and iterate as you learn more. You’re no Joe Rogan, but you’re starting to become pretty informed.
undefined
Mar 28, 2023 • 19min

"There’s no such thing as a tree (phylogenetically)" by Eukaryote

https://www.lesswrong.com/posts/fRwdkop6tyhi3d22L/there-s-no-such-thing-as-a-tree-phylogeneticallyThis is a linkpost for https://eukaryotewritesblog.com/2021/05/02/theres-no-such-thing-as-a-tree/[Crossposted from Eukaryote Writes Blog.]So you’ve heard about how fish aren’t a monophyletic group? You’ve heard about carcinization, the process by which ocean arthropods convergently evolve into crabs? You say you get it now? Sit down. Sit down. Shut up. Listen. You don’t know nothing yet.“Trees” are not a coherent phylogenetic category. On the evolutionary tree of plants, trees are regularly interspersed with things that are absolutely, 100% not trees. This means that, for instance, either:The common ancestor of a maple and a mulberry tree was not a tree.The common ancestor of a stinging nettle and a strawberry plant was a tree.And this is true for most trees or non-trees that you can think of.I thought I had a pretty good guess at this, but the situation is far worse than I could have imagined.
undefined
Mar 28, 2023 • 18min

"What failure looks like" by Paul Christiano

https://www.lesswrong.com/posts/HBxe6wdjxK239zajf/what-failure-looks-likeCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.The stereotyped image of AI catastrophe is a powerful, malicious AI system that takes its creators by surprise and quickly achieves a decisive advantage over the rest of humanity.I think this is probably not what failure will look like, and I want to try to paint a more realistic picture. I’ll tell the story in two parts:Part I: machine learning will increase our ability to “get what we can measure,” which could cause a slow-rolling catastrophe. ("Going out with a whimper.")Part II: ML training, like competitive economies or natural ecosystems, can give rise to “greedy” patterns that try to expand their own influence. Such patterns can ultimately dominate the behavior of a system and cause sudden breakdowns. ("Going out with a bang," an instance of optimization daemons.)I think these are the most important problems if we fail to solve intent alignment.In practice these problems will interact with each other, and with other disruptions/instability caused by rapid progress. These problems are worse in worlds where progress is relatively fast, and fast takeoff can be a key risk factor, but I’m scared even if we have several years.
undefined
Mar 28, 2023 • 29min

"Lies, Damn Lies, and Fabricated Options" by Duncan Sabien

https://www.lesswrong.com/posts/gNodQGNoPDjztasbh/lies-damn-lies-and-fabricated-optionsThis is an essay about one of those "once you see it, you will see it everywhere" phenomena.  It is a psychological and interpersonal dynamic roughly as common, and almost as destructive, as motte-and-bailey, and at least in my own personal experience it's been quite valuable to have it reified, so that I can quickly recognize the commonality between what I had previously thought of as completely unrelated situations.The original quote referenced in the title is "There are three kinds of lies: lies, damned lies, and statistics."
undefined
Mar 28, 2023 • 1h 26min

"Why I think strong general AI is coming soon" by Porby

https://www.lesswrong.com/posts/K4urTDkBbtNuLivJx/why-i-think-strong-general-ai-is-coming-soonI think there is little time left before someone builds AGI (median ~2030). Once upon a time, I didn't think this.This post attempts to walk through some of the observations and insights that collapsed my estimates.The core ideas are as follows:We've already captured way too much of intelligence with way too little effort.Everything points towards us capturing way more of intelligence with very little additional effort.Trying to create a self-consistent world

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app