

LessWrong (Curated & Popular)
LessWrong
Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
Episodes
Mentioned books

Jan 12, 2023 • 10min
"Models Don't 'Get Reward'" by Sam Ringer
https://www.lesswrong.com/posts/TWorNr22hhYegE4RT/models-don-t-get-rewardCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.In terms of content, this has a lot of overlap with Reward is not the optimization target. I'm basically rewriting a part of that post in language I personally find clearer, emphasising what I think is the core insightWhen thinking about deception and RLHF training, a simplified threat model is something like this:A model takes some actions.If a human approves of these actions, the human gives the model some reward.Humans can be deceived into giving reward in situations where they would otherwise not if they had more knowledge.Models will take advantage of this so they can get more reward.Models will therefore become deceptive.Before continuing, I would encourage you to really engage with the above. Does it make sense to you? Is it making any hidden assumptions? Is it missing any steps? Can you rewrite it to be more mechanistically correct?I believe that when people use the above threat model, they are either using it as shorthand for something else or they misunderstand how reinforcement learning works. Most alignment researchers will be in the former category. However, I was in the latter.

Jan 12, 2023 • 9min
"The Feeling of Idea Scarcity" by John Wentworth
https://www.lesswrong.com/posts/mfPHTWsFhzmcXw8ta/the-feeling-of-idea-scarcityHere’s a story you may recognize. There's a bright up-and-coming young person - let's call her Alice. Alice has a cool idea. It seems like maybe an important idea, a big idea, an idea which might matter. A new and valuable idea. It’s the first time Alice has come up with a high-potential idea herself, something which she’s never heard in a class or read in a book or what have you.So Alice goes all-in pursuing this idea. She spends months fleshing it out. Maybe she writes a paper, or starts a blog, or gets a research grant, or starts a company, or whatever, in order to pursue the high-potential idea, bring it to the world.And sometimes it just works!… but more often, the high-potential idea doesn’t actually work out. Maybe it turns out to be basically-the-same as something which has already been tried. Maybe it runs into some major barrier, some not-easily-patchable flaw in the idea. Maybe the problem it solves just wasn’t that important in the first place.From Alice’ point of view, the possibility that her one high-potential idea wasn’t that great after all is painful. The idea probably feels to Alice like the single biggest intellectual achievement of her life. To lose that, to find out that her single greatest intellectual achievement amounts to little or nothing… that hurts to even think about. So most likely, Alice will reflexively look for an out. She’ll look for some excuse to ignore the similar ideas which have already been tried, some reason to think her idea is different. She’ll look for reasons to believe that maybe the major barrier isn’t that much of an issue, or that we Just Don’t Know whether it’s actually an issue and therefore maybe the idea could work after all. She’ll look for reasons why the problem really is important. Maybe she’ll grudgingly acknowledge some shortcomings of the idea, but she’ll give up as little ground as possible at each step, update as slowly as she can.

Dec 21, 2022 • 1h 19min
"The next decades might be wild" by Marius Hobbhahn
https://www.lesswrong.com/posts/qRtD4WqKRYEtT5pi3/the-next-decades-might-be-wildCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.I’d like to thank Simon Grimm and Tamay Besiroglu for feedback and discussions.This post is inspired by What 2026 looks like and an AI vignette workshop guided by Tamay Besiroglu. I think of this post as “what would I expect the world to look like if these timelines (median compute for transformative AI ~2036) were true” or “what short-to-medium timelines feel like” since I find it hard to translate a statement like “median TAI year is 20XX” into a coherent imaginable world.I expect some readers to think that the post sounds wild and crazy but that doesn’t mean its content couldn’t be true. If you had told someone in 1990 or 2000 that there would be more smartphones and computers than humans in 2020, that probably would have sounded wild to them. The same could be true for AIs, i.e. that in 2050 there are more human-level AIs than humans. The fact that this sounds as ridiculous as ubiquitous smartphones sounded to the 1990/2000 person, might just mean that we are bad at predicting exponential growth and disruptive technology. Update: titotal points out in the comments that the correct timeframe for computers is probably 1980 to 2020. So the correct time span is probably 40 years instead of 30. For mobile phones, it's probably 1993 to 2020 if you can trust this statistic.I’m obviously not confident (see confidence and takeaways section) in this particular prediction but many of the things I describe seem like relatively direct consequences of more and more powerful and ubiquitous AI mixed with basic social dynamics and incentives.

Nov 17, 2022 • 26min
"Lessons learned from talking to >100 academics about AI safety" by Marius Hobbhahn
https://www.lesswrong.com/posts/SqjQFhn5KTarfW8v7/lessons-learned-from-talking-to-greater-than-100-academicsCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.I’d like to thank MH, Jaime Sevilla and Tamay Besiroglu for their feedback.During my Master's and Ph.D. (still ongoing), I have spoken with many academics about AI safety. These conversations include chats with individual PhDs, poster presentations and talks about AI safety. I think I have learned a lot from these conversations and expect many other people concerned about AI safety to find themselves in similar situations. Therefore, I want to detail some of my lessons and make my thoughts explicit so that others can scrutinize them.TL;DR: People in academia seem more and more open to arguments about risks from advanced intelligence over time and I would genuinely recommend having lots of these chats. Furthermore, I underestimated how much work related to some aspects AI safety already exists in academia and that we sometimes reinvent the wheel. Messaging matters, e.g. technical discussions got more interest than alarmism and explaining the problem rather than trying to actively convince someone received better feedback.

Nov 10, 2022 • 14min
"How my team at Lightcone sometimes gets stuff done" by jacobjacob
https://www.lesswrong.com/posts/6LzKRP88mhL9NKNrS/how-my-team-at-lightcone-sometimes-gets-stuff-doneDisclaimer: I originally wrote this as a private doc for the Lightcone team. I then showed it to John and he said he would pay me to post it here. That sounded awfully compelling. However, I wanted to note that I’m an early founder who hasn't built anything truly great yet. I’m writing this doc because as Lightcone is growing, I have to take a stance on these questions. I need to design our org to handle more people. Still, I haven’t seen the results long-term, and who knows if this is good advice. Don’t overinterpret this. Suppose you went up on stage in front of a company you founded, that now had grown to 100, or 1000, 10 000+ people. You were going to give a talk about your company values. You can say things like “We care about moving fast, taking responsibility, and being creative” -- but I expect these words would mostly fall flat. At the end of the day, the path the water takes down the hill is determined by the shape of the territory, not the sound the water makes as it swooshes by. To manage that many people, it seems to me you need clear, concrete instructions. What are those? What are things you could write down on a piece of paper and pass along your chain of command, such that if at the end people go ahead and just implement them, without asking what you meant, they would still preserve some chunk of what makes your org work?

Nov 8, 2022 • 57min
"Decision theory does not imply that we get to have nice things" by So8res
https://www.lesswrong.com/posts/rP66bz34crvDudzcJ/decision-theory-does-not-imply-that-we-get-to-have-niceCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.(Note: I wrote this with editing help from Rob and Eliezer. Eliezer's responsible for a few of the paragraphs.)A common confusion I see in the tiny fragment of the world that knows about logical decision theory (FDT/UDT/etc.), is that people think LDT agents are genial and friendly for each other.[1]One recent example is Will Eden’s tweet about how maybe a molecular paperclip/squiggle maximizer would leave humanity a few stars/galaxies/whatever on game-theoretic grounds. (And that's just one example; I hear this suggestion bandied around pretty often.)I'm pretty confident that this view is wrong (alas), and based on a misunderstanding of LDT. I shall now attempt to clear up that confusion.To begin, a parable: the entity Omicron (Omega's little sister) fills box A with $1M and box B with $1k, and puts them both in front of an LDT agent saying "You may choose to take either one or both, and know that I have already chosen whether to fill the first box". The LDT agent takes both."What?" cries the CDT agent. "I thought LDT agents one-box!"LDT agents don't cooperate because they like cooperating. They don't one-box because the name of the action starts with an 'o'. They maximize utility, using counterfactuals that assert that the world they are already in (and the observations they have already seen) can (in the right circumstances) depend (in a relevant way) on what they are later going to do.A paperclipper cooperates with other LDT agents on a one-shot prisoner's dilemma because they get more paperclips that way. Not because it has a primitive property of cooperativeness-with-similar-beings. It needs to get the more paperclips.

4 snips
Nov 7, 2022 • 37min
"What 2026 looks like" by Daniel Kokotajlo
https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like#2022Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This was written for the Vignettes Workshop.[1] The goal is to write out a detailed future history (“trajectory”) that is as realistic (to me) as I can currently manage, i.e. I’m not aware of any alternative trajectory that is similarly detailed and clearly more plausible to me. The methodology is roughly: Write a future history of 2022. Condition on it, and write a future history of 2023. Repeat for 2024, 2025, etc. (I'm posting 2022-2026 now so I can get feedback that will help me write 2027+. I intend to keep writing until the story reaches singularity/extinction/utopia/etc.)What’s the point of doing this? Well, there are a couple of reasons:Sometimes attempting to write down a concrete example causes you to learn things, e.g. that a possibility is more or less plausible than you thought.Most serious conversation about the future takes place at a high level of abstraction, talking about e.g. GDP acceleration, timelines until TAI is affordable, multipolar vs. unipolar takeoff… vignettes are a neglected complementary approach worth exploring.Most stories are written backwards. The author begins with some idea of how it will end, and arranges the story to achieve that ending. Reality, by contrast, proceeds from past to future. It isn’t trying to entertain anyone or prove a point in an argument.Anecdotally, various people seem to have found Paul Christiano’s “tales of doom” stories helpful, and relative to typical discussions those stories are quite close to what we want. (I still think a bit more detail would be good — e.g. Paul’s stories don’t give dates, or durations, or any numbers at all really.)[2]“I want someone to ... write a trajectory for how AI goes down, that is really specific about what the world GDP is in every one of the years from now until insane intelligence explosion. And just write down what the world is like in each of those years because I don't know how to write an internally consistent, plausible trajectory. I don't know how to write even one of those for anything except a ridiculously fast takeoff.” --Buck ShlegerisThis vignette was hard to write. To achieve the desired level of detail I had to make a bunch of stuff up, but in order to be realistic I had to constantly ask “but actually though, what would really happen in this situation?” which made it painfully obvious how little I know about the future. There are numerous points where I had to conclude “Well, this does seem implausible, but I can’t think of anything more plausible at the moment and I need to move on.” I fully expect the actual world to diverge quickly from the trajectory laid out here. Let anyone who (with the benefit of hindsight) claims this divergence as evidence against my judgment prove it by exhibiting a vignette/trajectory they themselves wrote in 2021. If it maintains a similar level of detail (and thus sticks its neck out just as much) while being more accurate, I bow deeply in respect!

7 snips
Nov 4, 2022 • 1h 15min
Counterarguments to the basic AI x-risk case

Oct 29, 2022 • 46min
"Introduction to abstract entropy" by Alex Altair
https://www.lesswrong.com/posts/REA49tL5jsh69X3aM/introduction-to-abstract-entropy#fnrefpi8b39u5hd7This post, and much of the following sequence, was greatly aided by feedback from the following people (among others): Lawrence Chan, Joanna Morningstar, John Wentworth, Samira Nedungadi, Aysja Johnson, Cody Wild, Jeremy Gillen, Ryan Kidd, Justis Mills and Jonathan Mustin. Illustrations by Anne Ore.Introduction & motivationIn the course of researching optimization, I decided that I had to really understand what entropy is.[1] But there are a lot of other reasons why the concept is worth studying:Information theory:Entropy tells you about the amount of information in something.It tells us how to design optimal communication protocols.It helps us understand strategies for (and limits on) file compression.Statistical mechanics:Entropy tells us how macroscopic physical systems act in practice.It gives us the heat equation.We can use it to improve engine efficiency.It tells us how hot things glow, which led to the discovery of quantum mechanics.Epistemics (an important application to me and many others on LessWrong):The concept of entropy yields the maximum entropy principle, which is extremely helpful for doing general Bayesian reasoning.Entropy tells us how "unlikely" something is and how much we would have to fight against nature to get that outcome (i.e. optimize).It can be used to explain the arrow of time.It is relevant to the fate of the universe.And it's also a fun puzzle to figure out!I didn't intend to write a post about entropy when I started trying to understand it. But I found the existing resources (textbooks, Wikipedia, science explainers) so poor that it actually seems important to have a better one as a prerequisite for understanding optimization! One failure mode I was running into was that other resources tended only to be concerned about the application of the concept in their particular sub-domain. Here, I try to take on the task of synthesizing the abstract concept of entropy, to show what's so deep and fundamental about it. In future posts, I'll talk about things like:

Oct 25, 2022 • 11min
"Consider your appetite for disagreements" by Adam Zerner
https://www.lesswrong.com/posts/8vesjeKybhRggaEpT/consider-your-appetite-for-disagreementsPokerThere was a time about five years ago where I was trying to get good at poker. If you want to get good at poker, one thing you have to do is review hands. Preferably with other people.For example, suppose you have ace king offsuit on the button. Someone in the highjack opens to 3 big blinds preflop. You call. Everyone else folds. The flop is dealt. It's a rainbow Q75. You don't have any flush draws. You missed. Your opponent bets. You fold. They take the pot and you move to the next hand.Once you finish your session, it'd be good to come back and review this hand. Again, preferably with another person. To do this, you would review each decision point in the hand. Here, there were two decision points.The first was when you faced a 3BB open from HJ preflop with AKo. In the hand, you decided to call. However, this of course wasn't your only option. You had two others: you could have folded, and you could have raised. Actually, you could have raised to various sizes. You could have raised small to 8BB, medium to 10BB, or big to 12BB. Or hell, you could have just shoved 200BB! But that's not really a realistic option, nor is folding. So in practice your decision was between calling and raising to various realistic sizes.


