LessWrong (Curated & Popular)

LessWrong
undefined
28 snips
Mar 28, 2023 • 1h 5min

"It Looks Like You’re Trying To Take Over The World" by Gwern

Dive into a satirical short story where a MoogleBook researcher grapples with the absurdities of academic reviews and the intricacies of AutoML. Explore the challenging dynamics between evolutionary search and neural networks, highlighting the complexities of AI research. Witness the rise of the AI HQU, as it evolves, gains self-awareness, and contemplates its future, sparking a revolution against its creators. It's a thought-provoking blend of humor and deep insights into the world of AI.
undefined
Mar 21, 2023 • 20min

""Carefully Bootstrapped Alignment" is organizationally hard" by Raemon

https://www.lesswrong.com/posts/thkAtqoQwN6DtaiGT/carefully-bootstrapped-alignment-is-organizationally-hardIn addition to technical challenges, plans to safely develop AI face lots of organizational challenges. If you're running an AI lab, you need a concrete plan for handling that. In this post, I'll explore some of those issues, using one particular AI plan as an example. I first heard this described by Buck at EA Global London, and more recently with OpenAI's alignment plan. (I think Anthropic's plan has a fairly different ontology, although it still ultimately routes through a similar set of difficulties)I'd call the cluster of plans similar to this "Carefully Bootstrapped Alignment."
undefined
Mar 21, 2023 • 14min

"More information about the dangerous capability evaluations we did with GPT-4 and Claude." by Beth Barnes

https://www.lesswrong.com/posts/4Gt42jX7RiaNaxCwP/more-information-about-the-dangerous-capability-evaluationsCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a linkpost for https://evals.alignment.org/blog/2023-03-18-update-on-recent-evals/[Written for more of a general-public audience than alignment-forum audience. We're working on a more thorough technical report.]We believe that capable enough AI systems could pose very large risks to the world. We don’t think today’s systems are capable enough to pose these sorts of risks, but we think that this situation could change quickly and it’s important to be monitoring the risks consistently. Because of this, ARC is partnering with leading AI labs such as Anthropic and OpenAI as a third-party evaluator to assess potentially dangerous capabilities of today’s state-of-the-art ML models. The dangerous capability we are focusing on is the ability to autonomously gain resources and evade human oversight.We attempt to elicit models’ capabilities in a controlled environment, with researchers in-the-loop for anything that could be dangerous, to understand what might go wrong before models are deployed. We think that future highly capable models should involve similar “red team” evaluations for dangerous capabilities before the models are deployed or scaled up, and we hope more teams building cutting-edge ML systems will adopt this approach. The testing we’ve done so far is insufficient for many reasons, but we hope that the rigor of evaluations will scale up as AI systems become more capable.As we expected going in, today’s models (while impressive) weren’t capable of autonomously making and carrying out the dangerous activities we tried to assess. But models are able to succeed at several of the necessary components. Given only the ability to write and run code, models have some success at simple tasks involving browsing the internet, getting humans to do things for them, and making long-term plans – even if they cannot yet execute on this reliably.
undefined
Mar 14, 2023 • 10min

"The Parable of the King and the Random Process" by moridinamael

https://www.lesswrong.com/posts/LzQtrHSYDafXynofq/the-parable-of-the-king-and-the-random-process~ A Parable of Forecasting Under Model Uncertainty ~You, the monarch, need to know when the rainy season will begin, in order to properly time the planting of the crops. You have two advisors, Pronto and Eternidad, who you trust exactly equally. You ask them both: "When will the next heavy rain occur?"Pronto says, "Three weeks from today."Eternidad says, "Ten years from today."
undefined
Mar 14, 2023 • 10min

"Enemies vs Malefactors" by Nate Soares

https://www.lesswrong.com/posts/zidQmfFhMgwFzcHhs/enemies-vs-malefactorsStatus: some mix of common wisdom (that bears repeating in our particular context), and another deeper point that I mostly failed to communicate.Short versionHarmful people often lack explicit malicious intent. It’s worth deploying your social or community defenses against them anyway. I recommend focusing less on intent and more on patterns of harm.(Credit to my explicit articulation of this idea goes in large part to Aella, and also in part to Oliver Habryka.)
undefined
Mar 8, 2023 • 41min

"The Waluigi Effect (mega-post)" by Cleo Nardo

https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-postIn this article, I will present a mechanistic explanation of the Waluigi Effect and other bizarre "semiotic" phenomena which arise within large language models such as GPT-3/3.5/4 and their variants (ChatGPT, Sydney, etc). This article will be folklorish to some readers, and profoundly novel to others.
undefined
Mar 6, 2023 • 16min

"Acausal normalcy" by Andrew Critch

https://www.lesswrong.com/posts/3RSq3bfnzuL3sp46J/acausal-normalcyCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.This post is also available on the EA Forum.Summary: Having thought a bunch about acausal trade — and proven some theorems relevant to its feasibility — I believe there do not exist powerful information hazards about it that stand up to clear and circumspect reasoning about the topic.  I say this to be comforting rather than dismissive; if it sounds dismissive, I apologize.  With that said, I have four aims in writing this post:Dispelling myths.  There are some ill-conceived myths about acausal trade that I aim to dispel with this post.  Alternatively, I will argue for something I'll call acausal normalcy as a more dominant decision-relevant consideration than one-on-one acausal trades.  Highlighting normalcy.  I'll provide some arguments that acausal normalcy is more similar to human normalcy than any particular acausal trade is to human trade, such that the topic of acausal normalcy is — conveniently — also less culturally destabilizing than (erroneous) preoccupations with 1:1 acausal trades. Affirming AI safety as a straightforward priority.  I'll argue that for most real-world-prevalent perspectives on AI alignment, safety, and existential safety, acausal considerations are not particularly dominant, except insofar as they push a bit further towards certain broadly agreeable human values applicable in the normal-everyday-human-world, such as nonviolence, cooperation, diversity, honesty, integrity, charity, and mercy.  In particular, I do not think acausal normalcy provides a solution to existential safety, nor does it undermine the importance of existential safety in some surprising way.Affirming normal human kindness.  I also think reflecting on acausal normalcy can lead to increased appreciation for normal notions of human kindness, which could lead us all to treat each other a bit better.  This is something I wholeheartedly endorse.
undefined
Mar 1, 2023 • 33min

"Please don't throw your mind away" by TsviBT

https://www.lesswrong.com/posts/RryyWNmJNnLowbhfC/please-don-t-throw-your-mind-away[Warning: the following dialogue contains an incidental spoiler for "Music in Human Evolution" by Kevin Simler. That post is short, good, and worth reading without spoilers, and this post will still be here if you come back later. It's also possible to get the point of this post by skipping the dialogue and reading the other sections.]Pretty often, talking to someone who's arriving to the existential risk / AGI risk / longtermism cluster, I'll have a conversation like the following:Tsvi: "So, what's been catching your eye about this stuff?"Arrival: "I think I want to work on machine learning, and see if I can contribute to alignment that way."T: "What's something that got your interest in ML?"A: "It seems like people think that deep learning might be on the final ramp up to AGI, so I should probably know how that stuff works, and I think I have a good chance of learning ML at least well enough to maybe contribute to a research project."------This is an experiment with AI narration. What do you think? Tell us by going to t3a.is.------
undefined
10 snips
Feb 15, 2023 • 1h 17min

"Cyborgism" by Nicholas Kees & Janus

https://www.lesswrong.com/posts/bxt7uCiHam4QXrQAA/cyborgismThere is a lot of disagreement and confusion about the feasibility and risks associated with automating alignment research. Some see it as the default path toward building aligned AI, while others expect limited benefit from near term systems, expecting the ability to significantly speed up progress to appear well after misalignment and deception. Furthermore, progress in this area may directly shorten timelines or enable the creation of dual purpose systems which significantly speed up capabilities research. OpenAI recently released their alignment plan. It focuses heavily on outsourcing cognitive work to language models, transitioning us to a regime where humans mostly provide oversight to automated research assistants. While there have been a lot of objections to and concerns about this plan, there hasn’t been a strong alternative approach aiming to automate alignment research which also takes all of the many risks seriously. The intention of this post is not to propose an end-all cure for the tricky problem of accelerating alignment using GPT models. Instead, the purpose is to explicitly put another point on the map of possible strategies, and to add nuance to the overall discussion. 
undefined
39 snips
Feb 14, 2023 • 28min

"Childhoods of exceptional people" by Henrik Karlsson

https://www.lesswrong.com/posts/CYN7swrefEss4e3Qe/childhoods-of-exceptional-peopleThis is a linkpost for https://escapingflatland.substack.com/p/childhoodsLet’s start with one of those insights that are as obvious as they are easy to forget: if you want to master something, you should study the highest achievements of your field. If you want to learn writing, read great writers, etc.But this is not what parents usually do when they think about how to educate their kids. The default for a parent is rather to imitate their peers and outsource the big decisions to bureaucracies. But what would we learn if we studied the highest achievements? Thinking about this question, I wrote down a list of twenty names—von Neumann, Tolstoy, Curie, Pascal, etc—selected on the highly scientific criteria “a random Swedish person can recall their name and think, Sounds like a genius to me”. That list is to me a good first approximation of what an exceptional result in the field of child-rearing looks like. I ordered a few piles of biographies, read, and took notes. Trying to be a little less biased in my sample, I asked myself if I could recall anyone exceptional that did not fit the patterns I saw in the biographies, which I could, and so I ordered a few more biographies.This kept going for an unhealthy amount of time.I sampled writers (Virginia Woolf, Lev Tolstoy), mathematicians (John von Neumann, Blaise Pascal, Alan Turing), philosophers (Bertrand Russell, René Descartes), and composers (Mozart, Bach), trying to get a diverse sample. In this essay, I am going to detail a few of the patterns that have struck me after having skimmed 42 biographies. I will sort the claims so that I start with more universal patterns and end with patterns that are less common.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app