LessWrong (Curated & Popular)

LessWrong

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

Episodes

Mentioned books

Feb 13, 2023 • 5min

"What I mean by "alignment is in large part about making cognition aimable at all"" by Nate Soares

https://www.lesswrong.com/posts/NJYmovr9ZZAyyTBwM/what-i-mean-by-alignment-is-in-large-part-about-makingCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.(Epistemic status: attempting to clear up a misunderstanding about points I have attempted to make in the past. This post is not intended as an argument for those points.)I have long said that the lion's share of the AI alignment problem seems to me to be about pointing powerful cognition at anything at all, rather than figuring out what to point it at.It’s recently come to my attention that some people have misunderstood this point, so I’ll attempt to clarify here.

Feb 10, 2023 • 1h 13min

"On not getting contaminated by the wrong obesity ideas" by Natália Coelho Mendonça

https://www.lesswrong.com/posts/NRrbJJWnaSorrqvtZ/on-not-getting-contaminated-by-the-wrong-obesity-ideasA Chemical Hunger (a), a series by the authors of the blog Slime Mold Time Mold (SMTM), argues that the obesity epidemic is entirely caused (a) by environmental contaminants. In my last post, I investigated SMTM’s main suspect (lithium).[1] This post collects other observations I have made about SMTM’s work, not narrowly related to lithium, but rather focused on the broader thesis of their blog post series.I think that the environmental contamination hypothesis of the obesity epidemic is a priori plausible. After all, we know that chemicals can affect humans, and our exposure to chemicals has plausibly changed a lot over time. However, I found that several of what seem to be SMTM’s strongest arguments in favor of the contamination theory turned out to be dubious, and that nearly all of the interesting things I thought I’d learned from their blog posts turned out to actually be wrong. I’ll explain that in this post.

Feb 8, 2023 • 34min

"SolidGoldMagikarp (plus, prompt generation)"

https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generationWork done at SERI-MATS, over the past two months, by Jessica Rumbelow and Matthew Watkins.TL;DRAnomalous tokens: a mysterious failure mode for GPT (which reliably insulted Matthew)We have found a set of anomalous tokens which result in a previously undocumented failure mode for GPT-2 and GPT-3 models. (The 'instruct' models “are particularly deranged” in this context, as janus has observed.)Many of these tokens reliably break determinism in the OpenAI GPT-3 playground at temperature 0 (which theoretically shouldn't happen).Prompt generation: a new interpretability method for language models (which reliably finds prompts that result in a target completion). This is good for:eliciting knowledgegenerating adversarial inputsautomating prompt search (e.g. for fine-tuning)In this post, we'll introduce the prototype of a new model-agnostic interpretability method for language models which reliably generates adversarial prompts that result in a target completion. We'll also demonstrate a previously undocumented failure mode for GPT-2 and GPT-3 language models, which results in bizarre completions (in some cases explicitly contrary to the purpose of the model), and present the results of our investigation into this phenomenon. Further detail can be found in a follow-up post.

Feb 3, 2023 • 7min

"Focus on the places where you feel shocked everyone's dropping the ball" by Nate Soares

https://www.lesswrong.com/posts/Zp6wG5eQFLGWwcG6j/focus-on-the-places-where-you-feel-shocked-everyone-sWriting down something I’ve found myself repeating in different conversations:If you're looking for ways to help with the whole “the world looks pretty doomed” business, here's my advice: look around for places where we're all being total idiots.Look for places where everyone's fretting about a problem that some part of you thinks it could obviously just solve.Look around for places where something seems incompetently run, or hopelessly inept, and where some part of you thinks you can do better.Then do it better.

43 snips

Feb 2, 2023 • 1h 7min

"How 'Discovering Latent Knowledge in Language Models Without Supervision' Fits Into a Broader Alignment Scheme" by Collin

https://www.lesswrong.com/posts/L4anhrxjv8j2yRKKp/how-discovering-latent-knowledge-in-language-models-withoutCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.IntroductionA few collaborators and I recently released a new paper: Discovering Latent Knowledge in Language Models Without Supervision. For a quick summary of our paper, you can check out this Twitter thread.In this post I will describe how I think the results and methods in our paper fit into a broader scalable alignment agenda. Unlike the paper, this post is explicitly aimed at an alignment audience and is mainly conceptual rather than empirical. Tl;dr: unsupervised methods are more scalable than supervised methods, deep learning has special structure that we can exploit for alignment, and we may be able to recover superhuman beliefs from deep learning representations in a totally unsupervised way.Disclaimers: I have tried to make this post concise, at the cost of not making the full arguments for many of my claims; you should treat this as more of a rough sketch of my views rather than anything comprehensive. I also frequently change my mind – I’m usually more consistently excited about some of the broad intuitions but much less wedded to the details – and this of course just represents my current thinking on the topic.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

LessWrong (Curated & Popular)

Episodes

Mentioned books

"What I mean by "alignment is in large part about making cognition aimable at all"" by Nate Soares

"On not getting contaminated by the wrong obesity ideas" by Natália Coelho Mendonça

"SolidGoldMagikarp (plus, prompt generation)"

"Focus on the places where you feel shocked everyone's dropping the ball" by Nate Soares

"Basics of Rationalist Discourse" by Duncan Sabien

"My Model Of EA Burnout" by Logan Strohl

"Sapir-Whorf for Rationalists" by Duncan Sabien

"The Social Recession: By the Numbers" by Anton Stjepan Cebalo

"Recursive Middle Manager Hell" by Raemon

"How 'Discovering Latent Knowledge in Language Models Without Supervision' Fits Into a Broader Alignment Scheme" by Collin

The AI-powered Podcast Player