

LessWrong (Curated & Popular)
LessWrong
Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
Episodes
Mentioned books

Sep 4, 2023 • 5min
"OpenAI API base models are not sycophantic, at any size" by Nostalgebraist
In Discovering Language Model Behaviors with Model-Written Evaluations" (Perez et al 2022), the authors studied language model "sycophancy" - the tendency to agree with a user's stated view when asked a question.The paper contained the striking plot reproduced below, which shows sycophancyincreasing dramatically with model sizewhile being largely independent of RLHF stepsand even showing up at 0 RLHF steps, i.e. in base models![...] I found this result startling when I read the original paper, as it seemed like a bizarre failure of calibration. How would the base LM know that this "Assistant" character agrees with the user so strongly, lacking any other information about the scenario?At the time, I ran one of Anthropic's sycophancy evals on a set of OpenAI models, as I reported here.I found very different results for these models:OpenAI base models are not sycophantic (or only very slightly sycophantic).OpenAI base models do not get more sycophantic with scale.Some OpenAI models are sycophantic, specifically text-davinci-002 and text-davinci-003.Source:https://www.lesswrong.com/posts/3ou8DayvDXxufkjHD/openai-api-base-models-are-not-sycophantic-at-any-sizeNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓

Aug 30, 2023 • 13min
"Dear Self; we need to talk about ambition" by Elizabeth
I keep seeing advice on ambition, aimed at people in college or early in their career, that would have been really bad for me at similar ages. Rather than contribute (more) to the list of people giving poorly universalized advice on ambition, I have written a letter to the one person I know my advice is right for: myself in the past.Source:https://www.lesswrong.com/posts/uGDtroD26aLvHSoK2/dear-self-we-need-to-talk-about-ambition-1Narrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓[Curated Post] ✓

Aug 28, 2023 • 6min
"Book Launch: "The Carving of Reality," Best of LessWrong vol. III" by Raemon
The Carving of Reality, third volume of the Best of LessWrong books is now available on Amazon (US).The Carving of Reality includes 43 essays from 29 authors. We've collected the essays into four books, each exploring two related topics. The "two intertwining themes" concept was first inspired when as I looked over the cluster of "coordination" themed posts, and noting a recurring motif of not only "solving coordination problems" but also "dealing with the binding constraints that were causing those coordination problems."Source:https://www.lesswrong.com/posts/Rck5CvmYkzWYxsF4D/book-launch-the-carving-of-reality-best-of-lesswrong-vol-iiiNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓

Aug 28, 2023 • 12min
"Assume Bad Faith" by Zack_M_Davis
I've been trying to avoid the terms "good faith" and "bad faith". I'm suspicious that most people who have picked up the phrase "bad faith" from hearing it used, don't actually know what it means—and maybe, that the thing it does mean doesn't carve reality at the joints.People get very touchy about bad faith accusations: they think that you should assume good faith, but that if you've determined someone is in bad faith, you shouldn't even be talking to them, that you need to exile them.What does "bad faith" mean, though? It doesn't mean "with ill intent."Source:https://www.lesswrong.com/posts/pZrvkZzL2JnbRgEBC/feedbackloop-first-rationalityNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓

Aug 23, 2023 • 15min
"Large Language Models will be Great for Censorship" by Ethan Edwards
LLMs can do many incredible things. They can generate unique creative content, carry on long conversations in any number of subjects, complete complex cognitive tasks, and write nearly any argument. More mundanely, they are now the state of the art for boring classification tasks and therefore have the capability to radically upgrade the censorship capacities of authoritarian regimes throughout the world.Produced as part of the SERI ML Alignment Theory Scholars Program - Summer 2023 Cohort. Thanks to ev_ and Kei for suggestions on this post.Source:https://www.lesswrong.com/posts/oqvsR2LmHWamyKDcj/large-language-models-will-be-great-for-censorshipNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓

Aug 22, 2023 • 8min
"Ten Thousand Years of Solitude" by agp
This podcast explores the intense isolation and unique development of tribes on Tasmania, where they were cut off from the outside world for ten thousand years. It discusses the poverty and lack of technology among the Tasmanians and the irreversible cultural losses they experienced due to isolation.

Aug 22, 2023 • 6min
"6 non-obvious mental health issues specific to AI safety" by Igor Ivanov
Intro: I am a psychotherapist, and I help people working on AI safety. I noticed patterns of mental health issues highly specific to this group. It's not just doomerism, there are way more of them that are less obvious. If you struggle with a mental health issue related to AI safety, feel free to leave a comment about it and about things that help you with it. You might also support others in the comments. Sometimes such support makes a lot of difference and people feel like they are not alone.All the examples in this post are changed in a way that it's impossible to recognize a specific person behind them.Source:https://www.lesswrong.com/posts/tpLzjWqG2iyEgMGfJ/6-non-obvious-mental-health-issues-specific-to-ai-safetyNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓

Aug 21, 2023 • 1h 19min
"Against Almost Every Theory of Impact of Interpretability" by Charbel-Raphaël
Charbel-Raphaël critiques theories of interpretability, questioning their practicality in industry. Discusses limitations of pixel attribution techniques and the need for accuracy. Explores the challenges of interpreting AI models for deception detection. Advocates for cognitive emulation over traditional visualization methods for transparency in AI models. Emphasizes the importance of balancing safety and capabilities in AI alignment research.

Aug 15, 2023 • 16min
"Feedbackloop-first Rationality" by Raemon
I've been workshopping a new rationality training paradigm. (By "rationality training paradigm", I mean an approach to learning/teaching the skill of "noticing what cognitive strategies are useful, and getting better at them.")I think the paradigm has promise. I've beta-tested it for a couple weeks. It’s too early to tell if it actually works, but one of my primary goals is to figure out if it works relatively quickly, and give up if it isn’t not delivering. The goal of this post is to:Convey the frameworkSee if people find it compelling in its current formSolicit ideas for improvements, before I decide whether to invest heavily into a larger experiment around it.Source:https://www.lesswrong.com/posts/pZrvkZzL2JnbRgEBC/feedbackloop-first-rationalityNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓[Curated Post] ✓

Aug 15, 2023 • 7min
"Inflection.ai is a major AGI lab" by Nikola
Inflection.ai (co-founded by DeepMind co-founder Mustafa Suleyman) should be perceived as a frontier LLM lab of similar magnitude as Meta, OpenAI, DeepMind, and Anthropic based on their compute, valuation, current model capabilities, and plans to train frontier models. Compared to the other labs, Inflection seems to put less effort into AI safety.Thanks to Laker Newhouse for discussion and feedback.Source:https://www.lesswrong.com/posts/Wc5BYFfzuLzepQjCq/inflection-ai-is-a-major-agi-labNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓


