LessWrong (Curated & Popular)

LessWrong

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.

Episodes

Mentioned books

Sep 15, 2022 • 27min

"Language models seem to be much better than humans at next-token prediction" by Buck, Fabien and LawrenceC

https://www.lesswrong.com/posts/htrZrxduciZ5QaCjw/language-models-seem-to-be-much-better-than-humans-at-nextCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.[Thanks to a variety of people for comments and assistance (especially Paul Christiano, Nostalgebraist, and Rafe Kennedy), and to various people for playing the game. Buck wrote the top-1 prediction web app; Fabien wrote the code for the perplexity experiment and did most of the analysis and wrote up the math here, Lawrence did the research on previous measurements. Epistemic status: we're pretty confident of our work here, but haven't engaged in a super thorough review process of all of it--this was more like a side-project than a core research project.]How good are modern language models compared to humans, at the task language models are trained on (next token prediction on internet text)? While there are language-based tasks that you can construct where humans can make a next-token prediction better than any language model, we aren't aware of any apples-to-apples comparisons on non-handcrafted datasets. To answer this question, we performed a few experiments comparing humans to language models on next-token prediction on OpenWebText.Contrary to some previous claims, we found that humans seem to be consistently worse at next-token prediction (in terms of both top-1 accuracy and perplexity) than even small models like Fairseq-125M, a 12-layer transformer roughly the size and quality of GPT-1. That is, even small language models are "superhuman" at predicting the next token. That being said, it seems plausible that humans can consistently beat the smaller 2017-era models (though not modern models) with a few hours more practice and strategizing. We conclude by discussing some of our takeaways from this result. We're not claiming that this result is completely novel or surprising. For example, FactorialCode makes a similar claim as an answer on this LessWrong post about this question. We've also heard from some NLP people that the superiority of LMs to humans for next-token prediction is widely acknowledged in NLP. However, we've seen incorrect claims to the contrary on the internet, and as far as we know there hasn't been a proper apples-to-apples comparison, so we believe there's some value to our results.If you want to play with our website, it’s here; in our opinion playing this game for half an hour gives you some useful perspective on what it’s like to be a language model.

Sep 14, 2022 • 13min

"Moral strategies at different capability levels" by Richard Ngo

https://www.lesswrong.com/posts/jDQm7YJxLnMnSNHFu/moral-strategies-at-different-capability-levelsCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.Let’s consider three ways you can be altruistic towards another agent:You care about their welfare: some metric of how good their life is (as defined by you). I’ll call this care-morality - it endorses things like promoting their happiness, reducing their suffering, and hedonic utilitarian behavior (if you care about many agents).You care about their agency: their ability to achieve their goals (as defined by them). I’ll call this cooperation-morality - it endorses things like honesty, fairness, deontological behavior towards others, and some virtues (like honor).You care about obedience to them. I’ll call this deference-morality - it endorses things like loyalty, humility, and respect for authority.I think a lot of unresolved tensions in ethics comes from seeing these types of morality as in opposition to each other, when they’re actually complementary:

Sep 11, 2022 • 24min

"Worlds Where Iterative Design Fails" by John Wentworth

Sep 11, 2022 • 1h 35min

"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland

https://www.lesswrong.com/posts/QBAjndPuFbhEXKcCr/my-understanding-of-what-everyone-in-technical-alignment-isDespite a clear need for it, a good source explaining who is doing what and why in technical AI alignment doesn't exist. This is our attempt to produce such a resource. We expect to be inaccurate in some ways, but it seems great to get out there and let Cunningham’s Law do its thing.[1] The main body contains our understanding of what everyone is doing in technical alignment and why, as well as at least one of our opinions on each approach. We include supplements visualizing differences between approaches and Thomas’s big picture view on alignment. The opinions written are Thomas and Eli’s independent impressions, many of which have low resilience. Our all-things-considered views are significantly more uncertain. This post was mostly written while Thomas was participating in the 2022 iteration SERI MATS program, under mentor John Wentworth. Thomas benefited immensely from conversations with other SERI MATS participants, John Wentworth, as well as many others who I met this summer.

Sep 9, 2022 • 46min

"Unifying Bargaining Notions (1/2)" by Diffractor

https://www.lesswrong.com/posts/rYDas2DDGGDRc8gGB/unifying-bargaining-notions-1-2Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a two-part sequence of posts, in the ancient LessWrong tradition of decision-theory-posting. This first part will introduce various concepts of bargaining solutions and dividing gains from trade, which the reader may or may not already be familiar with.The upcoming part will be about how all introduced concepts from this post are secretly just different facets of the same underlying notion, as originally discovered by John Harsanyi back in 1963 and rediscovered by me from a completely different direction. The fact that the various different solution concepts in cooperative game theory are all merely special cases of a General Bargaining Solution for arbitrary games, is, as far as I can tell, not common knowledge on Less Wrong.Bargaining GamesLet's say there's a couple with a set of available restaurant options. Neither of them wants to go without the other, and if they fail to come to an agreement, the fallback is eating a cold canned soup dinner at home, the worst of all the options. However, they have different restaurant preferences. What's the fair way to split the gains from trade?Well, it depends on their restaurant preferences, and preferences are typically encoded with utility functions. Since both sides agree that the disagreement outcome is the worst, they might as well index that as 0 utility, and their favorite respective restaurants as 1 utility, and denominate all the other options in terms of what probability mix between a cold canned dinner and their favorite restaurant would make them indifferent. If there's something that scores 0.9 utility for both, it's probably a pretty good pick!

Sep 5, 2022 • 1h 48min

'Simulators' by Janus

https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators#fncrt8wagfir9SummaryTL;DR: Self-supervised learning may create AGI or its foundation. What would that look like?Unlike the limit of RL, the limit of self-supervised learning has received surprisingly little conceptual attention, and recent progress has made deconfusion in this domain more pressing.Existing AI taxonomies either fail to capture important properties of self-supervised models or lead to confusing propositions. For instance, GPT policies do not seem globally agentic, yet can be conditioned to behave in goal-directed ways. This post describes a frame that enables more natural reasoning about properties like agency: GPT, insofar as it is inner-aligned, is a simulator which can simulate agentic and non-agentic simulacra.The purpose of this post is to capture these objects in words so GPT can reference them and provide a better foundation for understanding them.I use the generic term “simulator” to refer to models trained with predictive loss on a self-supervised dataset, invariant to architecture or data type (natural language, code, pixels, game states, etc). The outer objective of self-supervised learning is Bayes-optimal conditional inference over the prior of the training distribution, which I call the simulation objective, because a conditional model can be used to simulate rollouts which probabilistically obey its learned distribution by iteratively sampling from its posterior (predictions) and updating the condition (prompt). Analogously, a predictive model of physics can be used to compute rollouts of phenomena in simulation. A goal-directed agent which evolves according to physics can be simulated by the physics rule parameterized by an initial state, but the same rule could also propagate agents with different values, or non-agentic phenomena like rocks. This ontological distinction between simulator (rule) and simulacra (phenomena) applies directly to generative models like GPT.

Aug 8, 2022 • 23min

"Humans provide an untapped wealth of evidence about alignment" by TurnTrout & Quintin Pope

https://www.lesswrong.com/posts/CjFZeDD6iCnNubDoS/humans-provide-an-untapped-wealth-of-evidence-about#fnref7a5ti4623qb Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. TL;DR: To even consciously consider an alignment research direction, you should have evidence to locate it as a promising lead. As best I can tell, many directions seem interesting but do not have strong evidence of being “entangled” with the alignment problem such that I expect them to yield significant insights. For example, “we can solve an easier version of the alignment problem by first figuring out how to build an AI which maximizes the number of real-world diamonds” has intuitive appeal and plausibility, but this claim doesn’t have to be true and this problem does not necessarily have a natural, compact solution. In contrast, there do in fact exist humans who care about diamonds. Therefore, there are guaranteed-to-exist alignment insights concerning the way people come to care about e.g. real-world diamonds. “Consider how humans navigate the alignment subproblem you’re worried about” is a habit which I (TurnTrout) picked up from Quintin Pope. I wrote the post, he originated the tactic.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app