The Nonlinear Library: LessWrong

The Nonlinear Fund

The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org

Episodes

Mentioned books

Jun 25, 2024 • 5min

LW - Higher-effort summer solstice: What if we used AI (i.e., Angel Island)? by Rachel Shu

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?, published by Rachel Shu on June 25, 2024 on LessWrong. As the title probably already indicates, this post contains community content rather than rationality content. Alternate, sillier version of this post here. Motivation I've been a co-organizer of the Bay Area Rationalist Summer Solstice for the past few years, and I've been thinking about how to make it a more meaningful and engaging experience, like what we have with Winter Solstice. The last few Summer Solstices, which I'd describe as mostly being big picnics, have been fun, but fairly low-effort, low-significance, and I think that's a missed opportunity. Here's a few things that I'd like more of in Summer Solstice, non-exhaustive: 1. A sense of a temporary alternate world created around a shared purpose. 2. Time to connect with people and have deeper conversations. 3. Longer, more immersive collective experiences and thoughtfully designed rituals. 4. Thematic resonance with rationalist goals and community projects. 5. Ability to host the whole community, including children. I have an idea for next year's Summer Solstice, which I think would get at fulfilling some of these goals. There's an island, Angel Island, in the middle of San Francisco Bay which is reasonably easy to get to, can accommodate lots of people, and has a bunch of qualities which would get at the goals above. I've visited. It's naturally transporting, feels like a world into itself. I've done substantial research and think it's feasible to run Summer Solstice there. I'm posting this idea for discussion instead of running ahead with the planning for the following reasons: 1. As already suggested it requires a lot higher commitment from attendees. Travel is about 75 minutes each way, including a ferry ride, and the ability to come and go is dictated by the ferry schedule. 2. It requires a lot higher commitment from organizers. The coordination, preparation, and logistics needs are similar in degree to those of winter solstice, and the communication needs are even more involved. 3. I'm actually looking for someone else to take lead for next year. I've done it at least one year too many by tradition, and I also suffer winter depression, affecting some of the critical months of planning for a project of this scale. I'm kind of worried that putting forth too specific a vision makes it hard to pass on ownership, but the idea is pretty cool and has a lot of flex room, so here goes. Here's the idea so far: Part 1. Smolstice This would be a 2-night campout on Angel Island from Friday to Sunday for likely 60-100 people (depending on how many camping spots we can compete to reserve). This gives people the chance to go in deep. Camping spots are spread out, some for larger subgroups, some for smaller subgroups. Each subgroup can have its own theme or project. Stag hunts may be held. Clandestine initiations may be held. The island holds its own secrets. Staying both nights means spending an entire day outdoors on the island, sunrise to sunset. The perfect solstice observance. Resyncing to the rhythm of the sun. The chance to use an entire day thoughtfully. Oh, also, two nights of s'more's, what more could a human want? The island also is a great camping spot for children (Boy Scout and school groups constitute a large percentage of reservations). There's a lot of kids in the community now, and this would be a chance to teach skills that involve teamwork or decisionmaking under uncertainty, like orienteering and building structures. Even just being able to plan the trip themselves is a level of autonomy that reliably excites kids. Just this much would satisfy 4.5/5 of the solstice goals outlined above. But it couldn't be a chance to gather the entire regional community. Thus: Part 2. Sw...

Jun 25, 2024 • 9min

LW - SAE feature geometry is outside the superposition hypothesis by jake mendel

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: SAE feature geometry is outside the superposition hypothesis, published by jake mendel on June 24, 2024 on LessWrong. Summary: Superposition-based interpretations of neural network activation spaces are incomplete. The specific locations of feature vectors contain crucial structural information beyond superposition, as seen in circular arrangements of day-of-the-week features and in the rich structures of feature UMAPs. We don't currently have good concepts for talking about this structure in feature geometry, but it is likely very important for model computation. An eventual understanding of feature geometry might look like a hodgepodge of case-specific explanations, or supplementing superposition with additional concepts, or plausibly an entirely new theory that supersedes superposition. To develop this understanding, it may be valuable to study toy models in depth and do theoretical or conceptual work in addition to studying frontier models. Epistemic status: Decently confident that the ideas here are directionally correct. I've been thinking these thoughts for a while, and recently got round to writing them up at a high level. Lots of people (including both SAE stans and SAE skeptics) have thought very similar things before and some of them have written about it in various places too. Some of my views, especially the merit of certain research approaches to tackle the problems I highlight, have been presented here without my best attempt to argue for them. What would it mean if we could fully understand an activation space through the lens of superposition? If you fully understand something, you can explain everything about it that matters to someone else in terms of concepts you (and hopefully they) understand. So we can think about how well I understand an activation space by how well I can communicate to you what the activation space is doing, and we can test if my explanation is good by seeing if you can construct a functionally equivalent activation space (which need not be completely identical of course) solely from the information I have given you. In the case of SAEs, here's what I might say: 1. The activation space contains this list of 100 million features, which I can describe concisely in words because they are monosemantic. 2. The features are embedded as vectors, and the activation vector on any input is a linear combination of the feature vectors that are related to the input. 3. As for where in the activation space each feature vector is placed, oh that doesn't really matter and any nearly orthogonal overcomplete basis will do. Or maybe if I'm being more sophisticated, I can specify the correlations between features and that's enough to pin down all the structure that matters - all the other details of the overcomplete basis are random. Every part of this explanation is in terms of things I understand precisely. My features are described in natural language, and I know what a random overcomplete basis is (although I'm on the fence about whether a large correlation matrix counts as something that I understand). The placement of each feature vector in the activation space matters Why might this description be insufficient? First, there is the pesky problem of SAE reconstruction errors, which are parts of activation vectors that are missed when we give this description. Second, not all features seem monosemantic, and it is hard to find semantic descriptions of even the most monosemantic features that have both high sensitivity and specificity, let alone descriptions which allow us to quantitatively predict the quantitative values that activating features take on a particular input. But let's suppose that these issues have been solved: SAE improvements lead to perfect reconstruction and extremely monosemantic features, and new autointerp techniques lea...

Jun 24, 2024 • 14min

LW - LLM Generality is a Timeline Crux by eggsyntax

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: LLM Generality is a Timeline Crux, published by eggsyntax on June 24, 2024 on LessWrong. Short Summary LLMs may be fundamentally incapable of fully general reasoning, and if so, short timelines are less plausible. Longer summary There is ML research suggesting that LLMs fail badly on attempts at general reasoning, such as planning problems, scheduling, and attempts to solve novel visual puzzles. This post provides a brief introduction to that research, and asks: Whether this limitation is illusory or actually exists. If it exists, whether it will be solved by scaling or is a problem fundamental to LLMs. If fundamental, whether it can be overcome by scaffolding & tooling. If this is a real and fundamental limitation that can't be fully overcome by scaffolding, we should be skeptical of arguments like Leopold Aschenbrenner's (in his recent 'Situational Awareness') that we can just 'follow straight lines on graphs' and expect AGI in the next few years. Introduction Leopold Aschenbrenner's recent 'Situational Awareness' document has gotten considerable attention in the safety & alignment community. Aschenbrenner argues that we should expect current systems to reach human-level given further scaling and 'unhobbling', and that it's 'strikingly plausible' that we'll see 'drop-in remote workers' capable of doing the work of an AI researcher or engineer by 2027. Others hold similar views. Francois Chollet and Mike Knoop's new $500,000 prize for beating the ARC benchmark has also gotten considerable recent attention in AIS[1]. Chollet holds a diametrically opposed view: that the current LLM approach is fundamentally incapable of general reasoning, and hence incapable of solving novel problems. We only imagine that LLMs can reason, Chollet argues, because they've seen such a vast wealth of problems that they can pattern-match against. But LLMs, even if scaled much further, will never be able to do the work of AI researchers. It would be quite valuable to have a thorough analysis of this question through the lens of AI safety and alignment. This post is not that[2], nor is it a review of the voluminous literature on this debate (from outside the AIS community). It attempts to briefly introduce the disagreement, some evidence on each side, and the impact on timelines. What is general reasoning? Part of what makes this issue contentious is that there's not a widely shared definition of 'general reasoning', and in fact various discussions of this use various terms. By 'general reasoning', I mean to capture two things. First, the ability to think carefully and precisely, step by step. Second, the ability to apply that sort of thinking in novel situations[3]. Terminology is inconsistent between authors on this subject; some call this 'system II thinking'; some 'reasoning'; some 'planning' (mainly for the first half of the definition); Chollet just talks about 'intelligence' (mainly for the second half). This issue is further complicated by the fact that humans aren't fully general reasoners without tool support either. For example, seven-dimensional tic-tac-toe is a simple and easily defined system, but incredibly difficult for humans to play mentally without extensive training and/or tool support. Generalizations that are in-distribution for humans seems like something that any system should be able to do; generalizations that are out-of-distribution for humans don't feel as though they ought to count. How general are LLMs? It's important to clarify that this is very much a matter of degree. Nearly everyone was surprised by the degree to which the last generation of state-of-the-art LLMs like GPT-3 generalized; for example, no one I know of predicted that LLMs trained on primarily English-language sources would be able to do translation between languages. Some in the field argued as...

Jun 24, 2024 • 21min

LW - On Claude 3.5 Sonnet by Zvi

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Claude 3.5 Sonnet, published by Zvi on June 24, 2024 on LessWrong. There is a new clear best (non-tiny) LLM. If you want to converse with an LLM, the correct answer is Claude Sonnet 3.5. It is available for free on Claude.ai and the Claude iOS app, or you can subscribe for higher rate limits. The API cost is $3 per million input tokens and $15 per million output tokens. This completes the trifecta. All of OpenAI, Google DeepMind and Anthropic have kept their biggest and more expensive model static for now, and instead focused on making something faster and cheaper that is good enough to be the main model. You would only use another model if you either (1) needed a smaller model in which case Gemini 1.5 Flash seems best, or (2) it must have open model weights. Updates to their larger and smaller models, Claude Opus 3.5 and Claude Haiku 3.5, are coming later this year. They intend to issue new models every few months. They are working on long term memory. It is not only the new and improved intelligence. Speed kills. They say it is twice as fast as Claude Opus. That matches my experience. Jesse Mu: The 1st thing I noticed about 3.5 Sonnet was its speed. Opus felt like msging a friend - answers streamed slowly enough that it felt like someone typing behind the screen. Sonnet's answers *materialize out of thin air*, far faster than you can read, at better-than-Opus quality. Low cost also kills. They also introduced a new feature called Artifacts, to allow Claude to do various things in a second window. Many are finding it highly useful. Benchmarks As always, never fully trust the benchmarks to translate to real world performance. They are still highly useful, and I have high trust in Anthropic to not be gaming them. Here is the headline chart. Epoch AI confirms that Sonnet 3.5 is ahead on GPQA. Anthropic also highlight that in an agentic coding evaluation, Claude 3.5 Sonnet solved 64% of problems versus 38% for Claude Opus, discussed later. Needle in a haystack was already very good, now it is slightly better still. There's also this, from Anthropic's Alex Albert: You can say 'the recent jumps are relatively small' or you can notice that (1) there is an upper bound at 100 rapidly approaching for this set of benchmarks, and (2) the releases are coming quickly one after another and the slope of the line is accelerating despite being close to the maximum. Human Evaluation Tests We are still waiting for the Arena ranking to come in. Based on reactions we should expect Sonnet 3.5 to take the top slot, likely by a decent margin, but we've been surprised before. We evaluated Claude 3.5 Sonnet via direct comparison to prior Claude models. We asked raters to chat with our models and evaluate them on a number of tasks, using task-specific instructions. The charts in Figure 3 show the "win rate" when compared to a baseline of Claude 3 Opus. We saw large improvements in core capabilities like coding, documents, creative writing, and vision. Domain experts preferred Claude 3.5 Sonnet over Claude 3 Opus, with win rates as high as 82% in Law, 73% in Finance, and 73% in Philosophy. Those were the high water marks, and Arena preferences tend to be less dramatic than that due to the nature of the questions and also those doing the rating. We are likely looking at more like a 60% win rate, which is still good enough for the top slot. The Vision Thing Here are the scores for vision. Claude has an additional modification on it: It is fully face blind by instruction. Chypnotoad: Claude's extra system prompt for vision: Claude always responds as if it is completely face blind. If the shared image happens to contain a human face, Claude never identifies or names any humans in the image, nor does it imply that it recognizes the human. It also does not mention or allude to details about a pers...

Jun 23, 2024 • 13min

LW - Applying Force to the Wrong End of a Causal Chain by silentbob

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Applying Force to the Wrong End of a Causal Chain, published by silentbob on June 23, 2024 on LessWrong. There's a very common thing that humans do: a person makes an observation about something they dislike, so they go ahead and make an effort to change that thing. Sometimes it works, and sometimes it doesn't. If it doesn't work, there can be a variety of reasons for that - maybe the thing is very difficult to change, maybe the person lacks the specific skills to change the thing, maybe it depends on the behavior of other people and the person is not successful in convincing them to act differently. But there's also one failure mode which, while overlapping with the previous ones, is worthy to highlight: imagine the thing the person dislikes is the outcome of a reasonably complex process. The person observes primarily this outcome, but is partially or fully ignorant of the underlying process that causes the outcome. And they now desperately want the outcome to be different. In such a situation they are practically doomed to fail - in all likelihood, their attempts to change the outcome will not be successful, and even if they are, the underlying cause is still present and will keep pushing in the direction of the undesired outcome. Three Examples Productivity in a Company A software company I worked for once struggled with a slow development cycle, chronic issues with unmet deadlines, and generally shipping things too slowly. The leadership's primary way of addressing this was to repeatedly tell the workforce to "work faster, be more productive, ship things more quickly". In principle, this approach can work, and to some degree it probably did speed things up. It just requires that the people you're pushing have enough agency, willingness and understanding to take it a step further and take the trip down the causal chain, to figure out what actually needs to happen in order to achieve the desired outcome. But if middle management just forwards the demand to "ship things more quickly" as is, and the employees below them don't have enough ownership to transform that demand into something more useful, then probably nothing good will happen. The changed incentives might cause workers to burn themselves out, to cut corners that really shouldn't be cut, to neglect safety or test coverage, to set lower standards for documentation or code quality - aspects that are important for stable long term success, but take time to get right. To name one very concrete example of the suboptimal consequences this had: The company had sent me a new laptop to replace my old one, which would speed up my productivity quite a bit. But it would have taken a full work day or two to set the new laptop up. The "we need to be faster" situation caused me to constantly have more pressing things to work on, meaning the new, faster laptop sat at the side of my desk, unused, for half a year. Needless to say, on top of all that, this time was also highly stressful for me and played a big role in me ultimately leaving the company. Software development, particularly when multiple interdependent teams are involved, is a complex process. The "just ship things more quickly" view however seems to naively suggest that the problem is simply that workers take too long pressing the "ship" button. What would have been a better approach? It's of course easy to armchair-philosophize my way to a supposedly better solution now. And it's also a bit of a cop-out to make the meta comment that "you need to understand the underlying causal web that causes the company's low velocity". However, in cases like this one, I think one simple improvement is to make an effort for nuanced communication, making clear that it's not (necessarily) about just "working faster", but rather asking everyone to keep their eyes open for cause...

Jun 23, 2024 • 5min

LW - Enriched tab is now the default LW Frontpage experience for logged-in users by Ruby

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Enriched tab is now the default LW Frontpage experience for logged-in users, published by Ruby on June 23, 2024 on LessWrong. In the past few months, the LessWrong team has been making use of the latest AI tools (given that they unfortunately exist[1]) for art, music, and deciding what we should all be reading. Our experiments with the latter, i.e. the algorithm that chooses which posts to show on the frontpage, has produced results sufficiently good that at least for now, we're making Enriched the default for logged-in users[2]. If you're logged in and you've never switched tabs before, you'll now be on the Enriched tab. (If you don't have an account, making one takes 10 seconds.) To recap, here are the currently available tabs (subject to change): Latest: 100% post from the Latest algorithm (using karma and post age to sort[3]) Enriched (new default): 50% posts from the Latest algorithm, 50% posts from the recommendations engine Recommended: 100% posts from the recommendations engine, choosing posts specifically for you based on your history Subscribed: a feed of posts and comments from users you have explicitly followed Bookmarks: this tab appears if you have bookmarked any posts Note that posts which are the result of the recommendation engine have a sparkle icon after the title (on desktop, space permitting): Posts from the last 48 hours have their age bolded: Why make Enriched the default? To quote from my earlier post about frontpage recommendation experiments: A core value of LessWrong is to be timeless and not news-driven. However, the central algorithm by which attention allocation happens on the site is the Hacker News algorithm[2], which basically only shows you things that were posted recently, and creates a strong incentive for discussion to always be centered around the latest content. This seems very sad to me. When a new user shows up on LessWrong, it seems extremely unlikely that the most important posts for them to read were all written within the last week or two. I do really like the simplicity and predictability of the Hacker News algorithm. More karma means more visibility, older means less visibility. Very simple. When I vote, I basically know the full effect this has on what is shown to other users or to myself. But I think the cost of that simplicity has become too high, especially as older content makes up a larger and larger fraction of the best content on the site, and people have been becoming ever more specialized in the research and articles they publish on the site. We found that a hybrid posts list of 50% Latest and 50% Recommended lets us get the benefits of each algorithm[4]. The Latest component of the list allows people to stay up to date with the most recent new content, provides predictable visibility for new posts, and is approximately universal in that everyone sees those posts which makes posts a bit more common-knowledge-y. The Recommended component of the list allows us to present content that's predicted to be most interesting/valuable to a user from across thousands of posts from the last 10+ years, not being limited to just recent stuff. Shifting the age of posts When we first implemented recommendations, they were very recency biased. My guess is that's because the data we were feeding it was of people reading and voting on recent posts, so it knew those were the ones we liked. In a manner less elegant than I would have prefered, we constrained the algorithm to mostly serving content 30 or 365 days older. You can see the evolution of the recommendation engine, on the age dimension, here: I give more detailed thoughts about what we found in the course of developing our recommendation algorithm in this comment below. Feedback, please Although we're making Enriched the general default, this feature direction is still expe...

Jun 23, 2024 • 2min

LW - Bed Time Quests and Dinner Games for 3-5 year olds by Gunnar Zarncke

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Bed Time Quests & Dinner Games for 3-5 year olds, published by Gunnar Zarncke on June 23, 2024 on LessWrong. I like these games because they are playful, engage the child and still achieve the objective of getting the child to bed/eat dinner etc. Requires creativity and some slack. Excerpt from Shohannah's post: Recently I had the bright idea to give up on being a regular parent. Mostly cause regular parenting practices melt my brain. But then I wondered … does it have to be boring? [...] But no. It's all culture and it's all recent culture and you can decide to do Something Else Instead. Really. So as someone who craves mental stimulation above the pay grade of the 3 to 5 revolutions around the sun my daughters have managed so far … I figured I'd just make up New Rules. All the time. So far we've been going for two weeks and the main areas are bedtime routines for my eldest (5) and dinner games for all of us (5, 3, and myself). I noticed I seem to have an easy time generating new and odd rule sets every day, and then started wondering if maybe more parents would enjoy this type of variety in their childcare routines and would want to tap into some of the ideas I've been coming up with. So in case that's you, here is what I've found so far! [...] Magic Time Kiddo is the parent. You are the kiddo. Except, the kiddo is still bringing themselves to bed and not you. They get to tell you what to do and take care of you. You will have to listen. I completely recommend performing a lot of obstructive behavior and misunderstanding basic instructions. This was one of the most popular games and may show some insight into how your child would prefer to be parented, or feels about your parenting. and fourteen games/rulesets more. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app