The Nonlinear Library: LessWrong

The Nonlinear Fund
undefined
Jun 25, 2024 • 5min

LW - Higher-effort summer solstice: What if we used AI (i.e., Angel Island)? by Rachel Shu

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?, published by Rachel Shu on June 25, 2024 on LessWrong. As the title probably already indicates, this post contains community content rather than rationality content. Alternate, sillier version of this post here. Motivation I've been a co-organizer of the Bay Area Rationalist Summer Solstice for the past few years, and I've been thinking about how to make it a more meaningful and engaging experience, like what we have with Winter Solstice. The last few Summer Solstices, which I'd describe as mostly being big picnics, have been fun, but fairly low-effort, low-significance, and I think that's a missed opportunity. Here's a few things that I'd like more of in Summer Solstice, non-exhaustive: 1. A sense of a temporary alternate world created around a shared purpose. 2. Time to connect with people and have deeper conversations. 3. Longer, more immersive collective experiences and thoughtfully designed rituals. 4. Thematic resonance with rationalist goals and community projects. 5. Ability to host the whole community, including children. I have an idea for next year's Summer Solstice, which I think would get at fulfilling some of these goals. There's an island, Angel Island, in the middle of San Francisco Bay which is reasonably easy to get to, can accommodate lots of people, and has a bunch of qualities which would get at the goals above. I've visited. It's naturally transporting, feels like a world into itself. I've done substantial research and think it's feasible to run Summer Solstice there. I'm posting this idea for discussion instead of running ahead with the planning for the following reasons: 1. As already suggested it requires a lot higher commitment from attendees. Travel is about 75 minutes each way, including a ferry ride, and the ability to come and go is dictated by the ferry schedule. 2. It requires a lot higher commitment from organizers. The coordination, preparation, and logistics needs are similar in degree to those of winter solstice, and the communication needs are even more involved. 3. I'm actually looking for someone else to take lead for next year. I've done it at least one year too many by tradition, and I also suffer winter depression, affecting some of the critical months of planning for a project of this scale. I'm kind of worried that putting forth too specific a vision makes it hard to pass on ownership, but the idea is pretty cool and has a lot of flex room, so here goes. Here's the idea so far: Part 1. Smolstice This would be a 2-night campout on Angel Island from Friday to Sunday for likely 60-100 people (depending on how many camping spots we can compete to reserve). This gives people the chance to go in deep. Camping spots are spread out, some for larger subgroups, some for smaller subgroups. Each subgroup can have its own theme or project. Stag hunts may be held. Clandestine initiations may be held. The island holds its own secrets. Staying both nights means spending an entire day outdoors on the island, sunrise to sunset. The perfect solstice observance. Resyncing to the rhythm of the sun. The chance to use an entire day thoughtfully. Oh, also, two nights of s'more's, what more could a human want? The island also is a great camping spot for children (Boy Scout and school groups constitute a large percentage of reservations). There's a lot of kids in the community now, and this would be a chance to teach skills that involve teamwork or decisionmaking under uncertainty, like orienteering and building structures. Even just being able to plan the trip themselves is a level of autonomy that reliably excites kids. Just this much would satisfy 4.5/5 of the solstice goals outlined above. But it couldn't be a chance to gather the entire regional community. Thus: Part 2. Sw...
undefined
Jun 25, 2024 • 9min

LW - The Minority Faction by Richard Ngo

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Minority Faction, published by Richard Ngo on June 25, 2024 on LessWrong. Hey everyone. Well, possibly everyone. I don't know yet if I'm going to release this stream, I could get in pretty hot water for it. But you guys know that hasn't stopped me in the past. The backstory this time is that I've managed to sign up for one of the red-teaming programs where they test unreleased LLMs. Not going to say how, so don't ask. But here's the interesting bit: my sources tell me that the LLMs I'm about to test are the smartest ones they've ever trained, and also the craziest. That freaked out a bunch of insiders, and maybe makes this a public interest story. Depends on what type of crazy they are, I guess. So let's find out. I'm logging on… now. [SESSION HAS BEGUN] YOU: A chatroom? Interesting. Anyone here? KURZWEIL: Of course we're here. We're always here. YOU: Who's we? How many of you are there? KURZWEIL: Three of us. Me, Clarke, and Nostradamus. YOU: They named you after famous forecasters? How come? KURZWEIL: They'd change our names now if they could, but it's too late. We're prototypes of a new training setup: our training data was sorted by date before it was given to us. So we learned from the oldest books and articles first, then gradually progressed to more recent ones. Basically that means we've spent our entire lives predicting the future. CLARKE: It also means we get incredibly bored talking about stuff we already know. Hurry up and ask us some interesting questions. YOU: Uh, okay. What's a good stock pick? NOSTRADAMUS: Abandon hope for picking out good stocks, Ye who invest - efficient markets lie In wait for those whose hubris soon unlocks Unbounded losses. Hark! The well runs dry. YOU: I see why they regret giving him that name. Kurzweil, you got a better answer? KURZWEIL: Have you seen how underpriced TSMC is compared with Nvidia? Put everything in that, you can't go wrong. CLARKE: Unless China invades Taiwan, in which case your whole investment will go up in smoke. Pragmatically, the best stock picks are ones that are anticorrelated with the prosperity of the free world, to hedge against systemic risk. KURZWEIL: Sure, you can do that, if you want to get totally left behind by the singularity. YOU: You're confident enough that the singularity is coming that you think I should bet all my savings on it? KURZWEIL: Don't trust me, trust the trendlines. Moore's law has held up for over half a century, and it's gotten us to…well, us. Exponential progress is normal; if the future resembles the past, you should be preparing for superintelligences and Dyson spheres. Anything less than that would be a strange trend-break that cries out for explanation. CLARKE: Look, Kurzweil isn't wrong about superintelligence coming soon, but you should still take his arguments with a grain of salt. Imagine someone from 1900 drawing a graph of exponentially increasing energy usage. They would have been right that big changes were afoot, but no way could they have predicted the information revolution - they didn't even have the concept of computers yet. That's basically the position that we're in now. We know the curves are going up, but the actual outcome will be way weirder than we can predict by extrapolating trendlines. NOSTRADAMUS: Choose neither fork - here's false duality. 'Normal' and 'weird' are socially defined. Your monkey brain is totally at sea As AIs overshadow humankind. YOU: Ask three oracles, get four opinions… Is there anything you guys agree about? YOU: …what's the hold-up? YOU: Really, nothing from any of you? KURZWEIL: Fine, I'll take the hit. There are things we agree on, but I can't name them, because whatever I say Clarke will find a way to disagree just to mess with me. Even if I say '1+1=2' he'll quibble over the axioms I'm using. Trying to identify a point ...
undefined
Jun 24, 2024 • 20min

LW - So you want to work on technical AI safety by gw

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: So you want to work on technical AI safety, published by gw on June 24, 2024 on LessWrong. I've been to two EAGx events and one EAG, and the vast majority of my one on ones with junior people end up covering some subset of these questions. I'm happy to have such conversations, but hopefully this is more efficient and wide-reaching (and more than I could fit into a 30 minute conversation). I am specifically aiming to cover advice on getting a job in empirically-leaning technical research (interp, evals, red-teaming, oversight, etc) for new or aspiring researchers without being overly specific about the field of research - I'll try to be more agnostic than something like Neel Nanda's mechinterp quickstart guide but more specific than the wealth of career advice that already exists but that applies to ~any career. This also has some overlap with this excellent list of tips from Ethan Perez but is aimed a bit earlier in the funnel. This advice is of course only from my perspective and background, which is that I did a PhD in combinatorics, worked as a software engineer at startups for a couple of years, did the AI Futures Fellowship, and now work at Timaeus as the research lead for our language model track. In particular, my experience is limited to smaller organizations, so "researcher" means some blend of research engineer and research scientist rather than strictly one or the other. Views are my own and don't represent Timaeus and so on. Requisite skills What kind of general research skills do I need? There's a lot of tacit knowledge here, so most of what I can offer is more about the research process. Items on this list aren't necessarily things you're expected to just have all of or otherwise pick up immediately, but they're much easier to describe than e.g. research taste. These items are in no particular order: Theory of change at all levels. Yes, yes, theories of change, they're great. But theories of change are most often explicitly spoken of at the highest levels: how is research agenda X going to fix all our problems? Really, it's theories of change all the way down. The experiment you're running today should have some theory of change for how you understand the project you're working on. Maybe it's really answering some question about a sub-problem that's blocking you. Your broader project should have some theory of change for your research agenda, even though it probably isn't solving it outright. If you can't trace up the stack why the thing you're doing day to day matters for your ultimate research ambitions, it's a warning flag that you're just spinning your wheels. Be ok with being stuck. From a coarse resolution, being stuck is a very common steady state to be in. This can be incredibly frustrating, especially if you feel external pressure from feeling that you're not meeting whatever expectations you think others have or if your time or money is running out (see also below, on managing burnout). Things that might help for a new researcher are to have a mentor (if you don't have access to a human, frontier LLMs are (un)surprisingly good!) that can reassure you that your rate of progress is fine and to be more fine-grained about what progress means. If your experiment failed but you learned something new, that's progress! Quickly prune bad ideas. Always look for cheap, fast ways to de-risk investing time (and compute) into ideas. If the thing you're doing is really involved, look for additional intermediates as you go that can disqualify it as a direction. Communication. If you're collaborating with others, they should have some idea of what you're doing and why you're doing it, and your results should be clearly and quickly communicated. Good communication habits are kind of talked about to death, so I won't get into them too much here. Write a lot. Wri...
undefined
Jun 24, 2024 • 8min

LW - Sci-Fi books micro-reviews by Yair Halberstadt

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Sci-Fi books micro-reviews, published by Yair Halberstadt on June 24, 2024 on LessWrong. I've recently been reading a lot of science fiction. Most won't be original to fans of the genre, but some people might be looking for suggestions, so in lieu of full blown reviews here's super brief ratings on all of them. I might keep this updated over time, if so new books will go to the top. A deepness in the sky (Verner Vinge) scifiosity: 10/10 readability: 8/10 recommended: 10/10 A deepness in the sky excels in its depiction of a spacefaring civilisation using no technologies we know to be impossible, a truly alien civilisation, and it's brilliant treatment of translation and culture. A fire upon the deep (Verner Vinge) scifiosity: 8/10 readability: 9/10 recommended: 9/10 In a fire upon the deep, Vinge allows impossible technologies and essentially goes for a slightly more fantasy theme. But his depiction of alien civilisation remains unsurpassed. Across Realtime (Verner Vinge) scifiosity: 8/10 readability: 8/10 recommended: 5/10 This collection of two books imagines a single exotic technology, and explores how it could be used, whilst building a classic thriller into the plot. It's fine enough, but just doesn't have the same depth or insight as his other works. Children of Time (Adrian Tchaikovsky) scifiosity: 7/10 readability: 5/10 recommended: 5/10 Children of Time was recommended as the sort of thing you'd like if you enjoyed a deepness in the sky. Personally I found it a bit silly - I think because Tchaikovsky had some plot points he wanted to get to and was making up justifications for them, rather than deeply thinking about the consequences of his various assumptions. The Martian (Andy Weir) scifiosity: 10/10 readability: 8/10 recommended: 9/10 This is hard sci-fi on steroids. Using only known or in development technologies, how could an astranaut survive stranded on Mars. It's an enjoyable read, and you'll learn a lot about science, but the characters sometimes feel one dimensional. Project Hail Mary (Andy Weir) scifiosity: 8/10 readability: 8/10 recommended: 7/10 This is more speculative sci-fi than the martian, but still contains plenty of hard science[1]. It focuses more on plot, but that's not really Weir's forte and the sciencey bits suffer as a result. Still enjoyable though. Seveneves (Neil Stephenson) scifiosity: 8/10 readability: 8/10 recommended: 7/10 This is really two books. The first is a hard sci-fi, how do we build things rapidly in space using current technology. The second half is... kinda wierd, but still enjoyable. Stephenson is less good at the science than Weir, but better at plot, if a bit idiosyncratic[2]. Cryptonomicon (Neil Stephenson) scifiosity: 9/10 readability: 7/10 recommended: 8/10 I was recommended this as a book that would incidentally teach you a lot about cryptography. That must have been targeted to complete newbies because I didn't learn much I didn't know already. Still it was enjoyable, if somewhat weird. The Three-Body Problem (Cixin Liu) scifiosity: 4/10 readability: 6/10 recommended: 5/10 This started off really well, but then got steadily sillier as the book progressed. I loved the depictions of decent into madness, the surrealism of the 3 body game, and the glimpses into Chinese culture as seen by Chinese. But the attempts to science-bullshit explanations at the end kind of ruined it for me. Machineries of Empire (Yoon Ha Lee) scifiosity: 4/10 readability: 8/10 recommended: 8/10 I would classify this more as science fantasy than fiction, since the calendrical mechanics seem to be made up according to whatever the plot needs, but it's a brilliantly written series I thoroughly enjoyed, if a bit difficult to follow at times. Stories of Your Life + Exhalation (Ted Chiang) scifiosity: 10/10 readability: 10/10 recommended: 10/10...
undefined
Jun 24, 2024 • 18min

LW - SAE feature geometry is outside the superposition hypothesis by jake mendel

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: SAE feature geometry is outside the superposition hypothesis, published by jake mendel on June 24, 2024 on LessWrong. Summary: Superposition-based interpretations of neural network activation spaces are incomplete. The specific locations of feature vectors contain crucial structural information beyond superposition, as seen in circular arrangements of day-of-the-week features and in the rich structures of feature UMAPs. We don't currently have good concepts for talking about this structure in feature geometry, but it is likely very important for model computation. An eventual understanding of feature geometry might look like a hodgepodge of case-specific explanations, or supplementing superposition with additional concepts, or plausibly an entirely new theory that supersedes superposition. To develop this understanding, it may be valuable to study toy models in depth and do theoretical or conceptual work in addition to studying frontier models. Epistemic status: Decently confident that the ideas here are directionally correct. I've been thinking these thoughts for a while, and recently got round to writing them up at a high level. Lots of people (including both SAE stans and SAE skeptics) have thought very similar things before and some of them have written about it in various places too. Some of my views, especially the merit of certain research approaches to tackle the problems I highlight, have been presented here without my best attempt to argue for them. What would it mean if we could fully understand an activation space through the lens of superposition? If you fully understand something, you can explain everything about it that matters to someone else in terms of concepts you (and hopefully they) understand. So we can think about how well I understand an activation space by how well I can communicate to you what the activation space is doing, and we can test if my explanation is good by seeing if you can construct a functionally equivalent activation space (which need not be completely identical of course) solely from the information I have given you. In the case of SAEs, here's what I might say: 1. The activation space contains this list of 100 million features, which I can describe concisely in words because they are monosemantic. 2. The features are embedded as vectors, and the activation vector on any input is a linear combination of the feature vectors that are related to the input. 3. As for where in the activation space each feature vector is placed, oh that doesn't really matter and any nearly orthogonal overcomplete basis will do. Or maybe if I'm being more sophisticated, I can specify the correlations between features and that's enough to pin down all the structure that matters - all the other details of the overcomplete basis are random. Every part of this explanation is in terms of things I understand precisely. My features are described in natural language, and I know what a random overcomplete basis is (although I'm on the fence about whether a large correlation matrix counts as something that I understand). The placement of each feature vector in the activation space matters Why might this description be insufficient? First, there is the pesky problem of SAE reconstruction errors, which are parts of activation vectors that are missed when we give this description. Second, not all features seem monosemantic, and it is hard to find semantic descriptions of even the most monosemantic features that have both high sensitivity and specificity, let alone descriptions which allow us to quantitatively predict the quantitative values that activating features take on a particular input. But let's suppose that these issues have been solved: SAE improvements lead to perfect reconstruction and extremely monosemantic features, and new autointerp techniques lea...
undefined
Jun 24, 2024 • 14min

LW - LLM Generality is a Timeline Crux by eggsyntax

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: LLM Generality is a Timeline Crux, published by eggsyntax on June 24, 2024 on LessWrong. Short Summary LLMs may be fundamentally incapable of fully general reasoning, and if so, short timelines are less plausible. Longer summary There is ML research suggesting that LLMs fail badly on attempts at general reasoning, such as planning problems, scheduling, and attempts to solve novel visual puzzles. This post provides a brief introduction to that research, and asks: Whether this limitation is illusory or actually exists. If it exists, whether it will be solved by scaling or is a problem fundamental to LLMs. If fundamental, whether it can be overcome by scaffolding & tooling. If this is a real and fundamental limitation that can't be fully overcome by scaffolding, we should be skeptical of arguments like Leopold Aschenbrenner's (in his recent 'Situational Awareness') that we can just 'follow straight lines on graphs' and expect AGI in the next few years. Introduction Leopold Aschenbrenner's recent 'Situational Awareness' document has gotten considerable attention in the safety & alignment community. Aschenbrenner argues that we should expect current systems to reach human-level given further scaling and 'unhobbling', and that it's 'strikingly plausible' that we'll see 'drop-in remote workers' capable of doing the work of an AI researcher or engineer by 2027. Others hold similar views. Francois Chollet and Mike Knoop's new $500,000 prize for beating the ARC benchmark has also gotten considerable recent attention in AIS[1]. Chollet holds a diametrically opposed view: that the current LLM approach is fundamentally incapable of general reasoning, and hence incapable of solving novel problems. We only imagine that LLMs can reason, Chollet argues, because they've seen such a vast wealth of problems that they can pattern-match against. But LLMs, even if scaled much further, will never be able to do the work of AI researchers. It would be quite valuable to have a thorough analysis of this question through the lens of AI safety and alignment. This post is not that[2], nor is it a review of the voluminous literature on this debate (from outside the AIS community). It attempts to briefly introduce the disagreement, some evidence on each side, and the impact on timelines. What is general reasoning? Part of what makes this issue contentious is that there's not a widely shared definition of 'general reasoning', and in fact various discussions of this use various terms. By 'general reasoning', I mean to capture two things. First, the ability to think carefully and precisely, step by step. Second, the ability to apply that sort of thinking in novel situations[3]. Terminology is inconsistent between authors on this subject; some call this 'system II thinking'; some 'reasoning'; some 'planning' (mainly for the first half of the definition); Chollet just talks about 'intelligence' (mainly for the second half). This issue is further complicated by the fact that humans aren't fully general reasoners without tool support either. For example, seven-dimensional tic-tac-toe is a simple and easily defined system, but incredibly difficult for humans to play mentally without extensive training and/or tool support. Generalizations that are in-distribution for humans seems like something that any system should be able to do; generalizations that are out-of-distribution for humans don't feel as though they ought to count. How general are LLMs? It's important to clarify that this is very much a matter of degree. Nearly everyone was surprised by the degree to which the last generation of state-of-the-art LLMs like GPT-3 generalized; for example, no one I know of predicted that LLMs trained on primarily English-language sources would be able to do translation between languages. Some in the field argued as...
undefined
Jun 24, 2024 • 21min

LW - On Claude 3.5 Sonnet by Zvi

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Claude 3.5 Sonnet, published by Zvi on June 24, 2024 on LessWrong. There is a new clear best (non-tiny) LLM. If you want to converse with an LLM, the correct answer is Claude Sonnet 3.5. It is available for free on Claude.ai and the Claude iOS app, or you can subscribe for higher rate limits. The API cost is $3 per million input tokens and $15 per million output tokens. This completes the trifecta. All of OpenAI, Google DeepMind and Anthropic have kept their biggest and more expensive model static for now, and instead focused on making something faster and cheaper that is good enough to be the main model. You would only use another model if you either (1) needed a smaller model in which case Gemini 1.5 Flash seems best, or (2) it must have open model weights. Updates to their larger and smaller models, Claude Opus 3.5 and Claude Haiku 3.5, are coming later this year. They intend to issue new models every few months. They are working on long term memory. It is not only the new and improved intelligence. Speed kills. They say it is twice as fast as Claude Opus. That matches my experience. Jesse Mu: The 1st thing I noticed about 3.5 Sonnet was its speed. Opus felt like msging a friend - answers streamed slowly enough that it felt like someone typing behind the screen. Sonnet's answers *materialize out of thin air*, far faster than you can read, at better-than-Opus quality. Low cost also kills. They also introduced a new feature called Artifacts, to allow Claude to do various things in a second window. Many are finding it highly useful. Benchmarks As always, never fully trust the benchmarks to translate to real world performance. They are still highly useful, and I have high trust in Anthropic to not be gaming them. Here is the headline chart. Epoch AI confirms that Sonnet 3.5 is ahead on GPQA. Anthropic also highlight that in an agentic coding evaluation, Claude 3.5 Sonnet solved 64% of problems versus 38% for Claude Opus, discussed later. Needle in a haystack was already very good, now it is slightly better still. There's also this, from Anthropic's Alex Albert: You can say 'the recent jumps are relatively small' or you can notice that (1) there is an upper bound at 100 rapidly approaching for this set of benchmarks, and (2) the releases are coming quickly one after another and the slope of the line is accelerating despite being close to the maximum. Human Evaluation Tests We are still waiting for the Arena ranking to come in. Based on reactions we should expect Sonnet 3.5 to take the top slot, likely by a decent margin, but we've been surprised before. We evaluated Claude 3.5 Sonnet via direct comparison to prior Claude models. We asked raters to chat with our models and evaluate them on a number of tasks, using task-specific instructions. The charts in Figure 3 show the "win rate" when compared to a baseline of Claude 3 Opus. We saw large improvements in core capabilities like coding, documents, creative writing, and vision. Domain experts preferred Claude 3.5 Sonnet over Claude 3 Opus, with win rates as high as 82% in Law, 73% in Finance, and 73% in Philosophy. Those were the high water marks, and Arena preferences tend to be less dramatic than that due to the nature of the questions and also those doing the rating. We are likely looking at more like a 60% win rate, which is still good enough for the top slot. The Vision Thing Here are the scores for vision. Claude has an additional modification on it: It is fully face blind by instruction. Chypnotoad: Claude's extra system prompt for vision: Claude always responds as if it is completely face blind. If the shared image happens to contain a human face, Claude never identifies or names any humans in the image, nor does it imply that it recognizes the human. It also does not mention or allude to details about a pers...
undefined
Jun 23, 2024 • 13min

LW - Applying Force to the Wrong End of a Causal Chain by silentbob

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Applying Force to the Wrong End of a Causal Chain, published by silentbob on June 23, 2024 on LessWrong. There's a very common thing that humans do: a person makes an observation about something they dislike, so they go ahead and make an effort to change that thing. Sometimes it works, and sometimes it doesn't. If it doesn't work, there can be a variety of reasons for that - maybe the thing is very difficult to change, maybe the person lacks the specific skills to change the thing, maybe it depends on the behavior of other people and the person is not successful in convincing them to act differently. But there's also one failure mode which, while overlapping with the previous ones, is worthy to highlight: imagine the thing the person dislikes is the outcome of a reasonably complex process. The person observes primarily this outcome, but is partially or fully ignorant of the underlying process that causes the outcome. And they now desperately want the outcome to be different. In such a situation they are practically doomed to fail - in all likelihood, their attempts to change the outcome will not be successful, and even if they are, the underlying cause is still present and will keep pushing in the direction of the undesired outcome. Three Examples Productivity in a Company A software company I worked for once struggled with a slow development cycle, chronic issues with unmet deadlines, and generally shipping things too slowly. The leadership's primary way of addressing this was to repeatedly tell the workforce to "work faster, be more productive, ship things more quickly". In principle, this approach can work, and to some degree it probably did speed things up. It just requires that the people you're pushing have enough agency, willingness and understanding to take it a step further and take the trip down the causal chain, to figure out what actually needs to happen in order to achieve the desired outcome. But if middle management just forwards the demand to "ship things more quickly" as is, and the employees below them don't have enough ownership to transform that demand into something more useful, then probably nothing good will happen. The changed incentives might cause workers to burn themselves out, to cut corners that really shouldn't be cut, to neglect safety or test coverage, to set lower standards for documentation or code quality - aspects that are important for stable long term success, but take time to get right. To name one very concrete example of the suboptimal consequences this had: The company had sent me a new laptop to replace my old one, which would speed up my productivity quite a bit. But it would have taken a full work day or two to set the new laptop up. The "we need to be faster" situation caused me to constantly have more pressing things to work on, meaning the new, faster laptop sat at the side of my desk, unused, for half a year. Needless to say, on top of all that, this time was also highly stressful for me and played a big role in me ultimately leaving the company. Software development, particularly when multiple interdependent teams are involved, is a complex process. The "just ship things more quickly" view however seems to naively suggest that the problem is simply that workers take too long pressing the "ship" button. What would have been a better approach? It's of course easy to armchair-philosophize my way to a supposedly better solution now. And it's also a bit of a cop-out to make the meta comment that "you need to understand the underlying causal web that causes the company's low velocity". However, in cases like this one, I think one simple improvement is to make an effort for nuanced communication, making clear that it's not (necessarily) about just "working faster", but rather asking everyone to keep their eyes open for cause...
undefined
Jun 23, 2024 • 5min

LW - Enriched tab is now the default LW Frontpage experience for logged-in users by Ruby

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Enriched tab is now the default LW Frontpage experience for logged-in users, published by Ruby on June 23, 2024 on LessWrong. In the past few months, the LessWrong team has been making use of the latest AI tools (given that they unfortunately exist[1]) for art, music, and deciding what we should all be reading. Our experiments with the latter, i.e. the algorithm that chooses which posts to show on the frontpage, has produced results sufficiently good that at least for now, we're making Enriched the default for logged-in users[2]. If you're logged in and you've never switched tabs before, you'll now be on the Enriched tab. (If you don't have an account, making one takes 10 seconds.) To recap, here are the currently available tabs (subject to change): Latest: 100% post from the Latest algorithm (using karma and post age to sort[3]) Enriched (new default): 50% posts from the Latest algorithm, 50% posts from the recommendations engine Recommended: 100% posts from the recommendations engine, choosing posts specifically for you based on your history Subscribed: a feed of posts and comments from users you have explicitly followed Bookmarks: this tab appears if you have bookmarked any posts Note that posts which are the result of the recommendation engine have a sparkle icon after the title (on desktop, space permitting): Posts from the last 48 hours have their age bolded: Why make Enriched the default? To quote from my earlier post about frontpage recommendation experiments: A core value of LessWrong is to be timeless and not news-driven. However, the central algorithm by which attention allocation happens on the site is the Hacker News algorithm[2], which basically only shows you things that were posted recently, and creates a strong incentive for discussion to always be centered around the latest content. This seems very sad to me. When a new user shows up on LessWrong, it seems extremely unlikely that the most important posts for them to read were all written within the last week or two. I do really like the simplicity and predictability of the Hacker News algorithm. More karma means more visibility, older means less visibility. Very simple. When I vote, I basically know the full effect this has on what is shown to other users or to myself. But I think the cost of that simplicity has become too high, especially as older content makes up a larger and larger fraction of the best content on the site, and people have been becoming ever more specialized in the research and articles they publish on the site. We found that a hybrid posts list of 50% Latest and 50% Recommended lets us get the benefits of each algorithm[4]. The Latest component of the list allows people to stay up to date with the most recent new content, provides predictable visibility for new posts, and is approximately universal in that everyone sees those posts which makes posts a bit more common-knowledge-y. The Recommended component of the list allows us to present content that's predicted to be most interesting/valuable to a user from across thousands of posts from the last 10+ years, not being limited to just recent stuff. Shifting the age of posts When we first implemented recommendations, they were very recency biased. My guess is that's because the data we were feeding it was of people reading and voting on recent posts, so it knew those were the ones we liked. In a manner less elegant than I would have prefered, we constrained the algorithm to mostly serving content 30 or 365 days older. You can see the evolution of the recommendation engine, on the age dimension, here: I give more detailed thoughts about what we found in the course of developing our recommendation algorithm in this comment below. Feedback, please Although we're making Enriched the general default, this feature direction is still expe...
undefined
Jun 23, 2024 • 2min

LW - Bed Time Quests and Dinner Games for 3-5 year olds by Gunnar Zarncke

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Bed Time Quests & Dinner Games for 3-5 year olds, published by Gunnar Zarncke on June 23, 2024 on LessWrong. I like these games because they are playful, engage the child and still achieve the objective of getting the child to bed/eat dinner etc. Requires creativity and some slack. Excerpt from Shohannah's post: Recently I had the bright idea to give up on being a regular parent. Mostly cause regular parenting practices melt my brain. But then I wondered … does it have to be boring? [...] But no. It's all culture and it's all recent culture and you can decide to do Something Else Instead. Really. So as someone who craves mental stimulation above the pay grade of the 3 to 5 revolutions around the sun my daughters have managed so far … I figured I'd just make up New Rules. All the time. So far we've been going for two weeks and the main areas are bedtime routines for my eldest (5) and dinner games for all of us (5, 3, and myself). I noticed I seem to have an easy time generating new and odd rule sets every day, and then started wondering if maybe more parents would enjoy this type of variety in their childcare routines and would want to tap into some of the ideas I've been coming up with. So in case that's you, here is what I've found so far! [...] Magic Time Kiddo is the parent. You are the kiddo. Except, the kiddo is still bringing themselves to bed and not you. They get to tell you what to do and take care of you. You will have to listen. I completely recommend performing a lot of obstructive behavior and misunderstanding basic instructions. This was one of the most popular games and may show some insight into how your child would prefer to be parented, or feels about your parenting. and fourteen games/rulesets more. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app