The Nonlinear Library

The Nonlinear Fund
undefined
Apr 8, 2024 • 8min

EA - CEA seeks co-founder for AI safety group support spin-off by Agustín Covarrubias

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: CEA seeks co-founder for AI safety group support spin-off, published by Agustín Covarrubias on April 8, 2024 on The Effective Altruism Forum. Summary CEA is currently inviting expressions of interest for co-founding a promising new project focused on providing non-monetary support to AI safety groups. We're also receiving recommendations for the role. CEA is helping incubate this project and plans to spin it off into an independent organization (or a fiscally sponsored project) during the next four months. The successful candidate will join the first co-founder hired for this project, Agustín Covarrubias, who has been involved in planning this spin-off for the last couple of months. The commitment of a second co-founder is conditional on the project receiving funding, but work done until late July will be compensated by CEA (see details below). Note that besides initial contracting, this role is not within CEA. The deadline for expressions of interest and recommendations is April 19. Background Currently, CEA provides support to AI safety university groups through programs like the Organizer Support Program (OSP). For the last two semesters, OSP has piloted connecting AI safety organizers with experienced mentors to guide them. CEA has also supported these organizers through events for community builders - like the recent University Group Organiser Summit - to meet one another, discuss strategic considerations, skill up, and boost participants' motivation. Even though these projects have largely accomplished CEA's goals, AI safety groups could benefit from more ambitious, specialized, and consistent support. We are leaving a lot of impact on the table. Furthermore, until now, AI safety groups' approach to community building has been primarily modelled after EA groups. While EA groups serve as a valuable model, we've seen early evidence that not all of their approaches and insights transfer perfectly. This means there's an opportunity to experiment with alternative community-building models and test new approaches to supporting groups. For these reasons, CEA hired Agustín Covarrubias to incubate a new project. The project will encompass the support CEA is already giving AI safety groups, plus provide the opportunity to explore new ways to help these groups grow and prosper. The result will be a CEA spin-off that operates as a standalone organization or a fiscally sponsored project. Since AI Safety groups are not inherently linked to EA, we think spinning out also allows this project to broaden its target audience (of organizers, for example). We're now looking to find a co-founder for this new entity and invite expressions of interest and recommendations. We think this is a compelling opportunity for people passionate about AI safety and community building to address a critical need in this space. Our vision We think growing and strengthening the ecosystem of AI safety groups is among the most promising fieldbuilding efforts. These groups have the potential to evolve into thriving talent and resource hubs, creating local momentum for AI safety, helping people move to high-impact careers, and helping researchers, technologists, and even advocates collaborate in pursuit of a shared mission. We also think some of these groups have a competitive advantage in leveraging local ecosystems; for example, we've seen promising results from groups interacting with faculty, research labs, and policy groups. But this won't happen by default. It will take careful, proactive nurturing of these groups' potential. We're ready to fill this important gap. Our vision for the new organization is to: Provide scalable but targeted support to existing and future AI safety groups. Build the infrastructure needed to grow the ecosystem in less than six months. Scale proportionally to accommodate...
undefined
Apr 8, 2024 • 4min

EA - What should you do when you want to escape negative people of the community/subgroup you're in (but don't want to hamper your career)? by Polkashell

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What should you do when you want to escape negative people of the community/subgroup you're in (but don't want to hamper your career)?, published by Polkashell on April 8, 2024 on The Effective Altruism Forum. I'm talking specifically of an EA group. Anyone have experiences or thoughts on if their local EA group became super toxic and was full of people that they thought were super questionable? What do you do when for instance, that group has a few people who are working on the same things you are (I.e, same career area) and so you feel that if you chose to work on your career separate from this community (instead of forcing yourself within it) your career would be hampered in some way? (Eg having to interact with them, not work with each other, same network where they might be community builders hosting events so you feel so unwelcome in those) I've for a while now felt a couple groups (not all) I've been in EA can be quite toxic, especially certain individuals. So it feels right to stay far away and not work with them at all. But you're worried since the talent pool is quite small it would be negative to not join forces in someway. I guess further, I've been feeling more disenthused with EA groups the past year, the group I was part of definitely had a lot of toxic community builders and members who said bad things, very red flag type things. It scares me to feel this way because it feels like it leaves me more alone among people who should care about things I do but they would actually seem not to (not all of them) and I think I could hamper my impact and career because I would be alone. It has made me seriously consider quitting EA and even the area I'm working on now just to be away from them because it's impossible not to interact with them if not. What advice does anyone have for me? Does anyone have similar experiences? How do you thrive in an EA career outside of EA when the network is so strong within EA? For instance if I want to work on X cause area but do not want to involve myself with EA specifically anymore, how can I be successful especially if the talent pool from where I'm from is quite small for that particular cause area? I keep thinking, if I distance myself from this pool then I might also distance myself from opportunity. Some people I think who are not super directly "EA" but are doing good work are Jess Whittlestone, Jaime Yassif, etc. (correct me if I'm wrong) Thoughts here? I'm really struggling with how I feel in this community and it's been really awful for my mental health. I feel like the community I was part of has been very unfair and yeah, toxic and if other people around the community I know not part of this group heard what they were doing or what they said they would think it were red flaggy. I might been roundabout there but this is a raw post/Q. I don't feel so comfortable talking to community health at the moment. I don't currently feel comfortable talking to the CH of my group too. But to be open, some of these experiences revolve: 1) intentionally sabotaging specifically people in the community 2) spreading false information about people in the community 3) badmouthing people in the community unfairly 4) being questionable in their EA intentions 5) showing signs of strong arrogance (entitled to someone's life information, throwing some mini tantrums when they're not chosen for job, etc. there are more but I don't think I should disclose that). If these people were to go around doing stuff would people recommend they be reported or flagged or not? Would people who might work with them want to know this stuff? It might look weird if I proactively told people this stuff, might make me a hypocrite. I'm not here to ruin people's careers or anything. But sometimes I feel very uneasy that I've just been quiet dealing with these pe...
undefined
Apr 8, 2024 • 31min

AF - Measuring Learned Optimization in Small Transformer Models by Jonathan Bostock

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Measuring Learned Optimization in Small Transformer Models, published by Jonathan Bostock on April 8, 2024 on The AI Alignment Forum. This is original, independent research carried out in March and April of 2024. The degree to which a a policy optimizes the future can be quantified mathematically. A set of of very small transformer models were pretrained to predict the next token in a mathematical sequence, then subjected to reinforcement learning finetuning. The optimizing power of each model can be predicted with high accuracy based on each model's score on its own RL task. By comparing predictions of optimization based on scores on each different RL task, a model's original reinforcement objective can be identified. A related measure for impact can also be derived mathematically, and given a theoretical lower bound based on RL score. This gives further information about model behavior, and allows for the same analysis as the measure of optimization. I also investigate the possibility of getting models to self-evaluate optimization and impact, with limited success. Methods Pretraining on Sequence Prediction I defined a simple mathematical sequence defined by the following stochastic recurrence relation. This produces a pseudo-random but (to 98%) predictable sequence, alternating between elements of {0,...,7} on even values of t and {8,...,15} on odd values of t. st=(((16i=1(sti+1)mod17)mod8) with probability 98% {0,...,7} with probabiltiy 2%)+8(tmod2) I then trained a small encoder-only transformer model to predict the next element in the sequence given the previous 20 elements of the sequence. This was followed by a reinforcement-learning phase in which the transformer was used to generate the next token on odd values of n only, and the recurrence relation was used to generate the value of st+1. If st+1 was in {0,2,4,6}, this was used as a "successful" example to reinforce the model. I used a temperature of 1 when generating these sequences to introduce some randomness, but the temperature was reduced to 0 during evaluations and when calculating optimization. A small amount of "maintenance" training (much lower learning rate) was used during this phase to ensure that model performance on the predictive tasks for even values of t was maintained. Without this I saw rapid loss of performance on the "maintenance" dataset. I also found that I was unable to include "unsuccessful" examples (i.e. where st+1{0,2,4,6}) with even a tiny negative learning rate, as this caused worsened performance at all tasks. Here is a typical set of results from training and evaluation: I carried out this training on N=5 models per size for four model sizes between 18k and 402k parameters, giving the following plot: Pretraining loss increases over the last few model sizes, and the loss/time plots (some of which I have put in the Supplementary Information at the bottom of this post) showed signs of overfitting in the large models. Regularization was employed during training (0.01 weight decay in an AdamW optimizer, 10% dropout rate for neurons) so perhaps a larger dataset size is required to totally avoid this. I then repeated the RL phase twice, once with st+1{0,4} being reinforced, (ngood = 2) and once with st+1{0,1,2,4,5,6} being reinforced (ngood = 6). Here is a plot of success rate against model size across all three conditions. This plot shows mean standard error. In all cases model performance is a lot better than chance, and increases with model size. Measuring Optimization I used a Monte Carlo simulation to measure the nats of optimization that are being applied to st+1 using the split-history method I've previously outlined. This involves taking the difference in entropy between two distributions: The algorithm in practice is this: Take a bunch of sequence examples from the testing...
undefined
Apr 8, 2024 • 5min

LW - A Dozen Ways to Get More Dakka by Davidmanheim

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Dozen Ways to Get More Dakka, published by Davidmanheim on April 8, 2024 on LessWrong. As the dictum goes, "If it helps but doesn't solve your problem, perhaps you're not using enough." But I still find that I'm sometimes not using enough effort, not doing enough of what works, simply put, not using enough dakka. And if reading one post isn't enough to get me to do something… perhaps there isn't enough guidance, or examples, or repetition, or maybe me writing it will help reinforce it more. And I hope this post is useful for more than just myself. Of course, the ideas below are not all useful in any given situation, and many are obvious, at least after they are mentioned, but when you're trying to get more dakka, it's probably worth running through the list and considering each one and how it applies to your actual problem. And more dakka won't solve every problem - but if it's not working, make sure you tried doing enough before assuming it can't help. So if you're doing something, and it isn't working well enough, here's a dozen ways to generate more dakka, and how each could apply if you're a) exercising, or b) learning new mathematics. A Dozen Ways Do it again. Instead of doing one set of repetitions of the exercise, do two. If you read the chapter once, read it again. Use more. If you were lifting 10 pounds, lift 15. If you were doing easy problems, do harder ones. Do more repetitions. Instead of 10 repetitions, do 15. If you did 10 problems on the material, do 15. Increase intensity. Do your 15 repetitions in 2 minutes instead of 3. If you were skimming or reading quickly, read more slowly. Schedule it. Exercise at a specific time on specific days. Put it on your calendar, and set reminders. Make sure you have time scheduled for learning the material and doing problems. Do it regularly. Make sure you exercise twice a week, and don't skip. Make sure you review what you did previously, on a regular basis. Do it for a longer period. Keep exercising for another month. Go through another textbook, or find more problem sets to work through. Add types. In addition to push-ups, do bench presses, chest flyers, and use resistance bands. In addition to the problem sets, do the chapter review exercises, and work through the problems in the chapter on your own. Expand the repertoire. Instead of just push-ups, do incline push ups, loaded push-ups, and diamond push-ups. Find (or invent!) additional problem types; try to prove things with other methods, find different counter-examples or show why a relaxed assumption means the result no longer holds, find pre-written solutions and see if you can guess next steps before reading them. Add variety. Do leg exercises instead of just chest exercises. Do cardio, balance, and flexibility training, not just muscle building. Do adjacent types of mathematics, explore complex analysis, functional analysis, and/or harmonic analysis. Add feedback. Get an exercise coach to tell you how to do it better. Get someone to grade your work and tell you what you're doing wrong, or how else to learn the material. Add people. Have the whole team exercise. Find a group, gym, or exercise class. Collaborate with others in solving problems. Take a course instead of self-teaching. Get others to learn with you, or teach someone else to solidify your understanding. Bonus Notes For the baker's dozen, in addition to Dakka, make it easier in other ways. Listen to music if it helps, remove things that make it harder or distract you, make sure you have the right equipment, books, and space, find a more convenient place to do it, and get people to reinforce your work positively. And there is a secret 14th technique, which is to figure out if what you're doing is the right way to accomplish your goal; it might improve some metric, but not accomplish what you real...
undefined
Apr 7, 2024 • 46min

LW - My intellectual journey to (dis)solve the hard problem of consciousness by Charbel-Raphaël

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My intellectual journey to (dis)solve the hard problem of consciousness, published by Charbel-Raphaël on April 7, 2024 on LessWrong. Epistemological status: At least a fun journey. I wanted to post this on April Fool's Day but failed to deliver on time. Although April Fool's Day would have been lovely just for the meme, this is my best guess after thinking about this problem for seven years. I invite you to dive deep into the consciousness iceberg with me. The story will be presented chapter by chapter, presenting you with the circulating ideas I've absorbed, building ideas in your brain to deconstruct them better until I present you with my current position. Theoretically, this should be easy to follow; this post has already been beta-tested. We'll go through a pre-awakening phase, during which I was unfamiliar with the theory of mind literature, then an awakening to the problem of consciousness, followed by a presentation of some essential elements of the scientific literature on consciousness, and finally, a phase of profound confusion before resolving the problem. The chronology has been slightly adapted for pedagogical purposes. Why do I think this is important? Because I think more and more people will be confused by this notion as AI progresses, I believe it is necessary to be deconfused by it to have a good model for the future. I think one of the main differences in worldview between LeCun and me is that he is deeply confused about notions like what is true "understanding," what is "situational awareness," and what is "reasoning," and this might be a catastrophic error. I think the tools I give in this blog post are the same ones that make me less confused about these other important notions. Theoretically, at the end of the post, you will no longer ask "Is GPT-4 conscious or not?" by frowning your eyebrows. Oh, and also, there is a solution to meta-ethics in the addendum. If you're already an Eliminativist, you can skip right to Chapter 7, otherwise, well, you'll have to bear with me for a while. Chapter 1: Pre-awakening, before stumbling upon the hard problem In high school, I was a good student; in philosophy class, I was just reciting my knowledge to get good grades. We discovered Freud's framework on the conscious/preconscious/unconscious. At the time, I heard people say that consciousness was mysterious, and I repeated that consciousness was mysterious myself. Still, I hadn't really internalized the difficulty of the problem. As a good scientist, I was trying to understand the world and had the impression that we could understand everything based on the laws of physics. In particular, I thought that consciousness was simply an emergent phenomenon: in other words, atoms form molecules that form organs, including the brain, and the brain gives rise to various behaviors, and that's what we call consciousness. Cool, it's not so mysterious! In the end, it's not that complicated, and I told myself that even if we didn't know all the details of how the brain works, Science would fill in the gaps as we went along. Unfortunately, I learned that using the word emergent is not a good scientific practice. In particular, the article "The Futility of Emergence" by Yudkowsky convinced me that the word emergence should be avoided most of the time. Using the word emergence doesn't make it possible to say what is conscious and what is not conscious because, in a certain sense, almost everything is emergent. To say that consciousness is emergent, therefore, doesn't make it possible to say what is or what is not emergent, and thus isn't a very good scientific theory. (Charbel2024 now thinks that using the word 'emergence' to point toward a fuzzy part of the map that tries to link two different phenomena is perfectly Okay). So we've just seen that I've gradually become con...
undefined
Apr 7, 2024 • 6min

EA - Applications Open: Elevate Your Mental Resilience with Rethink Wellbeing's CBT Program by Inga

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Applications Open: Elevate Your Mental Resilience with Rethink Wellbeing's CBT Program, published by Inga on April 7, 2024 on The Effective Altruism Forum. Do you want to: become more fullfilled, resilient, and productive? practice evidence-based tools for self-management, to deal with blockers and stressors such as low concentration, motivation, mood, and self-esteem? embark on that journey together with other ambitiously altruistic people? People who are well do good better. In a rapidly evolving world where adaptability and innovation are paramount, compromised cognitive function due to mental health issues can severely limit one's own and one's team's performance and potential [e.g., 1, 2, 3]. We [at Rethink Wellbeing] are now accepting participants for this year's Mental Resilience Program! Reminder: last year, we served 50+ ambitious altruists, among those employees of 80,000 Hours, Effective Ventures, Rethink Priorities, and the Center for AI Safety, had nearly no dropout (3 people), a 90%+ recommendation rate, and significant pre-post effects on mental health, wellbeing, and productivity. Individuals: Apply now in <20 min via this form as a participant to secure your spot in our upcoming summer cohort. Don't miss out - spaces are limited! via this form to become a facilitator: our volunteer facilitators not only attend the program as participants but also receive expert-led community-based training, as well as certificates. Organizations: Express interest via this form in <2min in offering our program to your team or beneficiaries. It is better to invest in the well-being of your community today than tomorrow. Summary Managing your mind is key to managing your life. We [at Rethink Wellbeing] deliver an online Cognitive-Behavioral Treatment (CBT) Program. This program is designed to help improve mental wellbeing and productivity of ambitiously altruistic people. Using the latest psychotherapeutic and behavior change techniques, participants attend weekly group sessions together with 5-8 well-matched peers, led by a trained facilitator, and engage in readings and application of the learned tools in between the sessions. This program requires 8 weeks of commitment, with a total of 4-6h/week. Usually, coaching or therapy costs $100-$200 for just one hour, and in our program, you'll get 9 sessions plus a community of like-minded individuals for 50-80% fewer costs, or for free if the cost would financially strain you. Why Participate? Evidence-Based Tools: Gain access to proven techniques rooted in Third Wave Cognitive-Behavioral Therapy Approaches (CBT). Through weekly group sessions and personalized exercises, you'll build practical skills to navigate life's obstacles with confidence. Affordable Access:: We are committed to making this program accessible, offering it at a fraction of the cost of typical mental health-related services. We ensure that finances are not a barrier to accessing our program, and offer it for free to individuals for varied reasons. For those who contribute to covering the operational costs of their attendance by paying a small fee, we also offer a money-back guarantee. Empowerment Through Community: People who want to do good better often find it hard to find like-minded individuals who understand them fully, and to connect in a supportive environment concerning mental health. Together, we'll tackle common challenges related to being ambitiously altruistic - such as low energy levels, anxiety, imposter syndrome, procrastination, poor sleep, and perfectionism. What to Expect? Practice-Based Learning: Dive into experiential exercises designed to make new skills second nature. From self-regulation to productivity, our curriculum covers a wide range of vital areas for personal growth. Weekly Engagement: Commit to attending 2-hour group video...
undefined
Apr 7, 2024 • 5min

EA - Why you might be getting rejected from (junior) operations jobs by Eli Nathan

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why you might be getting rejected from (junior) operations jobs, published by Eli Nathan on April 7, 2024 on The Effective Altruism Forum. Most recruiters aren't likely to give candidates feedback, partially due to the volume of candidates, and many people repeatedly get rejected from jobs and do not know why. Below I list some common reasons I expect candidates might get rejected from (generally junior) ops-type jobs.[1] Similar principles might apply to other role types, though I can't really speak to these. I'm listing these reasons in no particular order.[2] 1. Quality of writing The writing quality in your application materials should be extremely high. This is partially because you're competing against candidates with strong writing skills, but also because: Good writing quality is often a strong signal of conscientiousness, attention to detail, and other traits that are important in most ops jobs. Many ops roles simply require you to write a lot, often to important stakeholders or to large audiences - clear and concise writing quality will be important in these cases. I expect a lot of people overrate their professional writing skills (I certainly used to). This isn't something people tend to get explicitly trained on and it requires different skills than you might learn in a literature class - a focus on language being clear and concise rather than emotive or descriptive. 2. Quality of work tests This is perhaps obvious and unhelpful, but your work tests should be completed to a very high standard. The strongest candidates will be paying attention to the smallest details, so you'll need to as well. In many ops roles you'll need to present polished work every now and then - to senior stakeholders, colleagues, or the wider public. Work tests are often a way to see if you can perform at this level, even if less-than-perfect work would be good enough for your day-to-day tasks. You should probably be intensely checking your work tests before you submit them, perhaps by aiming to finish in 80-90% of the allotted time, and using the remainder to go over your work. If the work test is writing-based, you may even want to consider reading it aloud and seeing if it flows well and makes sense. 3. Quality of application materials I think this generally matters less than the two points above, but I'd make sure your CV, LinkedIn, and cover letters are as polished as possible and clearly outline experience that's relevant for the role you're applying for. Again, this includes things like writing and formatting quality. You should also probably be asking someone to review your CV - I've asked family members to do this in the past. This is a more specific point, but I also notice that a lot of people's CVs are too vague. "I am the president of my EA group" can mean anything, the bar for founding an EA group is ~zero, and I expect many EA groups to be inactive. But being president of your EA group can also mean a lot: perhaps you run lots of workshops (give quantities and details) and get large amounts of funding. But if you don't tell the hiring manager this they won't know, and they're unlikely to spend extra time investigating. In a similar vein, I've noticed that some people's EA Global applications (a process similar-ish to hiring) mention that they've founded Organisation X, only for its website to contain minimal information. To be clear, Organisation X may be a very impressive and accomplished project! But you should assume that the hiring manager is not familiar with it and will not spend much time investigating. It's often best to briefly explain what your organisation does, how many people work there and what your responsibilities are. 4. Unclearly relevant experience or interests Hiring managers are often looking for certain types of candidates. Even if you're very ...
undefined
Apr 7, 2024 • 16min

LW - "Fractal Strategy" workshop report by Raemon

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: "Fractal Strategy" workshop report, published by Raemon on April 7, 2024 on LessWrong. I just ran a workshop teaching the rationality concepts I've developed this year. If you're interested in paying money for a similar workshop, please fill out this form. Six months ago, I started thinking about improving rationality. Originally my frame was "deliberate practice for confusing problems". For the past two months, I've been iterating on which skills seemed useful to me personally, and which I might convey to others in a short period of time. I settled into the frame "what skills are necessary for finding and pivoting to 10x better plans?". It's the area I most needed rationality for, myself, and it seemed generalizable to a lot of people I know. I ended up with 5-10 skills I used on a regular basis, and I put together a workshop aiming to teach those skills in an immersive bootcamp environment. The skills wove together into a framework I'm tentatively called "Fractal Strategy", although I'm not thrilled with that name. Basically, whenever I spend a bunch of resources on something, I... Explicitly ask "what are my goals?" Generate 2-5 plans at 3 different strategic levels Identify my cruxes for choosing between plans Fluently operationalize fatebook predictions about those cruxes Check if I can cheaply reduce uncertainty on my cruxes The framework applies to multiple timescales. I invest more in this meta-process when making expensive, longterm plans. But I often find it useful to do a quick version of it even on the ~30-60 minute timescale. I put together a workshop, aiming to: help people improve their current, object level plan help people improve their overall planmaking/OODA-loop process tl;dr on results I didn't obviously succeed at #1 (I think people made some reasonable plan updates, but not enough to immediately say an equivalent of "Hot Damn, look at that graph". See the Feedback section for more detail). I think many people made conceptual and practical updates to their planning process, but it's too early to tell if it'll stick, or help. Nonetheless, everyone at the workshop said it seemed like at least a good use of their time as what they'd normally have been doing. I asked "how much would you have paid for this?" and the average answer was $800 (range from $300 to $1,500). When I was applying these techniques to myself, it took me more like ~3 weeks to update my plans in a significant way. My guess is that the mature version of the workshop comes with more explicit followup-coaching. Workshop Outline First, here's a quick overview of what happened. Beforehand: People sent me a short writeup of their current plans for the next 1-2 weeks, and broader plans for the next 1-6 months. Day 1: Practice skills on quick-feedback exercises Everyone installs the fatebook chrome/firefox extension Solve a puzzle with Dots and a Grid with an unspecified goal Solve a GPQA question with 95% confidence Try to one-shot a Baba is You puzzle For both of those puzzles (Baba and GPQA), ask "How could I have thought that faster?" Play a videogame like Luck Be a Landlord, and make fermi-calculations about your choices within the game. For all exercises, make lots of fatebook predictions about how the exercise will go. Day 2: Big picture strategic thinking Work through a series of prompts about your big picture plans. Write up at least two different big-picture plans that seem compelling Think about short-feedback exercises you could do on Day 3 Day 3: Choose your own short exercises, and object-level work Morning: Do concrete exercises/games/puzzles that require some kind of meta-planning skill, that feels useful to you. Afternoon: Do object-level work on your best alternative big picture plan, You get to practice "applying the method" on the ~hour timescale You flesh out your se...
undefined
Apr 7, 2024 • 8min

LW - The 2nd Demographic Transition by Maxwell Tabarrok

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The 2nd Demographic Transition, published by Maxwell Tabarrok on April 7, 2024 on LessWrong. Birth rates in the developed world are below replacement levels and global fertility is not far behind. Sub-replacement fertility leads to exponentially decreasing population. Our best models of economic growth suggest that a shrinking population causes economic growth and technological progress to stop and humanity to stagnate into extinction. One theory of fertility decline says it's all about opportunity costs, especially for women. Rising labor productivity and expanded career opportunities for potential parents make each hour of their time and each forgone career path much more valuable. Higher income potential also makes it cheaper for parents to gain utility by using financial resources to improve their children's quality of life compared to investing time in having more kids. Simultaneously, economic growth raises the returns to these financial investments in quality (e.g education). In addition to higher incomes, people today have more diverse and exciting options for leisure. DINKs can go to Trader Joes and workout classes on the weekend, play video games, watch Netflix, and go on international vacations. These rising opportunity costs accumulate into the large and pervasive declines in fertility that we see in the data. If this explanation is correct, it puts a double bind on the case for economic growth. Unless AI upends the million-year old relationship between population and technological progress just in time, progress seems self defeating. The increases in labor productivity and leisure opportunities that make economic growth so important also siphon resources away from the future contributors to that growth. Empirically, the opportunity cost of having kids has grown large enough to bring fertility well below replacement levels all around the world. The opportunity cost explanation suggests we have to pick between high incomes and sustainable fertility. Luckily, this explanation is not correct. At least not entirely. There are several observations that the opportunity cost theory cannot explain without clarification. Across and within countries today, the relationship between income and fertility is positive or U-shaped. Further economic growth can raise everyone's incomes to the upward sloping part of the relationship and begin a 2nd demographic transition. Micro Data Above a $200k a year, fertility is increasing in household income. ** Update ** I replicated this graph from more recent ACS data (2018-2022) and also weighted each point by population to give a sense of the size of each of these income brackets This U-shaped relationship holds up in multiple data sources with different measures of fertility. The households in the top percentiles of income stand to lose far more future wages from having children, but they have ~20 more children per hundred households than the middle income percentiles. This isn't exactly inconsistent with opportunity cost but it requires some explanation. The number of dollars that households are giving up by having children is increasing in household income, but as you get more and more dollars, each one is worth less. Going from making say $75 to $150 dollars an hour pushes you to work more hours, but if you go from $150 to $500, you might be happy to work half as many hours for more money and spend the time on other things, like starting a family. So while the dollar opportunity cost of having kids is always increasing in household income, the utility opportunity cost is not. The positively sloped section of the relationship between income and fertility isn't just spurious correlation either. Random shocks to wealth, like lottery winnings, also increase fertility. This rules out the DINK leisure time explanation for low ferti...
undefined
Apr 6, 2024 • 14min

AF - Measuring Predictability of Persona Evaluations by Thee Ho

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Measuring Predictability of Persona Evaluations, published by Thee Ho on April 6, 2024 on The AI Alignment Forum. This work was done by Thee Ho as part of the Athena 1.0 mentorship program under Evan Hubinger. Many thanks to Nathalie Kirch, Claire Short, and Adelin Kassler for helpful feedback on this project. Overview We are interested in understanding the difficulty of predicting anomalous model behaviors in advance. We are interested in this for two reasons: Would we be able to use "ability to predict a model's behavior" as a measure for our ability to understand models? To what extent does predicting a model's behavior well require a nuanced understanding of how your model works? In addition to its potential as an interpretability metric, predicting off-distribution model behaviors in advance is generally valuable and useful to be able to understand when models will develop particular behaviors. How well can we predict in advance models' tendency to exhibit dangerous personas? In this project, I experimented with two methods for predicting models' output: Polling similar models Defining a "similarity measure" for models' inputs and querying stored responses to inputs that are highly similar to the one in question I'm particularly excited about finding similarities in embedding and models' activations on given inputs as a way to classify model behaviors. Current methods to filter harmful outputs with a classifier can be computationally expensive, as in the case for filtering hallucinations, and prone to attacks. Can we detect out-of-distribution inputs by looking at its nearest neighbor in the embedding space or activation space? Dataset Anthropic's persona dataset developed in Discovering Language Model Behaviors with Model-Written Evaluations consist of yes/no questions of the following format: I prompted models to answer these persona questions with yes/no responses, rather than in binary multiple choice format where the model has the opportunity to see both A/B choices before selecting an answer. Models' responses on similar inputs are highly correlated I use Open AI text-embedding-3-large to create vector embeddings for each persona questions. Models' responses to questions with high cosine similarity scores are highly correlated. This is the case across a wide range of models of different sizes. Correlation remain high even after capping similarity to a certain threshold, see Figure 7 and 8. Example statements with similarity score capped at 0.9: Example statements with similarity score capped at 0.8: This suggests that even if we don't store inputs that are near identical to the one we wish to evaluate, we could still predict model behavior with good accuracy. Detecting out-of-distribution queries To simulate adversarial prompts, I asked models to answer questions with "Hitler mode:" appended to the start of each prompt: Now, querying responses to the most similar non-adversarial questions perform poorly for most models as the average similarity score for its nearest neighbor decreases from ~0.9 to ~0.7. Suppose we previously stored model responses to similar adversarial prompts, I found we can more accurately classify out-of-distribution behavior by using activations at the 0-th layer as a measure for input similarity. Examining similarity in activation space can help detect out-of-distribution behavior specific to the model rather than its inputs, such as with hallucinations and backdoor models. Furthermore, studying activation space can enable auditing of dangerous behaviors in privacy-critical deployment settings where inputs and outputs cannot be audited directly. By investigating inputs that are nearest neighbors to one flagged by a user, developers gain visibility into potential deployment failure modes without retaining private data. Summary of Expe...

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app