LessWrong (Curated & Popular)

LessWrong
undefined
Mar 7, 2024 • 40min

Tips for Empirical Alignment Research

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.TLDR: I’ve collected some tips for research that I’ve given to other people and/or used myself, which have sped things up and helped put people in the right general mindset for empirical AI alignment research. Some of these are opinionated takes, also around what has helped me. Researchers can be successful in different ways, but I still stand by the tips here as a reasonable default. What success generally looks likeHere, I’ve included specific criteria that strong collaborators of mine tend to meet, with rough weightings on the importance, as a rough north star for people who collaborate with me (especially if you’re new to research). These criteria are for the specific kind of research I do (highly experimental LLM alignment research, excluding interpretability); some examples of research areas where this applies are e.g. scalable oversight [...]--- First published: February 29th, 2024 Source: https://www.lesswrong.com/posts/dZFpEdKyb9Bf4xYn7/tips-for-empirical-alignment-research --- Narrated by TYPE III AUDIO.
undefined
Feb 29, 2024 • 12min

Timaeus’s First Four Months

Timaeus was announced in late October 2023, with the mission of making fundamental breakthroughs in technical AI alignment using deep ideas from mathematics and the sciences. This is our first progress update.In service of the mission, our first priority has been to support and contribute to ongoing work in Singular Learning Theory (SLT) and developmental interpretability, with the aim of laying theoretical and empirical foundations for a science of deep learning and neural network interpretability. Our main uncertainties in this research were: Is SLT useful in deep learning? While SLT is mathematically established, it was not clear whether the central quantities of SLT could be estimated at sufficient scale, and whether SLT's predictions actually held for realistic models (esp. language models). Does structure in neural networks form in phase transitions? The idea of developmental interpretability was to view phase transitions as a core primitive in the [...]The original text contained 1 footnote which was omitted from this narration. --- First published: February 28th, 2024 Source: https://www.lesswrong.com/posts/Quht2AY6A5KNeZFEA/timaeus-s-first-four-months --- Narrated by TYPE III AUDIO.
undefined
Feb 23, 2024 • 8min

Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”

This is a linkpost for https://bayesshammai.substack.com/p/contra-ngo-et-al-every-every-bayWith thanks to Scott Alexander for the inspiration, Jeffrey Ladish, Philip Parker, Avital Morris, and Drake Thomas for masterful cohosting, and Richard Ngo for his investigative journalism.Last summer, I threw an Every Bay Area House Party themed party. I don’t live in the Bay, but I was there for a construction-work-slash-webforum-moderation-and-UI-design-slash-grantmaking gig, so I took the opportunity to impose myself on the ever generous Jeffrey Ladish and host a party in his home. Fortunately, the inside of his house is already optimized to look like a parody of a Bay Area house party house, so not much extra decorating was needed, but when has that ever stopped me?Attendees could look through the window for an outside viewRichard Ngo recently covered the event, with only very minor embellishments. I’ve heard rumors that some people are doubting whether the party described truly happened, so [...]--- First published: February 22nd, 2024 Source: https://www.lesswrong.com/posts/mmYFF4dyi8Kg6pWGC/contra-ngo-et-al-every-every-bay-area-house-party-bay-area Linkpost URL:https://bayesshammai.substack.com/p/contra-ngo-et-al-every-every-bay --- Narrated by TYPE III AUDIO.
undefined
Feb 20, 2024 • 25min

[HUMAN VOICE] "Updatelessness doesn't solve most problems" by Martín Soto

The podcast explores the complexities of updatelessness in decision-making, highlighting limitations and strategic implications. It delves into decision strategies in uncertain environments, cooperation with counterfactual selves, and the trade-offs in decision-making under uncertainty. The discussions cover scenarios like counterfactual mugging and the game of chicken, emphasizing the challenges of committing to a strategy without being exploited. It also explores the impact of updateful and updateless agents in diverse AI training scenarios, highlighting the complexities of super-intelligent interactions.
undefined
Feb 20, 2024 • 22min

[HUMAN VOICE] "And All the Shoggoths Merely Players" by Zack_M_Davis

Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/8yCXeafJo67tYe5L4/and-all-the-shoggoths-merely-players Narrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓[125+ Karma Post] ✓
undefined
Feb 19, 2024 • 7min

Every “Every Bay Area House Party” Bay Area House Party

Inspired by a house party inspired by Scott Alexander.By the time you arrive in Berkeley, the party is already in full swing. You’ve come late because your reading of the polycule graph indicated that the first half would be inauspicious. But now you’ve finally made it to the social event of the season: the Every Bay Area House Party-themed house party.The first order of the evening is to get a color-coded flirting wristband, so that you don’t incur any accidental micromarriages. You scan the menu of options near the door. There's the wristband for people who aren’t interested in flirting; the wristband for those want to be flirted with, but will never flirt back; the wristband for those who only want to flirt with people who have different-colored wristbands; and of course the one for people who want to glomarize disclosure of their flirting preferences. Finally you [...]--- First published: February 16th, 2024 Source: https://www.lesswrong.com/posts/g5q4JiG5dzafkdyEN/every-every-bay-area-house-party-bay-area-house-party --- Narrated by TYPE III AUDIO.
undefined
Feb 19, 2024 • 1h 53min

2023 Survey Results

The Data 0. PopulationThere were 558 responses over 32 days. The spacing and timing of the responses had hills and valleys because of an experiment I was performing where I'd get the survey advertised in a different place, then watch how many new responses happened in the day or two after that.Previous surveys have been run over the last decade or so. 2009: 166 2011: 1090 2012: 1195 2013: 1636 2014: 1503 2016: 3083 2017: "About 300" 2020: 61 2022: 186 2023: 558Last year when I got a hundred and eighty six responses, I said that the cheerfully optimistic interpretation was "cool! I got about as many as Scott did on his first try!" This time I got around half of what Scott did on his second try. A thousand responses feels pretty firmly achievable. This is also the tenth such [...]--- First published: February 16th, 2024 Source: https://www.lesswrong.com/posts/WRaq4SzxhunLoFKCs/2023-survey-results --- Narrated by TYPE III AUDIO.
undefined
Feb 18, 2024 • 8min

Raising children on the eve of AI

Cross-posted with light edits from Otherwise. I think of us in some kind of twilight world as transformative AI looks more likely: things are about to change, and I don’t know if it's about to get a lot darker or a lot brighter. Increasingly this makes me wonder how I should be raising my kids differently. What might the world look likeMost of my imaginings about my children's lives have them in pretty normal futures, where they go to college and have jobs and do normal human stuff, but with better phones.It's hard for me to imagine the other versions: A lot of us are killed or incapacitated by AIMore war, pandemics, and general chaosPost-scarcity utopia, possibly with people living as uploads Some other weird outcome I haven’t imaginedEven in the world where change is slower, more like the speed [...]--- First published: February 15th, 2024 Source: https://www.lesswrong.com/posts/cyqrvE3dk5apg54Sk/raising-children-on-the-eve-of-ai --- Narrated by TYPE III AUDIO.
undefined
Feb 18, 2024 • 15min

“No-one in my org puts money in their pension”

This is a linkpost for https://seekingtobejolly.substack.com/p/no-one-in-my-org-puts-money-in-theirEpistemic status: the stories here are all as true as possible from memory, but my memory is so so.An AI made this This is going to be bigIt's late Summer 2017. I am on a walk in the Mendip Hills. It's warm and sunny and the air feels fresh. With me are around 20 other people from the Effective Altruism London community. We’ve travelled west for a retreat to discuss how to help others more effectively with our donations and careers. As we cross cow field after cow field, I get talking to one of the people from the group I don’t know yet. He seems smart, and cheerful. He tells me that he is an AI researcher at Google DeepMind. He explains how he is thinking about how to make sure that any powerful AI system actually does what we want it [...]--- First published: February 16th, 2024 Source: https://www.lesswrong.com/posts/dLXdCjxbJMGtDBWTH/no-one-in-my-org-puts-money-in-their-pension Linkpost URL:https://seekingtobejolly.substack.com/p/no-one-in-my-org-puts-money-in-their --- Narrated by TYPE III AUDIO.
undefined
Feb 16, 2024 • 8min

Masterpiece

This is a linkpost for https://www.narrativeark.xyz/p/masterpieceA sequel to qntm's Lena. Reading Lena first is helpful but not necessary.We’re excited to announce the fourth annual MMindscaping competition! Over the last few years, interest in the art of mindscaping has continued to grow rapidly. We expect this year's competition to be our biggest yet, and we’ve expanded the prize pool to match. The theme for the competition is “Weird and Wonderful”—we want your wackiest ideas and most off-the-wall creations! Competition rulesAs in previous competitions, the starting point is a base MMAcevedo mind upload. All entries must consist of a single modified version of MMAcevedo, along with a written or recorded description of the sequence of transformations or edits which produced it. For more guidance on which mind-editing techniques can be used, see the Technique section below.Your entry must have been created in the last 12 months, and cannot [...]--- First published: February 13th, 2024 Source: https://www.lesswrong.com/posts/Fruv7Mmk3X5EekbgB/masterpiece Linkpost URL:https://www.narrativeark.xyz/p/masterpiece --- Narrated by TYPE III AUDIO.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app