LessWrong (Curated & Popular)

LessWrong
undefined
Aug 9, 2023 • 17min

"When can we trust model evaluations?" bu evhub

In "Towards understanding-based safety evaluations," I discussed why I think evaluating specifically the alignment of models is likely to require mechanistic, understanding-based evaluations rather than solely behavioral evaluations. However, I also mentioned in a footnote why I thought behavioral evaluations would likely be fine in the case of evaluating capabilities rather than evaluating alignment:However, while I like the sorts of behavioral evaluations discussed in the GPT-4 System Card (e.g. ARC's autonomous replication evaluation) as a way of assessing model capabilities, I have a pretty fundamental concern with these sorts of techniques as a mechanism for eventually assessing alignment.That's because while I think it would be quite tricky for a deceptively aligned AI to sandbag its capabilities when explicitly fine-tuned on some capabilities task (that probably requires pretty advanced gradient hacking), it should be quite easy for such a model to pretend to look aligned.In this post, I want to try to expand a bit on this point and explain exactly what assumptions I think are necessary for various different evaluations to be reliable and trustworthy. For that purpose, I'm going to talk about four different categories of evaluations and what assumptions I think are needed to make each one go through.Source:https://www.lesswrong.com/posts/dBmfb76zx6wjPsBC7/when-can-we-trust-model-evaluationsNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[Curated Post] ✓
undefined
17 snips
Aug 9, 2023 • 36min

"Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research" by evhub, Nicholas Schiefer, Carson Denison, Ethan Perez

This podcast discusses the importance of researching model organisms of misalignment to understand the causes of alignment failures in AI systems. It explores different strategies for model training and deployment, such as input tagging and evaluating output with a preference model. The risks associated with using model organisms in research, including deceptive alignment, are also explored.
undefined
Aug 4, 2023 • 7min

"The "public debate" about AI is confusing for the general public and for policymakers because it is a three-sided debate" by Adam David Long

AI expert Adam David Long discusses the three-sided debate about AI, involving AI pragmatists, doomers, and boosters. He argues that this framework clarifies the public debate and helps policymakers navigate the complexities of AI impact on society.
undefined
Aug 4, 2023 • 8min

"ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks" by Beth Barnes

Explore ARCE Val's report on evaluating language model agents' abilities in acquiring resources, replicating, and adapting to challenges. Learn about the impact of fine-tuning on GPT-4's performance in autonomous tasks, emphasizing the need for continuous improvement and scaffolding for enhancing ARA capabilities.
undefined
Aug 4, 2023 • 10min

"My current LK99 questions" by Eliezer Yudkowsky

So this morning I thought to myself, "Okay, now I will actually try to study the LK99 question, instead of betting based on nontechnical priors and market sentiment reckoning."  (My initial entry into the affray, having been driven by people online presenting as confidently YES when the prediction markets were not confidently YES.)  And then I thought to myself, "This LK99 issue seems complicated enough that it'd be worth doing an actual Bayesian calculation on it"--a rare thought; I don't think I've done an actual explicit numerical Bayesian update in at least a year.In the process of trying to set up an explicit calculation, I realized I felt very unsure about some critically important quantities, to the point where it no longer seemed worth trying to do the calculation with numbers.  This is the System Working As Intended.Source:https://www.lesswrong.com/posts/EzSH9698DhBsXAcYY/my-current-lk99-questionsNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓[Curated Post] ✓
undefined
Aug 2, 2023 • 20min

"Thoughts on sharing information about language model capabilities" by paulfchristiano

I believe that sharing information about the capabilities and limits of existing ML systems, and especially language model agents, significantly reduces risks from powerful AI—despite the fact that such information may increase the amount or quality of investment in ML generally (or in LM agents in particular).Concretely, I mean to include information like: tasks and evaluation frameworks for LM agents, the results of evaluations of particular agents, discussions of the qualitative strengths and weaknesses of agents, and information about agent design that may represent small improvements over the state of the art (insofar as that information is hard to decouple from evaluation results).Source:https://www.lesswrong.com/posts/fRSj2W4Fjje8rQWm9/thoughts-on-sharing-information-about-language-modelNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓[Curated Post] ✓
undefined
Jul 31, 2023 • 12min

"Yes, It's Subjective, But Why All The Crabs?" by johnswentworth

Some early biologist, equipped with knowledge of evolution but not much else, might see all these crabs and expect a common ancestral lineage. That’s the obvious explanation of the similarity, after all: if the crabs descended from a common ancestor, then of course we’d expect them to be pretty similar.… but then our hypothetical biologist might start to notice surprisingly deep differences between all these crabs. The smoking gun, of course, would come with genetic sequencing: if the crabs’ physiological similarity is achieved by totally different genetic means, or if functionally-irrelevant mutations differ across crab-species by more than mutational noise would induce over the hypothesized evolutionary timescale, then we’d have to conclude that the crabs had different lineages. (In fact, historically, people apparently figured out that crabs have different lineages long before sequencing came along.)Now, having accepted that the crabs have very different lineages, the differences are basically explained. If the crabs all descended from very different lineages, then of course we’d expect them to be very different.… but then our hypothetical biologist returns to the original empirical fact: all these crabs sure are very similar in form. If the crabs all descended from totally different lineages, then the convergent form is a huge empirical surprise! The differences between the crab have ceased to be an interesting puzzle - they’re explained - but now the similarities are the interesting puzzle. What caused the convergence?To summarize: if we imagine that the crabs are all closely related, then any deep differences are a surprising empirical fact, and are the main remaining thing our model needs to explain. But once we accept that the crabs are not closely related, then any convergence/similarity is a surprising empirical fact, and is the main remaining thing our model needs to explain.Source:https://www.lesswrong.com/posts/qsRvpEwmgDBNwPHyP/yes-it-s-subjective-but-why-all-the-crabsNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓
undefined
Jul 31, 2023 • 25min

"Cultivating a state of mind where new ideas are born" by Henrik Karlsson

In the early 2010s, a popular idea was to provide coworking spaces and shared living to people who were building startups. That way the founders would have a thriving social scene of peers to percolate ideas with as they figured out how to build and scale a venture. This was attempted thousands of times by different startup incubators. There are no famous success stories.In 2015, Sam Altman, who was at the time the president of Y Combinator, a startup accelerator that has helped scale startups collectively worth $600 billion, tweeted in reaction that “not [providing coworking spaces] is part of what makes YC work.” Later, in a 2019 interview with Tyler Cowen, Altman was asked to explain why.Source:https://www.lesswrong.com/posts/R5yL6oZxqJfmqnuje/cultivating-a-state-of-mind-where-new-ideas-are-bornNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[Curated] ✓
undefined
Jul 31, 2023 • 8min

"Self-driving car bets" by paulfchristiano

This month I lost a bunch of bets.Back in early 2016 I bet at even odds that self-driving ride sharing would be available in 10 US cities by July 2023. Then I made similar bets a dozen times because everyone disagreed with me.Source:https://www.lesswrong.com/posts/ZRrYsZ626KSEgHv8s/self-driving-car-betsNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓[Curated Post] ✓
undefined
Jul 28, 2023 • 13min

"Brain Efficiency Cannell Prize Contest Award Ceremony" by Alexander Gietelink Oldenziel

Previously Jacob Cannell wrote the post "Brain Efficiency" which makes several radical claims: that the brain is at the pareto frontier of speed, energy efficiency and memory bandwith, that this represent a fundamental physical frontier.Here's an AI-generated summaryThe article “Brain Efficiency: Much More than You Wanted to Know” on LessWrong discusses the efficiency of physical learning machines. The article explains that there are several interconnected key measures of efficiency for physical learning machines: energy efficiency in ops/J, spatial efficiency in ops/mm^2 or ops/mm^3, speed efficiency in time/delay for key learned tasks, circuit/compute efficiency in size and steps for key low-level algorithmic tasks, and learning/data efficiency in samples/observations/bits required to achieve a level of circuit efficiency, or per unit thereof. The article also explains why brain efficiency matters a great deal for AGI timelines and takeoff speeds, as AGI is implicitly/explicitly defined in terms of brain parity. The article predicts that AGI will consume compute & data in predictable brain-like ways and suggests that AGI will be far more like human simulations/emulations than you’d otherwise expect and will require training/education/raising vaguely like humans1.Jake further has argued that this has implication for FOOM and DOOM. Considering the intense technical mastery of nanoelectronics, thermodynamics and neuroscience required to assess the arguments here I concluded that a public debate between experts was called for. This was the start of the Brain Efficiency Prize contest which attracted over a 100 in-depth technically informed comments. Now for the winners! Please note that the criteria for winning the contest was based on bringing in novel and substantive technical arguments as assesed by me. In contrast, general arguments about the likelihood of FOOM or DOOM while no doubt interesting did not factor into the judgement. And the winners of the Jake Cannell Brain Efficiency Prize contest areEge ErdilDaemonicSigilspxtr... and Steven Byrnes!Source:https://www.lesswrong.com/posts/fm88c8SvXvemk3BhW/brain-efficiency-cannell-prize-contest-award-ceremonyNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app