AI Safety Fundamentals: Alignment cover image

AI Safety Fundamentals: Alignment

Latest episodes

undefined
May 13, 2023 • 1h 11min

Biological Anchors: A Trick That Might Or Might Not Work

I've been trying to review and summarize Eliezer Yudkowksy's recent dialogues on AI safety. Previously in sequence: Yudkowsky Contra Ngo On Agents. Now we’re up to Yudkowsky contra Cotra on biological anchors, but before we get there we need to figure out what Cotra's talking about and what's going on.The Open Philanthropy Project ("Open Phil") is a big effective altruist foundation interested in funding AI safety. It's got $20 billion, probably the majority of money in the field, so its decisions matter a lot and it’s very invested in getting things right. In 2020, it asked senior researcher Ajeya Cotra to produce a report on when human-level AI would arrive. It says the resulting document is "informal" - but it’s 169 pages long and likely to affect millions of dollars in funding, which some might describe as making it kind of formal. The report finds a 10% chance of “transformative AI” by 2031, a 50% chance by 2052, and an almost 80% chance by 2100.Eliezer rejects their methodology and expects AI earlier (he doesn’t offer many numbers, but here he gives Bryan Caplan 50-50 odds on 2030, albeit not totally seriously). He made the case in his own very long essay, Biology-Inspired AGI Timelines: The Trick That Never Works, sparking a bunch of arguments and counterarguments and even more long essays.Source:https://astralcodexten.substack.com/p/biological-anchors-a-trick-that-mightCrossposted from the Astral Codex Ten podcast.---A podcast by BlueDot Impact. Learn more on the AI Safety Fundamentals website.
undefined
May 13, 2023 • 16min

On the Opportunities and Risks of Foundation Models

AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.Original article:https://arxiv.org/abs/2108.07258Authors:Bommasani et al.A podcast by BlueDot Impact. Learn more on the AI Safety Fundamentals website.
undefined
May 13, 2023 • 18min

A Short Introduction to Machine Learning

Despite the current popularity of machine learning, I haven’t found any short introductions to it which quite match the way I prefer to introduce people to the field. So here’s my own. Compared with other introductions, I’ve focused less on explaining each concept in detail, and more on explaining how they relate to other important concepts in AI, especially in diagram form. If you're new to machine learning, you shouldn't expect to fully understand most of the concepts explained here just after reading this post - the goal is instead to provide a broad framework which will contextualise more detailed explanations you'll receive from elsewhere. I'm aware that high-level taxonomies can be controversial, and also that it's easy to fall into the illusion of transparency when trying to introduce a field; so suggestions for improvements are very welcome! The key ideas are contained in this summary diagram: First, some quick clarifications: None of the boxes are meant to be comprehensive; we could add more items to any of them. So you should picture each list ending with “and others”. The distinction between tasks and techniques is not a firm or standard categorisation; it’s just the best way I’ve found so far to lay things out. The summary is explicitly from an AI-centric perspective. For example, statistical modeling and optimization are fields in their own right; but for our current purposes we can think of them as machine learning techniques.Original text:https://www.alignmentforum.org/posts/qE73pqxAZmeACsAdF/a-short-introduction-to-machine-learningNarrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.---A podcast by BlueDot Impact. Learn more on the AI Safety Fundamentals website.
undefined
May 13, 2023 • 19min

Intelligence Explosion: Evidence and Import

It seems unlikely that humans are near the ceiling of possible intelligences, rather than simply being the first such intelligence that happened to evolve. Computers far outperform humans in many narrow niches (e.g. arithmetic, chess, memory size), and there is reason to believe that similar large improvements over human performance are possible for general reasoning, technology design, and other tasks of interest. As occasional AI critic Jack Schwartz (1987) wrote:"If artificial intelligences can be created at all, there is little reason to believe that initial successes could not lead swiftly to the construction of artificial superintelligences able to explore significant mathematical, scientific, or engi-neering alternatives at a rate far exceeding human ability, or to generate plans and take action on them with equally overwhelming speed. Since man’s near-monopoly of all higher forms of intelligence has been one of the most basic facts of human existence throughout the past history of this planet, such developments would clearly create a new economics, a new sociology, and a new history."Why might AI “lead swiftly” to machine superintelligence? Below we consider some reasons.Original article:https://drive.google.com/file/d/1QxMuScnYvyq-XmxYeqBRHKz7cZoOosHr/viewAuthors:Luke Muehlhauser, Anna SalamonA podcast by BlueDot Impact. Learn more on the AI Safety Fundamentals website.
undefined
May 13, 2023 • 42min

Visualizing the Deep Learning Revolution

The podcast explores the rapid progress of AI fueled by deep learning techniques, highlighting advancements in vision, games, language tasks, and science. It discusses the evolution of AI image and text generation, the split in the United Methodist Church, advancements in large language models like ChatGPT, and challenges in AI progress including concerns about AI algorithms for generating chemical weapons.
undefined
May 13, 2023 • 13min

Future ML Systems Will Be Qualitatively Different

In 1972, the Nobel prize-winning physicist Philip Anderson wrote the essay "More Is Different". In it, he argues that quantitative changes can lead to qualitatively different and unexpected phenomena. While he focused on physics, one can find many examples of More is Different in other domains as well, including biology, economics, and computer science. Some examples of More is Different include: Uranium. With a bit of uranium, nothing special happens; with a large amount of uranium packed densely enough, you get a nuclear reaction. DNA. Given only small molecules such as calcium, you can’t meaningfully encode useful information; given larger molecules such as DNA, you can encode a genome. Water. Individual water molecules aren’t wet. Wetness only occurs due to the interaction forces between many water molecules interspersed throughout a fabric (or other material).Original text:https://bounded-regret.ghost.io/future-ml-systems-will-be-qualitatively-different/Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.---A podcast by BlueDot Impact. Learn more on the AI Safety Fundamentals website.
undefined
May 13, 2023 • 7min

More Is Different for AI

Machine learning is touching increasingly many aspects of our society, and its effect will only continue to grow. Given this, I and many others care about risks from future ML systems and how to mitigate them. When thinking about safety risks from ML, there are two common approaches, which I'll call the Engineering approach and the Philosophy approach: The Engineering approach tends to be empirically-driven, drawing experience from existing or past ML systems and looking at issues that either: (1) are already major problems, or (2) are minor problems, but can be expected to get worse in the future. Engineering tends to be bottom-up and tends to be both in touch with and anchored on current state-of-the-art systems. The Philosophy approach tends to think more about the limit of very advanced systems. It is willing to entertain thought experiments that would be implausible with current state-of-the-art systems (such as Nick Bostrom's paperclip maximizer) and is open to considering abstractions without knowing many details. It often sounds more "sci-fi like" and more like philosophy than like computer science. It draws some inspiration from current ML systems, but often only in broad strokes. I'll discuss these approaches mainly in the context of ML safety, but the same distinction applies in other areas. For instance, an Engineering approach to AI + Law might focus on how to regulate self-driving cars, while Philosophy might ask whether using AI in judicial decision-making could undermine liberal democracy.Original text:https://bounded-regret.ghost.io/more-is-different-for-ai/Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.---A podcast by BlueDot Impact. Learn more on the AI Safety Fundamentals website.
undefined
May 13, 2023 • 15min

Four Background Claims

MIRI’s mission is to ensure that the creation of smarter-than-human artificial intelligence has a positive impact. Why is this mission important, and why do we think that there’s work we can do today to help ensure any such thing? In this post and my next one, I’ll try to answer those questions. This post will lay out what I see as the four most important premises underlying our mission. Related posts include Eliezer Yudkowsky’s “Five Theses” and Luke Muehlhauser’s “Why MIRI?”; this is my attempt to make explicit the claims that are in the background whenever I assert that our mission is of critical importance. #### Claim #1: Humans have a very general ability to solve problems and achieve goals across diverse domains. We call this ability “intelligence,” or “general intelligence.” This isn’t a formal definition — if we knew exactly what general intelligence was, we’d be better able to program it into a computer — but we do think that there’s a real phenomenon of general intelligence that we cannot yet replicate in code. Alternative view: There is no such thing as general intelligence. Instead, humans have a collection of disparate special-purpose modules. Computers will keep getting better at narrowly defined tasks such as chess or driving, but at no point will they acquire “generality” and become significantly more useful, because there is no generality to acquire.Source:https://intelligence.org/2015/07/24/four-background-claims/Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.---A podcast by BlueDot Impact. Learn more on the AI Safety Fundamentals website.
undefined
May 13, 2023 • 13min

AGI Safety From First Principles

This report explores the core case for why the development of artificial general intelligence (AGI) might pose an existential threat to humanity. It stems from my dissatisfaction with existing arguments on this topic: early work is less relevant in the context of modern machine learning, while more recent work is scattered and brief. This report aims to fill that gap by providing a detailed investigation into the potential risk from AGI misbehaviour, grounded by our current knowledge of machine learning, and highlighting important uncertain ties. It identifies four key premises, evaluates existing arguments about them, and outlines some novel considerations for each.Source:https://drive.google.com/file/d/1uK7NhdSKprQKZnRjU58X7NLA1auXlWHt/viewNarrated for AI Safety Fundamentals by TYPE III AUDIO.---A podcast by BlueDot Impact. Learn more on the AI Safety Fundamentals website.
undefined
5 snips
May 13, 2023 • 34min

The Alignment Problem From a Deep Learning Perspective

Guests Richard Ngo, Lawrence Chan, and Sören Mindermann discuss the dangers of artificial general intelligence pursuing undesirable goals. They explore topics such as reward hacking, situational awareness in policies, internally represented goals in deep learning models, the inner alignment problem, deceptive alignment in AI systems, and the risks of AGIs gaining power. They highlight the need for preventative measures to ensure human control over AGI.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode