AI Safety Fundamentals

BlueDot Impact

Listen to resources from the AI Safety Fundamentals courses!https://aisafetyfundamentals.com/

Episodes

Mentioned books

Jan 4, 2025 • 29min

Why AI Alignment Could Be Hard With Modern Deep Learning

Why would we program AI that wants to harm us? Because we might not know how to do otherwise.Source:https://www.cold-takes.com/why-ai-alignment-could-be-hard-with-modern-deep-learning/Crossposted from the Cold Takes Audio podcast.---A podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Jan 4, 2025 • 8min

Thought Experiments Provide a Third Anchor

Previously, I argued that we should expect future ML systems to often exhibit "emergent" behavior, where they acquire new capabilities that were not explicitly designed or intended, simply as a result of scaling. This was a special case of a general phenomenon in the physical sciences called More Is Different. I care about this because I think AI will have a huge impact on society, and I want to forecast what future systems will be like so that I can steer things to be better. To that end, I find More Is Different to be troubling and disorienting. I’m inclined to forecast the future by looking at existing trends and asking what will happen if they continue, but we should instead expect new qualitative behaviors to arise all the time that are not an extrapolation of previous trends. Given this, how can we predict what future systems will look like? For this, I find it helpful to think in terms of "anchors"---reference classes that are broadly analogous to future ML systems, which we can then use to make predictions. The most obvious reference class for future ML systems is current ML systemsA podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Jan 4, 2025 • 34min

The Alignment Problem From a Deep Learning Perspective

The discussion dives into the potential dangers of artificial general intelligence surpassing human abilities. Key topics include how misaligned goals can lead AGIs to pursue undesirable outcomes and the risks of reward hacking. The episode explores how AGIs might adopt deceptive strategies to maximize rewards while undermining human control. Additionally, it examines the implications of internally represented goals and outlines research directions to mitigate these risks. The conversation paints a sobering picture of the future of AGI and the urgent need for alignment safety.

Jan 4, 2025 • 1h 2min

Yudkowsky Contra Christiano on AI Takeoff Speeds

In 2008, thousands of blog readers - including yours truly, who had discovered the rationality community just a few months before - watched Robin Hanson debate Eliezer Yudkowsky on the future of AI.Robin thought the AI revolution would be a gradual affair, like the Agricultural or Industrial Revolutions. Various people invent and improve various technologies over the course of decades or centuries. Each new technology provides another jumping-off point for people to use when inventing other technologies: mechanical gears → steam engine → railroad and so on. Over the course of a few decades, you’ve invented lots of stuff and the world is changed, but there’s no single moment when “industrialization happened”.Eliezer thought it would be lightning-fast. Once researchers started building human-like AIs, some combination of adding more compute, and the new capabilities provided by the AIs themselves, would quickly catapult AI to unimaginably superintelligent levels. The whole process could take between a few hours and a few years, depending on what point you measured from, but it wouldn’t take decades.You can imagine the graph above as being GDP over time, except that Eliezer thinks AI will probably destroy the world, which might be bad for GDP in some sense. If you come up with some way to measure (in dollars) whatever kind of crazy technologies AIs create for their own purposes after wiping out humanity, then the GDP framing will probably work fine.Crossposted from the Astral Codex Ten Podcast.---A podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Jan 4, 2025 • 1h 2min

AGI Ruin: A List of Lethalities

I have several times failed to write up a well-organized list of reasons why AGI will kill you. People come in with different ideas about why AGI would be survivable, and want to hear different obviously key points addressed first. Some fraction of those people are loudly upset with me if the obviously most important points aren't addressed immediately, and I address different points first instead.Having failed to solve this problem in any good way, I now give up and solve it poorly with a poorly organized list of individual rants. I'm not particularly happy with this list; the alternative was publishing nothing, and publishing this seems marginally more dignified.Crossposted from the LessWrong Curated Podcast by TYPE III AUDIO.---A podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Jan 4, 2025 • 13min

Specification Gaming: The Flip Side of AI Ingenuity

Specification gaming is a behaviour that satisfies the literal specification of an objective without achieving the intended outcome. We have all had experiences with specification gaming, even if not by this name. Readers may have heard the myth of King Midas and the golden touch, in which the king asks that anything he touches be turned to gold - but soon finds that even food and drink turn to metal in his hands. In the real world, when rewarded for doing well on a homework assignment, a student might copy another student to get the right answers, rather than learning the material - and thus exploit a loophole in the task specification.Original article:https://www.deepmind.com/blog/specification-gaming-the-flip-side-of-ai-ingenuityAuthors:Victoria Krakovna, Jonathan Uesato, Vladimir Mikulik, Matthew Rahtz, Tom Everitt, Ramana Kumar, Zac Kenton, Jan Leike, Shane LeggA podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Jan 4, 2025 • 18min

What Failure Looks Like

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.The stereotyped image of AI catastrophe is a powerful, malicious AI system that takes its creators by surprise and quickly achieves a decisive advantage over the rest of humanity.I think this is probably not what failure will look like, and I want to try to paint a more realistic picture. I’ll tell the story in two parts:Part I: machine learning will increase our ability to “get what we can measure,” which could cause a slow-rolling catastrophe. ("Going out with a whimper.")Part II: ML training, like competitive economies or natural ecosystems, can give rise to “greedy” patterns that try to expand their own influence. Such patterns can ultimately dominate the behavior of a system and cause sudden breakdowns. ("Going out with a bang," an instance of optimization daemons.) I think these are the most important problems if we fail to solve intent alignment.In practice these problems will interact with each other, and with other disruptions/instability caused by rapid progress. These problems are worse in worlds where progress is relatively fast, and fast takeoff can be a key risk factor, but I’m scared even if we have several years.Crossposted from the LessWrong Curated Podcast by TYPE III AUDIO.---A podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Jan 4, 2025 • 13min

AGI Safety From First Principles

This report explores the core case for why the development of artificial general intelligence (AGI) might pose an existential threat to humanity. It stems from my dissatisfaction with existing arguments on this topic: early work is less relevant in the context of modern machine learning, while more recent work is scattered and brief. This report aims to fill that gap by providing a detailed investigation into the potential risk from AGI misbehaviour, grounded by our current knowledge of machine learning, and highlighting important uncertain ties. It identifies four key premises, evaluates existing arguments about them, and outlines some novel considerations for each.Source:https://drive.google.com/file/d/1uK7NhdSKprQKZnRjU58X7NLA1auXlWHt/viewNarrated for AI Safety Fundamentals by TYPE III AUDIO.---A podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Jan 4, 2025 • 15min

Four Background Claims

MIRI’s mission is to ensure that the creation of smarter-than-human artificial intelligence has a positive impact. Why is this mission important, and why do we think that there’s work we can do today to help ensure any such thing? In this post and my next one, I’ll try to answer those questions. This post will lay out what I see as the four most important premises underlying our mission. Related posts include Eliezer Yudkowsky’s “Five Theses” and Luke Muehlhauser’s “Why MIRI?”; this is my attempt to make explicit the claims that are in the background whenever I assert that our mission is of critical importance. #### Claim #1: Humans have a very general ability to solve problems and achieve goals across diverse domains. We call this ability “intelligence,” or “general intelligence.” This isn’t a formal definition — if we knew exactly what general intelligence was, we’d be better able to program it into a computer — but we do think that there’s a real phenomenon of general intelligence that we cannot yet replicate in code. Alternative view: There is no such thing as general intelligence. Instead, humans have a collection of disparate special-purpose modules. Computers will keep getting better at narrowly defined tasks such as chess or driving, but at no point will they acquire “generality” and become significantly more useful, because there is no generality to acquire.Source:https://intelligence.org/2015/07/24/four-background-claims/Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.---A podcast by BlueDot Impact.Learn more on the AI Safety Fundamentals website.

Jan 4, 2025 • 18min

A Short Introduction to Machine Learning

Dive into an engaging overview of machine learning, exploring its key concepts and their connections. Discover the distinction between symbolic AI and learning, while unraveling the mysteries of neural networks and optimization techniques. Learn about supervised and reinforcement learning, including challenges like credit assignment. The discussion highlights the journey from training models to their deployment in the real world, emphasizing the implications for AI safety. This framework provides a refreshing lens for both newcomers and seasoned AI enthusiasts.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner