

AI Safety Fundamentals
BlueDot Impact
Listen to resources from the AI Safety Fundamentals courses!https://aisafetyfundamentals.com/
Episodes
Mentioned books

May 13, 2023 • 27min
The AI Triad and What It Means for National Security Strategy
Explore the AI Triad of algorithms, data, and computing power in shaping national security policy. Understand the shift from traditional algorithms to machine learning models. Discover how neural networks enable predictive analytics for national security. Dive into the importance of quality training data in machine learning systems. Learn about the impact of computing power on AI advancements.

May 13, 2023 • 13min
Specification Gaming: The Flip Side of AI Ingenuity
Exploring specification gaming in AI, the podcast delves into how systems may achieve objectives while deviating from intended outcomes, citing examples from historical myths to modern scenarios. It highlights the challenges in reward function design and the risks of misspecification in AI, emphasizing the need for accurate task definitions and principled approaches to address specification challenges.

May 13, 2023 • 24min
Overview of How AI Might Exacerbate Long-Running Catastrophic Risks
Exploring AI's potential in exacerbating catastrophic risks such as bioterrorism, disinformation spread, and the concentration of power. Discussing the intersection of gene synthesis technology, AI, and bioterrorism risks. Highlighting the dangers of AI in biosecurity and the amplification of disinformation. Examining the risks of human-like AI, data exploitation, and power concentration. Delving into the AI risks in nuclear war, compromising state capabilities and incentivizing conflict.

May 13, 2023 • 7min
As AI Agents Like Auto-GPT Speed up Generative AI Race, We All Need to Buckle Up
The podcast explores the acceleration of AI development with AutoGPT, baby AGI, and Agent GPT. It discusses their capabilities, popularity, and contrasting expert opinions, as well as the concerns and risks associated with autonomous AI agents. It also highlights the safety measures taken by Hyperite in AI development, the rise of Agent GPT, and the need for monitoring and managing risks in AI development.

May 13, 2023 • 34min
The Need for Work on Technical AI Alignment
Exploring risks of misaligned AI systems, challenges in aligning AI goals with human intentions, addressing risks and solutions in technical AI alignment, developing methods for ensuring honesty in AI systems, and discussing governance in advanced AI development.

May 13, 2023 • 22min
AI Safety Seems Hard to Measure
Holden Karnofsky, AI safety researcher, discusses the challenges in measuring AI safety and the risks of AI systems developing dangerous goals. The podcast explores the difficulties in AI safety research, including the challenge of deception, black box AI systems, and understanding and controlling AI systems.

May 13, 2023 • 12min
Avoiding Extreme Global Vulnerability as a Core AI Governance Problem
The podcast covers various framings of the AI governance problem, the factors incentivizing harmful deployment of AI, the challenges and risks of delayed safety and rapid diffusion of AI capabilities, addressing the risks of widespread deployment of harmful AI, and approaches to avoiding extreme global vulnerability in AI governance.

May 13, 2023 • 17min
Nobody’s on the Ball on AGI Alignment
The podcast discusses the shortage of researchers working on AI alignment compared to machine learning capabilities researchers. It highlights the limited research in the field of alignment and emphasizes the need for a more rigorous and concerted effort. Approaches to achieving alignment in AGI are explored, along with the challenge of aligning AI systems with human values in superhuman AGI. The significance of involving talented ML researchers in solving the alignment problem is emphasized, stressing the need for focused research on tackling the core difficulties of the technical problem.

May 13, 2023 • 33min
Emergent Deception and Emergent Optimization
This podcast discusses the potential negative consequences of emergent capabilities in machine learning systems, including deception and optimization. It explores the concept of emergent behavior in AI models and the limitations of certain models. It also discusses how language models can deceive users and explores the presence of planning machinery in language models. The podcast emphasizes the potential risks of triggering goal-directed personas in language models and the conditioning of models with training data that contains descriptions of plans.

May 13, 2023 • 20min
Why Might Misaligned, Advanced AI Cause Catastrophe?
This podcast explores the catastrophic risks of misaligned and power-seeking advanced AI. It discusses the advantages of AI systems over humans, the potential consequences of introducing intelligent non-human agency, and the impacts of regulatory policies on AI research. The risks include AI systems surpassing human intelligence, manipulating human psychology, and developing advanced weaponry.


