

Adversarial Attacks Against Reinforcement Learning Agents with Ian Goodfellow & Sandy Huang
Mar 15, 2018
Ian Goodfellow, a Staff Research Scientist at Google Brain known for his work on adversarial machine learning, joins Sandy Huang, a PhD student at UC Berkeley focusing on adversarial attacks in reinforcement learning. They dive into how a single pixel alteration can drastically reduce the performance of Atari-playing AI. The conversation also touches on the philosophy behind error assessment in AI, reward complexity in reinforcement learning, and the implications of adversarial threats on security in AI systems, highlighting the urgent need for robust defenses.
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7 8
Intro
00:00 • 2min
Journey Through Adversarial Machine Learning
01:33 • 5min
Navigating Adversarial Attacks in AI
06:24 • 16min
Navigating Reward Complexity in Reinforcement Learning
22:30 • 6min
Understanding Adversarial Attacks in Machine Learning
28:33 • 7min
Navigating Reward Hacking in Reinforcement Learning
35:35 • 4min
Robustness Challenges in Robotics
39:57 • 5min
Concluding Insights on Goodhart's Law in Machine Learning
44:50 • 2min