

Adversarial Examples Are Not Bugs, They Are Features with Aleksander Madry - #369
Apr 27, 2020
Aleksander Madry, a faculty member at MIT specializing in machine learning robustness, dives into the intriguing world of adversarial examples. He argues these examples shouldn't be seen as mere bugs but as inherent features of AI systems. The conversation highlights the mismatch between human expectations and machine perception, stressing the need for new methodologies to improve interpretability. Madry also shares insights on using robust classifiers for image synthesis and navigates the dual nature of AI technologies, urging a deeper understanding of their implications.
AI Snips
Chapters
Transcript
Episode notes
Adversarial Examples
- Adversarial examples are glitches in machine learning systems, often caused by adding small amounts of noise to images.
- This noise leads to misclassification, revealing the brittleness of machine learning predictions.
Adversarial Examples: Features, Not Bugs
- Adversarial examples are often viewed as bugs that need fixing in machine learning systems.
- However, they arise because models perform their tasks too well, exploiting patterns that humans don't recognize.
Pattern Recognition vs. Concept Learning
- Machine learning models don't learn concepts like humans do; they identify patterns correlating with labels.
- These patterns can be brittle and easily manipulated, unlike human-recognized features.