Unpacking Vulnerabilities in Machine Learning Explanations

This chapter explores the risks associated with post hoc explanation techniques in machine learning, highlighting their susceptibility to adversarial attacks that can distort interpretations. Through a user study with law students, it reveals how visual representations can impact trust in biased classifiers, particularly concerning omitted attributes like race and gender. The speakers advocate for a nuanced understanding of model explainability and emphasize the need for robust resources to navigate the complexities of interpretability in real-world scenarios.

Play episode from 42:46

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app