Exploring Weaknesses in AI Explanation Methods

This chapter focuses on the shortcomings of local and global explanation techniques in AI, particularly perturbation-based methods like LIME. It delves into how adversarial classifiers can mislead interpretability tools, making it difficult to identify biases in AI models, especially when race influences predictions. The discussion emphasizes ongoing research aimed at developing more robust explanation methods to improve fairness and transparency in machine learning.

Play episode from 27:10

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app