The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Beyond Accuracy: Behavioral Testing of NLP Models with Sameer Singh - #406

Sep 3, 2020
Sameer Singh, an assistant professor at UC Irvine, specializes in interpretable machine learning for NLP. He discusses the groundbreaking CheckList tool for robust behavioral testing of NLP models, stressing the importance of understanding model limitations beyond mere accuracy. Sameer reflects on the evolving landscape of AI, the relevance of his co-authored LIME paper in model explainability, and the potential of embodied AI in enhancing our understanding of complex machine learning systems. It's a thoughtful dive into the future of AI evaluation methods.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Deep Learning Surprise

  • Sameer Singh initially focused on specific NLP tasks and missed the deep learning wave.
  • When he adopted deep learning, it excelled at his tasks but didn't utilize his specialized techniques.
INSIGHT

Explainability to Evaluation

  • Deep learning models' high performance contrasted with a lack of understanding of their internal workings.
  • This discrepancy led Singh to explore explainability methods like LIME, shifting his focus to evaluation and debugging.
INSIGHT

Checklist and Behavioral Testing

  • Checklist helps create tests for NLP models, similar to software engineering's behavioral testing.
  • It goes beyond simple accuracy metrics by testing specific capabilities like negation handling and robustness to paraphrasing.
Get the Snipd Podcast app to discover more snips from this episode
Get the app