The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Beyond Accuracy: Behavioral Testing of NLP Models with Sameer Singh - #406

Sep 3, 2020

Sameer Singh, an assistant professor at UC Irvine, specializes in interpretable machine learning for NLP. He discusses the groundbreaking CheckList tool for robust behavioral testing of NLP models, stressing the importance of understanding model limitations beyond mere accuracy. Sameer reflects on the evolving landscape of AI, the relevance of his co-authored LIME paper in model explainability, and the potential of embodied AI in enhancing our understanding of complex machine learning systems. It's a thoughtful dive into the future of AI evaluation methods.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Deep Learning Surprise

Sameer Singh initially focused on specific NLP tasks and missed the deep learning wave.
When he adopted deep learning, it excelled at his tasks but didn't utilize his specialized techniques.

INSIGHT

Explainability to Evaluation

Deep learning models' high performance contrasted with a lack of understanding of their internal workings.
This discrepancy led Singh to explore explainability methods like LIME, shifting his focus to evaluation and debugging.

INSIGHT

Checklist and Behavioral Testing

Checklist helps create tests for NLP models, similar to software engineering's behavioral testing.
It goes beyond simple accuracy metrics by testing specific capabilities like negation handling and robustness to paraphrasing.

Get the Snipd Podcast app to discover more snips from this episode

Get the app