
117 - Interpreting NLP Model Predictions, with Sameer Singh
NLP Highlights
00:00
The Different Types of Gradients
The gradient-based methods are pretty interesting. But this is also taking the gradient at a single input instance. And we know things like by doing adversarial attacks and stuff like that, that the gradient or the local region around the prediction may not be quite as flat as one imagines it would be. There has been some really interesting work for integrated gradients where they look at accumulated gradients over a whole path through the input space. With NLP, we've sort of decided all zero embeddings is the way to start, but it's not clear if that's the one because if that has never been seen as input during training, it may not be a very meaningful thing to
Transcript
Play full episode