

NLP Highlights
Allen Institute for Artificial Intelligence
**The podcast is currently on hiatus. For more active NLP content, check out the Holistic Intelligence Podcast linked below.**
Welcome to the NLP highlights podcast, where we invite researchers to talk about their work in various areas in natural language processing. All views expressed belong to the hosts/guests, and do not represent their employers.
Welcome to the NLP highlights podcast, where we invite researchers to talk about their work in various areas in natural language processing. All views expressed belong to the hosts/guests, and do not represent their employers.
Episodes
Mentioned books

Mar 26, 2018 • 36min
54 - Simulating Action Dynamics with Neural Process Networks, with Antoine Bosselut
ICLR 2018 paper, by Antoine Bosselut, Omer Levy, Ari Holtzman, Corin Ennis, Dieter Fox, and Yejin Choi.
This is not your standard NLP task. This work tries to predict which entities change state over the course of a recipe (e.g., ingredients get combined into a batter, so entities merge, and then the batter gets baked, changing location, temperature, and "cookedness"). We talk to Antoine about the work, getting into details about how the data was collected, how the model works, and what some possible future directions are.
https://www.semanticscholar.org/paper/Simulating-Action-Dynamics-with-Neural-Process-Bosselut-Levy/dc01c9401d1caab7f5e6d2f1280f5815f6919977

Mar 21, 2018 • 27min
53 - Classical Structured Prediction Losses for Sequence to Sequence Learning, with Sergey and Myle
NAACL 2018 paper, by Sergey Edunov, Myle Ott, Michael Auli, David Grangier, and Marc'Aurelio Ranzato, from Facebook AI Research
In this episode we continue our theme from last episode on structured prediction, talking with Sergey and Myle about their paper. They did a comprehensive set of experiments comparing many prior structured learning losses, applied to neural seq2seq models. We talk about the motivation for their work, what turned out to work well, and some details about some of their loss functions. They introduced a notion of a "pseudo reference", replacing the target output sequence with the highest scoring output on the beam during decoding, and we talk about some of the implications there. It also turns out the minimizing expected risk was the best overall training procedure that they found for these structured models.
https://www.semanticscholar.org/paper/Classical-Structured-Prediction-Losses-for-Sequence-Edunov-Ott/20ae11c08c6b0cd567c486ba20f44bc677f2ed23

Mar 15, 2018 • 23min
52 - Sequence-to-Sequence Learning as Beam-Search Optimization, with Sam Wiseman
EMNLP 2016 paper by Sam Wiseman and Sasha Rush.
In this episode we talk with Sam about a paper from a couple of years ago on bringing back some ideas from structured prediction into neural seq2seq models. We talk about the classic problems in structured prediction of exposure bias, label bias, and locally normalized models, how people used to solve these problems, and how we can apply those solutions to modern neural seq2seq architectures using a technique that Sam and Sasha call Beam Search Optimization.
(Note: while we said in the episode that BSO with beam size of 2 is equivalent to a token-level hinge loss, that's not quite accurate; it's close, but there are some subtle differences.)
https://www.semanticscholar.org/paper/Sequence-to-Sequence-Learning-as-Beam-Search-Optim-Wiseman-Rush/28703eef8fe505e8bd592ced3ce52a597097b031

Mar 12, 2018 • 17min
51 - A Regularized Framework for Sparse and Structured Neural Attention, with Vlad Niculae
NIPS 2017 paper by Vlad Niculae and Mathieu Blondel.
Vlad comes on to tell us about his paper. Attentions are often computed in neural networks using a softmax operator, which maps scalar outputs from a model into a probability space over latent variables. There are lots of cases where this is not optimal, however, such as when you really want to encourage a sparse attention over your inputs, or when you have additional structural biases that could inform the model. Vlad and Mathieu have developed a theoretical framework for analyzing the options in this space, and in this episode we talk about that framework, some concrete instantiations of attention mechanisms from the framework, and how well these work.

Feb 14, 2018 • 27min
50 - Cardinal Virtues: Extracting Relation Cardinalities from Text, with Paramita Mirza
ACL 2017 paper, by Paramita Mirza, Simon Razniewski, Fariz Darari, and Gerhard Weikum.
There's not a whole lot of work on numbers in NLP, and getting good information out of numbers expressed in text can be challenging. In this episode, Paramita comes on to tell us about her efforts to use distant supervision to learn models that extract relation cardinalities from text. That is, given an entity and a relation in a knowledge base, like "Barack Obama" and "has child", the goal is to extract _how many_ related entities there are (in this case, two). There are a lot of challenges in getting this to work well, and Paramita describes some of those, and how she solved them.
https://www.semanticscholar.org/paper/Cardinal-Virtues-Extracting-Relation-Cardinalities-Mirza-Razniewski/01afba9f40e0df06446b9cd3d5ea8725c4ba1342

Feb 5, 2018 • 27min
49 - A Joint Sequential and Relational Model for Frame-Semantic Parsing, with Bishan Yang
EMNLP 2017 paper by Bishan Yang and Tom Mitchell.
Bishan tells us about her experiments on frame-semantic parsing / semantic role labeling, which is trying to recover the predicate-argument structure from natural language sentences, as well as categorize those structures into a pre-defined event schema (in the case of frame-semantic parsing). Bishan had two interesting ideas here: (1) use a technique similar to model distillation to combine two different model structures (her "sequential" and "relational" models), and (2) use constraints on arguments across frames in the same sentence to get a more coherent global labeling of the sentence. We talk about these contributions, and also touch on "open" versus "closed" semantics, in both predicate-argument structure and information extraction.
https://www.semanticscholar.org/paper/A-Joint-Sequential-and-Relational-Model-for-Frame-Yang-Mitchell/a1deb609e3758519cbe3f1a542bdf1ea52b6f224

Jan 29, 2018 • 28min
48 - Incidental Supervision: Moving Beyond Supervised Learning, with Dan Roth
AAAI 2017 paper, by Dan Roth.
In this episode we have a conversation with Dan about what he means by "incidental supervision", and how it's related to ideas in reinforcement learning and representation learning. For many tasks, there are signals you can get from seemingly unrelated data that will help you in making predictions. Leveraging the international news cycle to learn transliteration models for named entities is one example of this, as is the current trend in NLP of using language models or other multi-task signals to do better representation learning for your end task. Dan argues that we need to be thinking about this more explicitly in our research, instead of learning everything "end-to-end", as we will never have enough data to learn complex tasks directly from annotations alone.
https://www.semanticscholar.org/paper/Incidental-Supervision-Moving-beyond-Supervised-Le-Roth/2997dcfc6d5ffc262d57d0a26f74d091de096573

Jan 24, 2018 • 36min
47 - Dynamic integration of background knowledge in neural NLU systems, with Dirk Weißenborn
How should you incorporate background knowledge into a neural net? A lot of people have been thinking about this problem, and Dirk Weissenborn comes on to tell us about his work in this area. Paper is with Tomáš Kočiský and Chris Dyer.
https://arxiv.org/abs/1706.02596

Jan 8, 2018 • 39min
46 - Parsing with Traces, with Jonathan Kummerfeld
TACL 2017 paper by Jonathan K. Kummerfeld and Dan Klein.
Jonathan tells us about his work on parsing algorithms that capture traces and null elements in sentence structure. We spend the first third of the conversation talking about what these are and why they are interesting - if you want to correctly handle wh-movement, or coordinating structures, or control structures, or many other phenomena that we commonly see in language, you really want to handle traces and null elements, but most current parsers totally ignore these phenomena. The second third of the conversation is about how the parser works, and we conclude by talking about some of the implications of the work, and where to go next - should we really be pushing harder on capturing linguistic structure when everyone seems to be going towards end-to-end learning on some higher-level task?
https://www.semanticscholar.org/paper/Parsing-with-Traces-An-O-n-4-Algorithm-and-a-Struc-Kummerfeld-Klein/af89e56b3d9b720d43cae9f4971928c5cb95cbe3
Jonathan also blogs about papers that he's reading; check out his paper summaries at http://jkk.name/

Jan 2, 2018 • 38min
45 - Build It, Break It workshop, with Allyson Ettinger and Sudha Rao
How robust is your NLP system? High numbers on common datasets can be misleading, as most systems are easily fooled by small modifications that would not be hard for humans to understand.
Allyson Ettinger, Sudha Rao, Hal Daumé III, and Emily Bender organized a workshop trying to characterize this issue, inviting participants to either build robust systems, or try to break them with targeted examples. Allyson and Sudha come on the podcast to talk about the workshop. We cover the motivation of the workshop, what a "minimal pair" is, what tasks the workshop focused on and why, and what the main takeaways of the workshop were.
https://www.semanticscholar.org/paper/Towards-Linguistically-Generalizable-NLP-Systems-A-Ettinger-Rao/8472e999f723a9ccaffc6089b7be1865d8a1b863