

NLP Highlights
Allen Institute for Artificial Intelligence
**The podcast is currently on hiatus. For more active NLP content, check out the Holistic Intelligence Podcast linked below.**
Welcome to the NLP highlights podcast, where we invite researchers to talk about their work in various areas in natural language processing. All views expressed belong to the hosts/guests, and do not represent their employers.
Welcome to the NLP highlights podcast, where we invite researchers to talk about their work in various areas in natural language processing. All views expressed belong to the hosts/guests, and do not represent their employers.
Episodes
Mentioned books

Sep 30, 2019 • 28min
94 - Decompositional Semantics, with Aaron White
In this episode, Aaron White tells us about the decompositional semantics initiative (Decomp), an attempt to re-think the prototypical approach to semantic representation and annotation. The basic idea is to decompose complex semantic classes such as ‘agent’ and ‘patient’ into simpler semantic properties such as ‘causation’ and ‘volition’, while embracing the uncertainty inherent in language by allowing annotators to choose answers such as ‘probably’ or ‘probably not’. In order to scale the collection of labeled data, each property is annotated by asking crowd workers intuitive questions about phrases in a given sentence.
Aaron White's homepage: http://aaronstevenwhite.io/
Decomp initiative page: http://decomp.io/

Jul 22, 2019 • 37min
93 - NLP/ML for clinical data, with Alistair Johnson
In this episode, we invite Alistair Johnson to discuss the main challenge in applying NLP/ML to clinical domains: the lack of data. We discuss privacy concerns, de-identification, synthesizing records, legal liabilities and data heterogeneity. We also discuss how the MIMIC dataset evolved over the years, how it is being used, and some of the under-explored ways in which it can be used.
Alistair’s homepage: http://alistairewj.github.io/
MIMIC dataset: https://mimic.physionet.org/

Jul 5, 2019 • 34min
92 - Computational Humanities, with David Bamman
In this episode, we invite David Bamman to give an overview of computational humanities. We discuss examples of questions studied in computational humanities (e.g., characterizing fictionality, assessing novelty, measuring the attention given to male vs. female characters in the literature). We talk about the role NLP plays in addressing these questions and how the accuracy and biases of NLP models can influence the results. We also discuss understudied NLP tasks which can help us answer more questions in this domain such as literary scene coreference resolution and constructing a map of literature geography.
David Bamman's homepage: http://people.ischool.berkeley.edu/~dbamman/
LitBank dataset: https://github.com/dbamman/litbank

Jun 26, 2019 • 42min
91 - (Executable) Semantic Parsing, with Jonathan Berant
In this episode, we invite Jonathan Berant to talk about executable semantic parsing. We discuss what executable semantic parsing is and how it differs from related tasks such as semantic dependency parsing and abstract meaning representation (AMR) parsing. We talk about the main components of a semantic parser, how the formal language affects design choices in the parser, and end with a discussion of some exciting open problems in this space.
Jonathan Berant's homepage: http://www.cs.tau.ac.il/~joberant/

May 31, 2019 • 55min
90 - Research in Academia versus Industry, with Philip Resnik and Jason Baldridge
How is it like to do research in academia vs. industry? In this episode, we invite Jason Baldridge (UT Austin => Google) and Philip Resnik (Sun Microsystems => UMD) to discuss some of the aspects one may want to consider when planning their research careers, including flexibility, security and intellectual freedom. Perhaps most importantly, we discuss how the career choices we make influence and are influenced by the relationships we forge. Check out the Careers in NLP Panel at NAACL'19 on Monday, June 3, 2019 for further discussion.
Careers in NLP panel @ NAACL'19: https://naacl2019.org/blog/careers-panel-survey/
Jason Baldridge's homepage: http://www.jasonbaldridge.com/
Philip Resnik's homepage: http://users.umiacs.umd.edu/~resnik/

May 31, 2019 • 37min
89 - Dialog Systems, with Zhou Yu
In this episode, we invite Zhou Yu to give an overview of dialogue systems. We discuss different types of dialogue systems (task-oriented vs. non-task-oriented), the main building blocks and how they relate to other research areas in NLP, how to transfer models across domains, and the different ways used to evaluate these systems. Zhou also shares her thoughts on exciting future directions such as developing dialogue methods for non-cooperative environments (e.g., to negotiate prices) and multimodal dialogue systems (e.g., using video as well as audio/text).
Zhou Yu's homepage: http://zhouyu.cs.ucdavis.edu/

May 7, 2019 • 41min
88 - A Structural Probe for Finding Syntax in Word Representations, with John Hewitt
In this episode, we invite John Hewitt to discuss his take on how to probe word embeddings for syntactic information. The basic idea is to project word embeddings to a vector space where the L2 distance between a pair of words in a sentence approximates the number of hops between them in the dependency tree. The proposed method shows that ELMo and BERT representations, trained with no syntactic supervision, embed many of the unlabeled, undirected dependency attachments between words in the same sentence.
Paper: https://nlp.stanford.edu/pubs/hewitt2019structural.pdf
GitHub repository: https://github.com/john-hewitt/structural-probes
Blog post: https://nlp.stanford.edu/~johnhew/structural-probe.html
Twitter thread: https://twitter.com/johnhewtt/status/1114252302141886464
John's homepage: https://nlp.stanford.edu/~johnhew/

Apr 25, 2019 • 33min
87 - Pathologies of Neural Models Make Interpretation Difficult, with Shi Feng
In this episode, Shi Feng joins us to discuss his recent work on identifying pathological behaviors of neural models for NLP tasks. Shi uses input word gradients to identify the least important word for a model's prediction, and iteratively removes that word until the model prediction changes. The reduced inputs tend to be significantly smaller than the original inputs, e.g., 2.3 words instead of 11.5 in the original in SQuAD, on average. We discuss possible interpretations of these results, and a proposed method for mitigating these pathologies.
Shi Feng's homepage: http://users.umiacs.umd.edu/~shifeng/
Paper: https://www.semanticscholar.org/paper/Pathologies-of-Neural-Models-Make-Interpretation-Feng-Wallace/8e141b5cb01c88b315c9a94dc97e50738cc7370d
Joint work with Eric Wallace, Alvin Grissom II, Mohit Iyyer, Pedro Rodriguez and Jordan Boyd-Graber

Apr 15, 2019 • 32min
86 - NLP for Evidence-based Medicine, with Byron Wallace
In this episode, Byron Wallace tells us about interdisciplinary work between evidence based medicine and natural language processing. We discuss extracting PICO frames from articles describing clinical trials and data available for direct and weak supervision. We also discuss automating the assessment of risks of bias in, e.g., random sequence generation, allocation containment and outcome assessment, which have been used to help domain experts who need to review hundreds of articles.
Byron Wallace's homepage: http://www.byronwallace.com/
EBM-NLP dataset: https://ebm-nlp.herokuapp.com/
MIMIC dataset: https://mimic.physionet.org/
Cochrane database of systematic reviews: https://www.cochranelibrary.com/cdsr/about-cdsr
The bioNLP workshop at ACL'19 (submission due date was extended to May 10): https://aclweb.org/aclwiki/BioNLP_Workshop
The workshop on health text mining and information analysis at EMNLP'19: https://louhi2019.fbk.eu/
Machine learning for healthcare conference: https://www.mlforhc.org/

Mar 29, 2019 • 37min
85 - Stress in Research, with Charles Sutton
In this episode, Charles Sutton walks us through common sources of stress for researchers and suggests coping strategies to maintain your sanity. We talk about how pursuing a research career is similar to participating in a life-long international tournament, conflating research worth and self-worth, and how freedom can be both a blessing and a curse, among other stressors one may encounter in a research career.
Charles Sutton's homepage: https://homepages.inf.ed.ac.uk/csutton/
A series of blog posts Charles wrote on this topic: http://www.theexclusive.org/tag/stress%20in%20research/