

NLP Highlights
Allen Institute for Artificial Intelligence
**The podcast is currently on hiatus. For more active NLP content, check out the Holistic Intelligence Podcast linked below.**
Welcome to the NLP highlights podcast, where we invite researchers to talk about their work in various areas in natural language processing. All views expressed belong to the hosts/guests, and do not represent their employers.
Welcome to the NLP highlights podcast, where we invite researchers to talk about their work in various areas in natural language processing. All views expressed belong to the hosts/guests, and do not represent their employers.
Episodes
Mentioned books

Feb 3, 2020 • 31min
104 - Model Distillation, with Victor Sanh and Thomas Wolf
In this episode we talked with Victor Sanh and Thomas Wolf from HuggingFace about model distillation, and DistilBERT as one example of distillation. The idea behind model distillation is compressing a large model by building a smaller model, with much fewer parameters, that approximates the output distribution of the original model, typically for increased efficiency. We discussed how model distillation was typically done previously, and then focused on the specifics of DistilBERT, including training objective, empirical results, ablations etc. We finally discussed what kinds of information you might lose when doing model distillation.

Jan 27, 2020 • 43min
103 - Processing Language in Social Media, with Brendan O'Connor
We talked to Brendan O’Connor for this episode about processing language in social media. Brendan started off by telling us about his projects that studied the linguistic and geographical patterns of African American English (AAE), and how obtaining data from Twitter made these projects possible. We then talked about how many tools built for standard English perform very poorly on AAE, and why collecting dialect-specific data is important. For the rest of the conversation, we discussed the issues involved in scraping data from social media, including ethical considerations and the biases that the data comes with.
Brendan O’Connor is an Assistant Professor at the University of Massachusetts, Amherst.
Warning: This episode contains explicit language (one swear word).

Jan 20, 2020 • 37min
102 - Biomedical NLP research at the National Institute of Health with Dina Demner-Fushman
What exciting NLP research problems are involved in processing biomedical and clinical data? In this episode, we spoke with Dina Demner-Fushman, who leads NLP and IR research at the Lister Hill National Center for Biomedical Communications, part of the National Library of Medicine. We talked about processing biomedical scientific literature, understanding clinical notes, and answering consumer health questions, and the challenges involved in each of these applications. Dina listed some specific tasks and relevant data sources for NLP researchers interested in such applications, and concluded with some pointers to getting started in this field.

7 snips
Jan 14, 2020 • 41min
101 - The lottery ticket hypothesis, with Jonathan Frankle
In this episode, Jonathan Frankle describes the lottery ticket hypothesis, a popular explanation of how over-parameterization helps in training neural networks. We discuss pruning methods used to uncover subnetworks (winning tickets) which were initialized in a particularly effective way. We also discuss patterns observed in pruned networks, stability of networks pruned at different time steps and transferring uncovered subnetworks across tasks, among other topics.
A recent paper on the topic by Frankle and Carbin, ICLR 2019: https://arxiv.org/abs/1803.03635
Jonathan Frankle’s homepage: http://www.jfrankle.com/

Jan 8, 2020 • 31min
100 - NLP Startups, with Oren Etzioni
For our 100th episode, we invite AI2 CEO Oren Etzioni to talk to us about NLP startups. Oren has founded several successful startups, is himself an investor in startups, and helps with AI2's startup incubator.
Some of our discussion topics include: What's the similarity between being a researcher and an entrepreneur? How do you transition from being a researcher to doing a startup? How do you evaluate early-stage startups? What advice would you give to a researcher who's thinking about a startup? What are some typical mistakes that you've seen startups make? Along the way, Oren predicts a that we'll see a whole generation of startup companies based on the technology underlying ELMo, BERT, etc.

Dec 16, 2019 • 45min
99 - Evaluating Protein Transfer Learning, With Roshan Rao And Neil Thomas
For this episode, we chatted with Neil Thomas and Roshan Rao about modeling protein sequences and evaluating transfer learning methods for a set of five protein modeling tasks. Learning representations using self-supervised pretaining objectives has shown promising results in transferring to downstream tasks in protein sequence modeling, just like it has in NLP. We started off by discussing the similarities and differences between language and protein sequence data, and how the contextual embedding techniques are applicable also to protein sequences. Neil and Roshan then described a set of five benchmark tasks to assess the quality of protein embeddings (TAPE), particularly in terms of how well they capture the structural, functional, and evolutionary aspects of proteins. The results from the experiments they ran with various model architectures indicated that there was not a single best performing model across all tasks, and that there is a lot of room for future work in protein sequence modeling.
Neil Thomas and Roshan Rao are PhD students at UC Berkeley.
Paper: https://www.biorxiv.org/content/10.1101/676825v1
Blog post: https://bair.berkeley.edu/blog/2019/11/04/proteins/

Dec 9, 2019 • 37min
98 - Analyzing Information Flow In Transformers, With Elena Voita
What function do the different attention heads serve in multi-headed attention models? In this episode, Lena describes how to use attribution methods to assess the importance and contribution of different heads in several tasks, and describes a gating mechanism to prune the number of effective heads used when combined with an auxiliary loss. Then, we discuss Lena’s work on studying the evolution of representations of individual tokens in transformers model.
Lena’s homepage:
https://lena-voita.github.io/
Blog posts:
https://lena-voita.github.io/posts/acl19_heads.html
https://lena-voita.github.io/posts/emnlp19_evolution.html
Papers:
https://arxiv.org/abs/1905.09418
https://arxiv.org/abs/1909.01380

Nov 27, 2019 • 44min
97 - Automated Analysis Of Historical Printed Documents, With Taylor Berg-Kirkpatrick
In this episode, we talk to Taylor Berg-Kirkpatrick about optical character recognition (OCR) on historical documents. Taylor starts off by describing some practical issues related to old scanning processes of documents that make performing OCR on them a difficult problem. Then he explains how one can build latent variable models for this data using unsupervised methods, the relative importance of various modeling choices, and summarizes how well the models do. We then take a higher level view of historical OCR as a Machine Learning problem, and discuss how it is different from other ML problems in terms of the tradeoff between learning from data and imposing constraints based on prior knowledge of the underlying process. Finally, Taylor talks about the applications of this research, and how these predictions can be of interest to historians studying the original texts.

Nov 12, 2019 • 30min
96 - Question Answering as an Annotation Format, with Luke Zettlemoyer
In this episode, we chat with Luke Zettlemoyer about Question Answering as a format for crowdsourcing annotations of various semantic phenomena in text. We start by talking about QA-SRL and QAMR, two datasets that use QA pairs to annotate predicate-argument relations at the sentence level. Luke describes how this annotation scheme makes it possible to obtain annotations from non-experts, and discusses the tradeoffs involved in choosing this scheme. Then we talk about the challenges involved in using QA-based annotations for more complex phenomena like coreference. Finally, we briefly discuss the value of crowd-labeled datasets given the recent developments in pretraining large language models. Luke is an associate professor at the University of Washington and a Research Scientist at Facebook AI Research.

Oct 7, 2019 • 35min
95 - Common sense reasoning, with Yejin Choi
In this episode, we invite Yejin Choi to talk about common sense knowledge and reasoning, a growing area in NLP. We start by discussing a working definition of “common sense” and the practical utility of studying it. We then talk about some of the datasets and resources focused on studying different aspects of common sense (e.g., ReCoRD, CommonsenseQA, ATOMIC) and contrast implicit vs. explicit modeling of common sense, and what it means for downstream applications. To conclude, Yejin shares her thoughts on some of the open problems in this area and where it is headed in the future.
Yejin Choi’s homepage: https://homes.cs.washington.edu/~yejin/
ATOMIC: https://homes.cs.washington.edu/~msap/atomic/
ReCoRD: https://sheng-z.github.io/ReCoRD-explorer/
CommonsenseQA: https://www.tau-nlp.org/commonsenseqa