TAUR Lab - EMNLP 2020

EMNLP 2020 Main Conference

Calibration of Pre-trained Transformers
Shrey Desai and Greg Durrett. Zoom Session 2A. Talk page

When a pre-trained Transformer model like RoBERTa makes a prediction, how calibrated is it? That is, if the model gives a 70% confidence in a prediction, is it right about that prediction 70% of the time? Generally, yes, but you can improve this even further both in-domain and out-of-domain!
Sketch-Driven Regular Expression Generation from Natural Language and Examples (TACL)
Xi Ye, Qiaochu Chen, Xinyu Wang, Isil Dillig, and Greg Durrett. Zoom Session 2D. Talk page

We look at complex real-world regex synthesis problems. We show a system that first generates a sketch, then uses positive and negative examples of strings the regex should/shouldn't match to complete the sketch. Follows on ideas from our earlier work in PLDI 2020 and a dataset we presented in ACL 2020.
Understanding Neural Abstractive Summarization Models via Uncertainty
Jiacheng Xu, Shrey Desai and Greg Durrett. Gather Session 4J. Talk page

We can analyze what's going on in text generation models by looking at their uncertainty, or the entropy of the decisions they make. We show what high-entropy and low-entropy decisions look like in summarization, and connect these to phenomena like copying from the input and performing content selection.
Compressive Summarization with Plausibility and Salience Modeling
Shrey Desai, Jiacheng Xu and Greg Durrett. Gather Session 4J. Talk page

We revisit the idea of compressive summarization: extracting sentences and explicitly compressing them to build a summary. We show how to build a simple but effective system from BERT-based components, specifically modeling two aspects of the compression process: plausibility (is this chunk of text okay to delete with respect to grammaticality, acceptability, and factuality?) and salience (what chunks should we delete to make the best summary?).
Inquisitive Question Generation for High Level Text Comprehension
Wei-Jen Ko, Te-Yuan Chen, Yiyan Huang, Greg Durrett and Junyi Jessy Li. Gather Session 4H. Talk page

What questions come to a reader's mind as they're reading through an article? We present a new dataset of such questions, which ask for more information about spans of text in the article that may need background information or clarification to be fully understandable.

Findings

Evaluating Factuality in Generation with Dependency-level Entailment
Tanya Goyal and Greg Durrett. To be presented at Blackbox NLP

Many factuality errors in generated paraphrases or summaries aren't caused by the model hallucinating totally new content, but instead mixing up things like predicate-argument structures with existing entities and events. We present a method that assesses factuality of individual dependency arcs to identify and localize these sorts of errors.
Interpretable Entity Representations through Large-Scale Typing
Yasumasa Onoe and Greg Durrett. To be presented at Blackbox NLP

We explore ways of representing entities in text with vectors of fine-grained entity types (see our AAAI 2020 work). These vectors are interpretable and can be useful for tasks like coreference resolution and named entity disambiguation even without fine-tuning on the specific task/dataset.
Byte Pair Encoding is Suboptimal for Language Model Pretraining
Kaj Bostrom and Greg Durrett.

BPE is commonly used but produces segmentations that don't look morphologically plausible. Unigram LM is a much better technique, and actually works better across two languages (English and Japanese) in pre-trained LMs.