Ph.D. Research Proficiency Exam: Qing Wang

Event
Speaker: 
Qing Wang
Thursday, August 12, 2021 - 3:00pm
Event Type: 

CONJR: Conjunctive Sentence Splitter without Parsing

Conjunction is a common syntactic phenomenon in various Natural Language Processing (NLP) corpora. Based on our counting, 39.4% of the sentences in OntoNotes Release 5.0 contain at least one conjunctions. The frequently appeared conjunctive sentences bring many NLP tasks challenges. For example, in Named Entity Recognition (NER) tasks, conjunctions can cause discontinuous spans. In Open Information Extraction (Open IE) tasks, ineffective processing of conjunctive sentences will result in systems losing substantial yield. We observe and address the challenges of splitting conjunctive sentences around each group of conjuncts. Most existing methods rely on parsers to identify the conjuncts in a sentence and detect the coordination boundaries. However, state-of-the-art syntactic parsers are slow and suffer from errors, especially for long and complicated sentences. In order to better solve the problems, we formulate coordination boundary detection as a sequence tagging task and propose a specialized model called CONJR without using syntactic parsers. We introduce both semantic and syntactic features including BERT contextualized token embeddings, Part-of-Speech embeddings, similarity features, and suffix feature to capture the symmetry among the potential conjuncts and enhance the model performance. We show improvements on datasets from both general and biomedical domains.

Committee: Qi Li (major professor), Ying Cai, Hongyang Gao, Jia Liu, and Zhu Zhang.

Join on Zoom: https://iastate.zoom.us/j/97191530207?pwd=YjAvOEpzM2dNZ0t2cVREd0M5akx6Zz09 Or, go to https://iastate.zoom.us/join and enter meeting ID: 971 9153 0207 and password: 247414