Information Extraction with Weak Supervision
In this research, we introduce novel weak supervision techniques to address key challenges in information extraction, specifically in Named Entity Recognition (NER), Relation Extraction (RE), and Entity Linking (EL). Traditional methods in these areas rely heavily on manual annotations, which are expensive and time-consuming. To overcome this, we propose three innovative approaches that significantly reduce the need for human labeling while maintaining strong performance.
First, we present Confidence-Based Multi-Class Positive and Unlabeled (Conf-MPU) learning, which improves distantly supervised NER by incorporating confidence scores to handle incomplete labeling. Second, we introduce DSRE-NLI, a method that leverages indirect supervision through a Natural Language Inference engine to enhance Relation Extraction, improving accuracy with minimal human input. Finally, we propose GenDecider, a re-ranking approach for Zero-Shot Entity Linking, which incorporates "None of the Candidates" judgments to increase accuracy and reliability, particularly when the correct entity is absent from retrieved candidates.h
These advancements reduce reliance on manual labeling, making information extraction systems more scalable and robust, with broad applicability across various domains.
Committee: Qi Li (major professor), Hongyang Gao, Mengdi Huai, Ying Cai and Kevin Liu.