Enhancing Document-Level Information Extraction via Span-Based and Sequence-Aware Approaches
Document-level information extraction is crucial for natural language processing tasks, requiring models to capture long-range dependencies and ensure consistency across text spans. This preliminary exam presents two complementary approaches: (1) ScdNER, a span-based consistency-aware document-level named entity recognition (NER) model that enhances global feature fusion while mitigating noise from token-level inconsistencies, and (2) SagDRE, a sequence-aware graph-based document-level relation extraction (RE) model that integrates sentence-level directional edges and token-level sequential paths to improve entity relation reasoning. ScdNER employs a two-stage process, leveraging a span-based key-value memory for adaptive global feature integration, while SagDRE introduces an adaptive margin loss to address multi-label imbalance in RE tasks. Experimental results across scientific, biomedical, and general-domain datasets demonstrate the effectiveness of these models in improving accuracy and consistency for document-level NER and RE.
Committee: Qi Li (major professor), Mengdi Huai, Wei Le, Ali Jannesari, and Forrest Bao