Ph.D. Final Oral Exam: Nasim Sabetpour

Ph.D. Final Oral Exam: Nasim Sabetpour

Apr 4, 2022 - 3:00 PM
to , -

Speaker:Nasim Sabetpour

Toward Complex Data Structure Aggregation and Truth Discovery

In real-world applications, we have various sources to describe an object. Having different sources for the same object brings information conflicts. One of the challenges is identifying the true information among conflicting sources of data. To tackle this challenge, truth discovery, which integrates multi-source noisy information by estimating the reliability of each source, has emerged as a hot topic. Multiple truth discovery methods have been proposed for various scenarios, and they have been successfully applied in diverse domains. Existing literature in annotation aggregation is more focused on binary and multi-choice problems and takes no notice of sequential labels. Also, truth discovery concentrates more on categorical and continuous data type, and complex data structures such as tree or text are not studied. We focus on the complex data structure aggregation in truth discovery such as sequence labels, constituency parse trees, and textual data in different domains. We first adapt truth discovery to aggregate sequential labels in crowdsourcing platforms in the absence of ground truth. An optimization-based sequential label aggregation (AggSLC) method is proposed that infers the best set of annotation aggregated using labels provided by workers. Next, truth discovery is adapted to aggregate constituency parse trees, a common data structure for parsing sentences in multiple Natural Language Processing applications. The goal is to aggregate constituency parse trees from different parsers by estimating the parsers' reliability in the absence of ground truth. An optimization-based method (CPTAM) is presented to aggregate the constituency parse trees in the absence of ground truth. In our last work, we focus on textual data aggregation (e.g. Wikipedia articles) to verify the credibility of a claim in fact checking. The goal is to aggregate textual corpus, to validate a given statement. We propose a Multi-Instance Learning (MIL) based model (FVMIL) which jointly preforms the evidence classification and claim verification sub-tasks.

Fact verification aims to identify the veracity of a given claim using reliable resources. In addition, fact verification also aims to retrieve evidence sentences from the reliable corpus to support the verdict. Specifically, the Fact Extraction and VERification (FEVER) task is developed and provides a publicly available dataset for verification against textual sources, where the reliable corpus is Wikipedia and claims are labeled by Supported, Refuted, or Not Enough Information (NEI). Many existing works consist of three sequential sub-tasks: document retrieval, evidence retrieval, and claim verification. These existing pipeline-based methods suffer from the error propagation problem as the decision made by one sub-task cannot be revised by another sub-task. The joint models are introduced targeting the error propagation problem raised in pipeline-based models. One of the major problems in the joint models is the high ratio of evidence to non-evidence leading to the class imbalanced problem. In this work, we present Fact Verification with Multi-Instance Learning (FVMIL) to tackle the challenges. We introduce FVMIL as a Multi-Task Learning (MTL) based framework that jointly combines the evidence classification and claim verification sub-tasks resulting in strong connections between modules. To overcome the challenge of the class imbalanced problem, we propose a Multi-Instance Learning (MIL) module that classifies bags of sentences instead of individual sentences, combining document retrieval and evidence sentence retrieval steps. Experiments on the FEVER dataset show that the proposed model FVMIL outperforms state-of-the-art baselines in terms of FEVER Score and Label Accuracy.

Committee members: Dr. Qi Li (major professor), Dr. Kris De Brabanter, Dr. Oliver Eulenstein, Dr. Zhu Zhang, Dr. Jia Liu

Join on Zoom: Click this URL to start or join. https://iastate.zoom.us/j/94494572886?pwd=WFFNQ1BjWFczelA1M2pyaDZGTklZQT09 Or, go to https://iastate.zoom.us/join and enter meeting ID: 944 9457 2886 and password: 983093