Ph.D. Final Oral Exam: Lei Qi

Event
Speaker: 
Lei Qi
Monday, November 1, 2021 - 9:00am
Location: 
1230 Communications Building
Event Type: 

Quantification Learning with Deep Neural Networks

Quantification learning has been explored in several disciplines, resulting in different terminologies such as prior probability shift, prevalence estimation, or class ratio estimation. In many subject domains, it is important to estimate and track prevalence over time, for instance, tracking of the prevalence of diseases, customer complaint issues, trending topics in customer demands, support for political candidates in blog posts, and popular policy areas of interest to federal and state governments. Quantification learning is a task that “Given a labeled training set, induce a quantifier that takes an unlabeled test set as input and returns its best estimate of the class distribution.” However, quantification learning has received relatively far less attention compared to classification learning in computer science. The main reason is because of the mistaken belief that quantification is easily solved using a straightforward approach by classifying individual documents and calculating the ratio of instances in each predicted class to the total instances. Although both quantification and classification are in the family of supervised learning, there is a significant difference between the two learning tasks. Quantification learning does not require training data and test data to be independent and identically distributed. Unlike a classifier, a quantifier makes a prediction for a group of instances, not a single instance.

The main contributions of the dissertation are summarized as follows.

1) We propose a new data augmentation method called Context-Aware-AUG to reduce quantification errors of classification-based quantification learning methods. A fair classifier gives a better estimate of class ratios. To obtain a fair classifier, Context-Aware-AUG generates additional training data for the classes with much fewer instances. The generated text documents are readable by humans, which is achieved by keeping the order of the words as in the original documents and replacing important words with their most similar words using knowledge from three external data sources.

2) We address an under-explored problem of quantification learning by proposing the first end-to-end Deep Quantification Neural Network (DQN) framework. DQN jointly learns effective feature representations and class distribution estimates. We formulate the quantification learning problem as a maximum likelihood problem. DQN can be seen as learning a mapping function of the input, a set of instances, and the output is the class ratios. We introduce two strategies to select a set of instances (termed a tuplet) for training to investigate how well the induced quantifier generalizes to test sets with class distributions different from that of the training set. We present the effectiveness of DQN by evaluating quantification tasks of text documents on four public datasets with 2, 4, and 20 classes. We performed a sensitivity analysis of DQN performance by varying tuplet sizes and training dataset sizes. Compared to that of classification-based quantification learning, DQN performance is less impacted by the size of the training dataset. In other words, DQN is a promising method when the manual labeling budget is limited.

3) We evaluate whether DQN is also effective for binary quantification and multi-class quantification for other types of data such as political text documents and image data, ranging from nature images, medical images, animal images, and agricultural images. DQN achieves the best results on almost all datasets. Compared to the best existing method in our study, DQN reduces the mean absolute error by 55% on two datasets of political documents and 38.34% on six image datasets. We found that transfer learning also helps improving performance of classification-based quantification methods.

Committee: Wallapak Tavanapong (major professor), Adisak Sukul, Jin Tian, Johnny Wong, and David Peterson

Join on Zoomhttps://iastate.zoom.us/my/tavanapo    Passcode: JOY