PhD Final Oral Exam: Azeez Idris

PhD Final Oral Exam: Azeez Idris

Sep 9, 2025 - 12:00 PM
to , -

Unified Confusion-Derived Learning Framework for Image Classification

The performance of supervised deep learning image classifiers has advanced considerably due to the availability of large-scale labeled datasets and increased computational resources. However, acquiring large, labeled image datasets in specialized domains like medical imaging remains both costly and logistically challenging. This dissertation addresses the fundamental challenge of enhancing model performance under limited labeled data conditions by leveraging confusion across various phases of developing deep learning models - before, during, and after training. Confusion is defined as the condition in which an image of one class is incorrectly predicted as belonging to another class.

The first major contribution of this dissertation is the Synthesized Image Training Technique (SIT2), a novel confusion-based training framework that systematically harnesses inter-class confusion to improve model robustness. Specifically, SIT2 identifies pairs of classes that exhibit high confusion and synthesizes “not-sure" images from these pairs, thereby incorporating confusion directly during the training process. A not-sure image is defined as a synthetically generated image that incorporates features from two distinct classes, created by blending or combining samples from highly confused class pairs in approximately equal proportions. The purpose of generating these images is to embed controlled ambiguity, ensuring that the model does not prematurely converge on a single class assignment. A prediction made with excessive confidence toward one class results in the exclusion of features indicative of alternative classes that may coexist within the image. Consequently, it is crucial that the model be encouraged to attend to and preserve features from all relevant classes to maintain a more comprehensive representation. We develop three new training strategies utilizing these synthesized images: (1) the not-sure training strategy that pretrains a model using not-sure images and the original training images, (2) the sure-or-not strategy that pretrains with synthesized sure or not-sure images, and (3) the multi-label strategy that pretrains with synthesized images but predicts the original class(es) of the synthesized images. Extensive evaluation on five medical and non-medical datasets demonstrates statistically significant performance gains, with improvements of up to 7.8% accuracy on certain datasets.

The second contribution of this dissertation is ActiveConfusion, an efficient cold-start active learning framework that leverages pretext task confusion to identify the most informative samples for labeling, before model training. By exploiting confusion patterns derived from self-supervised pretext tasks, ActiveConfusion addresses the cold-start problem in active learning, a scenario in which no labeled data are initially available. Experimental results show that ActiveConfusion matches or surpasses state-of-the-art cold-start methods while reducing pretext task training time by up to 13X. Across five public datasets covering both medical and non-medical domains, the method achieves accuracy improvements of up to 11.8% on balanced data and up to 11.4% on imbalanced medical data.

The third contribution addresses the problem of limited access to specialized Large Multimodal Models (LMMs), conceptualized here as an accessibility gap. This gap is defined as the lack of access to specialized models arising from factors such as high computational costs, restrictive licensing policies, or the limited availability of domain-specific resources. To mitigate this challenge, this work proposes BIRD - Binary Inference & Resolution for Decisions, a framework for reducing the complexity of tasks assigned to general-purpose LMMs after model training. Specifically, BIRD reformulates multi-class classification problems into a series of n-versus-rest binary subproblems, so that the model only needs to distinguish between two outcomes at a time. This decomposition is expected to reduce confusion during initial predictions, since the model is not required to simultaneously discriminate among many competing classes. 

Overall, this dissertation advances data-efficient deep learning by introducing novel strategies that improve model performance under limited labeled data conditions at multiple stages of deep learning model development. The proposed contributions (SIT2, ActiveConfusion, and BIRD) demonstrate effectiveness across diverse domains, including medical imaging, general computer vision, and multimodal learning. More broadly, the confusion-based paradigm introduced in this work establishes a new perspective for designing methods that reduce errors, enhance generalization, and expand the accessibility of deep learning in resource-constrained settings at various phases of model development.

Committee: Ying Cai (major professor), Wallapak Tavanapong, Soumik Sarkar, Wensheng Zhang and Sigurdur Olafsson