Ph.D. Final Oral Exam: Giang Nguyen
Speaker:Giang Nguyen
Optimizing and Reasoning About Fairness in Machine Learning Pipeline
Machine learning (ML) is increasingly being used in critical decision-making software, but incidents have raised questions about the fairness of ML predictions. To address this issue, new tools and methods are needed to mitigate bias and test fairness in ML-based software. However, these methods lack generalizability as they only perform effectively in specific scenarios. Additionally, they fall short of identifying and explaining the underlying causes of bias. This dissertation introduces novel techniques for optimizing and reasoning about fairness within machine learning pipelines, addressing the limitations of existing bias mitigation and testing approaches. In the first approach, We introduce Fair-AutoML, a novel technique that utilizes AutoML to fix fairness bugs in machine learning models. Unlike existing bias mitigation techniques, Fair-AutoML addresses their limitations by enabling efficient and fairness-aware Bayesian search to repair unfair models, making it effective for a wide range of datasets, models, and fairness metrics. To demonstrate effectiveness, we evaluated our approach on four fairness problems and 16 ML models, showing significant improvement over baseline and existing bias mitigation techniques. Fair-AutoML repaired 60 out of 64 buggy cases, compared to 44 out of 64 by existing methods. Second, we propose a preventive measure called a ”Fairness Contract,” which incorporates design-by-contract (DbC) principles for algorithmic fairness into the ML pipeline. Fairness Contract is able to detect fairness violations immediately as they occur during the execution of the ML program. Consequently, the modularity of Fairness Contract allows us to pinpoint the exact location of fairness violations in ML software. In this work, we designed 24 contracts to target fairness bugs across the ML pipeline. These contracts were used to evaluate our method, Fairness Contract, on four fairness tasks, 45 buggy codes, and 24 correct codes. Fairness Contract localized fairness bugs at runtime, which existing techniques cannot, and outperformed them by identifying 40 out of 45 buggy cases, compared to 35 by existing methods. Third, we propose a novel approach that leverages concentration inequalities to detect and explain fairness bugs. This method calculates feature fairness scores to identify the impact of preprocessing steps, using concentration inequalities to detect fairness bugs in the ML pipeline. We also developed a programming abstraction to enforce modular fairness during pipeline development. Evaluated on four datasets, two fairness metrics, and 13 ML algorithms, Fairness Contract detected 42 out of 45 bugs, outperforming existing methods that detected 35 out of 45. Lastly, this dissertation presents an in-depth analysis and assessment of our proposed techniques, highlighting their effectiveness, efficiency, and limitations. It also offers suggestions for future research in the evolving field of Software Engineering for trustworthy AI.