Ph.D. Final Oral Exam: Shibbir Ahmed

Shibbir Ahmed
Thursday, June 20, 2024 - 10:00am
Event Type: 

Design by Contract for Deep Learning APIs and Models

Deep Learning (DL) is increasingly used in critical software, but these systems are often black boxes. When a DL model predicts an output for a given input, it is unclear whether that output can be trusted, especially if the input comes from an unknown region, one that the model may not have been trained on. Therefore, trained DL models pose challenges in ensuring trustworthy predictions during deployment. Furthermore, DL systems are prone to bugs, which can be prevented by specifying contracts. Due to the complexity of DL model architecture and expected outcomes, existing methods for specifying traditional software are insufficient for specifying DL software. This dissertation presents some innovative techniques for ensuring the reliability of DL model's output. In the first approach, DeepInfer, data precondition has been introduced. Data preconditions derived from a trained DL model's weights and biases determine the trustworthiness of the model's predictions during deployment by checking precondition violation of unknown input. We extensively evaluated DeepInfer on 29 real-world DNN models using four different datasets, demonstrating its utility, effectiveness, and performance improvements, significantly outperforming SelfChecker, the state-of-the-art in this area. Second, a feature debugging technique has been proposed that determines feature importance by inferring the trained model's assumption about the input features. Therefore, our proposed technique enhances the explainability of the DL model without retraining or modifying the model, allowing developers to debug models by removing less important features. In the evaluation, using real-world data from prior work, our proposed technique successfully identifies important features in both classification and regression models, demonstrating high recall and significantly faster performance than comparable state-of-the-art techniques. Third, to ensure the reliability of DL software, we have introduced contract layer to the deep learning library to intercept API calls, which enables checking the contracts to detect and fix bugs in deep learning programs that cause unreliable output and are not detected by DL libraries. Our approach identified 259 performance bugs out of 272 real-world buggy programs efficiently with less overhead compared to existing tools, and user surveys demonstrated its effectiveness for DL application debugging. Lastly, this dissertation provides a comprehensive investigation and evaluation of our proposed techniques, emphasizing their effectiveness, efficiency, and limitations, and also suggests some future directions in the exciting field of SE for trustworthy AI.

Committee: Hridesh Rajan (major professor), Wallapak Tavanapong, Pavan Aduri, Hongyang Gao, and Qi Li

Join on Zoom: