MS Final Oral Exam: Adele Haghighat Hoseini

MS Final Oral Exam: Adele Haghighat Hoseini

Apr 17, 2025 - 4:00 PM
to , -

Sub-Mitochondrial Protein Localization Prediction Using Protein Language Models

Accurately determining the sub-cellular localization of proteins is fundamental to understanding their biological functions and plays a vital role in drug discovery, systems biology, and proteomics research. Experimental identification of protein localization, while reliable, is often costly, labor intensive, and impractical at large scale. As a result, computational approaches have been developed to automate this task. However, many existing methods still face challenges in achieving high accuracy, particularly in predicting localization within sub-cellular compartments such as those in mitochondria. 

In this research, a computational framework is proposed for predicting sub-mitochondrial protein localization by integrating sequence-based embeddings with structural information. Protein sequences are first encoded using advanced protein language models (PLMs)—including ESM2, Ankh, and SeqVec—to generate informative, high-dimensional representations. In parallel, structural data are obtained from AlphaFold2 to incorporate spatial features of each protein. 

Based on this information, each protein is modeled as a graph, where nodes represent amino acids enriched with PLM-derived features, and edges are defined by Cα–Cα distances of less than 20 ˚A. The resulting graph representations are used to train both classical machine learning models (such as Logistic Regression, Random Forest, SVM, Gradient Boosting, and XGBoost) and deep learning architectures (including Feedforward Neural Networks, Convolutional Neural Networks, and Bidirectional LSTMs). Experimental results demonstrate that the proposed approach achieves high predictive performance across multiple evaluation metrics, including accuracy, F1-score, and Matthews correlation coefficient (MCC), highlighting the effectiveness of combining language model-derived embeddings with structural information for sub-mitochondrial protein localization.

Committee: Dr. Xiaoqiu Huang (major professor), Dr. Hongyang Gao, and Dr. Wei Le