M.S. Final Oral Exam: Kumara Sri Harsha Vajjhala

M.S. Final Oral Exam: Kumara Sri Harsha Vajjhala

Apr 11, 2022 - 2:00 PM
to , -

Speaker:Kumara Sri Harsha Vajjahala

Enhancing MetaOmGraph - The Open-Source workbench to Interactively Explore Omics Data, and Using Machine Learning Techniques to Study Novel Genes in Human Breast Cancer

Exploratory analysis is an important aspect of scientific study used for identifying interesting patterns, visualizing data, detecting anomalies, and hypothesis testing. MetaOmGraph is an open-source software package that provides a full capability of interactive exploratory analysis on high-dimensional datasets, especially focusing on omics data. In the first part of this thesis work, I look into the research idea of upgrading MetaOmGraph to a newer version that can handle large data sizes efficiently, has reproducibility of experiments embedded into the software, and has an enhanced user interface. In the new version of MetaOmGraph (1.9), I used an efficient JSON-based project structure loaded into memory using a stream-based parser and substantially improved project load time and performance on larger projects. I have also built a logging and playback framework in the software and applied it to all the major statistical analyses and visualizations. This feature allows for reproducible experiments and easy sharing of results across different platforms in the MetaOmGraph 1.9 version. Moreover, I developed an effective window management system by providing a taskbar, re-designed the metadata upload component, and improved other user interface components.

In the second part of my thesis, I look into the research question of classifying breast tissues in the human cancer RNA-seq dataset into tumor and non-tumor classes and identifying the biologically important annotated and novel genes. I utilize a random forest machine learning model to classify the tissues and compare the feature importances obtained from the model to the up-regulated and down-regulated genes obtained through differential expression analysis. My results suggest a batch-effect due to technical variation in the data generation, which results in an almost perfect classification. I also observe the genes that might be interesting biologically due to their strong differential expression level, greater than the confounding effect, and high feature importances.

Committee: Oliver Eulenstein (co-major professor), Eve Wurtele (co-major professor), and Pavan Aduri

Join on WebEx: https://iastate.webex.com/meet/harshavk