Iowa State University

Iowa State UniversityIowa State University

College of Liberal Arts and Sciences

Department of Computer Science

Ph.D. Preliminary Exam - Ankit Agrawal


Date: 14 Aug, 2008
Time: 11:00 AM
Location: 223 Atanasoff Hall
Topic: Sequence-Specific Sequence Comparison Using Pairwise Statistical Significance
Major Professor(s): Xiaoqiu Huang


Abstract:

Sequence comparison is one of the most fundamental computational problems in bioinformatics for which many approaches have been and are still being developed. In particular, pairwise sequence alignment forms the crux of both DNA and protein sequence comparison techniques, which in turn forms the basis of many other applications in bioinformatics. Pairwise sequence alignment methods align two sequences using a substitution matrix consisting of pairwise scores of aligning different residues with each other (like BLOSUM62), and give an alignment score for the given sequence-pair. The biologists routinely use such pairwise alignment programs to identify similar, or more specifically, related sequences (having common ancestor). It is widely accepted that the relatedness of two sequences is better judged by statistical significance of the alignment score rather than by the alignment score alone. This research seeks to investigate the problem of accurately estimating statistical significance of pairwise alignment for the purpose of identifying related sequences, by making the sequence comparison process more sequence-specific. Currently, most of the popular alignment programs report the statistical significance of an alignment in context of a database search, which is dependent on the size and composition of the database. Instead of using such database statistical significance, this work proposes to explore the use of pairwise statistical significance, which is specific to the pair of sequences being aligned and the alignment parameters. Preliminary results using pairwise statistical significance indicate good potential of the proposed sequence-specific approach for the application of homology detection (identifying related sequences). Thus, this research proposes to make the sequence comparison more and more specific to the sequence pair being aligned using sequence-specific strategies for alignment and statistical significance estimation.