AToL: Collaborative Research: A Phylogenomic Toolbox for Assembling the Tree of Life Molecular Sequence Databases



Iowa State University has been awarded a grant to develop new methods and software tools to help construct the genealogical "tree of life" of all biological species. The focus of these efforts will be on extracting information from the vast molecular sequence databases to build collections of smaller trees that can be assembled into larger, more comprehensive pictures of the tree of life. The scale of the data input is large: GenBank, for example, contains tens of millions of sequences sampled from over 100,000 species. Whereas extensive research has focused on the problem of building a tree from a single data set; relatively little is known about extracting these data sets en masse from sequence databases and then assembling a synthesis. The proposal is to study a set of novel computational problems that are as challenging as the basic tree building problem itself. These occur in three broad areas: (1) assessment of the potential information in sequence databases of various kinds; (2) optimal extraction of data from databases to bring the best information to bear on individual tree reconstruction; and (3) integration of these smaller trees into "supertrees" (larger trees assembled from smaller ones that share species in common), especially by identifying target sets of new sequences needed to construct optimal supertrees. Theoretical results will be evaluated by analysis of three diverse databases that pose a range of computational challenges (a subset of GenBank, SWISS-PROT, and the TIGR EGO database). This work will characterize the phylogenetic information content of these sequence sets, identify maximal sets of combinable sequence information, construct nonredundant partitions of the database to permit estimation of collections of trees, and assemble supertrees from these collections. The interdisciplinary team includes phylogenetic biologists and computer scientists with experience in phylogenetic theory, data analysis, and algorithm development and implementation. The project is also collaborating with three existing Tree-of-Life projects (each aimed at reconstructing particular portions of the tree) to provide tests of the sequence targeting algorithms.

2003-11-01 to 2009-09-30
Award Amount: 
Award Number: