Iowa State computer scientists to construct genealogical time line for all biological species.

November 3, 2003

One of the most profound ideas to emerge in modern science is Charles Darwin's concept that all of life, from the smallest microorganism to the largest vertebrate, is connected through genetic relatedness in a vast genealogy, the "Tree of Life." 

Yet many "branches" of this tree are still poorly known. Up to now, only small parts have been analyzed, typically by single investigators or small teams focusing on phylogenetic groups of modest size. 

David Fernandez-Baca, professor of computer science, and Oliver Eulenstein, assistant professor of computer science are part of a ambitious effort to construct a much larger phylogenetic framework for all 1.7 million species. 

They have been awarded a five-year $975,000 National Science Foundation (NSF) grant for the NSF's "Tree of Life" collaborative research effort, to develop novel methods and software tools that will help to construct this framework. 

"The focus of these efforts will be to extract information from the vast molecular sequence databases to build collections of smaller trees that can be reliably assembled into a larger, more comprehensive picture of the tree of life," Eulenstein said. 

The project is ambitious because of the sheer size of the information currently available. For instance, GenBank contains tens of millions of sequences sampled from over 100,000 species. While extensive research has focused on the problem of building a tree from a single data set, relatively little is known about extracting these data sets en masse from sequence databases and then assembling a synthesis. 

"This project is an outgrowth of previous research we've done together," Fernandez-Baca said."We're trying to identify data to use to build a reliable evolutionary tree. 

"Some of this information already exists on-line," he continued. "There have been partial efforts for plants and various other subsets, but assembling it into one large 'tree of life' in some sort of meaningful way is the ultimate goal." 

Eulenstein and Fernandez-Baca plan to study a set of novel computational problems that include the assessment of the potential information in different sequence databases, optimal extraction of data from databases to accurately construct small trees, and the building of "super trees" that display several small trees at once. 

The pair has assembled an interdisciplinary team that includes evolutionary biologists at the University of California-Davis and the University of Pennsylvania. While Iowa State's portion of the NSF grant is almost $1 million, the total NSF funding for the three institutions exceeds $2.4 million. 

Collaborations have also been established with three other existing NSF Tree of Life projects.