A phylogenetic tree represents the genealogy of a set of species. One of the grand challenges in computational biology is to build the Tree of Life, the phylogenetic tree describing the evolutionary history of all extant species. A major obstacle to assembling this or indeed any large tree reliably is the sparseness of the coverage provided by the available genomic data. That is, the same genetic information may not be at our disposal for all species. For example, a gene may not have been sequenced for a species of interest or the gene may simply be absent from that species. One approach to address this problem is to find a collection of overlapping subsets of the species, each of which has the property that similar genetic data is available for every species in the subset. A phylogenetic tree is built for each subset and then the trees are combined into a single supertree for the entire set of species. Building supertrees is nontrivial, as it involves reconciling conflicts among the various input trees with regard to the placement of certain species.
The proposed research will address the fundamental algorithmic problems that arise in reconciling multiple conflicting phylogenetic trees into a single supertree that represents them all. The reconciliation problem will be treated as one of finding a supertree whose total dissimilarity with the input trees is minimum. The research will study the mathematical and computational properties of tree reconciliation under a variety of measures. Some of these will be based purely on tree structure; for example, on what groupings a tree implies on its species set. The PIs will also study what are sometimes referred to as gene tree parsimony problems. Here, the dissimilarity measures attempt to identify the possible occurrence of complex evolutionary events such as gene duplication and subsequent loss, and horizontal gene transfer.
The research aims to advance the theory of tree reconciliation and to use the results as the foundation for novel algorithmic methods. The work will build on significant recent theoretical developments in the field. Techniques will include graph theory, fast local search, axiomatic characterizations of tree reconciliation methods, impossibility proofs, compact integer linear programming formulations, and fixed parameter tractability. Specific questions to be addressed include developing methods to handle non-binary trees, extending distance measures to account for missing data, understanding which properties can or cannot be satisfied by a particular method or by any method, and resolving a number of open problems in gene tree parsimony.
The research will be informed by the needs of the evolutionary biology community, with which both PIs have an extensive and longstanding collaboration. The results will be widely disseminated within that community. It is anticipated that several of the algorithmic techniques developed in this project will be of direct utility in phylogenetic tree databases.