Ph.D. Final Oral Exam: Lei Liu

Lei Liu
Thursday, November 11, 2021 - 3:00pm
Event Type: 

Supertree Construction with Display Graphs and Dynamic Graph Connectivity

Supertree construction problems synthesize source phylogenetic trees with overlapping taxon sets into one single supertree that contains all information from the source trees. Naturally, there exist conflicts or data shortages among the source trees. These obstacles make it difficult for a method to construct a high quality supertree efficiently. Two supertree construction problems are studied first, namely compatibility testing problem and agreement testing problem. The compatibility testing algorithm takes as an input a profile P = {T1 , T2 , . . . , Tk }, and it targets at finding a tree T which takes the union of the taxon sets of the source trees, and each source tree Ti can be obtained from restriction of T to taxon set of Ti through proper edge contractions. We conducted an empirical experimental studies with the state-of-the-art compatibility testing algorithm and demonstrated that the algorithm was quite efficient with the support of auxiliary display graphs and dynamic graph connectivity data structure. The agreement testing algorithm adds more requirements to compatibility testing problem. Given a profile P = {T1 , T2 , . . . , Tk }, the agreement testing problem asks whether an agreement tree T exists, such that the taxon sets of the agreement tree T is the union of taxon sets of the source trees such that the restriction of T to the taxon set of Ti is isomorphic to Ti . We propose the first agreement testing algorithm that solved the agreement testing problem of profiles P containing internal labels. Our algorithm runs in O(nk( i∈[k] di + log2 (nk))) time, where n is the total number of distinct taxa in P, k is the number of trees in P, and di is the maximum number of children of a node in Ti . In practice, profiles that are compatible or agree are rare, and neither compatibility testing algorithms nor agreement testing algorithms can return supertrees. Based on our previous studies, we propose a new divide-and-conquer algorithm that can build up supertrees in O(kn3 log2 (nk)) time, where n is the total number of taxa and k is the total number of source trees. In order to improve the quality of the constructed supertrees, we also proposed several heuristics to achieve the goals. Our empirical experimental studies show that our method is competitive with other methods on a wide range of datasets, and indeed outperforms other
methods on most of the real biological datasets we tested.

Committee: David Fernandez-Baca (major professor), Oliver Eulenstein, Xiaoqiu Huang, Pavan Aduri and Ryan Martin

Join on Zoom:

meeting ID: 977 1138 7982 and password: 911016