Integer Linear Programming Formulation for the Unified Duplication-loss-coalescence Model
The classical Duplication-Loss-Coalescence parsimony model (DLC-model) is a powerful tool when studying the complex evolutionary scenarios of simultaneous duplication-loss and deep coalescence events in evolutionary histories of gene families. However, inferring such scenarios is an intrinsically difficult problem and, therefore, prohibitive for larger gene families typically occurring in practice. The parsimony duplication-loss-coalescence model (DLCParsimony model) appeared as an effective way to study commonly appearing gene families, whose evolutionary histories had duplication-loss events appearing alongside the incomplete lineage sorting events. This model allows practitioners to accurately infer ortholog/paralog relationships among extant genes and to study the relationship between the gene family evolution (gene tree) and the host species evolution (species tree). Unfortunately, the existing enumeration algorithm, DLCPar, which finds the optimum reconciliation between a gene tree and a species tree under the DLCParsimony model, is very limited in practice due to its advanced complexity.
To overcome this stringent limitation, we first describe a non-trivial and flexible Integer Linear Programming (ILP) formulation for inferring DLC evolutionary scenarios. A comparative scalability study demonstrates that our ILP formulation commonly outperforms the standard enumeration algorithm DLCPar for inferring scenarios under the DLC-model.
To make the DLC-model more practical, we then introduce two sensibly constrained versions of the model and describe two respectively modified versions of our ILP formulation reflecting these constraints. Using a simulation study, we showcase that our constrained ILP formulation computes evolutionary scenarios that are substantially larger than the scenarios computable under our original ILP formulation and DLCPar. Further, scenarios computed under our constrained DLC-model are overall remarkably accurate when compared to corresponding scenarios under the original DLC-model.
Committee: Oliver Eulenstein (major professor), Qi Li, Lizhi Wang
Join on Zoom: Meeting ID 932 0462 0151