Site-based quartet-based estimation of species trees
Presenter
November 19, 2024
Abstract
We present a new approach for estimating species trees directly from site patterns. Following similar methods in the literature, our method is based on a statistically consistent score for each quartet, which is then summed over all quartets of the tree. However, unlike similar works, our new score is such that it can be measured for all quartets without enumerating the score for each quartet. This property enables the method, called CASTER, to employ dynamic programming algorithms similar to ASTRAL to infer the species tree in polynomial time. Moreover, the score is defined per site, with the final score being the sum across the genome. This property enables it to reveal interesting patterns of evolution across the genome, including measuring the level of ILS. Simulations show the method to be accurate and much more scalable than alternatives. Applications on real data show its power in finding unexpected patterns. This talk focuses on the theory of the method: how the scores are defined, what is known in theory about their consistency, and what is yet to be understood. We hope the open problems will spur further research on this approach.