Monday, February 20, 2017

Producing admixture graphs


I have written before about admixture graphs, which are phylogenetic networks that represent reticulations due to introgression:
To date, these graphs have not really been incorporated into the mainstream network literature. Part of the problem has been the rather disparate nature of the admixture literature itself. A paper has recently appeared as a preprint in Bioinformatics that provides a brief introduction to this situation:
  • Kalle Leppälä, Svend Vendelbo Nielsen, Thomas Mailund (2017) admixturegraph: an R package for admixture graph manipulation and fitting. Bioinformatics

There are currently several quite different programs for producing admixture graphs:
  • qpgraph (Castelo and Roberato 2006)
  • TreeMix (Pickrell and Pritchard 2012)
  • AdmixTools (Patterson et al. 2012)
  • MixMapper (Lipson et al. 2013)
  • admixturegraph (see above)
These programs summarize the genetic data in different ways based on genetic drift (eg. the covariance matrix versus so-called f statistics), and construct the graphs in different ways (eg. sequential heuristic building versus a user specified graph). There are also different ways to evaluate the graphs, including fitting the graph parameters using likelihood, and comparing them, including the bootstrap, jackknife, and MCMC.

None of this is ideal. Another problem has been that the graphs are often constructed by hand, and may be needed as input to the programs. However, the biggest limitation is that there are currently no algorithms for inferring the optimal graph topology. This is, of course, the basic problem that needs to be solved for all network construction. To quote the authors with regard to their own R package:
The set of all possible graphs, even when limited to one or two admixture events, grows super-exponentially in the number of leaves, and it is generally not computationally feasible to explore this set exhaustively. Still, we give graph libraries for searching through all possible topologies with not too many leaves and admixture events.
For larger graphs we provide functions for exploring all possible graphs that can be reached from a given graph by adding one extra admixture event or by adding one additional leaf. However, the best fitting admixture graphs are not necessarily extensions of best fitting smaller graphs, so we recommend that users not only expand the best smaller graph but a selected few best of them.
The world of graph-edge rearrangements (NNI, SPR) does not yet seem to have encountered the world of admixture graphs.