Duquesne University

Department of Mathematics and Computer Science

Technical Report #98-02, October 1998

Calculating the likelihood of observed DNA sequence data at the leaves of a tree is the computational bottleneck for phylogenetic analysis by Bayesian methods or by the method of maximum likelihood. Because analysis of even moderately sized data sets can require hours of computational time on fast desktop computers, algorithmic changes that substantially increase the speed of the basic likelihood calculation are significant. It has long been recognized that the contribution to the likelihood at sites with identical patterns is the same and need only be computed once for each unique pattern. We note that sites whose patterns are not identical on the entire tree may be identical on subtrees, and hence partial likelihood calculations made for one site may be stored and used for calculations at another. The bookkeeping and memory requirements are large, but not too excessive for current desktop computers. Timed calculations on many genuine data sets indicate that the computational algorithm we present in this paper for likelihood calculations on trees result in decreases in running time by a factor ranging from 1.1 to 5.2 for data sets considered. There is reason to believe similar increases in speed would be more generally realized on alternative trees and data sets.

Revised .