BAMBE

Bayesian Analysis in Molecular Biology and Evolution

Version 2.03 beta, January 2001

© Copyright 2000, 2001, Donald Simon & Bret Larget, Department of Mathematics and Computer Science, Duquesne University.


MCMC algorithms

There are two main tree proposal algorithms in BAMBE, each with clock and non-clock versions. There are different algorithms for updating model parameters. Complete descriptions are in the manuscript

Larget, B. and D. Simon (1999). Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Molecular Biology and Evolution 16:750-759.

GLOBAL with a molecular clock

Both versions of GLOBAL manipulate the tree through direct manipulation of the tree representation described in Larget and Simon (1999). For any given collection of left/right orientations for subtrees at each internal node, a graph of distance from the root in a depth first traversal of the tree results in a graph with peaks and valleys. With a molecular clock the peaks are all the same height above the root level. The peaks correspond to leaves (taxa) and the valleys to internal nodes. The internal nodes are some distance below the level of the peaks and are ordered by the traversal ordering. The tree is parameterized by a permutation of the taxa as read from left to right in the representation and the valley depths from left to right. A tree with s leaves has 2s-1 different possible representations.

GLOBAL with a molecular clock does these steps:

  1. Choose a tree representation at random by choosing a left/right orientation for the subtrees at each internal node with equal probability.
  2. Perturb each valley depth independently by a small uniform amount, reflecting excess back into range if a depth would rise above the leaf level.
  3. Evaluate the resultant tree and accept or reject by Metropolis-Hastings.
Every branch length is modified by this proposal, and the tree topology may change.

GLOBAL without a molecular clock

When there is no molecular clock, the peaks may all be different distances from the root in the representation. Each valley has two depths, the depths to its left and right peaks. The tree is now parameterized by a permutation of the taxa and an array of 2(s-1) valley depths. Because the likelihood models do not not distinguish between different rootings of the same rooted tree, we occasionally change the rooting of our rooted tree. GLOBAL without a molecular clock is described as:
  1. Choose to reroot with probability p or update tree otherwise.
  2. If reroot tree,
    1. Choose a point in the unrooted tree for the new root uniformly at random.
  3. Otherwise
    1. Perturb each valley depth (two for each valley) independently by a small uniform amount, reflecting excess back into range if a depth becomes negative.
    2. Evaluate the resultant tree and accept or reject by Metropolis-Hastings.
Every branch length is modified by this proposal, and the tree topology may change.

LOCAL with a molecular clock

LOCAL with a molecular clock acts on a rooted tree in only a small neighborhood.

  1. Pick uniformly at random an internal branch of the tree.
  2. If this branch does not adjoin the root:
    1. Define the local neighborhood to be the two nodes of the selected branch, u and v, and nodes adjacent to these. The children of u are a and b, the children of v are u and c, and v's parent is w.
    2. The heights of a, b, and c above w are sorted and labeled h1 < h2 < h3.
    3. Choose two positions above w independently, one uniformly at random between 0 and h1 and the other uniformly at random between 0 and h2. If the option use-beta is true, these two positions are selected according to a scaled Beta distribution whose mean is the current position.
    4. The position farther from w is the proposed position of u. The position closer to w is the proposed position of v.
    5. If the proposed position of u is a distance less than h1 from w:
      1. Randomly choose one of the three children nodes to connect to v and connect the others to u.
    6. Otherwise the tree topology is forced and the lowest child is connected to v.
  3. Else, if the randomly chosen branch does adjoin the root:
    1. Define the local neighborhood to be the two nodes of the selected branch, u and v where v is the root of the tree and nodes adjacent to these. The children of u are a and b, and the children of v are u and c.
    2. The heights of a, b, and c above v are sorted and labeled h1 < h2 < h3.
    3. The proposed distance between the root v and the nearest child among a, b, and c is a random multiple of h1.
    4. The proposed location of u is a height above the proposed location of v chosen randomly between 0 and h2. If use-beta is true, the location is selected according to a scaled Beta distribution with mean at the current location.
    5. If the proposed position of u is closer to the proposed location of v
      1. Randomly choose one of the three children nodes to connect to v and connect the others to u.
    6. Otherwise the tree topology is forced and the lowest child is connected to v.
  4. Accept or reject the tree by Metropolis-Hastings.

LOCAL without a molecular clock

LOCAL without a molecular clock acts on the unrooted tree.
  1. Pick uniformly at random an edge of the unrooted tree.
  2. Define the local neighborhood to be the two nodes of the selected branch, u and v, and nodes adjacent to these. The neighbors of u are randomly labeled a and b with equal probability and independently the neighbors of v are randomly labeled c and d with equal probability.
  3. The length of the path from a to c is multiplied by a random factor.
  4. One node from u and v is randomly chosen with equal probability and moved with its corresponding subtree to a location on the path from a to c chosen uniformly at random.
  5. Accept or reject the tree by Metropolis-Hastings.

Parameter updating

In all models, the parameters theta, pi_A, pi_G, pi_C, and pi_T may be updated within each category. In HKY85 or F84, there is an additional parameter kappa within each category. In TN93, there are two additional parameters, ttp and gamma, within each category. We restrict the kappa, ttp, and gamma values to be positive, the weighted average of the theta values (weighted by the number of sites in the category) to be 1, and the sum of the pi values within each category to be 1.

Unconstrained parameters kappa, ttp, and gamma are updated by adding a random uniform perturbation, reflecting the excess back into range should the proposed value become negative.

Parameters whose sum or weighted sum is fixed are proposed from a Dirichlet distribution with expected values at the current values.


Back to the table of contents.


This page was most recently updated on January 19, 2001.

bambe@mathcs.duq.edu