Using the Evolution Simulator
Quick Guide: the Simulator frame will use the first sequence in the Bonsai frame and permit you to duplicate and diverge that sequence more or less at will. You can then do alignments and build trees to test Bonsai with sequences of known evolutionary origin. It's also a great way to educate yourself about trees and alignment algorithms because you know what things should look like.
Initialize button: click to use the first sequence in the Bonsai main window as the seed for a new evolution. The Bonsai main window sequence list will be cleared and occupied by two copies of the original sequence. In addition, various parameters for the coming evolution are set. After initializing the sequences and prior to starting the simulation run, set whether you want to include conserved blocks in your evolution ("Settings : No Markov Blocks). Since sequences generally have this property, the default is to include conserved blocks. To specify details of the conservation pattern, use "Settings : Markov Blocks Settings".
Running the simulation: there are various alternatives to explore, but a typically useful simulation run goes like this. Set the number of mutations you want to permit in the Run to Density text field (100 is the default). Click the Run to Density button one or more times to cause all current sequences to accumulate mutations according to their evolutionary model. Click Duplicate all Sequences to take all the diverged sequences at the current moment in evolution and duplicate each precisely. Use Run to Density again on all of these sequences. Repeat as often as you like. Compute a phylogenetic tree or multiple alignment for all or a subset of the evolved sequences. At the end of an evolutionary run, you may want to click Add Original or Add Outgroup to add one copy of the sequence from which all the evolved sequences arose or to add a fully randomized variant of these sequences to serve as an outgroup in a phylogenetic tree. NOTE: the option of including the original sequence is never possible in real alignments.
Other comments: the sequence evolution is performed by running a series of "cycles" on each sequence, permitting mutations to arise stochastically at each cycle. The mutation rate per cycle is arbitrarily set for you, so it is best to use the Run to Density button to drive the evolution. If you wish, you may use the Run to Cycle button instead, which will run the indicated number of cycles.
In addition to duplicating all of the sequences, you can duplicate individual sequences using the Duplicate buttons next to each in the lower panel. You will see by doing this that you can produce more "realistic" trees but they are much harder to visually interpret in tests of tree building and sequence alignment quality because they are irregular.
The selection of conserved blocks during an evolutionary run is mediated by a Markov Chain, which amounts to a flexible algorithm for permitting stochastic changes in conservation state along the protein chain. These states are preserved throughout the evolutionary run and cannot be changed. The default (and currently only) Markov Chain permits three conservation states that permit relative mutation rates of 1.0, 0.2, and 0.05 respectively. The high mutating blocks predominate, and are interspersed with short stretches of intermediate or high conservation. If you want to have only a single state (all residues are equally conserved), turn Markov Blocks off using the menu before running the simulation.
Advanced Use: Create a relatively deep symmetric tree (average of 1 to 3 mutations per residue with 8 or 16 final sequences, with the same mutation number per tree tier). Create a multiple alignment to see what it looks like. On the Guide Tree panel (or for the tree made from the multiple alignment), activate the distance corrector (menu Show : Distance Corrector). Vary the parameters for the correction equation to try to produce a tree that most accurately represents deep divergence times. [NOTE: it does not seem to be possible to produce one "correct" distance correction function, because it depends somewhat on the distribution of conservation along the length of the proteins. The Bonsai default is to use corrected scores (which is probably superior to the more usual corrected percent identities) and a correction equation with alpha = 1.0, beta = 4.0, and gamma = 1.0. Use the Distance Corrector frame to get a feel for this.]
James H. Thomas, Department of Genome Sciences, University of Washington
5/18/2002