Bonsai
The guiding principle of Bonsai is to assist and educate the user in sequence alignment and evolutionary tree building methods. It may not produce better alignments or trees than other available programs (although the goal is to do at least as well for the methods used). Instead Bonsai focuses on providing the user with credible information about the basis for the computations and their interpretation in a mathematical and biological framework. It is my belief that this understanding is critically important for the biologist trying to make use of these methods to address ultimately biological problems.
Bonsai necessarily makes compromises in its output and documentation. A full mathematical understanding of the principles of sequence alignment and tree building is well beyond the interest of most biologists. Little attempt is made to provide such an understanding, though a list of useful resources is provided below for those who are interested. What Bonsai does provide is a simple conceptual understanding of these principles and access to much of the quantitative information that lies behind these concepts for a specific alignment or phylogenetic tree. The hope is that users will be forced to confront the simpler aspects of this quantitative information, and ideally will be able to delve as deeply as they want beyond this simplest level. Such access to alignment and tree-building information is difficult or impossible to come by with other programs that I am aware of. For example, the popular multiple alignment program Clustal with default parameters will produce a multiple alignment nearly identical to the one produced by Bonsai with default parameters, but little of the underlying alignment information and scoring is accessible or explained by Clustal. Phylogenetic trees in Clustal are misleading (they are not intended for use except in guiding alignments), so the trees in Bonsai are much preferable. Because of slight improvements in some methods, the Bonsai alignment may be slightly more accurate as well.
A set of sample sequences is provided together with a tutorial for working with these sequences and understanding the program output. Go to tutorial.
- extensive documentation, from window-specific details and tutorial to interpretation guides and algorithm descriptions.
- easy sequence loading, pasting, and direct editing.
- order of sequences in sequence lists and alignments directly editable and sortable.
- sequence, sequence list, and alignment text comments of any length.
- alignment outputs with color-coded alignment quality keys (plus detailed score information easily accessible).
- pair alignment option of dot matrix display with settable display parameters.
- multiple alignment gap positions directly editable.
- properly computed phylogenetic trees with distance correction.
- intuitive and flexible phylogenetic tree arrangement and labeling.
- dynamic distance correction functions for phylogenetic trees.
- up-to-date protein score matrices, including a transmembrane matrix (PHAT).
- alignment speeds benchmark the same as ClustalW and ClustalX.
- sophisticated gap position heuristics for multiple alignments, together with advice about what they are and how to use them.
- direct access to position-specific score matrix (PSSM) data for a multiple alignment, with direct editing of specific match scores and column weighting.
- a simulation package that permits you to investigate the alignment and tree-building results from a known pattern of evolution.
James H. Thomas, Department of Genome Sciences, University of Washington