Bonsai Tutorial

This tutorial guides you through some of the simpler aspects of using Bonsai, particularly those likely to be useful in producing publishable figures with some confidence that you understand what those figures mean. Provided sample sequences and profiles are used to ensure accuracy of instruction. At any time through this tutorial, you are invited to read the Help files for more detail.

Generating a Phylogenetic Tree and Multiple Alignment

(Jump to profile interpretation and editing. Jump to other Bonsai features)

1) Open the second set of sample sequences under the "Help" menu in the Bonsai main window.

2) A set of 6 related sequences will appear in the opening window split pane, with names to the left and sequences to the right with some guide numbering to orient you. Try sliding the split-pane divider, scrolling the sequence window, and resizing the window to get the feel for viewing sequences.

3) Select some sequences by clicking in the name panel (only the name will appear highlighted) and use "Show : Selected Sequence Properties" to view the sequences and edit some of their properties if you wish. Shift-click to select a range of sequences, Cntl-click to add individual sequences.

4) Select any pair of sequences and use "Align : Pair Align" to compute and display a basic pair alignment. Briefly view the Score-Shaded Pair Alignment window (we will come back to the equivalent window for a multiple alignment below).

5) For all alignments, related results will appear together as a set of internal windows inside a super frame, which makes it easier to keep track of what results belong together. In this case there should be a score-shaded alignment window and a dot-matrix window. From the main Bonsai window you can use "Settings : Pair Align Settings" and "Settings : Multiple Align Settings" to tailor which results from the alignment are displayed.

6) View the Pair Align Dot Matrix and use the buttons and menu items to change the display to your liking. Use "Display : Current Path" to view the path the shaded alignment takes through the dot matrix. Cntl-click and drag to selected a region of the dot matrix and use "Align: Align Selected Region" to generate a new pair alignment including only this region.

7) Close or iconify the enclosing pair Alignment Superframe (or leave it open if you prefer - any number of such analyses can be viewed simultaneously).

8) In the opening window, use "Align: Multiple Align All" to compute and display a full pairwise guide tree and multiple alignment of all the sequences. (If you wish, from the Edit menu you can delete selected sequences from the list before alignment, or you can select a subset of sequences to align and use "Align : Multiple Align Selected".) As with Pair Align, a superframe will be created that holds all the related alignment results as windows within it.

9) View the Alignment Guide Tree window. A conventional evolutionary tree is displayed (horizontally so that names are easily displayed). See Tree Graph Help for details. This is a provisional tree used to guide the multiple alignment. Iconify it for now.

10) View the Score-shaded Multiple Alignment window, and use "Calculate : N-J Tree From Multiple Alignment" to generate a phylogenetic tree from the multiply aligned sequences. Notice that, in this case, this tree is slightly different from the guide tree. The method used to compute the tree is called Neighbor-Joining.

11) In the Phylogenetic Tree from Multiple Alignment window, click the "Rotators" button and click on the rotators that appear at the tree nodes. These can be used to arrange the tree in the way you want. NOTE: these rotations have NO effect on the underlying structure or interpretation of the tree, but they can help make the tree more intuitive to view.

12) Click the "Rerooters" button and click on the ovals that appear on tree branches to "reroot" the tree. NOTE: these reroots DO change the structure of the tree. Use with caution! Bonsai selects a default root position that makes the leaves appear most even, as would be expected for current day sequences with an approximately constant molecular clock. In some situations this assumption is misleading. You can always return to the default tree by clicking the appropriate root position.

13) Click the "Distances" button to display numbers that correspond to the branch lengths on the tree.

14) Use the menu "Tree : Compute Bootstrap" and accept the default of 1,000 repeats to compute a number that reflect the confidence of the tree branch patterns. Depending on which version of Bonsai you have, you may need to switch to Score Uncorrected tree to compute the bootstrap values.

15) After getting a tree that appears the way you want it, use "File: Save as GIF" to save the tree as a figure.

16) Switch back to the Score Shaded Alignment window. As with the tree window, this window is complex. Read the Help file for much more information.

17) Use "Edit : Column Selection Mode" to switch to being able to select regions of the alignment. Click and drag on the alignment or use Shift-click to select a block of contiguous alignment. Use "Align : Align Selected Columns" to produce a new multiple alignment restricted to this region only. This is particularly useful to clean off unwanted ragged ends on your sequences, or more importantly to generate trees from appropriately selected sub-segments of the sequence set.

18) Use "Show : Text Alignment" to show a simple text alignment similar to ones produced by ClustalW. You can save this as a text file or select and copy it to the clipboard for use in the Boxshade program to generate simple publishable figures. One such server is found at http://www.ch.embnet.org/software/BOX_form.html. Bonsai also provides a simple BoxShade option under "Show : Boxshade Format".

19) Open the Multiple Alignment Guide Tree window and compare it to the Phylogenetic Tree from Multiple Alignment window. The two trees will usually be slightly different. In most cases the Phylogenetic Tree generated from the Multiple Alignment will be more accurate. However, in cases where some sequences are highly diverged, the multiple alignment may have difficulties producing a valid alignment of all the sequences at once, in which case this tree will be misleading. In such cases, constructing a good tree is challenging but the Guide Tree may be closer to the truth.


Interpreting, editing, and using a Profile

1) In the main Bonsai window use "Help : Load Sample Profile" to open the sample alignment profile.

2) A multiple alignment window will open and display an alignment of 5 related protein kinases.

3) Select column 5 by clicking it and use "View : Column Alignment Values" to see the values associated with this column.

4) At the top of the Profile Column window you will see an editable column weight. This number is always 1.0 when a profile is first constructed, indicating that all columns are weighted equally when this profile is used to align other sequences or profiles. You can edit this value up or down if you have reason to believe this column is more important than others. For example, in a kinase alignment you might want to increase the values for a few strictly-conserved residues in the ATP-binding fold and the catalytic site. Don't overdo this - even a 2-fold increase will usually lock a new sequence into aligning a good match to this column. Note that when a residue is strictly conserved, it will have a strong tendency to drive a new alignment at that column without increasing the column weight, because the score for a correct match is high (see next section). Values for column weight between 0.1 and 10.0 are allowed.

5) In the Scores and Counts panel, you will see editable match scores for each amino acid and un-editable weighted counts for them. The weighted counts are included for information only - their values were already used to compute the scores at the left. The weighted counts will always total about 1,000. See Multiple Alignment Methods for more information on sequence weights.

6) The match scores are the values that will be assigned when this profile is used in an alignment. This is simplest to understand by considering alignment of this profile to a single sequence: when a sequence residue is aligned to this profile column, it will receive the indicated score (alignment to another profile is a simple extension of this principle). Remember that the profile column scores are summations of scores for all the individual residues and their weights, so the alignment is driven by all the available profile information.

7) Click on various columns in the multiple alignment and watch the Profile Column window, noticing how scores change depending on the specific residues in the column and how good the residue consensus is. Column 5 is particularly instructive - note that Y receives the best score but that F is not far below it despite the fact that only a single F is present in this column. The high score for F is from a combination of two facts. First the BLOSUM matrix (used to make the multiple alignment) says that F and Y are evolutionarily similar (and chemically of course). Second, the sequence the F appears in (hCaMKIV) has a relatively high weighted count in this profile so an F-F match counted correspondingly heavily during score computation.

8) If you edit the profile column weights or scores, click save before moving to a new column.

9) If you wish, find another protein kinase from the NCBI site or elsewhere, load it in the Bonsai opening window, and use "Align : Align Sequences to this Profile" to see how the profile guides alignment of a new protein kinase.


Many other features are available through menus and settings. See the full set of Help files for more information.

Other features include:
- easy reordering of aligned and unaligned sequences
- easy and intuitive tree rearrangement
- pairwise distance tree construction using various types of distance measure
- averaged Kyte-Doolittle hydropathy plots for multiple alignments
- platform-independent figure saving and printing
- easy viewing of intermediate steps from a progressive multiple alignment
- progressive refinement of a multiple alignment
- simulation of protein evolution using a reasonably realistic protein model (handy for testing the properties and limits of various methods)
- interactively settable distance corrections for phylogenetic trees


James H. Thomas, Department of Genome Sciences, University of Washington
5/18/2002