Bonsai Main Window Help

The opening Bonsai window entitled is organized to load and display unaligned sequences for analysis, to initiate a variety of alignment and tree building methods, and to make various general, alignment, and tree settings. Loaded sequences are shown in the large central panel and some general information is shown in the bottom panels. All actions are menu or keystroke driven.


File Menu:

Load Sequences: Load a set of sequences from a sequence file in any of several formats (including Fasta and Bonsai serialized format) and prepares them for use in various methods. Replaces currently loaded sequences. You can also load

Append Sequences: Append additional sequences from a sequence file in any of several formats (including Fasta and Bonsai format).

Save Fasta Sequences: Save the current set of sequences in a text file with Fasta format. No file extension is added.

Save Sequences: Save the current set of sequences in Bonsai file format. (note - this format uses the Java serialization methods and is not readily readable outside of Bonsai, it may also be unstable from release to release of Bonsai) The files are automatically given a ".bnss" extension.

Load Alignment: Load a multiple alignment profile from a Bonsai file (.bnsp).

Load Fasta Alignment: Load a multiple alignment from a text file in the fasta format. This format is similar to the fasta sequence format except that dashes have been added at gap positions in each sequence.

Load Interchange Alignment: Load a multiple alignment profile from a text file that conforms to the Bonsai interchange format. This format records essential profile data in a text format.

Load Other Formats: Load sequences or multiple alignments in other formats. This will open a self-explanatory choice panel for the type of file you want to open. These format interpreters are new and may have bugs. The Nexus/PAUP interpreter in particular does NOT handle all the complexities of this format (but it does work with the basic multiple alignment).

Exit: Exit Bonsai.


Edit Menu:

Copy to Clipboard: Copy the selected sequences (either in Fasta format or just the sequence names) to the system clipboard. The names version will be in text format with one name on each line, without the ">" fasta character. Useful for other programs and web pages that require name lists and for Bonsai menu item Select From Name Set on Clipboard (see below).

Paste Fasta Sequences from Clipboard: Add sequences from the system clipboard that are in Fasta format. If no sequences are currently loaded, these will form a new sequence list, otherwise they will be appended to the end of the current sequence list.

Add Sequence from Web: Add a sequence from the Genbank sequence repository based on its accession number or its "GI" number. This will bring up dialogs that let you enter the accession number and choose the sequence to add to the current sequence list. Some Genbank accession entries have more than one associated sequence, in which case you will be able to choose among them based on preview information. This feature is new and may be buggy.

Delete Selected Sequences: Delete the sequences that are currently user-selected.

Delete All Sequences: Delete all currently loaded sequences. Note: loading a new sequence file will also delete current sequences.

Find Name Matches: Brings up a dialog that lets you find and select sequences with names that match your specification. For each of the Find or Select functions, you can find the inverse set of sequences by running Find and then Invert Selected Set.

Find Sequence String Matches: Brings up a dialog that lets you find and select sequences with sequence residues that match your specification.

Find N Matches to Selected Sequence: [best or worst] Brings up a dialog that lets you specify the number of best or worst matches to find for a selected sequence. The currently selected sequence will be aligned to each of the other sequences using the current pair align settings, and the specified number of best (or worst) alignment matches will be added to the current sequence selection. Very useful for aligning related subsets of a longer list.

Select From Name Set on Clipboard: If you have pasted one or more sequence names to the system clipboard, this uses them to search through all the currently loaded sequences and select matches. The clipboard contents must be a set of text lines, one name per line, separated by new lines (carriage returns). You can either demand exact matches (the complete name must be identical) or you can find sequences whose names contain a query within them.

Select N Random Sequences: Select a set of random sequences from the current sequence list. Useful for significance testing and pruning huge data sets to representative subsets.

Sort Sequences: Sort the list of displayed sequences by various keys. Useful for managing large sequence lists. The sorts are ascending and self-explanatory except for one - "Sort by Relatedness" starts with the sequence currently in the first row, puts its closest relative in the second row, puts the closest relative of the second row in the third row, etc. until the list is complete. Relatedness is determined by a full pair alignment using the current settings.

Number Sequences: Add a number and underscore in front of each sequence name in their current order (e.g. "seqA" will become "1_seqA").

Truncate Sequence Names: This will bring up a self-explanatory dialog that lets you trim all the current sequence names in various ways as a convenience. You can also change individual names by hand using the menu Show : (various) Properties.

Select All: Select all the sequences.

Invert Selected Set: Select the unselected sequences and deselect the selected ones.

Clear Selections: Clear current selections.

Move Selections Up/Down List: Move the selected sequences one step up or down in the list of sequences. This has no effect on alignments, but allows you to arrange sequence lists to your liking.

Validate Sequence List: An unlocated bug occasionally creates an internal inconsistency in a sequence list (noticeable as unexpected error messages). This work-around fixes it.


Align Menu:

Pair Align: Run a pair alignment on the first two selected sequences in the sequence list (or the first two sequences if none are selected). The parameters for the pair alignment can be changed in the Settings menu.

Multiple Align All: Make a full multiple alignment, which will include all pairwise alignments used to construct a pairwise-distance "guide tree", followed by a progressive multiple alignment. Parameters can be set in Settings menu.

Multiple Align Selected: Make a full multiple alignment only with the currently selected sequences. See "Multiple Align All".

Pairwise Tree All: Measure pairwise distances among all currently loaded sequences. Methods for making the multiple pairwise comparisons include full pair alignment and "Turbo Words". Full pair alignment is preferred unless a very large number of sequences makes speed critical. Parameters for the alignments can be changed in the various Settings menus, but only the pairwise tree will be created and displayed.

Pairwise Tree Selected: Measure pairwise distances among all currently selected sequences. See "Pairwise Tree All".

Find Motif (Gibbs Sampler): Brings up a controller for Gibbs sampling the sequencing. This is thus far incompletely implemented and documented but it is included because it is very useful for analyzing highly divergent sequence sets for short shared segments, something that progressive multiple alignment does poorly.

Align Sequences to Alignment: Align the currently loaded sequences to a previously calculated multiple alignment, which must be in the Bonsai serialized format. You will be prompted for a file from which to load the multiple alignment.


Show Menu:

Sequence List Properties: Show a summary list of basic properties of all the sequences currently loaded. The list property window can also be used to activate detailed properties for any specific sequence.

All Sequence Properties: Show detailed sequence properties for all sequences. The sequence name and sequence itself are two of these properties and they can be edited in the property window. Other properties are not editable.

Selected Sequence Properties: Show detailed sequence properties for all currently selected sequences. The sequence name, the sequence itself, and comments can be edited in the property window. Other properties are not editable.

Tree Window: Show an empty tree display window. You can load tree data from there.

Protein Evolution Simulator: Show a window that permits you to use the first of the loaded sequences as the seed for a simulated evolution.

Close All Other Windows: Duh!


Settings Menu:

Pair Alignment Settings: Show a window where various parameters can be set for pair alignments. These include settings used during the alignment process and pair alignment display options.

Multiple Alignment Settings: Show a window where various parameters can be set for multiple alignments. These include settings used during the alignment process and multiple alignment display options.

Tree Settings: Show a window where tree construction parameters can be set.

Turbo Words Settings: Show a window where turbo words parameters can be set.

Set Sequence Type: In the event that Bonsai doesn't figure out what sequence type you loaded, or gets it wrong, you can change this manually here.


Help Menu:

Main Window Help: This help.

Task Oriented Help: A list of common tasks (e.g. align sequences to a profile) and a brief overview of how to perform them.

Tutorial: A tutorial guide for use of the simpler features of Bonsai.

Guiding Principles: A general description of the philosophy and purpose of Bonsai.

Index of Topics: An index of various Bonsai help files.

Load Sample Sequences 1: Load a sample set of very short similar (imaginary) sequences to test Bonsai features.

Load Sample Sequences 2: Load a set of longer sequences for testing. This is a real set of 6 related potassium channels from C. elegans, Drosophila, and mammals.

Load Sample Sequences 3: Load a large set of sequences for testing. This is a set of 45 putative odorant receptor proteins from C. elegans. The set pushes Bonsai methods in some ways because the set is so large and divergent. If you produce a multiple alignment, notice that the conservation is highly dispersed and subtle. I have run full multiple alignments of up to 243 sequences of this length with 120 MBytes of RAM assigned to Bonsai and had no obvious memory problems.

Load DNA Sample Sequences: Load a small set of sample DNA sequences. The DNA alignment has not been tuned and will not work as well as Clustal. It is also strictly limited in sequence length. This will be improved with time.

Load Sample Alignment: Load a sample multiple alignment to see how the alignment window works. The sample is an alignment of the kinase domains of diverse CaM Kinases. The NJ tree from this alignment is a nice example of the ancient origin of kinases (compare the divergence time of the three CaMK subfamilies to the divergence time for the metazoans as reflected in the CaMKII group).

About Bonsai: Credits, build date, and web address for downloading Bonsai.


James H. Thomas, Department of Genome Sciences, University of Washington, 9/25/2003