Inside Bonsai

Bonsai is written in Java and currently requires JDK 1.3 or later. This help page will not provide most of the detail about how the Bonsai code is structured, but I will endeavor to provide a taste of it.

To the best of my limited knowledge and skill, Bonsai takes full advantage of Java object orientation and the rich set of foundation classes. It consists of a very large number of Bonsai-specific classes and makes extensive use of inheritance. For example, all alignments are managed by subclasses (or subclasses of subclasses) of the abstract Align class, which defines the core properties and methods that all Align objects use. Parts of the code are clumsy because I was learning at the same time as writing Bonsai. I'm fitfully returning to these parts and writing cleaner code.

For developers, I am trying to maintain a reasonably complete set of Javadocs and scattered methods that are designed to output text, html, or other interchange formats from alignments etc. Feel free to request additional such methods or documentation. I am not yet ready to release source code.

Sequence class: a sequence is represented internally as a short[ ] (an array of primitive "shorts", which are essentially small integers). Each sequence object also has: a code property that can translate this short[ ] into a character String, a type property (PROTEIN, DNA, or RNA), a long that is an object-independent unique identifier for the sequence, and various other properties to store things such as the sequence name, comments, etc. The residue sequence of a Sequence object is immutable, with the exception of objects of the EvoSequence subclass, which adds mutability among other things. When the residue sequence of a Sequence object is modified, a new Sequence is created to represent the modified sequence.

SequenceList class: usually Sequence objects are managed as part of a list, which is represented by the SequenceList class. A SequenceList is mutable in the sense that immutable Sequence objects can be added and removed at will. Internally, the list is maintained as a LinkedList, which facilitates addition and removal of Sequence objects. Since access time is not a speed issue this works fine.

PhylogeneticTree class: this class may change substantially in the future to make it more flexible, but for now it is a custom implementation of a simple tree data structure. Because of the importance of node distances in phylogenetic trees, they do not match standard tree structures well (indeed they are closer to highly constrained sparse digraph data structures). A Bonsai tree always has a root (though this root is ignored for some purposes) and every internal node has exactly two children and one parent. Every node is a PTreeNode object. A leaf node will typically have a Sequence property and the PhylogeneticTree object will usually have a DistanceSet object and a 2-dimensional array of PairwiseAlignmentResult objects that together represent all of the pairwise comparison data. If the tree is displayed, each tree node will have a transient DisplayTreeNode object that manages display-specific details about the node.

Profile Class: [add description]

Serializing: currently Bonsai uses standard Java serializing mechanisms to save files in "Bonsai format". Given the developmental state of the project this is not ideal but it is easy. I'm thinking about this - anybody have any ideas? I've tried to plan the serialized classes, but it seems likely some will change in the future, creating backward compatibility problems.

Alignment Display: The main Bonsai innovation is a much-improved user interface. This is a very complex area and it is under development, so I won't describe much of it but here's a little bit. Score-shaded sequence alignment displays are actually large Java JTable objects, with a custom table cell renderer and other specializations. This makes them a bit slow for display of large data sets, but provides major advantages in flexibility and user selection.

Tree Display: tree display is achieved by representing graphical aspects of every tree node in a DisplayTreeNode object. This object knows how to draw itself completely, so drawing a complete tree is simply a matter of having each DisplayTreeNode draw itself. The trick is to pass each DisplayTreeNode several key values: its phylogenetic distance from the root, the (vertical) spacing between adjacent leaves, the number of leaves above and below it, and the nodes that are it's children. Essentially, it draws a branch of the correct length positioned with respect to the root and it draws a duplication bar between the left ends of its children's branches. By working up the tree from the leaves, every node knows exactly how to draw itself. A few extra rules get added at the leaves and root (e.g. a leaf doesn't draw a duplication bar). This approach is so simple and fast that when anything is changed about the tree (node rotation, distance correction etc.) it is changed in the underlying PhylogeneticTree and the complete set of DisplayTreeNode objects is recomputed and drawn. It works like a charm.


James H. Thomas, Department of Genome Sciences, University of Washington
11/8/2002