Markov Block Settings

Introduction

The purpose of this aspect of Bonsai sequence evolution modeling is to approximate realistic domains of conservation over evolutionary time. Alignment of real proteins invariably shows that some segments of the primary sequence remain more conserved than others. The pattern of conserved domains varies substantially among protein families, both in terms of the positions of the domains and their relative lengths. Some protein families have one or two relatively long segments of high conservation separated by long regions of varying but much lower conservation or none at all. Others have multiple short regions of conservation (single amino acids at the extreme) interspersed with short regions of lower conservation. All of these features can be simulated using a Markov chain method that assigns degrees of conservation to protein regions according to probabilistic rules.

You can use the Markov Chain Settings in a simple manner by selecting from one of four preset options: "Fine", "Grainy", "Chunky", and "Blocky". These will produce conservation patterns that range from finely dispersed short regions of conservation to long blocks of conservation that are more dispersed. The Markov Chain parameters for each will displayed as you select these defaults.

To understand what these settings mean and how to make other conservation patterns yourself, you have to understand Markov chains. The current Bonsai Markov chain has four states. Each position along the starting protein sequence will be assigned to one of these four states, and that state will remain fixed throughout an evolutionary simulation. Each state determines the likelihood of mutations occurring at that position during evolution. The Markov chain is used to assign the state at the outset of the evolution.

The Markov chain starts at the N-terminus of the protein and marches down the protein to the end, one amino acid residue at a time. At each residue, the Markov chain first determines what the state of the residue will be (based on the state of the previous residue), and then assigns the state-associated mutation probability to that position. Each state has assigned probabilities that the next residue will remain in the same state or that it will adopt one of the other three states. The table at the top of this settings window displays these probabilities. Since each of the four states can have different probabilities of staying in the same state or changing to each of the other states, the pattern of transitions from state to state is highly tunable. Since all of these transitions are probabilistic (Bonsai uses a random number generator to choose among them at each step), the exact pattern of states assigned will be different each time the chain is run, but it will follow characteristic general patterns. The result will be a "look and feel" to the alignment generated after extensive evolution. To get a sense of this you might try the most extreme default states, run simulations with each, and look at the alignments.

It is easy to get a feel for how the presets "Fine", "Grainy" etc. are reflected in the transition probabilities: click on various default options and observe the transition probability table. Fine patterns result when states have a high probability of changing to other states, so that the protein chatters rapidly from state to state as you move along it. Blocky patterns result when the states have a high probability of remaining the same from step to step (high values along the top-left to bottom-right diagonal). Under these conditions, the protein has longer blocks of each state that change less often. In the extreme, you can introduce a "trapping" state - one that cannot change back to another state once it is reached (set all the P-values for transition out of this state to zero). During a Markov chain run, once this trapping state is reached, all subsequent residues will be assigned the trapping state.


Changing Settings:

In the State Transition Probabilities table, the top row and left column label the four Markov States. The transition probability from state X to state Y is given at their intersection in the table. You can edit any cell in the table to change the value. Be sure to press "enter" or selected another table cell after your edit (I'll fix this one of these days). When done, also be sure to click Save Settings before closing the window. The state transition probabilities in each row must (by definition) sum to 1.0. To ensure this, when you click Save Settings, the values in each row are adjusted while maintaining their relative values. This can be used to advantage by changing one value, which implicitly adjusts the others to compensate.

In the Mutation Probabilities panel, enter values between 0 and 1.0 for each state as you wish. Each state will suffer mutations in proportion to this value. [Behind the scenes, what actually happens during Bonsai protein evolution is that a protein residue is chosen and this residue position is queried for "permission" to allow a residue change. The permission probability depends on the residue state and the setting in this mutation probabilities panel. This corresponds approximately to random mutation followed by selection against changes at some sites more than others.]

To return to one of the standard default models select one of the presets radio-buttons. The values in the other display fields will be changed to reflect the new default model.


For those who know about Markov Chains, the Bonsai model at this time is very simple. In particular, it permits only a single emission from each state, namely the mutation probability. Though the number of states is currently fixed at four, you can of course effectively reduce that number by making some states unreachable. I may increase the number of possible states, but the value of this is questionable since four states already can produce protein evolutions with a natural look and feel.


James H. Thomas, Department of Genome Sciences, University of Washington. 8/1/2002