CI2Daggett Research Group | Research | Bioinformatics

University of Washington - College of Engineering - School of Medicine - Department of Bioengineering

Selected Projects

  • SQL data analysis
  • Property space analysis
  • Denatured state structural analysis
  • Native flexibility analysis
  • Wavelet analysis

Bioinformatics

The immense amount of data generated by Dynameomics and other projects cannot be stored, organized, retrieved, and examined using traditional methods available in biology. In order to efficiently handle the dozens of terabytes of data that we have calculated, we have undergone a great number of bioinformatics initiatives. Traditional database techniques are not designed for data of the density and form such as ours, yet research will undoubtedly require simultaneous access to disparate pieces of our simulations. All of this requires a very firm and coherent structure and organization.

Our Dynameomics database (available at www.dynameomics.org) is implemented in Microsoft SQL Server 2008. The database is split over several servers, each of which host subsets of the data and are joined via a single unified directory. In order to greatly simplify data access, the database makes extensive use of views. Although traditional SQL row-sets are a natural structure for some data types, we have also been experimenting with multidimensional OLAP cubes as a way to more efficiently store high-dimensional data such as coordinates. OLAP, which is accessible via the MDX query language, shows promise as a means of simplifying complex multi-simulation queries. Throughout the database a very strict organization is enforced so as to make extension and access easy as we transition our data to the public domain.

Additionally, traditional analysis techniques for simulations cannot be used for all research projects. Divining information from such a large database requires new high-throughput analysis techniques that can quickly pinpoint items of interest or similarity and that can rapidly summarize entire simulations. In order to achieve this end, we have been developing several new techniques. One of these techniques is the analysis of property space, which can be used to rapidly differentiate between native-state and denatured structures, enabling a high-throughput discovery of transition states. Another technique is flexibility analysis, which summarizes the primary modes of an entire simulation in a single structure. Conversely, wavelet analysis is a technique that examines find-detail oscillations of atoms and which has shown promise in pinpointing notable events in a simulation such as the rearrangement of a helix.

Relevant Publications