CF Research Translation Center and Research Development Program

University of Washington
UW Health Sciences, K-140
Genome Sciences, Box 357710
Seattle, WA 98195

Pilot 4: Computational Tools For Identifying Compositional Shifts In The CF Gut Microbiome

P.I.: Elhanan Borenstein, PhD
Associate Professor of Genome Sciences
Adjunct Associate Professor of Computer Science
External Professor, The Santa Fe Institute

Abstract: Cystic fibrosis (CF) is often associated with diseases of the intestinal tract, ultimately leading to malnutrition and poor growth. This persistent growth failure resists nutrient and enzyme replacement therapy, suggesting that factors other than inadequate nutrient intake or malabsorption may contribute to malnutrition in children with CF. Specifically, it has been suggested that the CF gut microbiome may influence growth and clinical outcomes through its effects on host metabolism, nutrition, and immune function.

To examine this hypothesis, the Cystic Fibrosis Research Translation Center at UW is currently characterizing the gut microbiomes of children with and without CF, using massively parallel next-generation sequencing methods, with additional efforts underway. These initiatives are generating exciting metagenomic data, mapping, for the first time, the previously uncharted composition of the CF gut microbiome. However, considering the numerous factors affecting the composition of the gut microbiome and the overall functional uniformity across samples, standard comparative analysis may fail to detect significant patterns. Advanced computational methods are therefore required to sift through this ultra-high-throughput data and pinpoint potential functional capacities of the microbiome that may be associated with CF and with clinical outcomes.

In this pilot project we will therefore develop a suite of novel computational methods for identifying associations between the composition of the microbiome and specific host phenotypes such as CF status and clinical parameters. These methods are especially tailored to identify subtle differences in highly-multidimensional data with a relatively small sample size – a common setting in comparative metagenomic analysis. We will focus on three complementary methods: First, we will develop a “Gene Set” based method for improved identification of over- and under-represented functional categories in the microbiome, inspired by microarray analysis. Second, we will develop a computational framework for co-occurrence-based grouping of genes in the microbiome and for dimension reduction, offering a more natural alternative to pathway-based grouping and accounting for inter-gene dependencies. Finally, we will develop a computational framework for simultaneously analyzing species and gene abundances, and for harnessing these two data sources in an integrated manner. Each of these methods aims to address specific weaknesses of standard comparative metagenomic analysis and to enhance the statistical power of such studies. We will further make the methods developed in this pilot project available to the research community both as an open source and as a web-application for wide accessibility.

Applying these tools to study the microbiomes of children with and without CF will allow us to obtain novel insights into functional shifts in the CF microbiome and their role in CF-related growth failure. The methods developed in this pilot study and their application to CF would lay the foundation for a large-scale study aimed at developing a comprehensive computational framework for improved comparative metagenomic analysis for studying the microbiome’s contribution to CF and to other human diseases.