Sultanah AlshammariJessica CounihanLong Feng
Rachel GittelmanRayna HarrisJesse Hoff
Timothy JonesFei WangJoshua Welch
Wei XieLisheng Zhou

Sultanah Alshammari

Computer Science
University of North Texas

Poster 88

X-team 2

Whitepaper:
Modeling Disease Spread at Global Mass Gatherings

Spread of infectious diseases at global mass gatherings can pose health threats to both the hosting country and the participants’ countries of origin. The travel patterns at the end of these international events can initiate a global epidemic within a short period of time. Advanced surveillance systems and computational models are essential tools to estimate, study, and control epidemics at mass gatherings. In this paper, we present our ongoing efforts to model disease spread during the Hajj. We discuss the different aspects of modeling infectious diseases and the related to the Hajj season.

Bio:

I am an international PhD student at the University of North Texas in the Department of Computer Science and Engineering. My research area is Computational Epidemiology. My current focus is modeling the spread of infectious diseases during mass gatherings (MGs) events. At present we are studying Hajj, the pilgrimage to Makkah, Saudi Arabia, which is one of the largest annual MGs events in the world. Hajj is characterized by the limited space and time with heterogeneous MGs of over two million Muslim pilgrims from over 189 countries. While some of the infectious diseases to be modeled may not be endemic in the host country, they might be endemic in some visitors’ countries. In our study, we are aiming to highlight the important aspects of modeling disease spread during MGs.

Interest areas:
Computational EpidemiologyMass GatheringsBioinformatics

Jessica Counihan

Metabolic Biology
UC Berkeley

Poster 51

X-team 2

Whitepaper:
Use of big data platforms and the future directions of proteomic analysis and visualization

Proteomics, the study of protein products expressed by the genome, has become one of the leading high-throughput technologies in biology due to an increased interest in system-wide analyses of proteins. The mapping of complex proteomics data to biological processes has become impossible by manual means, and the need for computer-aided data analysis is essential for further progress of the field.

Bio:

I am entering my third year of graduate school at UC Berkeley in the Nomura Research Group. Our group is focused on discovering new therapeutic strategies to treat complex human diseases. We develop and apply innovative chemoproteomic and metabolic platforms to map metabolic and chemical drivers of disease and to develop next-generation pharmacological tools to understand biological processes and eradicate pathological conditions, including cancer. One major chemoproteomic platform used in our lab is activity-based protein profiling (ABPP). ABPP relies on the selective targeting of a single enzyme, or a functionally related family of proteins, using a chemical probe compromised of a protein-reactive electrophile and a reporter group. I am currently using a cysteine-reactive probe to look for enzymes that are upregulated in aggressive cancer cell lines relative to their nonaggressive counterparts.

Interest areas:
Cancer Metabolism

Long Feng

Department of Statistics and Biostatistics
Rutgers University

Poster 90

X-team 2

Whitepaper:
Methodological Issues of using graphical models for econometrics

Over the past decades, graphical model has become an increasingly popular big data technique to find the conditional dependence structure between millions of target random variables. In this paper, we considered the issue of using graphical models for economic datasets. Since economic datasets usually change over time, the problem of combining graphical model with time series model was discussed.

Bio:

Currently I am a third year Ph.D student in statistics at Rutgers University. My research focus on two topics: high dimensional data and nonparametric bayesian analysis, both topic involves big data.

Interest areas:
High dimensional dataBayesian Analysis

Rachel Gittelman

Genome Sciences
University of Washington

Poster 82

X-team 2

Whitepaper:
The challenges of high-dimensional expression quantitative trait locus inference

This paper discusses ways of increasing power to detect associations in high-dimensional genotype-expression data sets that span multiple tissue types. Although methods exist that effectively combine data across tissues for joint analysis, they are limited in scope and not widely used currently.

Bio:

I am a PhD student in Josh Akey's lab. I'm primarily interested in the evolution of gene regulation in humans and non-human primates, and how gene expression variation contributes to human phenotypes.

Interest areas:
genomics

Rayna Harris

Integrative Biology, Center for Computational Biology and Bioinformatics
The University of Texas at Austin

Poster 12

X-team 2

Whitepaper:
White Paper: Integrative Neuroscience

The brain is a fascinatingly complex organ that has been the subject of intense study for centuries, but many mysteries about the brain still remain. Current brain initiatives call for multi-scale integration of the activity and structure of the brain in order to elucidate and link the neural circuits dynamics to brain function. As cutting edge technologies are being developed, neuroscience critically depends on developing both data repositories, analysis tools and theories for integrating real-time genomic, connectomic, optogeneic, electrophysiological, and behavioral data.

Bio:

I am a graduate research assistant and Ph.D. candidate at The University of Texas in Austin. I collaborate with Dr. Hans Hofmann to study functional variability of single neurons, to enhance transdisciplinary training in data intensive approaches, and to promote the sharing of ideas and expertise across typical disciplinary boundaries. My published research spans a variety of topics and utilizes multiple approaches to better understand biological phenomena.My thesis research aims to identify single neuron gene networks that respond to spatial and social learning and memory. My educational efforts aim to transform graduate training in computational and molecular approaches to understand biological phenomena. I love bringing people together to talk science, especially when I see that these interactions foster new relationships and projects. Working with the the Center for Computational Biology and Bioinformatics, I have many opportunities to build a community of Big Data users in the natural sciences.

Interest areas:
NeuroscienceGenomicsTransdisciplinary education

Jesse Hoff

Animal Science/Genetics
University of Missouri

Poster 46

X-team 2

Whitepaper:
Incorporation of Genomic Data in US Cattle Breeding and Production

We are analyzing appropriate practices for routine incorporation of genetic information in both the selection and care of US livestock. The rapid decrease in cost of genetic data from chips or short read data has lead to the accumulation of large, profitable data sets that provide dense quantification of the genetic component of livestock production. Unlike human or model organism genetics, the regulatory environment surrounding livestock genomics has enabled immediate application of cutting edge genomic technology in a commercial setting. Low cost genomic data informs high leverage decision making at farms ranging in size and technical expertise, enabled by well shared central genomic data repositories that inform genomic breeding models. We seek to develop tools that lower costs and allow genomic information to provide value across the whole livestock production cycle from reproduction, to immunity to ecological sustainability and carcass quality.

Bio:

I am a phd student studying genomic prediction and population genetics in cattle.

Interest areas:
GenomicsAnimal Breeding

Timothy Jones

Biological Sciences
Louisiana State University

Poster 22

X-team 2

Whitepaper:
Seeing is believing with data visualization

This whitepaper outlines state of the art biodiversity identification practices and new visual methods using Kingdom Plantae and its children as models.

Bio:

My expertise is working on computational identification issues of large plant genera and families. For my doctoral research at Louisiana State University, I focused on visual identification methods of vascular plants using only off-the-shelf technologies to marry both old and new data. The first published work used Microsoft’s .NET framework and the Silverlight plug-in and featured the super-genus Carex, or sedges. The second work concentrated on all grass genera of Louisiana that incorporated the textual resources of both grass volumes from the ongoing Flora of North America project. The third iteration of this method examined Kingdom Plantae as a whole. All three projects now exclusively use JavaScript without the need for plug-ins. My latest publication examines the impact of my web-based works plus other online works from like-minded collaborators such as Tropicos, eFloras, FloraBase, SEInet, and the Global Biodiversity Information Facility.

Interest areas:
Biodiversity InformaticsData visualizationNatural User Interface

Fei Wang

Waksman Institute
Rutgers University

Poster 83

X-team 2

Whitepaper:
Personalized Disease Networks: A New Approach for Understanding and Predicting Complex Processes with Applications to Cardiovascular Diseases

we propose to build personalized patient networks, which represent the disease evolution of patients across subsequent hospitalizations. These networks are individual based and represent the evolutionary steps of various cardiovascular conditions, diseases and procedures. We will use the word network to represent a data based construction in a form of a connected graph to derive disease evolution pathway of each individual patient. A dataset of networks will be obtained once we build each individual network. Then we will proceed by clustering the personalized networks to 1) summarize more general networks within the cluster; 2) predict the cardiovascular mortality for each cluster. This network algorithm can also be adapted to other personalized pattern analysis.

Bio:

Genomic data analysis, cardiovascular disease data analysis

Interest areas:
Biological data science

Joshua Welch

Computer Science
The University of North Carolina at Chapel Hill

Poster 67

X-team 2

Whitepaper:
Finding Life in High-Dimensional Space: Identifying Cell Types from Single Cell Gene Expression Data

Recent technological advances have enabled measurements of the genes that individual cells use. Data from these experiments provide a treasure trove of information about the functions of individual genes in specifying the properties of different types of cells, but computational methods for interpreting these large, high-dimensional datasets are lacking. In this paper, I describe three challenges in identification of cell types from single cell gene expression data and identify three corresponding datasets for developing and benchmarking approaches to address these challenges.

Bio:

My research focuses on developing algorithmic and statistical methods for studying posttranscriptional gene regulation using RNA sequencing data. I have worked on projects requiring the analysis of many different types of RNA sequencing data, including RNA-seq, small RNA-seq, single cell RNA-seq, HITS-CLIP, PAR-CLIP, GRO-seq, TAIL-seq, and EnD-seq. My work has already resulted in 2 first author papers, 4 co-author papers, and 2 first-author papers in preparation. The recent meteoric rise of single cell RNA-seq technologies (named “Method of the Year 2013” by Nature Methods) opens a number of exciting new directions for research. Single cell resolution allows characterization of rare or unknown cell types, enables dissection of differentiation processes, and aids in decoding regulatory networks responsible for healthy and diseased states of cells. However, current single cell RNA-seq studies are limited by crucial gaps in existing computational methods. For my latest research project, I am developing a computational method for identifying sequential changes through which cells progress during a cellular process. These methods promise to enable deeper understanding of human gene regulation and the biology of disease.

Interest areas:
Computational BiologyHigh-Dimensional Data AnalysisMachine Learning

Wei Xie

Department of Electrical Engineering & Computer Science
Vanderbilt University

Poster 39

X-team 2

Whitepaper:
Collaborative data science without violating privacy: a case study from genome research

Data privacy is an important issue for many data science disciplines involving human subjects. Here we take genome research as a representative case study to illustrate the privacy concerns and countermeasures. We develop a novel cryptography-based method to enable collaborative studies via meta-analysis without violating privacy. We also show the relevance of this method to the wider research community of data science.

Bio:

I'm pursuing a PhD in computer science since 2011. I work on genome privacy and privacy-preserving machine learning, by proposing novel Cryptography-based solutions for preserving privacy on machine learning and statistical algorithms, with applications to genetics and biomedicine.

Interest areas:
Data privacyGeneticsMachine learning

Lisheng Zhou

Department of Genetics
Rutgers, the State University of New Jersey

Poster 56

X-team 2

Whitepaper:
A tool of genetic marker discovery for general biologists

This white paper describes the challenge most biologists are facing that the lack of computational and statistical skills does not meet the desire of identifying genetic markers with increasing data available through advanced sequencing technologies.

Bio:

I am a doctoral student at Rutgers University. I am interested in the integration of computational tools to solve biological questions. I have great passions towards data mining, big data application and software development. My current research is focusing on developing a statistical test of association between single nucleotide polymorphisms (SNPs) and disease traits.

Interest areas:
Computational BiologyBig DataStatistical Genetics