|Sultanah Alshammari||Jessica Counihan||Long Feng|
|Rachel Gittelman||Rayna Harris||Jesse Hoff|
|Timothy Jones||Fei Wang||Joshua Welch|
|Wei Xie||Lisheng Zhou|
Modeling Disease Spread at Global Mass Gatherings
Spread of infectious diseases at global mass gatherings can pose health threats to both the hosting country and the participants’ countries of origin. The travel patterns at the end of these international events can initiate a global epidemic within a short period of time. Advanced surveillance systems and computational models are essential tools to estimate, study, and control epidemics at mass gatherings. In this paper, we present our ongoing efforts to model disease spread during the Hajj. We discuss the different aspects of modeling infectious diseases and the related to the Hajj season.
I am an international PhD student at the University of North Texas in the Department of Computer Science and Engineering. My research area is Computational Epidemiology. My current focus is modeling the spread of infectious diseases during mass gatherings (MGs) events. At present we are studying Hajj, the pilgrimage to Makkah, Saudi Arabia, which is one of the largest annual MGs events in the world. Hajj is characterized by the limited space and time with heterogeneous MGs of over two million Muslim pilgrims from over 189 countries. While some of the infectious diseases to be modeled may not be endemic in the host country, they might be endemic in some visitors’ countries. In our study, we are aiming to highlight the important aspects of modeling disease spread during MGs.
|Computational Epidemiology||Mass Gatherings||Bioinformatics|
Use of big data platforms and the future directions of proteomic analysis and visualization
Proteomics, the study of protein products expressed by the genome, has become one of the leading high-throughput technologies in biology due to an increased interest in system-wide analyses of proteins. The mapping of complex proteomics data to biological processes has become impossible by manual means, and the need for computer-aided data analysis is essential for further progress of the field.
I am entering my third year of graduate school at UC Berkeley in the Nomura Research Group. Our group is focused on discovering new therapeutic strategies to treat complex human diseases. We develop and apply innovative chemoproteomic and metabolic platforms to map metabolic and chemical drivers of disease and to develop next-generation pharmacological tools to understand biological processes and eradicate pathological conditions, including cancer. One major chemoproteomic platform used in our lab is activity-based protein profiling (ABPP). ABPP relies on the selective targeting of a single enzyme, or a functionally related family of proteins, using a chemical probe compromised of a protein-reactive electrophile and a reporter group. I am currently using a cysteine-reactive probe to look for enzymes that are upregulated in aggressive cancer cell lines relative to their nonaggressive counterparts.
Department of Statistics and Biostatistics
Methodological Issues of using graphical models for econometrics
Over the past decades, graphical model has become an increasingly popular big data technique to find the conditional dependence structure between millions of target random variables. In this paper, we considered the issue of using graphical models for economic datasets. Since economic datasets usually change over time, the problem of combining graphical model with time series model was discussed.
Currently I am a third year Ph.D student in statistics at Rutgers University. My research focus on two topics: high dimensional data and nonparametric bayesian analysis, both topic involves big data.
|High dimensional data||Bayesian Analysis|
The challenges of high-dimensional expression quantitative trait locus inference
This paper discusses ways of increasing power to detect associations in high-dimensional genotype-expression data sets that span multiple tissue types. Although methods exist that effectively combine data across tissues for joint analysis, they are limited in scope and not widely used currently.
I am a PhD student in Josh Akey's lab. I'm primarily interested in the evolution of gene regulation in humans and non-human primates, and how gene expression variation contributes to human phenotypes.
Integrative Biology, Center for Computational Biology and Bioinformatics
White Paper: Integrative Neuroscience
The brain is a fascinatingly complex organ that has been the subject of intense study for centuries, but many mysteries about the brain still remain. Current brain initiatives call for multi-scale integration of the activity and structure of the brain in order to elucidate and link the neural circuits dynamics to brain function. As cutting edge technologies are being developed, neuroscience critically depends on developing both data repositories, analysis tools and theories for integrating real-time genomic, connectomic, optogeneic, electrophysiological, and behavioral data.
I am a graduate research assistant and Ph.D. candidate at The University of Texas in Austin. I collaborate with Dr. Hans Hofmann to study functional variability of single neurons, to enhance transdisciplinary training in data intensive approaches, and to promote the sharing of ideas and expertise across typical disciplinary boundaries. My published research spans a variety of topics and utilizes multiple approaches to better understand biological phenomena.My thesis research aims to identify single neuron gene networks that respond to spatial and social learning and memory. My educational efforts aim to transform graduate training in computational and molecular approaches to understand biological phenomena. I love bringing people together to talk science, especially when I see that these interactions foster new relationships and projects. Working with the the Center for Computational Biology and Bioinformatics, I have many opportunities to build a community of Big Data users in the natural sciences.
Incorporation of Genomic Data in US Cattle Breeding and Production
We are analyzing appropriate practices for routine incorporation of genetic information in both the selection and care of US livestock. The rapid decrease in cost of genetic data from chips or short read data has lead to the accumulation of large, profitable data sets that provide dense quantification of the genetic component of livestock production. Unlike human or model organism genetics, the regulatory environment surrounding livestock genomics has enabled immediate application of cutting edge genomic technology in a commercial setting. Low cost genomic data informs high leverage decision making at farms ranging in size and technical expertise, enabled by well shared central genomic data repositories that inform genomic breeding models. We seek to develop tools that lower costs and allow genomic information to provide value across the whole livestock production cycle from reproduction, to immunity to ecological sustainability and carcass quality.
I am a phd student studying genomic prediction and population genetics in cattle.
Seeing is believing with data visualization
This whitepaper outlines state of the art biodiversity identification practices and new visual methods using Kingdom Plantae and its children as models.
|Biodiversity Informatics||Data visualization||Natural User Interface|
Personalized Disease Networks: A New Approach for Understanding and Predicting Complex Processes with Applications to Cardiovascular Diseases
we propose to build personalized patient networks, which represent the disease evolution of patients across subsequent hospitalizations. These networks are individual based and represent the evolutionary steps of various cardiovascular conditions, diseases and procedures. We will use the word network to represent a data based construction in a form of a connected graph to derive disease evolution pathway of each individual patient. A dataset of networks will be obtained once we build each individual network. Then we will proceed by clustering the personalized networks to 1) summarize more general networks within the cluster; 2) predict the cardiovascular mortality for each cluster. This network algorithm can also be adapted to other personalized pattern analysis.
Genomic data analysis, cardiovascular disease data analysis
|Biological data science|
Finding Life in High-Dimensional Space: Identifying Cell Types from Single Cell Gene Expression Data
Recent technological advances have enabled measurements of the genes that individual cells use. Data from these experiments provide a treasure trove of information about the functions of individual genes in specifying the properties of different types of cells, but computational methods for interpreting these large, high-dimensional datasets are lacking. In this paper, I describe three challenges in identification of cell types from single cell gene expression data and identify three corresponding datasets for developing and benchmarking approaches to address these challenges.
My research focuses on developing algorithmic and statistical methods for studying posttranscriptional gene regulation using RNA sequencing data. I have worked on projects requiring the analysis of many different types of RNA sequencing data, including RNA-seq, small RNA-seq, single cell RNA-seq, HITS-CLIP, PAR-CLIP, GRO-seq, TAIL-seq, and EnD-seq. My work has already resulted in 2 first author papers, 4 co-author papers, and 2 first-author papers in preparation. The recent meteoric rise of single cell RNA-seq technologies (named “Method of the Year 2013” by Nature Methods) opens a number of exciting new directions for research. Single cell resolution allows characterization of rare or unknown cell types, enables dissection of differentiation processes, and aids in decoding regulatory networks responsible for healthy and diseased states of cells. However, current single cell RNA-seq studies are limited by crucial gaps in existing computational methods. For my latest research project, I am developing a computational method for identifying sequential changes through which cells progress during a cellular process. These methods promise to enable deeper understanding of human gene regulation and the biology of disease.
|Computational Biology||High-Dimensional Data Analysis||Machine Learning|
Department of Electrical Engineering & Computer Science
Collaborative data science without violating privacy: a case study from genome research
Data privacy is an important issue for many data science disciplines involving human subjects. Here we take genome research as a representative case study to illustrate the privacy concerns and countermeasures. We develop a novel cryptography-based method to enable collaborative studies via meta-analysis without violating privacy. We also show the relevance of this method to the wider research community of data science.
I'm pursuing a PhD in computer science since 2011. I work on genome privacy and privacy-preserving machine learning, by proposing novel Cryptography-based solutions for preserving privacy on machine learning and statistical algorithms, with applications to genetics and biomedicine.
|Data privacy||Genetics||Machine learning|
Department of Genetics
A tool of genetic marker discovery for general biologists
This white paper describes the challenge most biologists are facing that the lack of computational and statistical skills does not meet the desire of identifying genetic markers with increasing data available through advanced sequencing technologies.
I am a doctoral student at Rutgers University. I am interested in the integration of computational tools to solve biological questions. I have great passions towards data mining, big data application and software development. My current research is focusing on developing a statistical test of association between single nucleotide polymorphisms (SNPs) and disease traits.
|Computational Biology||Big Data||Statistical Genetics|