Yazhong Wang

Physics
Rutgers

Poster 1

X-team 10

Whitepaper:
Cosmological experiments in condensed matter system

We studied topological defects in hexagonal manganites to help understand the evolution of our Universe based on Kibble-Zurek mechanism. In this work, we have to account defect density in a large scale and the coordinates of each vortex core on some optical images. It takes us several months to do this data analysis. We are seeking more efficient way to do this work. And it will take us huge convenience in the future research.

Julie van der Hoop

Joint Program in Oceanography (Biology)
Massachusetts Institute of Technology - Woods Hole Oceanographic Institution

Poster 2

X-team 1

Whitepaper:
Integrating animal sensing systems

The next breakthroughs in wearable technology, for humans or animals, require integrated sensing systems.

Fred Morstatter

School of Computing, Informatics, and Decision Systems Engineering
Arizona State University

Poster 3

X-team 1

Whitepaper:
Discovering Bias in Big Social Media Data

One fundamental problem with social media mining is getting access to representative, reliable data. While companies like Facebook have massive amounts of data, they do not share this data with the research community at large. For the few sites that do share their data, they do so through the use of APIs that allow the researcher access to a portion of the overall data generated on the site. Twitter, one example of a social media site that shares its data, allows researchers to at most 1% of all of the posts generated on the site each day through its API. Twitter is perhaps the most lenient when it comes to sharing data with the research community. While Twitter’s APIs come as a welcome relief to those in the area of social media mining, their ability to represent the true activity on the social media site has become a concern to researchers in recent years. The problem of finding representative samples of social media is a widely accepted and necessary problem that researchers must address in order to ensure the veracity of their research results. Herein we define the problem and outline two state-of-the-art solutions.

Austin Arrington

Environmental Science / Ecosystem Restoration
SUNY College of Environmental Science and Forestry

Poster 4

X-team 4

Whitepaper:
Color Analysis of Crowdsourced Images for Ecological Monitoring

Remote sensing technology, such as satellite imagery, is a powerful tool for studying spatial ecology. However, understanding spatial ecology often requires finer scales than is afforded by satellite imagery, and the need for “ground-truthing” still exists. Leveraging “Big Data,” or more specifically, geo-tagged and time-stamped images provided through open source online networks, may offer a solution to help better understand scale and pattern in ecological systems.

Chris Cacciapaglia

Biological sciences
PhD student

Poster 5

X-team 4

Whitepaper:
Climate change refuges in the oceans

Identify coral reef refugia in the Pacific, Indian and Atlantic Oceans under differing climate change scenarios using climate-envelope models in accordance with high-resolution environmental data at a global scale

James Morton

Computer Science
University of California San Diego

Poster 6

X-team 9

Whitepaper:
Uncovering the Unknown: A New Approach in Analyzing Microbiome Data

In microbiome studies, the process of normalizing samples is still in the midst of immense debate. We argue that the most straightforward approach for normalizing samples is to calculate the proportions of species for each sample. In this paper we introduce a novel statistic for estimating multinomial proportions when the total number of possible species is unknown. Here we will show why using observed species abundances to estimate proportions are poor estimators for the true proportions and how coverage estimators can enhance accuracy of true proportion estimators.

Arif Khan

Computer Science
Purdue Universit

Poster 7

X-team 5

Whitepaper:
Large Scale Adaptive Anonymity via Parallel Approximate b-Matching

Data privacy is a necessary feature for data science applications. We discuss the potential of k-Anonymity, a privacy algorithm in the context of big data. We show some of the limitations of k-Anonymity and propose a heuristic solution to solve those problems. We also present the applicability of k-Anonymity to different domains of Data Sciences.

Alexandra Munoz

Environmental Medicine
New York University

Poster 8

X-team 9

Whitepaper:
Re-evaluating the paradigmatic presuppositions of molecular biology in the context of big data

Every piece of information that is extracted in data analysis also assumes a model – without the model the data would not tell you anything – there would be no context through which to relate the variables and the magnitude of the values would be meaningless. The molecular landscape is modeled in a DNA-centric manner that prioritizes certain types of information (singularities) over others (dynamic processes) and in turn constructs a system in which certain avenues of causality are not being fully integrated into the model. In turn, this paper critiques the current model and points to a direction for alternative exploration. The motivation for this work is to model the complexity of cancer in a new way, in an effort to expand the search area for the solution to cancer.

Colin Raffel

Electrical Engineering
Columbia University

Poster 9

X-team 8

Whitepaper:
Learning Efficient Representations for Sequence Retrieval

We explore the problem of matching sequences of high-dimensional vectors to entries in very large sequence databases. When utilizing dynamic time warping distance to compare sequences, the local distance calculations can be prohibitively expensive when the data's dimensionality and intrinsic sampling rate is high. We therefore motivate the need for methods which can learn efficient representations for sequence comparison and discuss potential applications of these techniques.

Ashlynn Daughton

Systems Analysis and Surveillance
Los Alamos National Lab

Poster 10

X-team 6

Whitepaper:
Use of Historic Disease Data to Facilitate Awareness and Inform Control Measures

Infectious diseases have recognizable patterns that have been documented for decades, but have not been fully exploited. We have developed an application that seeks to use similarities in historic infectious disease outbreak data to inform situational awareness of current outbreaks for a wide number of infectious diseases, even in contexts where minimal data is available.

Benjamin Weinstein

Ecology and Evolution
Stony Brook University

Poster 11

X-team 10

Whitepaper:
A pipeline for combining crowd-sourced images and computer vision to monitor plant flowering

Using images gathered from the Flickr photo sharing cite to collect data on the timing of flowering plants in Mt. Rainier National Park.

Rayna Harris

Integrative Biology, Center for Computational Biology and Bioinformatics
The University of Texas at Austin

Poster 12

X-team 2

Whitepaper:
White Paper: Integrative Neuroscience

The brain is a fascinatingly complex organ that has been the subject of intense study for centuries, but many mysteries about the brain still remain. Current brain initiatives call for multi-scale integration of the activity and structure of the brain in order to elucidate and link the neural circuits dynamics to brain function. As cutting edge technologies are being developed, neuroscience critically depends on developing both data repositories, analysis tools and theories for integrating real-time genomic, connectomic, optogeneic, electrophysiological, and behavioral data.

Raghava Mutharaju

Computer Science
Wright State University

Poster 13

X-team 6

Whitepaper:
Distributed Reasoning over Ontology Streams and Large Knowledge Base

With the rapid increase in the velocity and the volume of data, it is becoming increasingly difficult to effectively analyze the data so as to extract knowledge from it. Use of background knowledge (domain knowledge captured in the form of ontologies) and reasoning (to correlate and infer facts) can prove to be useful in tackling the Big Data monster. But existing reasoning approaches are not scalable. In this paper, we present a distributed reasoning solution that can scale with the data.

Dorian Rosen

Materials Science Engineering
University of Utah

Poster 14

X-team 7

Whitepaper:
Data Mining and Machine Learning to Guide Novel Thermoelectric Development

This white paper describes the possible uses of thermoelectric materials, and addresses the problems associated with conducting high-risk studies to synthesize novel compounds from chemical white space. By data mining the ever-increasing number of materials science publications, a comprehensive database is being constructed. Newly-developed machine-learning systems are being used to predict the thermoelectic properties of hypothetical materials, and bridge the gap between computational tools and experimental needs.

Dennis Linders

iSchool
University of Maryland, College Park

Poster 15

X-team 1

Whitepaper:
The Smart City as a Platform for Collaboration on Climate Change

Cities are at the forefront of the fight against climate change, because their concentration of resources provides the most environmentally-friendly way of delivering a high quality of life. Sustainable cities combine this advantage with a society-wide commitment to a low-carbon lifestyle. Yet the traditional tools of public administration are poorly equipped to facilitate this collaborative approach. Fortunately, advancements in Information and Communication Technologies (ICT) hold tremendous potential to address these shortcomings. Most promisingly, innovative urban leaders have begun to reshape both government and governance around a vision of a “Smart City” that collects vast amounts of data on the state and performance of its communities and then translates this data into actionable insights. Yet the adoption of these “smart city” innovations remains best described as experimental, as blind aspirations continue to far exceed validated best practices or proven implementation strategies. To bridge this gap, the proposed research project will conduct holistic case studies of three pioneering "smart cities" to identify effective business models for using "smart" infrastructure, data science, and connected citizens to promote community-wide action on climate change.

Charmgil Hong

Department of Computer Science
University of Pittsburgh

Poster 16

X-team 5

Whitepaper:
Multivariate Conditional Outlier Detection and Its Clinical Application

This paper summarizes our research that aims at developing automated methods of multivariate conditional outlier detection, and applying the methods to support clinical decision making. In particular, we are interested in identifying statistically unusual patient care patterns corresponding to medical errors based on data stored in electronic medical record (EMR) systems. We describe the problems and objectives of the research, and outline our model-based outlier detection approach. We also discuss the future directions and expected impacts of the research.

Haroon Raja

Electrical and Computer Engineering Department
Rutgers, The State University of New Jersey

Poster 17

X-team 5

Whitepaper:
Cloud K-SVD: A Dictionary Learning Algorithm for Big, Distributed Data

This paper studies the problem of data-adaptive representations for big, distributed data. It is assumed that a number of geographically-distributed, interconnected sites have massive local data and they are interested in collaboratively learning a low-dimensional geometric structure (dictionary) underlying these data.

Bharathi Asokarajan

Data Science and Analytics
University of Oklahoma

Poster 18

X-team 10

Whitepaper:
Pixel oriented visualization – An aid to analyze large-scale text data

Classics scholars work with text data that is not just Big, but also Interesting and Complex. We are developing new pixel-based text visualization technique that display the hierarchical structure of primary texts with their rich apparatus metadata in an accessible and comparable fashion. As a part of this, we are investigating new ways to support focus+context interactions across multiple scales of text. These visualization designs will help scholars to engage effectively and efficiently with the long and deep provenance of knowledge that surrounds some of humanity's most important historical works. We anticipate that successful application of new interactive visualization techniques for text analysis to a complex domain like classics will provide a clear direction for application to scholarship and learning on text, language, and communication in a wide variety of domains.

William Kearney

Earth and Environment
Boston University

Poster 19

X-team 8

Whitepaper:
Deriving process knowledge from data in coastal ecohydrology

Scientists interested in developing robust predictive models should aim for a synthetic modeling approach which combines the predictive power of empirical models with the process-driven understanding of physical models. I examine how this synthetic approach can improve the representation of processes in empirical models of salt marsh hydrology.

Saurabh Jha

Computer Science
University of Illinois at Urbana Champaign

Poster 20

X-team 6

Whitepaper:
LASE: Log Analysis and Storage Engine for Resiliency Study

There is a need to build exascale computers to further the progress of scientific studies and meet the ever growing demands of computing power. One of the critical problems – if not the most critical problem – in reaching exascale computing goal by the end of the decade is “designing fault tolerant applications and systems that can reach sustained petaflops to exaflops of performance”. Due to high number of errors and failures in such complex systems, it has become important to understand the reasons for errors and failures and take proactive action to contain the errors to support exascale computation. Logs serve as important source of information to do such study. However, due to nature and scale of these logs, it has become difficult if not impossible to process and extract meaningful information from these logs. LASE is a log analysis and storage engine that bring various techniques together in an unified framework that can handle petabytes of data and assist in building models for failure diagnosis, prediction and anomaly detection.

Victoria Villar

Astronomy and Astrophysics
Harvard

Poster 21

X-team 3

Whitepaper:
Classification of Intermediate-Luminosity Astronomical Transients

Stars materialize, live and die following a lifecycle that depends on both intrinsic properties and environmental factors. Their transient outbursts, interactions and deaths all encode important information about stellar evolution. Future large surveys, such as LSST, will produce 30+ TB of data daily which astronomers can use to study these transients. This paper describes possible classification techniques for analyzing the LSST dataset of intermediate-luminosity transients.

Timothy Jones

Biological Sciences
Louisiana State University

Poster 22

X-team 2

Whitepaper:
Seeing is believing with data visualization

This whitepaper outlines state of the art biodiversity identification practices and new visual methods using Kingdom Plantae and its children as models.

Daniel Cook

Molecular Biosciences
Northwestern University

Poster 23

X-team 1

Whitepaper:
Improving data-management and integration within resequencing-pipelines

A powerful strategy for identifying the genetic basis of phenotypes is to perform genome-wide association (GWA) analysis. GWA studies that utilize massively parallel sequencing rely on population resequencing pipelines to identify genetic variants. Resequencing pipelines require precise data handling and integration of data generated across a large series of steps from multiple programs to identify issues, confounding factors in analysis, or to identify interesting associations. However, integrating this data is challenging as it requires extensive file parsing, manipulation, and merging. Here, I propose the development of a database schema resembling an entity-attribute-value (EAV) model for storage of summary data generated at different steps within resequencing pipelines and a set of tools enabling integration of this system. This system improves data-handling within resequencing pipelines and facilitates comparison of variables across tools, samples, and between pipeline configurations.

Xin Yang

Computational Science
Middle Tennessee State University

Poster 24

X-team 10

Whitepaper:
Spatial Regularization for Multitask Learning and Application in fMRI Data Analysis

fMRI data has extremely complicated structure. Hence efficient and accurate models are necessary in detecting accurate neuronal activity, by incorporating spatial and spectral information. In this paper, we use General Linear Model to formulate the fMRI data, where each voxel is assumed as a task, and proposed a class of spatial Multi-task Learning models, which incorporates spatial information provided by each task’s neighborhood. Simulation and real application results show satisfactory performance from spatial Multi-task Learning algorithms.

Sepideh Pourazarm

Systems Engineering
Boston University

Poster 25

X-team 7

Whitepaper:
Improving Traffic Management Using Big Data

We study the routing problem for vehicle flows through a road network that includes both battery-powered Electric Vehicles (EVs) and Non-Electric Vehicles (NEVs). We seek to optimize a system-centric (as opposed to user-centric) objective aiming to minimize the total elapsed time for all vehicles to reach their destinations considering both traveling times and recharging times for EVs when the latter do not have adequate energy for the entire journey. We are validating the efficiency of our algorithm using real traffic data in terms of “average speed” on the road segments in Eastern Massachusetts provided by the City of Boston.

Susmit Shannigrahi

Computer Sc. Dept
Colorado State University

Poster 26

X-team 7

Whitepaper:
Named Data Networking for Large Scientific Data Management

This paper discusses how using Named Data Networking (NDN) reduces the complexities of large scientific data management. Scientific data collections require safe archiving and easy retrieval while maintaining data provenance and integrity. The large size and distributed nature of these datasets complicates already challenging data management task. NDN (Named Data Networking), a NSF project for investigating future Internet architectures, replaces IP endpoints by hierarchical content names. NDN implicitly overcomes many of the challenges associated with managing scientific data. We describe a framework developed with NDN to reduce such challenges.

Erika Helgeson

Department of Biostatistics
University of North Carolina

Poster 27

X-team 1

Whitepaper:
Nonparametric Cluster Significance Testing

We describe a proposed method for testing the statistical significance of putative clusters. Cluster analysis is an unsupervised learning strategy that can be used to identify groups of observations in data sets of unknown structure. Few methods are available that can assess the strength of clusters identified in a data set. The methods that are available often rely on distributional assumptions or are not optimized for high dimensional settings. We propose a novel non-parametric method for testing the null hypothesis that no clusters are present in a given data set which can be used in both high and low dimensional settings with optimal accuracy.

Qi Song

School of Electrical Engineering and Computer Science
Washington State University

Poster 28

X-team 6

Whitepaper:
Knowledge Search Made Easy: Effective Knowledge Graph Summarization and Applications

The rising Big Data tide requires powerful techniques to effectively search useful knowledge from information systems such as knowledge bases and knowledge graphs. Accessing and search complex knowledge graphs is difficult for end users due to query ambiguity, data heterogeneity and large scale data. We propose to develop effective knowledge summarization techniques to make the knowledge search process easy for end users. The knowledge graph summaries not only help users understand the complex knowledge data and search results, but can also suggest reasonable queries and support fast knowledge search. Our research will benefit a number of knowledge discovery applications including web and scientific search, social network analysis, cyber security and health informatics.

Chengrui Li

Department of Statistics and Biostatistics
Rutgers University

Poster 29

X-team 3

Whitepaper:
A Sequential Split-Conquer-Combine Approach for Analysis of Big Spatial Data

The task of analyzing massive spatial data is extremely challenging. In this paper we propose a sequential split-conquer-combine (SSCC) approach for analysis of dependent big data and illustrate it using a Gaussian process model, along with a theoretical support. This SSCC approach can substantially reduce computing time and computer memory requirements. We also show that the SSCC approach is oracle in the sense that the result obtained using the approach is asymptotically equivalent to the one obtained from performing the analysis on the entire data in a super-super computer. The methodology is illustrated numerically using both simulation and a real data example of a computer experiment on modeling room temperatures.

Jianbo Ye

Information Sciences and Technology
The Pennsylvania State University

Poster 30

X-team 10

Whitepaper:
Clustering Distributions at Scale: A New Tool for Data Sciences

We introduce a fast and parallel tool for clustering large-scale discrete distributions under the optimal transport distance. The significant computational cost brought by the optimal transport has left the problems of machine learning from such unstructured data almost untouched till today. Our proposed optimization method successfully resolves the scalability bottleneck of previous methods, hence is readily applicable for analyzing large distributional dataset without first specifying which form the distribution of data has to comply.

Berk Ustun

Electrical Engineering and Computer Science
MIT

Poster 31

X-team 7

Whitepaper:
Learning Tailored Risk Scores from Large-Scale Datasets

Risk scores are simple models that let users assess risk by adding, subtracting and multiplying a few small numbers. These models are widely used in medicine and crime prediction but difficult to learn from data because they need to be accurate, sparse, and use integer coefficients. We formulate the risk score problem as a mixed integer non-linear programming problem, and present a cutting-plane algorithm to solve it for datasets with large sample sizes.

Mahdi Ahmadi

Mechanical and Energy Engineering
University of North Texas

Poster 32

X-team 1

Whitepaper:
Big Data Mining Methods for Accurate Spatial Interpolation of Ozone Pollution

This paper explains the importance of using big data methods to extract accurate spatial interpolation functions for ozone pollution prediction.

Sayamindu Dasgupta

Media Arts and Sciences
Massachusetts Institute of Technology

Poster 33

X-team 3

Whitepaper:
Large-scale analysis of novice programmer trajectories in an open-ended programming community

This white paper outlines some of the opportunities and challenges in analyzing trajectories of young novice programmers as they create, share, and remix media-rich programming projects, as well as participate socially in the Scratch online community (https://scratch.mit.edu). Scratch is open-ended by design, where anyone with a web-browser can create a wide variety of programming projects, ranging from games to science-simulations, from interactive stories to computational music programs. This open-ended context poses a number of challenges for the large-scale analysis and measurement of learning outcomes. Addressing these challenges hold promise not just for understanding the use of Scratch as a learning environment, but also, as the learn-to-code movement in the United States and elsewhere gathers momentum, methods and strategies formulated for Scratch data-research has the potential to be useful for research on other similar tools and environments that teach young people programming.

Abhijit Bendale

Computer Science
University of Colorado at Colorado Springs

Poster 34

X-team 10

Whitepaper:
Towards Open World Recognition

With the of advent rich classification models and high computational power visual recognition systems have found many operational applications. Recognition in the real world poses multiple challenges that are not apparent in controlled lab environments. The datasets are dynamic and novel categories must be continuously detected and then added. At prediction time, a trained system has to deal with myriad unseen categories. Operational systems require minimal downtime, even to learn. To handle these operational issues, we present the problem of Open World Recognition and formally define it. We prove that thresholding sums of monotonically decreasing functions of distances in linearly transformed feature space can balance “open space risk” and empirical risk. Our theory extends existing algorithms for open world recognition.

Justin Brandenburg

Computational Social Science
George Mason University

Poster 35

X-team 3

Whitepaper:
Replicating Cyber-attack Patterns of Behavior using Bipartite Network Analysis and Agent-Based Modeling

Introducing a method of evaluating cyber traffic behavior via bipartite graph analysis and implementing agent-based modeling to simulate and test network capability.

Kevin Keys

Biomathematics
UCLA

Poster 36

X-team 7

Whitepaper:
Parsimonious model selection in genome-wide association studies

This white paper sketches an issue with model selection in multiple regression analysis of genome-wide association studies. Based on our current research, we suggest a remedy to perform these large analyses on a desktop machine.

Yang Wang

System and Information Engineering
University of Virginia

Poster 37

X-team 8

Whitepaper:
Maintained Individual Data Distributed Likelihood Estimation (MIDDLE)

Maintained Individual Data Distributed Likelihood Estimation (MIDDLE) paradigm will construct and validate a revolutionary model for the accomplishing health science human-subject research with networked devices. The MIDDLE paradigm is that data can be privately maintained by participants on their personal devices and never revealed to researchers, while statistical models are fit and scientific hypotheses are tested.

Ethan Rudd

Computer Science
University of Colorado at Colorado Springs

Poster 38

X-team 10

Whitepaper:
The Extreme Value Machine

This paper describes a scalable, non-linear model called the Extreme Value Machine (EVM), an analog to the Support Vector Machine (SVM) derived from statistical Extreme Value Theory. The EVM is far more scalable than a kernelized SVM, exhibits comparable accuracy on closed set datasets (where all classes are known at test time), and avoids the need for a parameter grid search. This allows the EVM model to scale to large datasets that are computationally infeasible for non-linear SVMs. Moreover, unlike SVMs, our EVM model performs well in the open-set regime (when unknown classes are present at test time), achieving state-of-the-art results on open-set datasets.

Wei Xie

Department of Electrical Engineering & Computer Science
Vanderbilt University

Poster 39

X-team 2

Whitepaper:
Collaborative data science without violating privacy: a case study from genome research

Data privacy is an important issue for many data science disciplines involving human subjects. Here we take genome research as a representative case study to illustrate the privacy concerns and countermeasures. We develop a novel cryptography-based method to enable collaborative studies via meta-analysis without violating privacy. We also show the relevance of this method to the wider research community of data science.

Marina Kogan

Computer Science
University of Colorado Boulder

Poster 40

X-team 5

Whitepaper:
Why Data Science Needs to Attend to Contextual Behavior: The Case of Crisis Informatics

Crisis informatics is a study of how people converge, spread information, and cooperate around the tasks they deem important on social media in crisis. The socio-behavioral focus of crisis informatics necessitates that research methodology accounts for the social context of users’ activity. On the other hand, the volume of the social media data requires the use of data science approaches, which in the current form often decontextualize the social activity. I propose several methodological innovations that would propel big data methods towards attending to the highly-situated and contextual nature of the social activity in crisis.

Taisuke Imai

Division of the Humanities and Social Sciences
California Institute of Technology

Poster 41

X-team 3

Whitepaper:
Detecting Habitual Behavior in Natural Consumer Choice Data

Habit is a process by which a stimulus automatically generates an impulse toward action, based on learned association between stimulus and response. In this project we seek to identify habitual choices and shifts from habit to model-directed behavior using big and broad data sets of natural consumer decision making such as online shopping, online stock trading, and commuter route choice.

Lincoln Sheets

Informatics Institute
University of Missouri

Poster 42

X-team 3

Whitepaper:
Data Mining to Predict Healthcare Utilization in Managed Care Patients

Systematic association mining of clinical attributes from the electronic health records of adult primary care patients to discover predictors of high healthcare utilization.

Nancy (Xin Ru) Wang

Computer Science and Engineering
University of Washington

Poster 43

X-team 6

Whitepaper:
Decoding neural signals with natural multimodal data

This paper outlines our project that will use deep and unsupervised techniques to analyze a large multimodal natural (non-experimental) dataset, including simultaneous video, audio and ECoG/EEG signals for computational neuroscience and brain-computer interface applications. This project combines techniques from multiple fields in order to fully leverage the multimodality of the dataset.

Kin Gwn Lore

Mechanical Engineering
Iowa State University

Poster 44

X-team 1

Whitepaper:
Pattern Discovery from Large-scale Computational Fluid Dynamic Data using Deep Learning

This paper outlines our research in solving an inverse fluid dynamics design problem using large-scale simulation data. The forward problem of sculpting fluid flow by placing a set of pillars in a fluid channel has been simulated and experimentally validated. We now explore the applicability of machine learning models (specifically deep learning) in the inverse problem to serve as a map between user-defined flow shapes and the corresponding sequence of pillars in the design of microfluidic devices.

Anuj Karpatne

Computer Science and Engineering
University of Minnesota

Poster 45

X-team 4

Whitepaper:
Global Monitoring of Inland Water Dynamics: A Data-driven Approach

Freshwater, which is only available in inland water bodies such as lakes, reservoirs, and rivers, is increasingly becoming scarce across the world and this scarcity is posing a global threat to human sustainability. A global monitoring of inland water bodies is necessary for policy-makers and the scientific community to address this problem. The promise of data-driven approaches coupled with availability of remote sensing data presents opportunities as well as challenges for global monitoring of inland water bodies. My research aims at developing predictive models that address the challenges in analyzing remote sensing data for creating the first global monitoring system of inland water dynamics.

Jesse Hoff

Animal Science/Genetics
University of Missouri

Poster 46

X-team 2

Whitepaper:
Incorporation of Genomic Data in US Cattle Breeding and Production

We are analyzing appropriate practices for routine incorporation of genetic information in both the selection and care of US livestock. The rapid decrease in cost of genetic data from chips or short read data has lead to the accumulation of large, profitable data sets that provide dense quantification of the genetic component of livestock production. Unlike human or model organism genetics, the regulatory environment surrounding livestock genomics has enabled immediate application of cutting edge genomic technology in a commercial setting. Low cost genomic data informs high leverage decision making at farms ranging in size and technical expertise, enabled by well shared central genomic data repositories that inform genomic breeding models. We seek to develop tools that lower costs and allow genomic information to provide value across the whole livestock production cycle from reproduction, to immunity to ecological sustainability and carcass quality.

Zachary Foster

Botany and Plant Pathology
Oregon State University

Poster 47

X-team 9

Whitepaper:
Automated website generation for reproducible and shareable data science

Describes the potential benefits and challenges of using literate programming for embedded documentation in data science projects and introduces a new R package under development that generates website representations of project folders. It uses the names of files/folders and options specified in configuration files to infer a menu hierarchy and organize the content of files. Literate programming documents are executed and their output is integrated into the website along with PDF files, images, and other HTML files in the project.

Daqing Yun

Department of Computer Science
University of Memphis

Poster 48

X-team 8

Whitepaper:
An Integrated Transport Solution to Big Data Movement in High-performance Networks

We propose and develop an integrated transport solution to big data movement in high-performance networks in support of data- and network-intensive scientific applications.

Kelly Spendlove

Mathematics
Rutgers University

Poster 49

X-team 10

Whitepaper:
Determining Periodicity In Data

In the last few years high-throughput technologies have enabled the efficient and inexpensive collection of massive amounts of data. In many cases the data are high dimensional and being generated by some nonlinear system. In such a situation one is interested in both the geometry of the data and the action of the unknown nonlinear system. One of the most fundamental problems in analyzing nonlinear systems is determining whether the system is periodic. However, the past few decades of dynamical systems theory have shown nonlinear systems can exhibit extremely complex behavior with respect to both system variables and parameters. Such complex behavior proven in theoretical work has to be contrasted with the capabilities of application; in the case of modeling multiscale processes, for instance, measurements may be of limited precision, parameters are rarely known exactly and nonlinearities are often not derived from first principles. This contrast suggests that extracting a robust characterization of the periodic behavior is of greater importance than a detailed understanding of the fine structure. For such a characterization, we propose an approach which incorporates Takens’ embedding theorem, persistent homology and diffusion maps.

Can Hu

Department of Statistics and Biostatistics
Rutgers, The State University of New Jersey

Poster 50

X-team 3

Whitepaper:
Advanced Data Analytics of Railroad Infrastructure Degradation to Improve Transportation Safety

This white paper introduces some possible models to capture the track geometry degradation.