Jessica Counihan

Metabolic Biology
UC Berkeley

Poster 51

X-team 2

Whitepaper:
Use of big data platforms and the future directions of proteomic analysis and visualization

Proteomics, the study of protein products expressed by the genome, has become one of the leading high-throughput technologies in biology due to an increased interest in system-wide analyses of proteins. The mapping of complex proteomics data to biological processes has become impossible by manual means, and the need for computer-aided data analysis is essential for further progress of the field.

Lynn Waterhouse

Biological Oceanography
Scripps Institution of Oceanography, University of California at San Diego

Poster 52

X-team 4

Whitepaper:
Assessing small scale fisheries effort using satellite imagery data

High resolution remote sensing data provides the unique opportunity to assess small-scale fisheries, specifically fishing effort, at the global level. UCSD has been given access, through partnership with the DigitalGlobe Foundation, to a large repository of fine scale remote sensing data. We propose using a search algorithm to identify possible boats from the data and a crowd sourcing platform to verify the algorithm output.

Charles Iaconangelo

Graduate School of Education
Rutgers, The State University of New Jersey

Poster 53

X-team 3

Whitepaper:
Optimizing the Use of Assessment Data to Support Educational Inferences

This paper proposes the application of methods traditionally used in Big Data to the modeling of student assessment data. The additional information extracted from item responses will be used to support the more ambitious inferences about student learning demanded by current educational policy.

Alex Georges

Physics/Mathematics
University of California, San Diego

Poster 54

X-team 5

Whitepaper:
Persistent Homology: A New Statistics

Topological data analysis is a new approach to analyzing the structure of high dimensional datasets. Persistent homology, specifically, generalizes hierarchical clustering methods to identify significant higher dimensional properties that are out of reach of any other approach.

Joshua Little

Computer Science & Engineering
Washington University in St. Louis

Poster 55

X-team 10

Whitepaper:
Large Scale Understanding of Activities in Public Spaces

Obesity is one of the greatest problems currently in the United States. In order to effectively encourage human activity and help fight obesity, it is important to understand how people's use of public spaces is affected by different factors, including the built environment, the current weather, and recent public health awareness campaigns. By exploiting large-scale online image archives—such as the Archive of Many Outdoor Scenes, a data set archiving imagery from 30000 public, outdoor webcams from over the last 8 years—it is possible to obtain millions of images regarding people's use of public space. However, webcams are usually low-resolution enough that existing methods of detecting and classifying people in images perform poorly. Data driven methods are required to learn how to effectively use image context to solve this problem, and therefore to turn large image archives into useful resources to understand human behavior.

Lisheng Zhou

Department of Genetics
Rutgers, the State University of New Jersey

Poster 56

X-team 2

Whitepaper:
A tool of genetic marker discovery for general biologists

This white paper describes the challenge most biologists are facing that the lack of computational and statistical skills does not meet the desire of identifying genetic markers with increasing data available through advanced sequencing technologies.

Jennings Anderson

Computer Science
University of Colorado Boulder

Poster 57

X-team 4

Whitepaper:
Big Human Data: For Humans, by Humans

Proposed here is the notion of “big human data” analysis. As today’s “big data” mostly consists of human generated data such as social media or volunteered geographic information (VGI), our analysis techniques must evolve to account for the socio-behavioral aspects of the data production. I present this argument and a new tool for localized analysis of the massive OpenStreetMap editing history which aims to better understand the VGI practices surrounding the rapid digital convergence of online mappers in the wake of a disaster.

Shelly Trigg

Biological Sciences
University of California, San Diego

Poster 58

X-team 9

Whitepaper:
Finding biological relevance in large-scale protein network studies

Proteome-wide protein-protein interaction screening has recently been made possible by coupling pooling strategies with next-generation sequencing. The unprecedented quantity of data being generated makes it infeasible to efficiently retest every interaction found for biological relevance. This work discusses strategies for prioritizing interactions found in interactome screens based on their projected phenotypic influence.

Shiree Hughes

Institute for Sensing and Embedded Network Systems Engineering
Florida Atlantic University

Poster 59

X-team 7

Whitepaper:
Automated Detection of Anomalies in Streaming Sensing Systems

Our society is rapidly moving to large-scale sensor networks for everything from smart buildings to monitoring the environment. As individual systems grow from just a few devices to tens of millions of devices or 100s of millions of devices, not only does the amount of data generated increase, but the probability of transmission error and sensor malfunction also increases. Techniques must be devised to ensure an easy yet efficient method for monitoring such systems for abnormalities, damaged sensors, or other network malfunctions.

Natalia Diaz Rodriguez

Computer Science Department
University of California Santa Cruz

Poster 60

X-team 7

Whitepaper:
HYBRID POSSIBILISTIC AND PROBABILISTIC SEMANTIC MODELLING OF UNCERTAINTY FOR SCALABLE HUMAN ACTIVITY RECOGNITION

Human activity recognition in smart environments is a challenging but crucial task in Ambient Intelligence and Ambient Assisted Living. Promising results were obtained using knowledge engineering methods such as semantic modelling with fuzzy description logics. However, we found some issues that still can improve, and further automate, the knowledge representation and learning methodology. We propose the use of Probabilistic Soft Logic (PSL) as an extension with fuzzy ontologies, to deal with problems such as temporal constraints and variations, discovery of new patterns and anomalies, uncertainty treatment in collective inference, as well as scaling the method to large knowledge bases so that they can accommodate for the model's evolution in time.

Robert Tunney

Computational Biology
University of California, Berkeley

Poster 61

X-team 9

Whitepaper:
Accurate Site Assignment of Ribosome Footprint Data

This paper proposes a method for accurate A site assignment of complete ribosome footprint data. This task is performed in order to analyze codon level regulatory features of translation. Current methods use heuristic A site assignment rules, based on the canonical length of ribosome footprints and position of E/P/A sites in those canonical size footprints. This paper proposes an expectation maximization algorithm to learn the parameters that generate footprint data about an A site. It increases the amount of usable data by performing maximum likelihood site assignment for ambiguous reads, which were previously discarded.

Ryan Lee

Department of Civil and Environmental Engineering
Villanova University

Poster 62

X-team 6

Whitepaper:
Data Challenges in Stormwater Research: Extracting Event-Based Datasets from Hydrologic Monitoring Databases

Environmental monitoring involving large amounts of time series data is increasingly used in research to improve stormwater green infrastructure. The data easily exceeds the capacity of standard engineering desktop software, and many engineers and researchers are left unable to properly utilize the data being generated. Tools or methods are needed that combine time series analysis and/or functional processing languages with database queries to make the proper working dataset accessible to the engineering community.

Janani Balaji

Department of Computer Science
Georgia State University

Poster 63

X-team 6

Whitepaper:
Challenges in Massive Graph Databases

Graph databases offer an efficient way to store and access inter-connected data. However, as the graph size grows, in order to query the entire graph, it becomes necessary to make multiple trips to the storage device to filter and gather data based on the query. But I/O accesses are expensive operations and immensely slow down query response time and prevent us from fully exploiting the graph specific benefits that graph databases offer. There are a few shortcomings unique to graphs, that prevent us from developing a high performance graph database without compromising on scalability. This white paper briefs through those challenges and suggests some solutions to overcome them.

Jin Tao

Electrical Engineering and Computer Science
Washington State University, Pullman

Poster 64

X-team 8

Whitepaper:
Toward Understanding and Engineering Ecological Processes for Sustainability

How can we combine large datasets and computation to solve sustainability problems? Toward this overarching goal, my research program focuses on leveraging machine learning techniques to understand ecological processes from large datasets and convert that understanding into policy decisions for sustainability.

Pierre Bhoorasingh

Chemical Engineering
Northeastern University

Poster 65

X-team 6

Whitepaper:
Species identification of detailed kinetic models for direct comparison

The typical publication format of chemical kinetic models does not contain species connectivity information, preventing efficient model comparison. A semi-automated tool has been developed to assist a user in determining the species connectivities in a kinetic model. The tool has helps identify large kinetic and thermodynamic discrepancies between models as well as duplicate species within the same kinetic model.

Nima Salehi Sadghiani

Department of Industrial and Operations Engineering
University of Michigan

Poster 66

X-team 7

Whitepaper:
Retail Chain Network Design under Mixed Uncertainties

Retailing is one of the main business sectors in urban areas whose business continuity is very crucial especially in emergencies. Unexpected disruptions such as disruptions in supplies’ incoming flows to stores due to natural disasters may impose ever-lasting detrimental effects on the continuity of retail networks. In these situations, it is critical for retailing managers to be able to distribute supplies rapidly from their unaffected supply facilities to undisrupted retail stores especially those in affected areas in an efficient and effective manner. Moreover, uncertainties in parameters (demands, costs, and time) and possible partial or complete disruption of network’s facilities, vehicles, etc. are always threatening the optimality and feasibility of the developed plans. Retail Chain Network (RCN) designing involves several strategic decisions such as the number, location, and capacity of required facilities to provide requested supplies to given customer zones in a timely and efficient manner. For designing a RCN, anticipation of production, warehousing, distribution, transportation and demand management decisions, associated with costs, revenues and service levels are required.

Joshua Welch

Computer Science
The University of North Carolina at Chapel Hill

Poster 67

X-team 2

Whitepaper:
Finding Life in High-Dimensional Space: Identifying Cell Types from Single Cell Gene Expression Data

Recent technological advances have enabled measurements of the genes that individual cells use. Data from these experiments provide a treasure trove of information about the functions of individual genes in specifying the properties of different types of cells, but computational methods for interpreting these large, high-dimensional datasets are lacking. In this paper, I describe three challenges in identification of cell types from single cell gene expression data and identify three corresponding datasets for developing and benchmarking approaches to address these challenges.

Jiangxiao Qiu

Department of Zoology
University of Wisconsin-Madison

Poster 68

X-team 4

Whitepaper:
Assessing Future Benefits of Ecosystems: Multidimensional Spatial-temporal data

Understanding future prospects of ecosystems and the benefits they provide to society (or termed as "ecosystem service") is critical yet remains challenging in the context of unpredictable global changes. Such research is further complicated by Big Data challenges, in particular pertaining to multidimensional spatial-temporal data. This study seeks to develop approaches to analyze this large and complex dataset for detecting long-term trends, thresholds, interactions, and spatial-temporal dynamics of ecosystem services, as well as further translating understanding into decision-makings to achieve sustainability transition into the future.

Nina Cesare

Sociology
University of Washington

Poster 69

X-team 3

Whitepaper:
Exploring Demographic Dimensions of Big Social Media Data

Despite growing use of social media data to analyze society, these data are not fully utilized in the social sciences. One reason for this is the fact that demographic information about individuals within social media spaces is not readily available via profile content. This paper discusses existing tools for extracting demographic information embedded within social media profiles, as well as ways in which this process may be rendered more efficient, scalable and accessible. It concludes by addressing the ways in which adding demographic dimensions to social media data may expand opportunities for social science research.

NILOTHPAL TALUKDER

Computer Science
Rensselaer Polytechnic Institute

Poster 70

X-team 5

Whitepaper:
Mining Graph Patterns in Massive Networks

Graphs are widely used to represent relationships among the entities, such as friendships in social networks, interactions in biological networks, and so on. Mining commonly occurring subgraph patterns from a massive social graph or citation network can help discover similar groups/behaviors, which may be of interest to social scientists. Likewise, a bioinformatics researcher may be interested in finding the common sub-structures within gene/protein networks. In the literature, this task is known as frequent subgraph mining (FSM). Although the problem has a great deal of importance, unfortunately, it is computationally hard due to the following major facts: 1) The search space for enumerating the subgraph patterns is exponential, 2) It requires subgraph isomorphism checking, which belongs to the class of NP-complete problems. The task has become highly challenging since the size of the graphs (e.g., social networks) has grown very large. For instance, the popular social network, Facebook, currently has 1.4 billion monthly active users. There is no extant FSM algorithm in the literature that can handle a graph that large. In our research we developed a scalable and distributed approach for mining frequent subgraph patterns from a single, massive network. Our algorithm can efficiently scale to billion node/edge networks. To the best of our knowledge this is the largest network considered, to date, for subgraph pattern mining.

Pardis Miri

Computer Science and Psychology
UCSC and Stanford

Poster 71

X-team 5

Whitepaper:
Improving Affective Communication through Technology

Preventing emotional imbalances through haptic feedback is a substantial dimensional problem with numerous possible combinations of the factors (such as tactile sensor types, sizes, vibrotactile patterns, location on the skin, etc.) . Consequently, it seems infeasible to launch a study to evaluate each one of the haptic compositions because the operation is considered to be resource-hungry in terms of time, cost, and necessity of having a large sample size. To answer the question of "what haptic description can elicit or communicate a distinct emotion?", we proposed implementation of a biofeedback SmartWatch application to allow for sending a haptic composition and gathering both the sender and the receiver physiological data is currently feasible. Once, there is mass adoption for the application, it will be possible to gather a large amount of data and Finding patterns in such a large data set is a classical Big Data problem.

Yirui Hu

Statistics
Rutgers

Poster 72

X-team 8

Whitepaper:
Detecting Network Anomaly: A Novel Latent Probabilistic Solution

This paper investigates several machine learning techniques to detect network anomaly. Various anomalies present different behaviors in wireless networks. Not all anomalies are known to networks. Unsupervised algorithms are desirable to automatically characterize the nature of traffic behavior and detect anomalies from normal behaviors. Essentially all anomaly detection techniques learn a model of the normal patterns in training data set, and then determine the anomaly score of a given testing data point based on the deviations from the learned patterns. Multi-cluster based analysis are valuable because they can obtain the insights of human behaviors and learn similar patterns in temporal traffic data. This paper leverages co-occurrence data that combine traffic data with generating entities, and then compares Gaussian probabilistic latent semantic analysis (GPLSA) model to a Gaussian Mixture Model (GMM) with temporal network data. A novel quantitative “Donut” algorithm of anomaly detection on the basis of model log-likelihood is proposed in this paper.

Sushant Mahajan

Department of Physics & Astronomy
Georgia State University

Poster 73

X-team 1

Whitepaper:
Automatic Detection and Characterization of Solar Filaments

The Solar Dynamics Observatory sends 1.5 TB data every day back to Earth. In this data lie the observations of various solar phenomena. The observations of filaments are crucial for space weather. We are trying to improve an automated code that detects and analyzes solar filaments so that it can be adopted for the next solar observatory in Hawaii which will record 5 TB of data every day.

Jeff Ratzloff

Physics and Astronomy
University of North Carolina at Chapel Hill

Poster 74

X-team 7

Whitepaper:
The Data Challenge of the First Gigapixel Full-sky Telescope

We have built and recently deployed a new class of telescope that solves the challenges of rare, short timescale objects. In doing so, a significant data set is created that presents new processing, storage, and computational challenges.

Matthew Long

Department of Chemical and Biological Engineering
University of Wisconsin-Madison

Poster 75

X-team 5

Whitepaper:
Novel Pathway Prediction and Gene Identification

Current methods for novel pathway prediction have largely focused on utilizing a manually curated list of reaction operators and developing pathways for the production of a desired metabolite. We propose to automatically generate reaction rules in a clustered, hierarchical format which would allow for the inclusion of catalytically promiscuous enzymes as well as the potential usage of gene sequence data.

Subrina Farah

Department of Family Medicine Research
University of Rochester

Poster 76

X-team 8

Whitepaper:
Full Questionnaire versus One Question Response: A Big Data Approach to Predict Patient Satisfaction Efficiently

Patient satisfaction has gained a valid recognition as a measure of quality care in healthcare services over the past decades1. From July-November 2013, a patient satisfaction survey data have been obtained from approximately 1700 admitted patients during their stay at a JCI accredited multi-national hospital located in South Asia. There were 31 survey questions grouped into 9 service areas and at the end of the survey questionnaire there was a question regarding to recommend this hospital to others. There is quite interesting that, by using only the one question about overall experience can correctly predict almost 98% of the time. Which is 1% higher than considering all 29 service questions. If the survey were reduced to one question of overall experience/overall satisfaction it would provide enough (with 98% accuracy) to predict patients’ recommendation. This would lead a potential way to obtain the same information and less data to collect, enter, manage and analyze than considering all 29 questions to predict patients’ recommendation.

Mayank Kejriwal

Computer Science
University of Texas at Austin

Poster 77

X-team 9

Whitepaper:
Unsupervised Instance Matching on Schema-free Linked Data

Linked Data is a global effort that has resulted in the publication of knowledge bases such as Freebase and DBpedia. This white paper describes a longstanding Artificial Intelligence problem called instance matching, and its emergence as a Big Data problem in the Linked Data community. It also provides a high-level outline of an architectural solution being developed by the author as part of his research efforts.

Jia Li

College of Science and Engineering
University of Minnesota

Poster 78

X-team 4

Whitepaper:
Spatio-­Temporal Analysis of Hospital Admission Data for Improved Population Health in New York City

There has been an increased focus on Precision Medicine -- customized healthcare delivery based on individual patient information (e.g. family history, genetic information, etc.) -- highlighted by President Obama’s $215M “Precision Medicine Initiative”. While this route is promising, there are several knowledge gaps (especially in uncertainty quantification) that limit Precision Medicine’s applicability. An alternative approach to creating a healthy society is to focus on how large-scale environmental factors -- natural environment, social environment, built environment, demographics, etc. -- as opposed to individual genetic information (such as those used in Precision Medicine) can inform community or population health. This work focuses on applying a spatio-temporal analysis for hospital admissions data to understand how various environmental factors influence the access and type of care patients from different communities might need, with a goal of improving population health for communities in New York City (NYC).

Krishna Karthik Gadiraju

Computer Science
North Carolina State University

Poster 79

X-team 4

Whitepaper:
Detecting Extreme Events in Global Gridded Climate Data using Gaussian Processes

Extreme weather events refer to events such as droughts, heat waves, floods, cyclones, wildfires etc. Several extreme weather events, such as the heat waves in Europe in 2003 and Russia in 2010, the California Drought etc., have increased in frequency in the past few decades. In order to be able to predict these extreme events, and understand their impact on the economy, there is an increasing need to study these events. Climate and weather data are spatiotemporal in nature; therefore the methods to identify these events should take into account the spatial and temporal autocorrelations. Techniques to find extreme events are often called as anomaly and/or outlier detection. Gaussian Process (GP) learning is one such method, which can model spatial and temporal correlations.In this paper, we briefly explain how GP can be applied for anomaly detection, and the problems faced by GP when dealing with big climate data.

Hazan Marti

Systems Engineering
George Washington University

Poster 80

X-team 8

Whitepaper:
Measuring Emotional Dimensions on Twitter: A Case Study in Tracking Fear of Vaccination

In this paper, we present a computational text analysis technique for measuring dimensions of emotion on Twitter, a common social media platform. Latent Semantic Analysis is integrated with Profile of Mood States psychometric instrument in order to extract emotional loads from any context by applying text mining methods.

Vanessa Gray

Genome Sciences
University of Washington

Poster 81

X-team 9

Whitepaper:
Harnessing large-scale mutagenesis data to improve protein engineering

This proposal aims to couple machine learning with large-scale mutagenesis datasets to accurately predict quantitative mutational effects. The resulting computational tool will have the capability to predict function-enhancing mutations, and will facilitate protein engineering innovation.

Rachel Gittelman

Genome Sciences
University of Washington

Poster 82

X-team 2

Whitepaper:
The challenges of high-dimensional expression quantitative trait locus inference

This paper discusses ways of increasing power to detect associations in high-dimensional genotype-expression data sets that span multiple tissue types. Although methods exist that effectively combine data across tissues for joint analysis, they are limited in scope and not widely used currently.

Fei Wang

Waksman Institute
Rutgers University

Poster 83

X-team 2

Whitepaper:
Personalized Disease Networks: A New Approach for Understanding and Predicting Complex Processes with Applications to Cardiovascular Diseases

we propose to build personalized patient networks, which represent the disease evolution of patients across subsequent hospitalizations. These networks are individual based and represent the evolutionary steps of various cardiovascular conditions, diseases and procedures. We will use the word network to represent a data based construction in a form of a connected graph to derive disease evolution pathway of each individual patient. A dataset of networks will be obtained once we build each individual network. Then we will proceed by clustering the personalized networks to 1) summarize more general networks within the cluster; 2) predict the cardiovascular mortality for each cluster. This network algorithm can also be adapted to other personalized pattern analysis.

Zane Goodwin

Department of Medicine
Washington University in St. Louis

Poster 84

X-team 9

Whitepaper:
Identifying Antibiotic Resistance Genes In Hospital-Acquired Bacterial Infections

Hospital-acquired bacterial infections can cause a wide range of untreatable infections, which without antibiotic treatment will lead to fatality and therefore pose a significant threat to the health and recovery of hospitalized individuals. The identification of antibiotic resistance genes (ARGs) in the genome sequences of hospital-acquired infections can help identify the genes driving antibiotic resistance. Straightforward phylogenetic methods have been employed to solve this problem, but computational complexity of phylogenetic methods scales exponentially with the number of genomes being compared. Compositional methods have been introduced to help reduce the search space for ARGs in bacterial genome sequences, but they cannot pinpoint the exact location of ARGs in a genome. Hence, a combination of phylogenetic and compositional methods is needed in order to identify ARGs with a high degree of specificity. Solving the big data problem of how to best identify ARGs in large bacterial genomics datasets will help to answer larger biological questions about how ARGs appear and evolve in bacterial populations and provide practical insights that can help prevent outbreaks of antibiotic-resistant bacteria in hospitals.

Etienne Fluet-Chouinard

Center for Limnology
University of Wisconsin-Madison

Poster 85

X-team 4

Whitepaper:
Estimating global inland fish catch from case study extrapolation.

The unreliability of nationally reported statistics on inland fish catch leaves a large uncertainty around the status of the inland fisheries, and limits responsible management of the fish resource. Alternative approaches for inventorying the status and trend of inland fisheries must be explored, however, by their simplicity, current existing yield models at the global scale do not provide a credible alternative to nationally reported statistics. I propose to generate global predictions of inland fish yield from a machine learning approach. The proposed approach presents the challenges of data integration from multiple sources including large data layers, such as a high-resolution wetland map to distinguishing water bodies types, as well as of adequate model selection among machine learning methods. This analysis will provide a new point of comparison for assessing the quality of national reporting and FAO’s own confidence level in reporting.

Alireza Borhani

Department of Construction Management
University of Washington

Poster 86

X-team 1

Whitepaper:
Building User Audit: Capturing Behavior, Energy, and Culture

Data analytics is emerging as an important management tool for the built environments. Utilizing data gathering and processing techniques result in dramatic improvements in building performance. However, studies show a significant discrepancy between predicted and actual performance mainly because of a gap in analysis of the occupant's impacts. Therefore, the main objective of this interdisciplinary research is to provide an audit tool that characterizes the building user behaviors and determine their influences on energy consumption. This audit tool (named Building User Audit Procedure, BUAP) introduces a procedure for an optimized data collection and analysis. The proper implementation of BUAP leads to more sustainability through improving energy efficiency and minimizing environmental impacts in building industry.

Pramod Anantharam

Computer Science and Engineering
Wright State University

Poster 87

X-team 7

Whitepaper:
mHealth Based Approach for Asthma Management

Increasing availability of sensors and mobile devices have created unprecedented opportunities across many domains. Healthcare will have profound implications when doctors and patients have continuous access to physiological, physical, and environmental observations. We address a crucial problem of asthma management in children by utilizing sensors and mobile devices. We pose questions deemed useful by doctors in the context of asthma management. After preliminary data analysis, we propose patient health score and vulnerability score for informed decision making by doctors and patients. With personalized action recommendation, we aspire to reduce asthma attacks in children.

Sultanah Alshammari

Computer Science
University of North Texas

Poster 88

X-team 2

Whitepaper:
Modeling Disease Spread at Global Mass Gatherings

Spread of infectious diseases at global mass gatherings can pose health threats to both the hosting country and the participants’ countries of origin. The travel patterns at the end of these international events can initiate a global epidemic within a short period of time. Advanced surveillance systems and computational models are essential tools to estimate, study, and control epidemics at mass gatherings. In this paper, we present our ongoing efforts to model disease spread during the Hajj. We discuss the different aspects of modeling infectious diseases and the related to the Hajj season.

Yisha Yao

Biochemistry
Rutgers, the State University of New Jersey

Poster 89

X-team 9

Whitepaper:
Combine spectral learning with advanced force field for protein structure prediction

It describes a methods that combines machine learning with advanced force field to improve protein structure prediction.

Long Feng

Department of Statistics and Biostatistics
Rutgers University

Poster 90

X-team 2

Whitepaper:
Methodological Issues of using graphical models for econometrics

Over the past decades, graphical model has become an increasingly popular big data technique to find the conditional dependence structure between millions of target random variables. In this paper, we considered the issue of using graphical models for economic datasets. Since economic datasets usually change over time, the problem of combining graphical model with time series model was discussed.

Timothy Goodrich

Computer Science
North Carolina State University

Poster 91

X-team 5

Whitepaper:
Extracting Key Structural Properties from Graph-Based Models

Motivated by recent work in quantum computing, we propose a new data science problem at the intersection of graph theory and data mining. Specifically, given an ensemble of graph models containing domain-specific metadata, we want to identify crucial graph structural properties for predicting model quality scores. Constructing a method for solving this problem leads to direct advances in quantum computing and high performance computing, with potential applications in other domains.

Mohammadreza Esfandiari

Computer Science
NJIT

Poster 92

X-team 10

Whitepaper:
BSP: An iterated local search heuristic for minimizing 0-1 loss

We consider the problem of minimizing the number of misclassified instances, known to be NPhard,plus the hinge loss. We implement and study an iterated local search algorithm and study it on 33 randomly real datasets from the UCI machine learning repository. We consider the effect of optimizing the error and margin terms separately and together with a parameter to balance the two terms. We show the effect of search parameters on our algorithm as well as correlation between training and test error of hyperplanes as the search progresses. Our program averages 12.4% error compared to the cross-validated liblinear program that reaches 12.4%. With bootstrapping our error reduces to 12.3% and the same holds for liblinear. In terms of running time We provide a a freely available beta version of our implementation.

Josh Melander

Electrical Engineering
Kansas State University

Poster 93

X-team 1

Whitepaper:
From Data to Knowledge: Temporal Network Analysis towards Implementing Robust Large-Scale Societal Changes

The past century has seen an unparalleled increase in technology, from the invention of the transistor to the enumerable services provided by the Internet it has become obvious that technology has a great effect on our lives. Some of the consequences of technological progress are obvious—increased connectivity, automation, new industries, etc.—while others are more subtle, whose connections and implications may go unnoticed. Being aware of these changes is dependent on our ability to understand and model the enumerable relations present in all aspects of society. Ultimately understanding the large scale implications (e.g. social, economical, environmental, etc.) of our actions, be they technological, legislative, or political, is going to take a shift in how we approach problems. It is not sufficient to take a reductionist point of view in understanding the world around us, we need to focus on the interactions between the various entities and how they give rise to large-scale, emergent behavior.

Solomon Vimal

Institute for the Environment, Department of Geography
UNC Chapel Hill

Poster 94

X-team 4

Whitepaper:
Global Flood Risk Management

Can we build better data warehouses and geo-servers to share flood (or disaster) risk maps and related data and models for interoperability and rapid update at global scale? Can we enable seamless visualization of risk maps by overlaying them on existing map services?

Babak Farmanesh

Industrial Engineering and Management
Oklahoma State University

Poster 95

X-team 8

Whitepaper:
Sparse Pseudo-input Local Kriging for Large Non-stationary Spatial Datasets with Exogenous Variables

Gaussian process (GP) regression is a powerful tool for building predictive models for spatial systems. However, it does not scale eefficiently for large datasets. Particularly, for high dimensional spatial datasets, the performance of GP regression further deteriorates. We propose a method which approximates the full GP for large spatial datasets with exogenous variables.

Colin White

Computer Science
Carnegie Mellon University

Poster 96

X-team 3

Whitepaper:
Clustering under Natural Stability Assumptions

Clustering is a well-studied problem in machine learning, with a wide variety of applications across many disciplines. The research has focused on finding approximation algorithms. However, with the recent explosion of big data, traditional approximation algorithms may not scale well both in time complexity and approximation guarantees. In a recent line of work, we add natural stability assumptions about the input data which allows us to devise simpler algorithms that have better guarantees than approximation algorithms. We explain the results in this area as well as open questions.

Fazle Faisal

Computer Science and Engineering
University of Notre Dame

Poster 97

X-team 5

Whitepaper:
Novel Algorithmic Directions for Analyzing Large-scale Biological Network Data

Network science has been an indispensable research area that spans many domains, including computational biology. Biomolecules such as genes or their protein products interact with each other to carry out biological functions. And this is exactly what biological networks model. Thus, the analysis of biological networks is essential for understanding complex biological processes and diseases, as well as for designing effective drugs. In this context, this paper demonstrates novel algorithmic directions for analyzing large-scale biological network data, which is essential to reveal novel insights into complex biological processes and diseases, and eventually lead us towards personalized medicine and therapeutics.

Sowmya Sridhar

Computer Science
New York University, School of Engineering

Poster 98

X-team 6

Whitepaper:
Weather Data Characterization Tool

The use of weather data in data analytics has become widely prevalent for a variety of applications, which aid in strategic business decision making for disciplines ranging from healthcare, transportation and planning, to economics, and in research investigations in the physical sciences. This has increased the demand for access to weather data in the form of daily, monthly, and annual summaries which contain values indicating the temperature (min, max, and average), precipitation, wind speed, snowfall and other parameters in record format. Consumers of the open weather dataset provided by the National Centers for Environmental Information [1] face a major challenge: they often lack the requisite domain knowledge to interpret the detailed meteorological data available. Hence, data scientists and other consumers of the data must write their own versions of tooling to translate the values into meaningful descriptions. The objective of this paper is to advocate for open data set publishers to supply data set interpretation tools (DSITs) to facilitate consumption of the released data. Specifically, we describe a DSIT for characterizing the weather on a particular day. The tool produces a data set, which is consumable by anyone without requiring them to possess expertise in the meteorological domain. DSITs are themselves analytics in that they are encoded representations of expert knowledge, which provide actionable insight.

Ishwarya Rajendrababu

Computer Science & Engineering
New York University School of Engineering

Poster 98

X-team 4

Whitepaper:
Weather Data Characterization Tool

The use of weather data in data analytics has become widely prevalent for a variety of applications, which aid in strategic business decision making for disciplines ranging from healthcare, transportation and planning, to economics, and in research investigations in the physical sciences. This has increased the demand for access to weather data in the form of daily, monthly, and annual summaries which contain values indicating the temperature (min, max, and average), precipitation, wind speed, snowfall and other parameters in record format. Consumers of the open weather dataset provided by the National Centers for Environmental Information [1] face a major challenge: they often lack the requisite domain knowledge to interpret the detailed meteorological data available. Hence, data scientists and other consumers of the data must write their own versions of tooling to translate the values into meaningful descriptions. The objective of this paper is to advocate for open data set publishers to supply data set interpretation tools (DSITs) to facilitate consumption of the released data. Specifically, we describe a DSIT for characterizing the weather on a particular day. The tool produces a data set, which is consumable by anyone without requiring them to possess expertise in the meteorological domain. DSITs are themselves analytics in that they are encoded representations of expert knowledge, which provide actionable insight.

Xinli Geng

Department of Civil and Environmental Engineering
university of nevada, reno

Poster 99

X-team 6

Whitepaper:
Vehicle to Pedestrian Communication Based on Client-Server Architecture

In this white paper, I present an architecture for the Vehicle-to-Pedestrian system, which would improve the safety of pedestrians and make a significant contribution to the development of connected vehicle. But recently, there are still some challenges to complete the full system.

Bradford Eilering

Fine Arts - Art Studio
Southern Illinois University, Edwardsville

Poster 100

X-team 1

Whitepaper:
Wave Pool: an environmental art installation.

I would like to address the challenges of our time and to assume the role of global citizen through avenues of public engagement with environmental artworks. Wave Pool is a visual aid with the substance to engage the public on many levels, including the artistic, the scientific and the environmentally minded.