Fazle FaisalAlex GeorgesTimothy Goodrich
Charmgil HongArif KhanMarina Kogan
Matthew LongPardis MiriHaroon Raja
NILOTHPAL TALUKDER

Fazle Faisal

Computer Science and Engineering
University of Notre Dame

Poster 97

X-team 5

Whitepaper:
Novel Algorithmic Directions for Analyzing Large-scale Biological Network Data

Network science has been an indispensable research area that spans many domains, including computational biology. Biomolecules such as genes or their protein products interact with each other to carry out biological functions. And this is exactly what biological networks model. Thus, the analysis of biological networks is essential for understanding complex biological processes and diseases, as well as for designing effective drugs. In this context, this paper demonstrates novel algorithmic directions for analyzing large-scale biological network data, which is essential to reveal novel insights into complex biological processes and diseases, and eventually lead us towards personalized medicine and therapeutics.

Bio:

I am a 4th year Ph.D. student in the Department of Computer Science and Engineering at the University of Notre Dame. I work in Complex Networks Lab, advised by Prof. Tijana Milenkovic. My research is focused on designing and developing algorithms and computational methods for integrative, dynamic and comparative network analysis, and applying the methods to address real-world problems with important biological applications, such as understanding complex molecular mechanisms underlying cancers, aging, and protein folding.

Interest areas:
Complex networksData miningHigh performance computing

Alex Georges

Physics/Mathematics
University of California, San Diego

Poster 54

X-team 5

Whitepaper:
Persistent Homology: A New Statistics

Topological data analysis is a new approach to analyzing the structure of high dimensional datasets. Persistent homology, specifically, generalizes hierarchical clustering methods to identify significant higher dimensional properties that are out of reach of any other approach.

Bio:

I recently completed my second year of the PhD program in physics. I have been working in both the mathematics and physics departments, often times on cross collaborative projects. A topic I investigate in the math department is a novel field of data analytics called topological data analysis. I have been utilizing this recently developed field to help uncover fundamental structure and patterns in big data.

Interest areas:
Data AnalyticsHigh Energy Particle TheoryQuantum Gravity

Timothy Goodrich

Computer Science
North Carolina State University

Poster 91

X-team 5

Whitepaper:
Extracting Key Structural Properties from Graph-Based Models

Motivated by recent work in quantum computing, we propose a new data science problem at the intersection of graph theory and data mining. Specifically, given an ensemble of graph models containing domain-specific metadata, we want to identify crucial graph structural properties for predicting model quality scores. Constructing a method for solving this problem leads to direct advances in quantum computing and high performance computing, with potential applications in other domains.

Bio:

I am a second year student working in the Theory in Practice lab at NC State. Using tools from both mathematics and computer science, I work directly with domain scientists to advance their state-of-the-art. Currently I am utilizing structural graph theory to develop faster methods of compiling adiabatic quantum computing programs. I am always looking for new problems and researchers to collaborate with. Email me at tdgoodri@ncsu.edu if you'd like to talk.

Interest areas:
Data-Driven ScienceGraph TheoryQuantum Computing

Charmgil Hong

Department of Computer Science
University of Pittsburgh

Poster 16

X-team 5

Whitepaper:
Multivariate Conditional Outlier Detection and Its Clinical Application

This paper summarizes our research that aims at developing automated methods of multivariate conditional outlier detection, and applying the methods to support clinical decision making. In particular, we are interested in identifying statistically unusual patient care patterns corresponding to medical errors based on data stored in electronic medical record (EMR) systems. We describe the problems and objectives of the research, and outline our model-based outlier detection approach. We also discuss the future directions and expected impacts of the research.

Bio:

I am a fifth year Ph.D. student in the Department of Computer Science at the University of Pittsburgh, studying under the supervision of Dr. Milos Hauskrecht. My primary research areas are Data Mining and Machine Learning, focusing on stochastic data modeling, discovering the correlation structures in data, and their applications to clinical anomaly detection.

Interest areas:
Machine learningStructured predictionOutlier detection

Arif Khan

Computer Science
Purdue Universit

Poster 7

X-team 5

Whitepaper:
Large Scale Adaptive Anonymity via Parallel Approximate b-Matching

Data privacy is a necessary feature for data science applications. We discuss the potential of k-Anonymity, a privacy algorithm in the context of big data. We show some of the limitations of k-Anonymity and propose a heuristic solution to solve those problems. We also present the applicability of k-Anonymity to different domains of Data Sciences.

Bio:

My research interest includes high performance computing. Currently, I am developing a highly scalable algorithm for b-Matching.

Interest areas:
High Performance ComputingBioinformaticsGraph Algorithms

Marina Kogan

Computer Science
University of Colorado Boulder

Poster 40

X-team 5

Whitepaper:
Why Data Science Needs to Attend to Contextual Behavior: The Case of Crisis Informatics

Crisis informatics is a study of how people converge, spread information, and cooperate around the tasks they deem important on social media in crisis. The socio-behavioral focus of crisis informatics necessitates that research methodology accounts for the social context of users’ activity. On the other hand, the volume of the social media data requires the use of data science approaches, which in the current form often decontextualize the social activity. I propose several methodological innovations that would propel big data methods towards attending to the highly-situated and contextual nature of the social activity in crisis.

Bio:

With a background in Sociology and Computer Science, I am a computational social scientists interested in collective action. More specifically, my research is on the social group formation, dynamics, and cooperative work. Currently, I focus on the cooperative activities on social media in disaster - both natural hazards and political crises.

Interest areas:
Computational Social ScienceNetwork ScienceCrisis Informatics

Matthew Long

Department of Chemical and Biological Engineering
University of Wisconsin-Madison

Poster 75

X-team 5

Whitepaper:
Novel Pathway Prediction and Gene Identification

Current methods for novel pathway prediction have largely focused on utilizing a manually curated list of reaction operators and developing pathways for the production of a desired metabolite. We propose to automatically generate reaction rules in a clustered, hierarchical format which would allow for the inclusion of catalytically promiscuous enzymes as well as the potential usage of gene sequence data.

Bio:

My current research is focused on the incorporation of '-omics' data into constraint-based optimization of genome-scale metabolic models.

Interest areas:
Systems Biology

Pardis Miri

Computer Science and Psychology
UCSC and Stanford

Poster 71

X-team 5

Whitepaper:
Improving Affective Communication through Technology

Preventing emotional imbalances through haptic feedback is a substantial dimensional problem with numerous possible combinations of the factors (such as tactile sensor types, sizes, vibrotactile patterns, location on the skin, etc.) . Consequently, it seems infeasible to launch a study to evaluate each one of the haptic compositions because the operation is considered to be resource-hungry in terms of time, cost, and necessity of having a large sample size. To answer the question of "what haptic description can elicit or communicate a distinct emotion?", we proposed implementation of a biofeedback SmartWatch application to allow for sending a haptic composition and gathering both the sender and the receiver physiological data is currently feasible. Once, there is mass adoption for the application, it will be possible to gather a large amount of data and Finding patterns in such a large data set is a classical Big Data problem.

Bio:

How to improve affective communication through technology

Interest areas:
Improving Affective Communication

Haroon Raja

Electrical and Computer Engineering Department
Rutgers, The State University of New Jersey

Poster 17

X-team 5

Whitepaper:
Cloud K-SVD: A Dictionary Learning Algorithm for Big, Distributed Data

This paper studies the problem of data-adaptive representations for big, distributed data. It is assumed that a number of geographically-distributed, interconnected sites have massive local data and they are interested in collaboratively learning a low-dimensional geometric structure (dictionary) underlying these data.

Bio:

I am a third year PhD student in ECE dept. at Rutgers. My research interests are in area of distributed signal processing. Currently I am interested in solving dictionary learning problem for distributed data settings.

Interest areas:
Signal processingDistributed data processing

NILOTHPAL TALUKDER

Computer Science
Rensselaer Polytechnic Institute

Poster 70

X-team 5

Whitepaper:
Mining Graph Patterns in Massive Networks

Graphs are widely used to represent relationships among the entities, such as friendships in social networks, interactions in biological networks, and so on. Mining commonly occurring subgraph patterns from a massive social graph or citation network can help discover similar groups/behaviors, which may be of interest to social scientists. Likewise, a bioinformatics researcher may be interested in finding the common sub-structures within gene/protein networks. In the literature, this task is known as frequent subgraph mining (FSM). Although the problem has a great deal of importance, unfortunately, it is computationally hard due to the following major facts: 1) The search space for enumerating the subgraph patterns is exponential, 2) It requires subgraph isomorphism checking, which belongs to the class of NP-complete problems. The task has become highly challenging since the size of the graphs (e.g., social networks) has grown very large. For instance, the popular social network, Facebook, currently has 1.4 billion monthly active users. There is no extant FSM algorithm in the literature that can handle a graph that large. In our research we developed a scalable and distributed approach for mining frequent subgraph patterns from a single, massive network. Our algorithm can efficiently scale to billion node/edge networks. To the best of our knowledge this is the largest network considered, to date, for subgraph pattern mining.

Bio:

My research interest is in data mining, more specifically, frequent graph mining from large and typically sparse networks. Frequent graph mining (FSM) has numerous applications in areas, such as computational chemistry, bioinformatics, social networks, etc. In order to solve the FSM task, one has to generate all possible subgraph patterns and check subgraph isomorphisms in the input graph. The number of subgraph patterns, even for a moderate number can be exponentially large. Furthermore, subgraph isomorphism is a NP-complete problem. Now, last but not least, the size of the real-world networks, such as social network (consisting billions of users) poses a special challenge to this classic problem. I am interested in designing and developing scalable algorithms for this problem. Besides working on graph mining problems, I have also done research on data quality and data privacy in the recent past

Interest areas:
Data miningParallel computing