Subrina FarahBabak FarmaneshYirui Hu
William KearneyHazan MartiColin Raffel
Jin TaoYang WangDaqing Yun

Subrina Farah

Department of Family Medicine Research
University of Rochester

Poster 76

X-team 8

Whitepaper:
Full Questionnaire versus One Question Response: A Big Data Approach to Predict Patient Satisfaction Efficiently

Patient satisfaction has gained a valid recognition as a measure of quality care in healthcare services over the past decades1. From July-November 2013, a patient satisfaction survey data have been obtained from approximately 1700 admitted patients during their stay at a JCI accredited multi-national hospital located in South Asia. There were 31 survey questions grouped into 9 service areas and at the end of the survey questionnaire there was a question regarding to recommend this hospital to others. There is quite interesting that, by using only the one question about overall experience can correctly predict almost 98% of the time. Which is 1% higher than considering all 29 service questions. If the survey were reduced to one question of overall experience/overall satisfaction it would provide enough (with 98% accuracy) to predict patients’ recommendation. This would lead a potential way to obtain the same information and less data to collect, enter, manage and analyze than considering all 29 questions to predict patients’ recommendation.

Bio:

Research statistician and graduate data scientist for studies related to give quality care for hypertensive patients, HIV positive people through mobile apps and human interactive devices. Passionate about big data management to quantify patient satisfaction efficiently on the basis of minimal survey questionnaires.

Interest areas:
Big Data & BiostatisticsHealthcare ResearchPublic Health Policy Research

Babak Farmanesh

Industrial Engineering and Management
Oklahoma State University

Poster 95

X-team 8

Whitepaper:
Sparse Pseudo-input Local Kriging for Large Non-stationary Spatial Datasets with Exogenous Variables

Gaussian process (GP) regression is a powerful tool for building predictive models for spatial systems. However, it does not scale eefficiently for large datasets. Particularly, for high dimensional spatial datasets, the performance of GP regression further deteriorates. We propose a method which approximates the full GP for large spatial datasets with exogenous variables.

Bio:

Statistical machine learning

Interest areas:
322 Engineering North

Yirui Hu

Statistics
Rutgers

Poster 72

X-team 8

Whitepaper:
Detecting Network Anomaly: A Novel Latent Probabilistic Solution

This paper investigates several machine learning techniques to detect network anomaly. Various anomalies present different behaviors in wireless networks. Not all anomalies are known to networks. Unsupervised algorithms are desirable to automatically characterize the nature of traffic behavior and detect anomalies from normal behaviors. Essentially all anomaly detection techniques learn a model of the normal patterns in training data set, and then determine the anomaly score of a given testing data point based on the deviations from the learned patterns. Multi-cluster based analysis are valuable because they can obtain the insights of human behaviors and learn similar patterns in temporal traffic data. This paper leverages co-occurrence data that combine traffic data with generating entities, and then compares Gaussian probabilistic latent semantic analysis (GPLSA) model to a Gaussian Mixture Model (GMM) with temporal network data. A novel quantitative “Donut” algorithm of anomaly detection on the basis of model log-likelihood is proposed in this paper.

Bio:

My research focuses on big data analytics and statistical modeling. In particular, anomaly detection based on multi-cluster techniques and Bayesian framework latent model.

Interest areas:
Big data analytics and statistical modelingAnomaly Detection

William Kearney

Earth and Environment
Boston University

Poster 19

X-team 8

Whitepaper:
Deriving process knowledge from data in coastal ecohydrology

Scientists interested in developing robust predictive models should aim for a synthetic modeling approach which combines the predictive power of empirical models with the process-driven understanding of physical models. I examine how this synthetic approach can improve the representation of processes in empirical models of salt marsh hydrology.

Bio:

I broadly study the evolution of coastal landscapes. I aim to derive a physical understanding of the processes which control that evolution from in situ and remote sensing data.

Interest areas:
HydrologyGeomorphology

Hazan Marti

Systems Engineering
George Washington University

Poster 80

X-team 8

Whitepaper:
Measuring Emotional Dimensions on Twitter: A Case Study in Tracking Fear of Vaccination

In this paper, we present a computational text analysis technique for measuring dimensions of emotion on Twitter, a common social media platform. Latent Semantic Analysis is integrated with Profile of Mood States psychometric instrument in order to extract emotional loads from any context by applying text mining methods.

Bio:

In this paper, we present a computational text analysis technique for measuring dimensions of emotion on the social media platforms, Twitter. In light of the recent Disneyland Measles Outbreak, and the associated threat of vaccine refusal in the U.S., we focus our study on messages relevant to vaccination. Given the fast dissemination of opinion across social networking and blogging platform, Twitter enables the detection of public opinion in myriad contexts. We use Latent Semantic Analysis to measure the semantic similarity between emotions expressed on Twitter and keywords taken from the Profile of Mood States psychometric instrument. We highlight how even short messages can express emotional valence.

Interest areas:
Data mining applications on public healthLatent Semantic Analysis with social media corpusNatural Language Processing

Colin Raffel

Electrical Engineering
Columbia University

Poster 9

X-team 8

Whitepaper:
Learning Efficient Representations for Sequence Retrieval

We explore the problem of matching sequences of high-dimensional vectors to entries in very large sequence databases. When utilizing dynamic time warping distance to compare sequences, the local distance calculations can be prohibitively expensive when the data's dimensionality and intrinsic sampling rate is high. We therefore motivate the need for methods which can learn efficient representations for sequence comparison and discuss potential applications of these techniques.

Bio:

I am currently a PhD student and IGERT Fellow at Columba University in the Laboratory for the Recognition and Organization of Speech and Audio studying under Dan Ellis. My research focuses on machine learning methods for sequential data, especially audio signals. More specifically, my work focuses on machine listening, i.e. enabling computers to process, generate, and understand audio signals the way humans do.

Interest areas:
Machine LearningSignal Processing

Jin Tao

Electrical Engineering and Computer Science
Washington State University, Pullman

Poster 64

X-team 8

Whitepaper:
Toward Understanding and Engineering Ecological Processes for Sustainability

How can we combine large datasets and computation to solve sustainability problems? Toward this overarching goal, my research program focuses on leveraging machine learning techniques to understand ecological processes from large datasets and convert that understanding into policy decisions for sustainability.

Bio:

I have a strong background in mathematics and computational algorithms, and I deeply care about solving important real-world problems in the area of data science using those skills. My general research interests are in machine learning and data-driven science, and current focus of my research is on problems at the intersection of ecological science, environmental policies, and computational sustainability. I'm also a strong supporter of Women in Computer Science / Machine Learning / Data Science, and during and after my PhD training, I hope to motivate other female students to pursue similar careers telling them how fun it is to work on all sorts of interesting problems!

Interest areas:
Ecological Data ScienceComputational SustainabilityMachine Learning

Yang Wang

System and Information Engineering
University of Virginia

Poster 37

X-team 8

Whitepaper:
Maintained Individual Data Distributed Likelihood Estimation (MIDDLE)

Maintained Individual Data Distributed Likelihood Estimation (MIDDLE) paradigm will construct and validate a revolutionary model for the accomplishing health science human-subject research with networked devices. The MIDDLE paradigm is that data can be privately maintained by participants on their personal devices and never revealed to researchers, while statistical models are fit and scientific hypotheses are tested.

Bio:

I received my B.Sc. in physics and mathematics from Hong Kong University of Science and Technology in 2009, and M.Phil. in system engineering and engineering management from the Chinese University of Hong Kong in 2011. I joined the Predictive Technology Laboratory at University of Virginia as a Ph.D. student in 2014. My research interests include data mining, machine learning, distributed estimation and statistical software development. I am currently cooperating with faculty and graduate students in psychology department on the project "Maintained Individual Data Distributed Likelihood Estimation".

Interest areas:
Data mining

Daqing Yun

Department of Computer Science
University of Memphis

Poster 48

X-team 8

Whitepaper:
An Integrated Transport Solution to Big Data Movement in High-performance Networks

We propose and develop an integrated transport solution to big data movement in high-performance networks in support of data- and network-intensive scientific applications.

Bio:

Daqing Yun received the B.E. degree in software engineering and the M.E. degree in computer software and theory from Xidian University in 2009 and 2012, respectively. He is currently a doctoral student in the Department of Computer Science at University of Memphis, and works in the High-Performance Networking and Computing Group. His research interests include high-performance networking, workflow optimization, and parallel and distributed computing.

Interest areas:
High-performance NetworkingParallel and Distributed Computing