Justin BrandenburgNina CesareSayamindu Dasgupta
Can HuCharles IaconangeloTaisuke Imai
Chengrui LiLincoln SheetsVictoria Villar
Colin White

Justin Brandenburg

Computational Social Science
George Mason University

Poster 35

X-team 3

Whitepaper:
Replicating Cyber-attack Patterns of Behavior using Bipartite Network Analysis and Agent-Based Modeling

Introducing a method of evaluating cyber traffic behavior via bipartite graph analysis and implementing agent-based modeling to simulate and test network capability.

Bio:

Data Scientist at L-3 Data Tactics. Pursuing MA in Computational Social Science at George Mason University in Fairfax, VA. Received MS Applied Economics from Johns Hopkins and a BS Economics from Virginia Tech.

Interest areas:
Cyber

Nina Cesare

Sociology
University of Washington

Poster 69

X-team 3

Whitepaper:
Exploring Demographic Dimensions of Big Social Media Data

Despite growing use of social media data to analyze society, these data are not fully utilized in the social sciences. One reason for this is the fact that demographic information about individuals within social media spaces is not readily available via profile content. This paper discusses existing tools for extracting demographic information embedded within social media profiles, as well as ways in which this process may be rendered more efficient, scalable and accessible. It concludes by addressing the ways in which adding demographic dimensions to social media data may expand opportunities for social science research.

Bio:

I explore ways of using social media data to answer questions of interest to social scientists - such as how communities form and how identity is expressed - within social media contexts. Of particular interest to me is how these patterns may vary along demographic lines within these spaces

Interest areas:
Social mediacommunityidentity

Sayamindu Dasgupta

Media Arts and Sciences
Massachusetts Institute of Technology

Poster 33

X-team 3

Whitepaper:
Large-scale analysis of novice programmer trajectories in an open-ended programming community

This white paper outlines some of the opportunities and challenges in analyzing trajectories of young novice programmers as they create, share, and remix media-rich programming projects, as well as participate socially in the Scratch online community (https://scratch.mit.edu). Scratch is open-ended by design, where anyone with a web-browser can create a wide variety of programming projects, ranging from games to science-simulations, from interactive stories to computational music programs. This open-ended context poses a number of challenges for the large-scale analysis and measurement of learning outcomes. Addressing these challenges hold promise not just for understanding the use of Scratch as a learning environment, but also, as the learn-to-code movement in the United States and elsewhere gathers momentum, methods and strategies formulated for Scratch data-research has the potential to be useful for research on other similar tools and environments that teach young people programming.

Bio:

I'm a graduate student in the Lifelong Kindergarten group at the MIT Media Lab. I am a part of the team behind Scratch, a visual, block-based programming language and environment and community designed for young people. The first part of my research focusses on designing and building systems that enable children to program with data. The other part of my research consists of understanding how and what children learn as they make projects using these systems.

Interest areas:
EducationComputer Science Education

Can Hu

Department of Statistics and Biostatistics
Rutgers, The State University of New Jersey

Poster 50

X-team 3

Whitepaper:
Advanced Data Analytics of Railroad Infrastructure Degradation to Improve Transportation Safety

This white paper introduces some possible models to capture the track geometry degradation.

Bio:

A first-year master candidate in statistics. Previously graduated from another graduate program on analytical finance. Now working on some data analysis on civil engineering as a student research assistant , especially on railway safety, for example, train collision, and track deterioration.

Interest areas:
Statistical ModelingData Mining

Charles Iaconangelo

Graduate School of Education
Rutgers, The State University of New Jersey

Poster 53

X-team 3

Whitepaper:
Optimizing the Use of Assessment Data to Support Educational Inferences

This paper proposes the application of methods traditionally used in Big Data to the modeling of student assessment data. The additional information extracted from item responses will be used to support the more ambitious inferences about student learning demanded by current educational policy.

Bio:

I'm a PhD student at Rutgers investigating new methods of assessing students and modeling student achievement to facilitate education reform policy.

Interest areas:
PsychometricsLatent Variable ModelingValidity and Assessment

Taisuke Imai

Division of the Humanities and Social Sciences
California Institute of Technology

Poster 41

X-team 3

Whitepaper:
Detecting Habitual Behavior in Natural Consumer Choice Data

Habit is a process by which a stimulus automatically generates an impulse toward action, based on learned association between stimulus and response. In this project we seek to identify habitual choices and shifts from habit to model-directed behavior using big and broad data sets of natural consumer decision making such as online shopping, online stock trading, and commuter route choice.

Bio:

My general research interest is in behavioral economics and neuroeconomics. In particular, I am interested in how psychological factors, such as attention, influence individual decision making. I have worked with data from laboratory experiments, often with techniques such as eyetracking and mousetracking, to uncover decision making processes.

Interest areas:
consumer choiceexperimental economicsneuroeconomics

Chengrui Li

Department of Statistics and Biostatistics
Rutgers University

Poster 29

X-team 3

Whitepaper:
A Sequential Split-Conquer-Combine Approach for Analysis of Big Spatial Data

The task of analyzing massive spatial data is extremely challenging. In this paper we propose a sequential split-conquer-combine (SSCC) approach for analysis of dependent big data and illustrate it using a Gaussian process model, along with a theoretical support. This SSCC approach can substantially reduce computing time and computer memory requirements. We also show that the SSCC approach is oracle in the sense that the result obtained using the approach is asymptotically equivalent to the one obtained from performing the analysis on the entire data in a super-super computer. The methodology is illustrated numerically using both simulation and a real data example of a computer experiment on modeling room temperatures.

Bio:

My research interests are in meta-analysis, data mining especially text mining.

Interest areas:
Meta AnalysisText mining

Lincoln Sheets

Informatics Institute
University of Missouri

Poster 42

X-team 3

Whitepaper:
Data Mining to Predict Healthcare Utilization in Managed Care Patients

Systematic association mining of clinical attributes from the electronic health records of adult primary care patients to discover predictors of high healthcare utilization.

Bio:

I have over twenty years of industry experience in software design, development, testing, and project management. I studied medicine and am studying medical informatics, with a focus on clinical decision support in primary care.

Interest areas:
Medical InformaticsClinical Decision Support

Victoria Villar

Astronomy and Astrophysics
Harvard

Poster 21

X-team 3

Whitepaper:
Classification of Intermediate-Luminosity Astronomical Transients

Stars materialize, live and die following a lifecycle that depends on both intrinsic properties and environmental factors. Their transient outbursts, interactions and deaths all encode important information about stellar evolution. Future large surveys, such as LSST, will produce 30+ TB of data daily which astronomers can use to study these transients. This paper describes possible classification techniques for analyzing the LSST dataset of intermediate-luminosity transients.

Bio:

I study optical transients such as supernovae and supernova impostors at the Harvard-Smithsonian Center for Astrophysics. I am specifically interested in understanding the final, violent years of a massive star's life before it explodes as a supernova.

Interest areas:
AstrophysicsHigh energy transientsMachine Learning

Colin White

Computer Science
Carnegie Mellon University

Poster 96

X-team 3

Whitepaper:
Clustering under Natural Stability Assumptions

Clustering is a well-studied problem in machine learning, with a wide variety of applications across many disciplines. The research has focused on finding approximation algorithms. However, with the recent explosion of big data, traditional approximation algorithms may not scale well both in time complexity and approximation guarantees. In a recent line of work, we add natural stability assumptions about the input data which allows us to devise simpler algorithms that have better guarantees than approximation algorithms. We explain the results in this area as well as open questions.

Bio:

I am about to complete my first year as a PhD student in the computer science department at Carnegie Mellon University. I am fortunate to be advised by Nina Balcan. I received my BA in mathematics and computer science from Amherst College. I am interested in the design and analysis of algorithms and learning theory. My recent projects have been about clustering under natural stability assumptions, and fault tolerant clustering.

Interest areas:
Learning Theory