|Justin Brandenburg||Nina Cesare||Sayamindu Dasgupta|
|Can Hu||Charles Iaconangelo||Taisuke Imai|
|Chengrui Li||Lincoln Sheets||Victoria Villar|
Computational Social Science
Replicating Cyber-attack Patterns of Behavior using Bipartite Network Analysis and Agent-Based Modeling
Introducing a method of evaluating cyber traffic behavior via bipartite graph analysis and implementing agent-based modeling to simulate and test network capability.
Data Scientist at L-3 Data Tactics. Pursuing MA in Computational Social Science at George Mason University in Fairfax, VA. Received MS Applied Economics from Johns Hopkins and a BS Economics from Virginia Tech.
Exploring Demographic Dimensions of Big Social Media Data
Despite growing use of social media data to analyze society, these data are not fully utilized in the social sciences. One reason for this is the fact that demographic information about individuals within social media spaces is not readily available via profile content. This paper discusses existing tools for extracting demographic information embedded within social media profiles, as well as ways in which this process may be rendered more efficient, scalable and accessible. It concludes by addressing the ways in which adding demographic dimensions to social media data may expand opportunities for social science research.
I explore ways of using social media data to answer questions of interest to social scientists - such as how communities form and how identity is expressed - within social media contexts. Of particular interest to me is how these patterns may vary along demographic lines within these spaces
Media Arts and Sciences
Large-scale analysis of novice programmer trajectories in an open-ended programming community
This white paper outlines some of the opportunities and challenges in analyzing trajectories of young novice programmers as they create, share, and remix media-rich programming projects, as well as participate socially in the Scratch online community (https://scratch.mit.edu). Scratch is open-ended by design, where anyone with a web-browser can create a wide variety of programming projects, ranging from games to science-simulations, from interactive stories to computational music programs. This open-ended context poses a number of challenges for the large-scale analysis and measurement of learning outcomes. Addressing these challenges hold promise not just for understanding the use of Scratch as a learning environment, but also, as the learn-to-code movement in the United States and elsewhere gathers momentum, methods and strategies formulated for Scratch data-research has the potential to be useful for research on other similar tools and environments that teach young people programming.
I'm a graduate student in the Lifelong Kindergarten group at the MIT Media Lab. I am a part of the team behind Scratch, a visual, block-based programming language and environment and community designed for young people. The first part of my research focusses on designing and building systems that enable children to program with data. The other part of my research consists of understanding how and what children learn as they make projects using these systems.
|Education||Computer Science Education|
Department of Statistics and Biostatistics
Advanced Data Analytics of Railroad Infrastructure Degradation to Improve Transportation Safety
This white paper introduces some possible models to capture the track geometry degradation.
A first-year master candidate in statistics. Previously graduated from another graduate program on analytical finance. Now working on some data analysis on civil engineering as a student research assistant , especially on railway safety, for example, train collision, and track deterioration.
|Statistical Modeling||Data Mining|
Graduate School of Education
Optimizing the Use of Assessment Data to Support Educational Inferences
This paper proposes the application of methods traditionally used in Big Data to the modeling of student assessment data. The additional information extracted from item responses will be used to support the more ambitious inferences about student learning demanded by current educational policy.
I'm a PhD student at Rutgers investigating new methods of assessing students and modeling student achievement to facilitate education reform policy.
|Psychometrics||Latent Variable Modeling||Validity and Assessment|
Division of the Humanities and Social Sciences
Detecting Habitual Behavior in Natural Consumer Choice Data
Habit is a process by which a stimulus automatically generates an impulse toward action, based on learned association between stimulus and response. In this project we seek to identify habitual choices and shifts from habit to model-directed behavior using big and broad data sets of natural consumer decision making such as online shopping, online stock trading, and commuter route choice.
My general research interest is in behavioral economics and neuroeconomics. In particular, I am interested in how psychological factors, such as attention, influence individual decision making. I have worked with data from laboratory experiments, often with techniques such as eyetracking and mousetracking, to uncover decision making processes.
|consumer choice||experimental economics||neuroeconomics|
Department of Statistics and Biostatistics
A Sequential Split-Conquer-Combine Approach for Analysis of Big Spatial Data
The task of analyzing massive spatial data is extremely challenging. In this paper we propose a sequential split-conquer-combine (SSCC) approach for analysis of dependent big data and illustrate it using a Gaussian process model, along with a theoretical support. This SSCC approach can substantially reduce computing time and computer memory requirements. We also show that the SSCC approach is oracle in the sense that the result obtained using the approach is asymptotically equivalent to the one obtained from performing the analysis on the entire data in a super-super computer. The methodology is illustrated numerically using both simulation and a real data example of a computer experiment on modeling room temperatures.
My research interests are in meta-analysis, data mining especially text mining.
|Meta Analysis||Text mining|
Data Mining to Predict Healthcare Utilization in Managed Care Patients
Systematic association mining of clinical attributes from the electronic health records of adult primary care patients to discover predictors of high healthcare utilization.
I have over twenty years of industry experience in software design, development, testing, and project management. I studied medicine and am studying medical informatics, with a focus on clinical decision support in primary care.
|Medical Informatics||Clinical Decision Support|
Astronomy and Astrophysics
Classification of Intermediate-Luminosity Astronomical Transients
Stars materialize, live and die following a lifecycle that depends on both intrinsic properties and environmental factors. Their transient outbursts, interactions and deaths all encode important information about stellar evolution. Future large surveys, such as LSST, will produce 30+ TB of data daily which astronomers can use to study these transients. This paper describes possible classification techniques for analyzing the LSST dataset of intermediate-luminosity transients.
I study optical transients such as supernovae and supernova impostors at the Harvard-Smithsonian Center for Astrophysics. I am specifically interested in understanding the final, violent years of a massive star's life before it explodes as a supernova.
|Astrophysics||High energy transients||Machine Learning|
Clustering under Natural Stability Assumptions
Clustering is a well-studied problem in machine learning, with a wide variety of applications across many disciplines. The research has focused on finding approximation algorithms. However, with the recent explosion of big data, traditional approximation algorithms may not scale well both in time complexity and approximation guarantees. In a recent line of work, we add natural stability assumptions about the input data which allows us to devise simpler algorithms that have better guarantees than approximation algorithms. We explain the results in this area as well as open questions.
I am about to complete my first year as a PhD student in the computer science department at Carnegie Mellon University. I am fortunate to be advised by Nina Balcan. I received my BA in mathematics and computer science from Amherst College. I am interested in the design and analysis of algorithms and learning theory. My recent projects have been about clustering under natural stability assumptions, and fault tolerant clustering.