Data Science Ethnography

Our team of ethnographers is studying how researchers work with large and complex datasets, and the institutions, programs, and communities that support data-intensive research. We embed ourselves in the places people are using and learning data science methods in order to understand how different communities make sense of and value data, and what is organizationally required to support data intensive practices and collaborations.

Traffigram: A Design Methodology for Distance Cartograms

Mapping technologies have the potential to help people better understand their transportation options and habits by conveying information about travel time, carbon dioxide emissions, expense, calories burned, and other metrics associated with transportation.The purpose of this research is to develop a new methodology for designing user-centered distance cartograms, and to build a platform that can be widely used by users, designers, researchers, and practitioners all around the world.

Collaborative games for science and engineering

We are creating an online, multi-player collaborative educational game that incorporates bioinformatics and cyberinfrastructure (CI) concepts aimed at high school students.

We are interested in the uptake of concepts of cyber problem solving specifically among young underrepresented minorities and women, and in better understanding the larger relationships between people, educational games, and infrastructural computational technologies. Collaboration and creative strategies are encouraged and integrated into the gameplay mechanics.

Emotion and affect in distributed collaboration

Distributed collaborative teams increasingly rely on online tools for interaction and communication for both social and task-oriented goals. We are expanding recent work linking emotion and affect to collaboration and creativity in order to model how this collaborative communication takes place by examining real-world examples, specifically from the chat logs of an international astrophysics collaboration.

Through qualitative analysis of the text-based communications created by these online collaborative projects, we aim to understand how team members express affect and emotion in this medium and the impact that these expressions have on group dynamics, creativity, and problem solving.

Visual analytics for chat and social media

Social media are producing vast quantities of valuable data every day. However, currently available techniques for extracting knowledge from these data sets are limited.

We are developing and evaluating interactive visual analytics tools and techniques for large, temporal, text communication data sets with many participants. Our work focuses on enabling efficient qualitative analysis of social media data sets by maintaining context, providing examples, and supporting group discussions around the data.

Scaling qualitative analysis of emotion with natural language processing and machine learning

Social media leave traces that enable rich qualitative analyses of how communication media shapes collaboration. However, the volume of these traces makes qualitative coding of entire data sets impractical.

Based on the labels of emotion expressed in each message, we have been able to train accurate machine learning classifiers to label the rest of the data set. These labels are fine-grained, including fear, anticipation, and interest, and several can be applied to a single message, forming a challenging formulation of the sentiment analysis problem. Through the use of diverse feature sets and sliding-window feature extraction, we have been able to create relatively accurate SVM classifiers. We are currently extending our approach by using graphical models to incorporate message context.

Previous projects

Human-centered biometric security

In the design of security systems, technical and security requirements usually drive decisions, but in practice, a security depends on a complex network of human and technical resources. Insufficient attention to human issues ranging from usability to organizational context can make even the most carefully engineered security system worse than useless.

We are studying security technology from a human-centered perspective. Our work includes usability evaluation of emerging security technologies, studies of people’s day-to-day password-related practices, and evaluation to determine the key interface design factors affecting user acceptance of biometric security systems.

Collaborative Creativity

Contrary to the popular belief of the “aha” moment of insight, recent work has indicated that creativity is often a series of incremental steps to discovery. As an idea is developed, it is amplified over time in its social context.

Dr. Aragon and her colleagues are developing and evaluating a dynamical systems theory of collaborative creativity based on distributed affect and interfaces that facilitate socio-emotional communication.

Thermostat Usability

Residential thermostats control about 10% of national energy use. In 2008 Energy Star concluded that homes with programmable thermostats were using more energy than homes with manual thermostats. As a result, Energy Star terminated the thermostat endorsement program in 2009 and decided that any future endorsement program must include specifications for minimum levels of usability.

Dr. Aragon and her colleagues and students performed multiple lab and field studies of thermostats and developed an innovative usability metric for thermostats to facilitate energy saving behavior. This metric is currently being evaluated in Energy Star’s draft specifications for programmable thermostats.


Sunfall is a visual analytics software toolset developed for the Nearby Supernova Factory (SNfactory), an international astrophysics experiment and the largest data volume supernova search currently in operation. Sunfall combines novel image processing algorithms and machine learning with highly interactive visual interfaces to enable collaborative, user-driven scientific exploration of supernova image and spectral data.

Sunfall is the first visual analytics system in production use at a major astrophysics project. The system resulted in labor savings of nearly 90%; the number of erroneous images was reduced by 70%; the innovative visual interfaces increased human efficiency by a factor of four.

Airflow Hazard Visualization

Dr. Aragon and her colleagues developed an airflow hazard visualization system for helicopter pilots. In a flight simulation usability study, the system significantly reduced the simulated crash rate (from 19% to 6.3%) among experienced pilots flying a high fidelity, aerodynamically realistic fixed-base rotorcraft flight simulator into hazardous conditions.

This work highlights the importance of understanding principles of human visual perception and cognitive ability, and applying this knowledge to the design and implementation of appropriate visualizations in an operationally stressful environment.