Human-Centered Data Science Lab

Project Overview: Researchers working with social media and online communication data often apply mixed methods, including both quantitative and qualitative analysis. While statistics and computational modeling can reveal general patterns over large datasets, qualitative analysis can generate rich descriptions and theory. By combining both approaches, researchers get the strengths of both. However, qualitative analysis, specifically coding, requires manual human interpretation and is quite labor-intensive. Visual analytics can support this process and facilitate richer insights.

Aeonium – Visual Analytics for Qualitative Coding: Qualitative coding is often used by social scientists to explore and analyze their datasets. As the scale of the data grows, coding the whole dataset may not be feasible. This project focuses on parsing out design requirements to facilitate the coding process of social scientists, and to leverage machine learning and visualization to highlight subsets of data that are ambiguous or inconsistent between coders.

Text Prizm – Collaborative Coding for Chat Data: Qualitative coding is laborious, and online social media and communication datasets are large. Spreadsheets are commonly used for coding this type of data, but they support the task awkwardly at best. Traditional qualitative data analysis (QDA) tools like ATLAS.ti and NVivo are hard to apply to datasets with thousands of short text-based messages. As part of our research analyzing the role of emotion in a collaborative scientific chat room, we designed and developed Text Prizm, a web application to help analyze social media and online communication content (such as chat logs or Twitter data). Text Prizm provides simultaneous collaborative coding with an efficient keyboard interaction.

ALOE – Classifying Emotion in Text-based Chat: Machine learning technology has potential to support qualitative analysis by learning models from smaller manually-coded datasets. These models can then be applied to analyze much larger amounts of data. We developed the machine learning tool ALOE to train and test machine learning classifiers for automatically labeling chat messages with different emotion or affect categories. The software takes as input a CSV file containing timestamped chat messages with labels for training and produces a trained Support Vector Machine classifier. We published a paper describing the development and evaluation of ALOE at CSCW 2013, and have released ALOE on GitHub.

Researchers

Nan-Chen Chen, UW, HCDE, PhD Candidate

Rafal Kocielnik, UW, HCDE, PhD Student

Meg Drouhard, UW, HCDE, PhD Student

Jina Suh, UW HCDE, MS Student

Past Contributors:

Michael Brooks, UW, HCDE, PhD

John Robinson, UW, HCDE, PhD Student

Katie Kuksenok, UW, CSE, PhD

Vanessa Peña-Araya, University of Chile, CS, PhD Student

Keting Cen, UW HCDE, BS

Xiangyi Zheng, UW HCDE, BS Student

Publications

Drouhard, M., Chen, N. C., Suh, J., Kocielnik, R., Pena-Araya, V., Cen, K., Zheng, X., Aragon, C. R. Aeonium: Visual Analytics to Support Collaborative Qualitative Coding. The 10th IEEE Pacific Visualization (PacificVis) (2017). PDF

Chen, N.-C., Kocielnik, R., Drouhard, M., Peña-Araya, V., Suh J., Cen K., Zheng X. and Aragon, C.R. 2016. Challenges of Applying Machine Learning to Qualitative Coding. To be presented in the CHI 2016 workshop on Human Centred Machine Learning (HCML 2016) PDF.

Brooks, M., Kuksenok, K., Torkildson, M.K., Perry, D, Robinson, J.J., Anicello, O., Scott, T.J., Zukowski, Harris, P., and A., Aragon, C.R. Statistical affect detection in collaborative chat. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work, ACM Press (2013), 317-328. doi: 10.1145/2441776.2441813 PDF

Scott, T.J., Kuksenok, K., Perry, D., Brooks, M., Anicello, O., and Aragon, C. Adapting grounded theory to construct a taxonomy of affect in collaborative online chat. In Proceedings of the 30th ACM International Conference on Design of Communication, ACM Press (2012), 197-204. doi:10.1145/2379057.2379096 PDF

Tools for Qualitative Analysis of Online Communication Data

Researchers

Publications

Affiliations