Human-Centered Data Science Lab

SCCL work will be presented at the Computer-Supported Cooperative Work 2013 conference [1]. It has been the result of work by a diverse, interdisciplinary group of people working to understand the role of affect expression in distributed scientific collaboration.

Our affect taxonomy contains 44 codes [2]

We have been working with a tremendous, rich dataset – half a million chat messages sent by several dozen scientists, some in US and some in France, over four years while sharing a remote telescope. In order to understand the role of affect, we first developed a method for manually labeling the presence of affect in chat messages [2]. We are interested in high-granularity labels, and our labels include “excitement,” “frustration.”

But there are many more chat messages (half a million!) than can reasonably be labelled manually, so we decided to try to automate identification of affect expression; our CSCW2013 paper reports a detailed description of our approach, including trade-offs of various decisions in the machine learning pipeline [1]. The automation is not expected or intended to replace the human process of interpretation, but to provide an analytic lens that makes a large dataset accessible for analysis of the role of affect. Automated labels of affect can be used to enable large-scale analysis of social media data, including but not limited to chat logs [3].

Different codes benefit from different kinds of features [1]

There are three lessons for applying machine learning to affect detection in chat we have carried from our experiments [1]:

Use specialized features: add to counts of words, with features particular to the medium (in chat, after all, it pays to distinguish “what.” from “WHAAAAAAAT”) or particular to the context (such as acronyms or conversation participant names, where known)
Different features benefit different codes: we were inclusive with adding features to the set, and trained separate classifiers for each code, as different features have different effectiveness across codes (eg, swear words and “frustration”)
Use an interpretable classifier: this helped to improve feature sets by reasoning about what features were deemed important

Our resulting pipeline is available on GitHub as a command-line tool, ALOE. In ongoing work, we are incorporating automation provided by ALOE into a web-based tool for large-scale analysis of social media data, TextPrizm [3].

[1] M. Brooks, K. Kuksenok, M. K. Torkildson, D. Perry, J. J. Robinson, T. J. Scott, O. Anicello, A. Zukowski, P. Harris, C. Aragon. Statistical Affect Detection in Collaborative Chat. CSCW 2013. PDF

[2] T. J. Scott, K. Kuksenok, D. Perry, M. Brooks, O. Anicello, C. Aragon. Adapting Grounded Theory to Construct a Taxonomy of Affect in Collaborative Online Chat. SIGDOC 2012. PDF

[3] K. Kuksenok, M. Brooks, J. J. Robinson, D. Perry, M. K. Torkildson, C. Aragon. Automating Large-Scale Annotation for Analysis of Social Media Content. Poster at 2nd Workshop on Interactive Visual Text Analytics, IEEE VisWeek (2012). PDF

Monthly Archives: December 2012

Statistical Affect Detection in Collaborative Chat (CSCW 2013)

Affiliations