Human-Centered Data Science Lab

Work Life Balance

Posted by Katie Kuksenok on October 13, 2014
Human-Centered Data Science Lab (HDSL) Blog / Comments Off

During the first meeting of the new quarter, our lab meeting consisted of each member talking about what they did this summer: be it professional achievement or a personal one. We laughed together. We ate pizza and a root vegetable medley made by one of the students, as per last year’s tradition to share food during meetings which had to be during mealtimes due to our excessively overwhelming schedules. We applauded for especially noteworthy remarks, such as: making a plan to graduate soon (2x), submitting a paper to the most recent Big Deal Conference deadline (4x), getting married (1x),and managing to have an actual honest-to-goodness vacation (3x). In our meetings for the last few years, we have allowed the unrelated to seep in, and I think it has improved both the variety and the caliber of our work. Instead of seeing these asides as distractions, we engaged with each other about a huge variety of research topics, as well as human topics.

In my own multi-year struggle with work-life balance (aka, “four years of grad school”), I have found it useful to have one core assumption. Even though I work on a million of seemingly-unrelated projects, they are necessarily and fundamentally related: because they are mine, and are built on the same body of knowledge. In this sense, every intellectually-stimulating conversation that grabs my attention is, by definition, relevant. It is relevant to my perception of the world, and I take note of it. Incidentally, when I began to pursue this sense of “wholeness,” it helped to ease the dreaded (and all-too-common) “impostor syndrome,” the haunting sense of being found out as far less competent than I appear. On the one hand, yes, with anything that I do, there are many people in the world who are much better at that thing than I am. But they are all not me, they do not have the combined idiosyncratic background I bring to the table: the whole has more creative variety to draw from than the sum of its parts. So I can feel both more secure in myself, and relieved that there is always someone to save you from excruciating (and boring) intellectual solitude with advice, feedback, and debate.

“How did you get over anxiety during giving talks?” one of the students asked Cecilia in an aside in a meeting a few years ago. “Well, when you’ve flown a plane straight at the ground at 250 mph at an airshow with hundreds of thousands of people watching, it’s difficult to be too stressed out about other things.” Professor Aragon leads our lab, teaches classes, and occasionally shares what she learned from the time she was an aerobatic champion. Instead of viewing “work life balance” as something of a separation between our “work” selves and our “life” selves, we’re building empathy within the group, as well as sharing with one another our wonderful variety of experiences and lessons.

Oberlin Winter Term 2013

Posted by Katie Kuksenok on April 02, 2013
Human-Centered Data Science Lab (HDSL) Blog / No Comments

For the month of January, three Oberlin College undergraduates – Dan Barella, Sayer Rippey, and Eli Rose – joined SCCL to work on extending our command-line tool for affect detection using machine learning, ALOE. The Winter Term internship was initially conceived by Katie Kuksenok, one of the two Oberlin alumni in SCCL; the other, Michael Brooks, also helped in mentoring the students while they were on campus.

obies2013

Each of the visiting Obies contributed a new functionality and compared its performance to that reported in our CSCW report; Dan implemented a novel segmentation algorithm, Sayer extended feature extraction to process French chat messages rather than only English, and Eli worked on HMM classification. Having returned to Oberlin, Sayer continues to work on analyzing the French portions of the dataset as an independent research project, collaborating over distance.

It has been an incredible month. Besides being blown away by the Seattle public transit system, I got to learn so much about machine learning, language, and grad school, and I got to meet a lot of smart, passionate, inspiring people.
The work I did applying the ALOE pipeline to French was completely fascinating. It was great because I got to be doing something very practical, trying to get the labeler working for the rest of the pipeline, but it also brought up some really interesting differences between French and English.

– Sayer Rippey

So, here I am at the end of Winter Term. I’m already nostalgic! This project was really enrapturing, and the whole experience thoroughly enjoyable. … I will say that, I’m proud of the work I’ve done. There are some places where I know there’s room for improvement, but to believe otherwise would perhaps be worse. I can’t claim that it’s all perfect, but I can claim that I did nearly all that I set out to do, and then some that I hadn’t expected to do. I didn’t expect I’d have to put together a profiling script to test my project, and yet this turned out to be one of the most invaluable tools I’ve had for code analysis (hopefully for others as well). I didn’t expect to find such a subtle tradeoff between a small tweaking of time variables, and yet this became a central issue of the last two weeks of my project. I didn’t think comparing pipeline statistics would be so nuanced, but now I’m beginning to see all the ways that a visualization can change the way people perceive information. I could go on, but what I’m really trying to say is: I learned so many new things!

But the most exciting parts of this Winter Term were not the components of my project. They were the incredible people at the SCCL, who brought me to lectures and talks on the nature of artificial intelligence and information visualization, who always provided novel viewpoints and provoking discussions, who were dedicated to sharing their unbelievable experience in so many topics. I was honored to work with Eli, Sayer, Katie, Michael, Megan, Cecilia, and the rest of this great team. They’ve humbled and challenged me, and for that I thank all of them; as this term comes to a close, I hope only that I should be so lucky in pursuit of future endeavors as I was in finding this one. So to everyone at the SCCL, so long, and thanks for all the fish!

– Dan Barella

Statistical Affect Detection in Collaborative Chat (CSCW 2013)

Posted by Katie Kuksenok on December 22, 2012
News / No Comments

SCCL work will be presented at the Computer-Supported Cooperative Work 2013 conference [1]. It has been the result of work by a diverse, interdisciplinary group of people working to understand the role of affect expression in distributed scientific collaboration.

Our affect taxonomy contains 44 codes [2]

We have been working with a tremendous, rich dataset – half a million chat messages sent by several dozen scientists, some in US and some in France, over four years while sharing a remote telescope. In order to understand the role of affect, we first developed a method for manually labeling the presence of affect in chat messages [2]. We are interested in high-granularity labels, and our labels include “excitement,” “frustration.”

But there are many more chat messages (half a million!) than can reasonably be labelled manually, so we decided to try to automate identification of affect expression; our CSCW2013 paper reports a detailed description of our approach, including trade-offs of various decisions in the machine learning pipeline [1]. The automation is not expected or intended to replace the human process of interpretation, but to provide an analytic lens that makes a large dataset accessible for analysis of the role of affect. Automated labels of affect can be used to enable large-scale analysis of social media data, including but not limited to chat logs [3].

Different codes benefit from different kinds of features [1]

There are three lessons for applying machine learning to affect detection in chat we have carried from our experiments [1]:

Use specialized features: add to counts of words, with features particular to the medium (in chat, after all, it pays to distinguish “what.” from “WHAAAAAAAT”) or particular to the context (such as acronyms or conversation participant names, where known)
Different features benefit different codes: we were inclusive with adding features to the set, and trained separate classifiers for each code, as different features have different effectiveness across codes (eg, swear words and “frustration”)
Use an interpretable classifier: this helped to improve feature sets by reasoning about what features were deemed important

Our resulting pipeline is available on GitHub as a command-line tool, ALOE. In ongoing work, we are incorporating automation provided by ALOE into a web-based tool for large-scale analysis of social media data, TextPrizm [3].

[1] M. Brooks, K. Kuksenok, M. K. Torkildson, D. Perry, J. J. Robinson, T. J. Scott, O. Anicello, A. Zukowski, P. Harris, C. Aragon. Statistical Affect Detection in Collaborative Chat. CSCW 2013. PDF

[2] T. J. Scott, K. Kuksenok, D. Perry, M. Brooks, O. Anicello, C. Aragon. Adapting Grounded Theory to Construct a Taxonomy of Affect in Collaborative Online Chat. SIGDOC 2012. PDF

[3] K. Kuksenok, M. Brooks, J. J. Robinson, D. Perry, M. K. Torkildson, C. Aragon. Automating Large-Scale Annotation for Analysis of Social Media Content. Poster at 2nd Workshop on Interactive Visual Text Analytics, IEEE VisWeek (2012). PDF

Trends in Crowdsourcing

Posted by Katie Kuksenok on April 09, 2012
Human-Centered Data Science Lab (HDSL) Blog / No Comments

These several recent years have seen the rise of crowdsourcing as an exciting new tool for getting things done. For many, it was a way to get tedious tasks done quickly, such as transcribing audio. For many others, it was a way to get data: labeled image data, transcription correction data, and so on. But there is also a layer of meta-inquiry: what constitutes crowdsourcing? Who is in the crowd, and why? What can they accomplish, and how might the software that supports crowdsourcing be designed in a way to help them accomplish more?

Each of the last two conferences I have attended, CSCW2012 and UIST2011, had a “crowdsourcing session,” spanning a range of crowdsourcing-related research. But only a short while before that, the far bigger CHI conference contained only one or two instances of “crowdsourcing papers.” So what happened in the last few years?

At some point in the last decade, crowdsourcing emerged both as a method for getting lots of tedious work done cheaply, and a field of inquiry that resonated with human-computer interaction researchers. Arguably, this point historically coincided with the unveiling of Amazon Mechanical Turk platform, which allowed employers, or “requesters,” to list small, low-paid tasks, or “human-intelligence tasks (HITs)” for anonymous online contractors, or “workers,” to complete. In Amazon’s words, this enabled “artificial artificial intelligence” – the capacity to cheaply get answers to questions that cannot be automated.

Colors in Visualization

Posted by Katie Kuksenok on April 03, 2012
Human-Centered Data Science Lab (HDSL) Blog / No Comments

In Russian, there is a name for a light blue, and a name for a dark blue, but not for what English-speaking people call blue. Indeed, language shapes our understanding of color differences and categories.

Although it can be entertaining to think about how other factors, like gender, affect categorization of color, the survey deployed and analyzed by xkcd’s Randall Munroe showed that chromosomal gender mostly doesn’t matter (and that nobody can spell fuchsia).

Color is used widely in scientific visualization. The influence of language/culture on color perception impacts the interpretation of such visualization to some extent, but other measures can be taken to improve the use of color in visualizations:

Using variations across the color spectrum? Remember how cyan is just brighter than blue? You can adjust for that!
Using discrete colors, either along a single hue, or across many distinct hues? You can find sets of distinct, evenly-spaced colors to use!

Written by lab member, student, and deliciousness enthusiast Katie Kuksenok. Read more of her posts here on the SCCL blog, or on her own research blog, interactive everything.

Work Life Balance

Oberlin Winter Term 2013

Statistical Affect Detection in Collaborative Chat (CSCW 2013)

Trends in Crowdsourcing

Colors in Visualization

Affiliations