BeamCoffer dataset

Videos of professionals at work provide rich data sources for microanalysis research techniques such as ethnomethodology, conversation analysis, and interaction analysis that are manually intensive activities in which researchers may spend hours analyzing just a few minutes of interaction. However, such datasets are extremely rare.

In February 2014 my research team used nine GoPro cameras and six high-quality Zoom H2n audio recorders running concurrently to collect a 6 terabyte dataset of video, audio, photos, and screen capture of professional software developers at work in a highly collaborative organization. These 11 days of data include video (380 hours), audio, photographs (thousands), time-lapse images, screen capture (292 hours), field notes, and interviews. It supports a wide range of research questions, using a variety of analysis methods, and focused different units of analysis (e.g., individual, pair, location, task). The richness of the data provide opportunities for microanalysis research techniques such as ethnomethodology, conversation analysis, and interaction analysis that are manually intensive activities in which researchers may spend hours analyzing just a few minutes of interaction. The dataset also provides opportunities for data mining algorithms, such as algorithms that identify key events of interest for researchers doing interaction analysis. For ore details about the dataset, see the following technical report (which is periodically updated with new information):

Socha, D. (2015). BeamCoffer Dataset: Professional Software Developers Collaborating in the Wild. University of Washington Bothell CSS Technical Report, TR-09-15. Bothell, WA.

This datasets presents several challenges. It is large (multiple terabytes). It is multi-modal (e.g., video, audio, photos). It is multi-channel (multiple recorders running concurrently). Its data is largely unstructured, contextually rich, and qualitative. These qualities make it difficult to understand, search, and analyze this dataset, which provides an opportunity to build software tools to aid these tasks.

These same challenges also provide an opportunity, since we have subject consent and IRB permission to share this data with other researchers who agree to abide by our human subjects agreement. By intentionally “over collecting” we have enough data for communities of researchers to individually or collectively analyze this dataset from a variety of disciplinary and theoretical perspectives, such as done in workshops like the 2014 Design Thinking Research Symposium.

If you wish to access the BeamCoffer dataset for your own research, email David Socha.

Publications directly related to or based upon data from the BeamCoffer dataset:

The following two publications foreshadowed the BeamCoffer dataset and described initial results from our first data collection in that organization:

Leave a Reply

Your email address will not be published. Required fields are marked *