Videos of professionals at work provide rich data sources for microanalysis research techniques such as ethnomethodology, conversation analysis, and interaction analysis that are manually intensive activities in which researchers may spend hours analyzing just a few minutes of interaction. However, such datasets are extremely rare.
In February 2014 my research team used nine GoPro cameras and six high-quality Zoom H2n audio recorders running concurrently to collect a 6 terabyte dataset of video, audio, photos, and screen capture of professional software developers at work in a highly collaborative organization. These 11 days of data include video (380 hours), audio, photographs (thousands), time-lapse images, screen capture (292 hours), field notes, and interviews. It supports a wide range of research questions, using a variety of analysis methods, and focused different units of analysis (e.g., individual, pair, location, task). The richness of the data provide opportunities for microanalysis research techniques such as ethnomethodology, conversation analysis, and interaction analysis that are manually intensive activities in which researchers may spend hours analyzing just a few minutes of interaction. The dataset also provides opportunities for data mining algorithms, such as algorithms that identify key events of interest for researchers doing interaction analysis. For ore details about the dataset, see the following technical report (which is periodically updated with new information):
Socha, D. (2015). BeamCoffer Dataset: Professional Software Developers Collaborating in the Wild. University of Washington Bothell CSS Technical Report, TR-09-15. Bothell, WA.
This datasets presents several challenges. It is large (multiple terabytes). It is multi-modal (e.g., video, audio, photos). It is multi-channel (multiple recorders running concurrently). Its data is largely unstructured, contextually rich, and qualitative. These qualities make it difficult to understand, search, and analyze this dataset, which provides an opportunity to build software tools to aid these tasks.
These same challenges also provide an opportunity, since we have subject consent and IRB permission to share this data with other researchers who agree to abide by our human subjects agreement. By intentionally “over collecting” we have enough data for communities of researchers to individually or collectively analyze this dataset from a variety of disciplinary and theoretical perspectives, such as done in workshops like the 2014 Design Thinking Research Symposium.
If you wish to access the BeamCoffer dataset for your own research, email David Socha.
Publications directly related to or based upon data from the BeamCoffer dataset:
- Roth, W.-M. (in press). The gap between instruction (plan) and situated action—A challenge to semiotics? Semiotica.
- Roth, W.-M., & Jornet, A. (2017). From Object-Oriented to Fluid Ontology: a Case Study of the Materiality of Design Work in Agile Software Development. Computer Supported Cooperative Work (CSCW). http://doi.org/10.1007/s10606-017-9297-6
- Arisoy, A. (2016). Exploratory Wayfinding in Wide Field Ethnography (Master’s thesis).
- Ramachandra, S., Chen,M., Socha, D. (2016). Laughter Detection Using Data Mining and Human Feedback. Proceedings of the 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS 2016).
- Socha, D., Adams, R., Franznick, K., Roth, W.M., Sullivan, K., Tenenberg, J., Walter, S. (2016). Wide-Field Ethnography: Studying Software Engineering in 2025 and Beyond. Proceedings of the 38th International Conference on Software Engineering (ICSE).
- Socha, D., Jornet, A., Adams, R. (2016). Wide Field Ethnography and Exploratory Analysis of Large Ethnographic Datasets. CSCW 2016 Workshop: Developing a Research Agenda for Human-Centered Data Science.
- Socha, D. (2015). BeamCoffer Dataset: Professional Software Developers Collaborating in the Wild. University of Washington Bothell CSS Technical Report, TR-09-15, University of Washington Bothell, Bothell, WA.
- Socha, D., & Sutanto, K. (2015). The “Pair” as a Problematic Unit of Analysis for Pair Programming. Proceedings of the 2015 International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE ’15). ACM.
- Socha, D., & Tenenberg, J. (2015). Sketching and Conceptions of Software Design. Proceedings of the 2015 International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE ’15). ACM.
- Tenenberg, J., Roth, W.-M., & Socha, D. (2015). From I-Awareness to We-Awareness in CSCW. Computer Supported Cooperative Work (CSCW).
- David Socha, Troy Frever, Chunchao Zhang, Using a Large Whiteboard Wall to Support Software Development Teams. Proceedings of the 48th Annual Hawaii International Conference on System Sciences. January 2015.
The following two publications foreshadowed the BeamCoffer dataset and described initial results from our first data collection in that organization:
- Socha, D., & Tenenberg, J. (2013). Sketching Software in the Wild. 35th International Conference on Software Engineering (ICSE 2013). San Francisco, USA. Our plan to collect a dataset like BeamCoffer.
- Socha, D., & Tenenberg, J. (2013). Navigating Constraints: The Design Work of Professional Software Developers. ACM SIGCHI Conference on Human Factors in Computing Systems. Paris, France. (and poster). Analysis of initial results from the organization from which the BeamCoffer dataset was later collected.