Human-Centered Data Science Lab

Lab Director Cecilia Aragon Receives Award for Data Management Project

Posted by Daniel Perry on November 11, 2014
News / Comments Off

SCC Lab Director and HCDE Professor Cecilia Aragon and collaborators at Lawrence Berkeley National Lab recently received a $300K award from the US Department of Energy to research and develop tools for improving scientific workflows and the use of massive supercomputers. The project, Usable Data Abstractions for Next-Generation Scientific Workflows, will utilize user-centered design processes and an ethnographic approach to study the hardware and software tools used to harness large scale scientific data. The project will initially focus on the climate sciences and combustion physics.

A further description of the project can be found on the Lawrence Berkeley National Lab website.

Highlights of the First Annual CHI Play Conference

Posted by Daniel Perry on November 11, 2014
Human-Centered Data Science Lab (HDSL) Blog / Comments Off

The Participatory Design Workshop at CHI Play

Posted by PhD student Daniel Perry

In late October I attended the first annual conference on Computer-Human Interaction in Play (CHI Play 2014) in Toronto. The conference brought together some 150 researchers, academics, and game designers across numerous areas of serious game research and HCI. Sessions at the conference covered a variety of topic areas, including collaboration and communication in serious games, games for health, gamification and education, and game analytics. Research I conducted on co-design with high school youth for the bioinformatics educational game MAX5 was one of several accepted abstracts in a workshop titled Participatory Design for Serious Game Design.

The Participatory Design for Serious Game Design workshop marked one of the highlights of the conference for me, as it brought together participants from all over the world (Malta, Belgium, Germany, Taiwan, Canada, and the U.S to name just a few) to explore the philosophical and methodological challenges of integrating participatory design and serious games. Participatory design (PD) has its roots in Scandinavia some forty years ago as way to empower workers and involve them more directly in the software design process. PD techniques have been adapted and used in a variety of fields and contexts with communities ranging from toddlers to the elderly. While PD often works hand-in-hand with user-centered design approaches, PD’s focus on directly integrating participant design concepts takes a more democratic stance on the design process. In the workshop, we worked in small groups mapping the field of participatory design as we saw it in our own research on games. Topics that emerged included: the use of PD to design games, as well as the use of games as a methodological way to design other types of software systems; the challenge of conveying domain knowledge to participants for learning games (in my own research, this involved getting high school youth up-to-speed with biology and computer science topics in our game); and deciding what design outcomes should be integrated into the final game. While there were no easy answers that came out of the workshop, the importance of transparency in design, as well as providing a tangible sense of participant contribution came up as important issues to address. It was exciting to feel that in many ways we were putting forth a new global agenda for the future of PD in serious games research.

Another highlight of the conference included a keynote by Mike Ambinder, an experimental psychologist at Valve Software (the Seattle-area company behind a host of game favorites including Portal and Left for Dead). In his talk, Mike discussed the current state of the art in gathering game data, and the frequent biases and challenges inherent in the process. He encouraged the audience to imagine the tools and methods that would fill in the data gaps in an ideal research world. I was left with the impression that if a game powerhouse like Valve was facing a daunting data landscape, there is much to gain from further discussions between industry veterans and the academic researcher community. I’m looking forward to attending CHI Play next year.

Undercurrents at the DSE Summit

Posted by Cecilia Aragon on November 05, 2014
Uncategorized / Comments Off

The Data Science Environment (DSE) Summit took place in beautiful Monterey, CA at the Asilomar Conference Center. The Summit brought together over a hundred participants across three universities (UW, UC Berkeley and NYU) involved in the Moore and Sloan Foundations’ Data Science Environment grant.

As a data science ethnographer, I typically take on the role of participant-observer of various data science events, but at the DSE Summit I ended up being more of a participant than an observer. The high degree of participation made it challenging at times to listen as closely as I would have wanted to for underlying rhythms and patterns across the group. However participating in the discussion sessions and interactions I identified some important undercurrents. I draw out these undercurrents into two main themes that I discuss in this post.

image of Monterey coastline

Photo credit: Kevin Koy

Imagining a Data Science Environment

I participated in particular sessions, including teaching and curriculum development, data science ethnography, and the ethnography and evaluation working group, which were all, to some degree, imagining a data science environment. Underlying these discussions were questions of where and what exactly is data science? Where is it located and when does it matter? What are the origins and the goals? These questions bubbled up in conversations imagining a curriculum for data science, career paths in data science, and the very structuring of a data science community within academia. As these various concepts were discussed and imagined, there was a fracturing and multiplication of perspectives around these questions that sparked a bit of confusion.

There was some clear discomfort with the uncertainty and messiness around these issues. Many people seemed to be craving concrete definitions and the move towards formalization of data science, while others seemed content to not know their specific position or where the ship was headed exactly, opening themselves up to being influenced by the experimentation yet to come. Allowing ourselves as a community to sit with the uncertainty and messiness while we try things out and wrap our heads around the implications is very much part of the goal of these five years. As an ethnographer, I am drawn to messiness, so I was feeling quite comfortable in these conversations.

I am going to attempt to disentangle a few strands of conversation in the unconference session on data science ethnography which focused on different approaches to data science. This discussion drew on a previous conversation about how different ways of approaching data science can emphasize a more individual-oriented view versus a community-oriented view. This conversation helped shift the focus from data science as residing within an individual to data science enacted at the community level.

This included thinking through the implications of different metaphors, such as T to pi (Π) to gamma () for characterizing the shape of data scientists of the future. The shape of a pi-shaped scientist implies there is an expectation of individuals having expert-level depth of knowledge in two domains. Whereas a gamma-shaped scientist would have expert-level depth in one domain and be versed and proficient in another domain. The addition of other metaphors, such as gamma-shaped individuals, expands our imagination for what this data science environment might look like. It also has implications for how people may or may not identify as data scientists or as playing various roles within the data science community. This gets at the heart of the question, “What are we building?”

This question of “What are we building?” opens up other fractures in perspectives around how applied or theoretical data science is and “should” be. As the term data science has accumulated many meanings across industry and academia, there are distinctions many people wanted to make around data science in a research sense, data science in an applied sense, and data science in a professional sense. Was data science going to be its own discipline, its own department, or was it simply a new dimension of the work of all other domains? Is data science always applied? What would its body of theory look like? What are the political implications of these different imaginations of data science for issues like status and careers?

A related strand of conversation emerges around the question, “Where are we building it?” A strong current across the discussions imagined data science coming out of statistics and/or computer science, which alienated some people within the group who did not see it that way. Others wanted to frame data science as an integral part of all domain sciences in this data-intensive age. These different imaginations require a language and infrastructure we don’t yet have and must build. What would it mean to be neither and both of these in the institutional context of the university?

Another strand emerges around the characterization of how and when is data science. This strand of conversation was first dominated by talk about “data producers” and “data consumers”. This characterization implied ways that the work of data science was being divided up. But by the end of the conversation, these oversimplified categories fell short of describing a more complex ecosystem of data science. First, this is because there are those individuals and practices that embody both consumption and production. Second, these categories don’t encompass the mediator roles and mediation practices that are integral to the data science environment. These roles and practices involve the work of translating, connecting, and often innovating in the interstices.

The conversation around the community level data science and the relation to T, pi, and gamma appropriately ended with a move to focus on the “horizontal line”, the connections and intersections among these various disciplines, the mediator roles and practices that support research translation. What emerged was that, perhaps as or more important than a conversation about the number of legs or their length (pi versus gamma), is the conversation about the character and future of the “horizontal line.” This focuses us on the translation work and the supporting infrastructures that function to forge and maintain connections across legs. Part of the important impact I think the ethnography group can have is in making visible the character of different horizontal lines, and to better understand how they function and their implications for developing a data science environment.

Collaboration in Context

Throughout the Summit, collaboration was often referred to and leveraged as an abstract principle that reigned over all of the activities of the grant. People talked at high levels about the goal of collaboration mostly ungrounded in what, when, where, how, and why. The overbroad terms in which collaboration was talked about potentially obfuscated the many levels, layers, and conditions at which differently configured collaborations may occur. Collaboration is multiple things, and importantly, it is negotiated within a multitude of circumstances and values.

The spaghetti and marshmallow hack (the community building activity we did on Tuesday morning) aimed to have the group experientially engage with the inextricable relationship of collaboration and the performance of tasks at hand.

Marshmallow and spaghetti sculpture

Photo credit: Gina Neff

For example, in many cases there is a fragile balance between attending to the work of collaboration and getting things done. Collaboration does not exist within a vacuum without constraints or consequences. We hope that the goals of the collaboration are aligned with measures of performance, but this is not always the case. Further, as anyone who has ever collaborated knows, collaboration requires time and energy beyond the task itself. Yes, we want to incentivize more collaboration across domains and groups, but most importantly we want to learn about how we configure effective collaborations, what different roles are important, and how different forms of value can be strategically distributed across participants.

Collaboration as a goal may make a lot of sense at the level of an individual’s specific research question when this question requires multiple types of expertise to answer, but at other levels such as collaborating across institutions to support a data science environment, the value and the incentives of collaboration may be harder to assess and determine for individuals. What our three universities together with Moore/Sloan are trying to learn about and develop on an institutional level shifts everyone’s focus beyond the particular research at hand to the hard work of building the infrastructures and cultivating the relationships, cultural norms, and values that are necessary for supporting a thriving data science environment.

Interestingly, the concrete work of building collaboration the how, where, when, and why, one might say, didn’t get discussed until the Wednesday morning unconference when many had left, and many who were still there were “raptured” in important high-level meetings. The group that was left was made up of a mix of graduate students, postdocs and research scientists, but no faculty. It was about 12 of us discussing exactly how connection and communication would continue after the Summit. What infrastructure would support the exchange of ideas and the conversations we had begun to have here?

We discussed the role of a chat room and the needs of different working groups for sharing and connecting across the campuses. This is work that needs to happen to ground any kind of collaboration. This was what the 12 people left on the final morning thought was most important to discuss over any other data science topic. These people didn’t just discuss these questions, they generated ideas, innovated around these ideas, and executed these ideas. There is now a MS-DSE chat room set up and multiple ongoing conversations about how to connect and communicate within and across campuses. I felt inspired by this session as I got the sense that these interactions represented the movement taking root and beginning to grow!

SCCL Members Attend Data Science Environment Summit

Posted by Daniel Perry on November 01, 2014
News / Comments Off

The Data Science Environment (DSE) Summit took place in Monterey, California October 5 – 8, attended by SCCL Director Cecilia Aragon, and lab members Brittany Fiore-Silfvast (postdoctoral fellow), and PhD students Michael Brooks and Katie Kuksenok. The inaugural summit brought together over a hundred participants involved in the Moore and Sloan Foundation’s Data Science Environment grant across the three partner institutions: the UW, UC Berkeley, and NYU. Aragon is the PI for Ethnography and Evaluation Working Group for the grant. Sessions at the summit attended by lab members included topics on curriculum development and education for data science and data science ethnography.

Monthly Archives: November 2014

Lab Director Cecilia Aragon Receives Award for Data Management Project

Highlights of the First Annual CHI Play Conference

Undercurrents at the DSE Summit

SCCL Members Attend Data Science Environment Summit

Affiliations