Data Science Workshop, 2015

Mentors, Observers, Ethnographers & Organizers

Anthony Arendt
Ginger Armbrust
Magda Balazinska
Chaitanya Baru
Dave Beck
Rahul Biswas
Alvin Cheung
Andy Connolly
Oren Etzioni
Rob Fatland
Brittany Fiore-Gartland
Will Gagne-Maynard
Jeff Gardner
Andrew Gartland
Joseph Hellerstein
Tim Hesterberg
Laura Norén
Earnestine Psalmonds
Raghu Ramakrishnan
Renata Rawlings-Goss
Darwin Schweitzer
Valentina Staneva
Alejandro Suarez
Anissa Tanweer
Kristin Tolle
David Williams
Jennifer Worrell
Fen Zhao

Anthony Arendt


Polar Science Center
University of Washington

Mentor

Anthony Arendt is a Senior Research Scientist with the Polar Science Center at the Applied Physics Laboratory, where he conducts research on the response of glaciers and ice sheets to changing climate. Anthony joined the eScience Institute in June 2015 and provides expertise in relational databases, geospatial data analytics, and development of lightweight cloud computing solutions for scientific research. Anthony is researching new ways to integrate satellite observations with large scale hydrological models. He is developing APIs to interface with these models and provide web-accessible decision support tools to environmental stakeholders. Anthony was previously a Research Professor at the University of Alaska Fairbanks, where he conducted field studies of Alaska glaciers and taught classes in Geographic Information Systems. He has worked at NASA’s Goddard Space Flight Center and was a Visiting Researcher at Microsoft Research.

Ginger Armbrust


Oceanography
University of Washington

Organizer

E. Virginia Armbrust is a Professor in the School of Oceanography at the University of Washington and Director of the School of Oceanography. She received her A.B. from Stanford University in 1980 and her PhD from Massachusetts Institute of Technology and Woods Hole Oceanographic Institution in 1990. She carried out postdoctoral research training at Washington University before joining the faculty at the University of Washington in 1996. Dr. Armbrust’s research focuses on marine phytoplankton, particularly marine diatoms, which are responsible for about 20% of global photosynthesis. She has pioneered the use of environmental genomics and transcriptomics, combined with metabolomics, to understand how natural diatom communities are shaped by the environment and by their interactions with other microbes. Most recently, she has identified chemical signals that form the basis of cross-kingdom communication. Her group developed ship-board instrumentation that now permits the fine-scale continuous mapping of distributions, growth rates and loss rates of different groups of phytoplankton. Armbrust is a Fellow of the American Academy of Microbiology, the American Association for the Association for the Advancement of Science, and a member of the Washington State Academy of Science.

Magda Balazinska


Computer Science & Engineering
University of Washington

Organizer

Magdalena Balazinska is an Associate Professor in the department of Computer Science and Engineering at the University of Washington and the Jean Loup Baer Professor of Computer Science and Engineering. She’s the director of the IGERT PhD Program in Big Data and Data Science. She’s also a Senior Data Science Fellow of the University of Washington eScience Institute. Magdalena’s research interests are in the field of database management systems. Her current research focuses on big data management, scientific data management, and cloud computing. Magdalena holds a Ph.D. from the Massachusetts Institute of Technology (2006). She is a Microsoft Research New Faculty Fellow (2007), received an NSF CAREER Award (2009), a 10-year most influential paper award (2010), an HP Labs Research Innovation Award (2009 and 2010), a Rogel Faculty Support Award (2006), a Microsoft Research Graduate Fellowship (2003-2005), and multiple best-paper awards.

Chaitanya Baru


Computer and Information Science and Engineering
National Science Foundation

Observer

Chaitan Baru is Senior Advisor for Data Science in the Computer and Information Science and Engineering Directorate at the National Science Foundation. He also co-chairs the federal inter-agency Big Data Senior Steering Group, consisting of 18 federal R&D agencies, which was formed to help coordinate Big Data R&D activities across agencies, as part of the White House Big Data R&D Initiative launched in 2012. Baru leads the BIGDATA research program at NSF and is also involved in the Big Data Regional Innovation Hubs, a new NSF initiative. He is on assignment from the San Diego Supercomputer Center, UC San Diego, where he is a Distinguished Scientist and Associate Director of Data Initiatives. He is interested in applied and applications-oriented research in scientific data management, big data, and database systems. Information about his research activities is available at http://acid.sdsc.edu/users/chaitan-baru.

Dave Beck


Chemical Engineering
University of Washington

Organizer

David Beck is an Assistant Research Professsor in the department of Chemical Engineering at the University of Washington (UW). He is a participating faculty in the IGERT PhD Program in Big Data & Data Science and a Senior Data Science Fellow of the eScience Institute at UW. Dr. Beck holds a Ph.D. from UW (2006) in Biomedical Structure & Design / Medicinal Chemistry and a BS in Computer Science from Drexel University. His research interests center on using computing & Data Science methods in systems, synthetic and structural biology for problems in biogeochemical cycling, wastewater + environment, the microbiome of built environments, and human health.

Rahul Biswas


Astronomy
University of Washington

Mentor

My main scientific interests are in the area of cosmology, concentrated around quantifying the phenomenology of the late-time accelerated expansion of the universe, often described in terms of properties of “dark energy,” the substance assumed to drive this acceleration. Mostly, I use observations of supernovae Type Ia from large astronomical surveys.for studying cosmology, or work on large scale structure studies that are most relevant for clusters of galaxies. Two of the key challenges for the cosmological studies of this type are: (a) the data tend to be large and low signal-to-noise, (b) the incomplete knowledge (obviously rich research frontiers in themselves) of astrophysical objects used to draw cosmological inferences. To address such issues, my research involves methods of drawing statistical inferences from such datasets, improving the models of astrophysical objects used in such inferences by using simulations or data, and accounting for the observational system. At the University of Washington, I am closely associated with the upcoming LSST project, where a current focus is building OSS tools for the survey. I am also working on analyses to forecast the performance of components of LSST for particular survey strategies as a step towards optimizing the survey strategy for scientific output.

Alvin Cheung


Computer Science & Engineering
University of Washington

Mentor

I am an assistant professor in the Department of Computer Science & Engineering at the University of Washington, affiliated with the database and programming systems research groups. I am also an affiliate member of the eScience Institute. My research interests include program analysis, improving database application performance, and building big systems in general. Some current research themes include: query processing across heterogeneous systems, helping end users build database applications, and improving application performance across heterogeneous architectures. Before that, I was a graduate student in the MIT database group and the computer-aided programming group, working with Professors Sam Madden and Armando Solar-Lezama. I worked on tools that make use of programming language techniques to improve application performance.

Andy Connolly


Astronomy
University of Washington

Organizer

Professor Andy Connolly studies cosmology and the formation of structure within our universe using large astronomical surveys such as the Sloan Digital Sky Survey and the Large Synoptic Sky Survey (LSST). He currently works on helping develop algorithms and software for analyzing the petabytes of imaging data that will come out of the LSST. He also runs the LSST simulation group which generates high-fidelity simulations of the universe that are used to scale the scientific algorithms that will be used in the LSST. Beyond his scientific research he is interested in using technology to increase access to scientific data and to improve the educational experiences of students.

Oren Etzioni



Allen Institute for Artificial Intelligence

Keynote Speaker

Dr. Oren Etzioni is Chief Executive Officer of the Allen Institute for Artificial Intelligence. He has been a Professor at the University of Washington’s Computer Science department since 1991, receiving several awards including Seattle’s Geek of the Year (2013), the Robert Engelmore Memorial Award (2007), the IJCAI Distinguished Paper Award (2005), AAAI Fellow (2003), and a National Young Investigator Award (1993). He was also the founder or co-founder of several companies including Farecast (sold to Microsoft in 2008) and Decide (sold to eBay in 2013), and the author of over 100 technical papers that have garnered over 23,000 citations. The goal of Oren’s research is to solve fundamental problems in AI, particularly the automatic learning of knowledge from text. Oren received his Ph.D. from Carnegie Mellon University in 1991, and his B.A. from Harvard in 1986.

Rob Fatland


Microsoft Research
Microsoft

Mentor

Dr. Rob Fatland works as a senior research program manager at Microsoft Research, specifically on the use of both cloud and traditional technology in service to environmental science. His work currently includes adaptation of cloud + Maker/IOT technology to marine science as well as data sharing to enhance global carbon cycle research. He is actively working with the data science community to articulate best practices in ‘long tail’ lightweight data management. Dr. Fatland received a B.S. in Physics from the California Institute of Technology in 1987 and a Ph.D. in Geophysics from the University of Alaska Fairbanks in 1998. He worked for 6 years at NASA-JPL in radar remote sensing. More recent work at Vexcel Corporation and Microsoft has included 4+D data visualization, deploying sensor networks in harsh environments and geospatial information challenges from remote sensing. In his spare time he engages in freelance STEAM outreach.

Brittany Fiore-Gartland


Human-Centered Design and Engineering
University of Washington

Ethnographer

Brittany Fiore-Gartland is a Postdoctoral Fellow in the eScience Institute and the Department of Human-Centered Design and Engineering. Her research is concerned with the social and organizational dimensions of the data-intensive transformations occurring across many sectors of work. This research agenda includes studying how communities make sense of and value data and what is organizationally required to support emerging data intensive practices and collaborations. She is part of a team of researchers at UW, NYU, and UC Berkeley conducting ethnographic research on the Data Science Environment funded by the Gordon & Betty Moore and Alfred P. Sloan Foundations. She leads a team of researchers with Cecilia Aragon in the Human-Centered Data Science Lab to understand the cultural changes that are reshaping how data science work is accomplished and the implications for institutions supporting this work. As part of her ethnographic practice, she works with communities to bridge communication gaps and develop value-informed and adaptive organizational practice. She holds a Ph.D. in Communication from the University of Washington and an M.A. in Anthropology from Columbia University.

Will Gagne-Maynard


Oceanography
University of Washington

Mentor

Will works with the River Systems Research group in the School of Oceanography at UW. His research focuses on applying various -omics techniques to tropical river systems and helping to integrate results into basin-wide models.

Jeff Gardner



Google

Mentor

Jeff Gardner has a BA in Physics from Cornell University and PhD in Astronomy from UW. For his thesis, he simulated large chunks of the Universe on massively parallel supercomputers in order to study the formation of structure and the evolution of galaxies. He then moved out of pure astrophysics into computational science, working as a Research Scientist for the Pittsburgh Supercomputing Center at Carnegie Mellon. He came back to UW in 2008 as a Senior Research Scientist in Physics with an Affiliate Research Assistant Professor appointment in Astronomy. Jeff’s research focused on computational and data science as applied to the physical sciences. He worked with a number of groups in Physics, Astronomy, and Computer Science. In 2010 he moved to the Office of Research to become Assistant Director for the then-nascent eScience Institute. He is now a software engineer at Google where he works on cloud computing products that can impact scientific research.

Andrew Gartland


Vaccine and Infectious Disease Division
Fred Hutchinson Cancer Research Center

Mentor

Dr. Andrew Fiore-Gartland earned his Ph.D. studying the neural mechanisms that underlie visual processing in the retina. In his research he used computational models of molecular and cellular processes to analyze and interpret the data generated by his experiments. Upon completing his degree he sought a field in which he could be more connected to the application and social impact of his work. As a Post-Doctoral fellow in Peter Gilbert’s group he shifted his focus to the immune system and has been applying his statistical and computational skills towards understanding the mechanisms of novel HIV, TB, influenza and dengue vaccines, including extensive analysis of the immune data generated by the RV144 HIV vaccine trial. He has developed statistical methods for detecting evidence of T-cell induced viral escape in the genetic sequences of “breakthrough” viruses in vaccine trials. Now as a Staff Scientist he continues his work at the HIV Vaccine Trials Network adding capacity to incorporate novel assays into cohesive statistical analyses. He is also interested in the determinants of T-cell immunodominance and in understanding the heterogeneity of the immune response to vaccination with the specific goal of identifying innate correlates of adaptive immunogenicity and immune correlates of protection. Andrew also recently joined the eScience Institute at University Washington where he is focusing on the development of open-source software for computational biology research and on the training of data scientists at the University.

Joseph Hellerstein


eScience Institute
University of Washington

Mentor

Joseph L. Hellerstein is Senior Data Science Fellow in the eScience Institute and Affiliate Professor of Computer Science, both at the University of Washington, Seattle, Washington. Previously, Dr. Hellerstein managed the Computational Discovery Department at Google (2008-2014), was a Principal Architect at Microsoft Corp. in Redmond, WA (2006 to 2008), and founded/directed the Adaptive Systems Department at the IBM Thomas J. Watson Research Center in Hawthorne, NY (1984 to 2006). Dr. Hellerstein received the PhD in computer science from the University of California at Los Angeles. He has published approximately 200 peer-reviewed papers, 30 patents, and two books. He has taught at Columbia University and the University of Washington, and has served on numerous program committees and government advisory panels. Dr. Hellerstein is a recipient of the 2007 IEEE/IFIP Stokesberry Award, and is a Fellow of the IEEE.

Tim Hesterberg



Google

Mentor

Dr. Tim Hesterberg is a Senior Quantitative Analyst at Google. He previously worked at Insightful (S-PLUS), Franklin & Marshall College, and Pacific Gas & Electric Co. He received his Ph.D. in Statistics from Stanford University, under Brad Efron. He is author of the “Resample” package for R, Chihara and Hesterberg “Mathematical Statistics with Resampling and R” (2011), and “What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum”, arXiv, 2014. http://arxiv.org/abs/1411.5279

Laura Norén


Center for Data Science
New York University

Ethnographer

Laura Norén is a Moore/Sloan Postdoctoral Associate at the Center for Data Science at New York University where she also holds adjunct professorships in the Stern School of Business and the Department of Media, Culture and Communication. Norén’s research focuses on collaboration and sociotechnical systems within organizations. She is currently studying data scientists and data analysts across a range of workplaces. Her dissertation addressed creative collaboration among graphic designers, architects, electric vehicle engineers, and fine dining kitchen staff. She has also published on collaboration within communities across a range of technological/infrastructural settings including food bloggers and taxi drivers.

Earnestine Psalmonds


Division of Graduate Education
National Science Foundation

Observer

Earnestine Psalmonds Easter is a program director in the Division of Graduate Education, Directorate for Education and Human Resources, National Science Foundation. Her current responsibilities include serving as program director for the EHR Workforce Development Core Research, Historically Black Colleges and Universities Undergraduate Programs broadening participation research track, and co-coordinator for the Research Experiences for Undergraduates (REU) program. She is the lead program officer for the new directorate-wide STEM Professional Workforce Development Core area. She represents the NSF on the Office of Science and Technology Policy interagency working group on broadening participation. As senior program officer and visiting scholar in the Policy and Global Affairs Division, National Academies, she served as study director for the two National Academies reports, including a congressionally mandated study focused on the underrepresentation of minorities in science and engineering. She has held numerous administrative positions in higher education and served on boards of directors for state and national organizations She received the baccalaureate and master’s degree from Tuskegee University and Ph.D. from Georgia State University.

Raghu Ramakrishnan



Microsoft

Keynote Speaker

Dr. Raghu Ramakrishnan is a researcher in the areas of database and information management. He is a Technical Fellow at Microsoft. He has been a Vice President and Research Fellow for Yahoo! Inc., and a Professor of Computer Sciences at the University of Wisconsin–Madison. Ramakrishnan received a bachelor’s degree from IIT Madras in 1983, and a Ph.D. from the University of Texas at Austin in 1987. He has been selected as a Fellow of the ACM and a Packard fellow, and has done pioneering research in the areas of deductive databases, data mining, exploratory data analysis, data privacy, and web-scale data integration. The focus of his work in 2007 was community-based information management. With Johannes Gehrke, he authored the popular textbook Database Management Systems, also known as the “Cow Book”.

Renata Rawlings-Goss


AAAS S&T Policy Fellow
National Science Foundation

Observer

Dr. Renata Rawlings-Goss is a Big Data AAAS science policy fellow at the National Science Foundation working on Big Data policies and priority goals. She sits on the NITRD inter-agency Big Data Senior Steering group with partners from (NSF, NIH, DARPA, DOE, NIST, NOOA, NASA, USGS, DHS, DOD, and other agencies). Additionally, Renata participates in the implementation of NSF priority goals for increased activity and workforce in data science. She interacts with industry partners and the White House Office of Science and Technology policy in the formation of public-private partnerships around big data, data science and the “Internet of Things”. Dr. Rawlings-Goss, is a biophysicist who completed her doctorate work at the University of Michigan-Ann Arbor. She then worked with the Center for Computational Medicine, where she developed new predictive statistics for patient monitored diabetes. Subsequently, she became a Penn-Port fellow in the department of genetics at the University of Pennsylvania, where her research interests included data-driven analysis of genetic/expression variation among worldwide populations for diseases such as cancer

Darwin Schweitzer


Algorithms and Data Science Group
Microsoft

Mentor

Darwin is a Senior Program Manager at Microsoft focused on Data Science education where he is part of the Algorithms and Data Science group in Information Management and Machine Learning. His data experience has been gained through a number of diverse roles at Microsoft as well as at other technology companies like IBM and Business Objects. Roles have included instructor, practitioner, data architect, technical lead, consultant, and teaching assistant and he has worked for companies in a variety of industries (technology, healthcare, financial services, insurance, pharmaceuticals, travel, education, non-profit, and utilities) including local PacWest organizations like the UW, WaMu, Expedia, and SnoPUD. The one commonality in his career has been Data and Education. Darwin is an aspiring Data Scientist and dedicated lifelong learner who contributes to continuing education as a Cloud Data Management & Analytics instructor at the University of Washington and as a teaching assistant at Henry M. Jackson High School in Millcreek, WA where he helps students learn Java and prepare for the AP Computer Science exam http://tealsk12.org . In his spare time he likes to travel, hike, read (technology or books about US Presidents), listen to Blues and Jazz, and enjoy an occasional round of golf. Darwin hopes to help drive Data Science education and increase the broad adoption of Data Science products and services and build Data Science community.

Valentina Staneva


eScience Institute
University of Washington

Mentor

Valentina Staneva started as a data scientist at the eScience Institute in March, 2015. Prior to joining University of Washington, she was a PhD student at the Applied Mathematics & Statistics Department at Johns Hopkins University. Her research was with the Center for Imaging Science and was devoted to developing methods for tracking deforming objects in videos and statistical estimation of their dynamics. Valentina has a Bachelors degree in Mathematics from Concord University, and between her undergraduate and graduate studies she spent 1.5 years working at Los Alamos National Laboratory on problems in imaging, optimization and compressed sensing. She has broad interests in extracting information from different types of data, and building tools for it.

Alejandro Suarez


AAAS S&T Policy Fellow
National Science Foundation

Observer

Alejandro is a computational physicist with a background in surface science and the chemical modification of graphene. After obtaining his B.S. in applied physics from Rensselaer Polytechnic Institute (RPI), he obtained a Bunton-Waller fellowship to attend Penn State University (PSU) for his graduate studies. While at PSU, Alejandro studied the electronic properties of graphene, a two-dimensional allotrope of carbon with novel physical and mechanical properties. By better understanding how graphene could be chemically modified, Alejandro helped advance research for incorporating graphene into modern electronic devices. While at PSU, Alejandro was also awarded an NSF GK-12 fellowship. As part of the fellowship, Alejandro worked closely with a sixth grade educator in Harrisburg, PA to enhance the scientific curriculum of the classroom and develop skills in communicating science to diverse audiences. After obtaining his Ph.D., Alejandro received an NRC postdoctoral associateship with the U.S. Naval Research Laboratory to study the effect of substrates on the chemical reactivity of graphene. Alejandro is currently a AAAS S&T fellow in the Computer and Information Sciences and Engineering (CISE) directorate at NSF, where he plans to study the use and future impact of data science and software development in scientific research.

Anissa Tanweer


Communication
University of Washington

Ethnographer

Anissa Tanweer is a PhD student in the Department of Communication at the University of Washington and a research assistant in the Human-Centered Data Science Lab. She is broadly interested in social dimensions of information and communication technologies, and her work focuses on the role of big data and data science in the production of knowledge and the formation of policy. She works with a team of researchers at UW led by Professor Cecilia Aragon that are conducting ethnographic research on the Data Science Environment funded by the Gordon & Betty Moore and Alfred P. Sloan Foundations.

Kristin Tolle


Microsoft Research
Microsoft

Mentor

Kristin M. Tolle is the Director of the Data Science Initiative in Microsoft Research Outreach, Redmond, WA. Since joining Microsoft in 2000, Dr. Tolle has acquired numerous patents and worked for several product teams including the Natural Language Group, Visual Studio, and the Microsoft Office Excel Team. Since joining Microsoft Research’s outreach program in 2006, she has run several major initiatives from Biomedical computing and environmental science to more traditional computer and information science programs around natural user interactions and data curation. She was also directed the development of the Microsoft Translator Hub and the Environmental Science Services Toolkit. She is also one of the editors and authors of one of the earliest books on data science, The Fourth Paradigm: Data Intensive Scientific Discovery. Her current focus is develop an outreach program to engage with academics on data science in general and more specifically around using data to create meaningful and useful user experiences across devices platforms. Prior to joining Microsoft, Tolle was an Oak Ridge Science and Engineering Research Fellow for the National Library of Medicine and a Research Associate at the University of Arizona Artificial Intelligence Lab managing the group on medical information retrieval and natural language processing. She earned her Ph.D. in Management of Information Systems with a minor in Computational Linguistics. Dr. Tolle’s present research interests include global public health as related to climate change, mobile computing to enable field scientists and inform the public, sensors used to gather ecological and environmental data, and integration and interoperability of large heterogeneous environmental data sources. She collaborates with several major research groups in Microsoft Research including eScience, computational science laboratory, computational ecology and environmental science, and the sensing and energy research group.

David Williams


Biology
University of Washington

Mentor

I work on understanding the control of biological motion, scaling from the molecular dynamics of individual motor molecules to the kinematics of animal movement. I use molecular models of muscle that incorporate interesting new protein movement mechanisms to understand force regulation in single sarcomeres, the sub-cellular unit that shortens when muscle contracts. These models show us how muscle’s geometric properties control the force it generates. I validate these models with X-ray imaging and, as a WRF fellow at the University of Washington, am developing software tools to automate and introduce objectivity into this technique.

Jennifer Worrell


Computer Science & Engineering
University of Washington

Organizer

Jennifer Worrell is the Program Manager for the Data Science IGERT Program and the Database Group in the UW Computer Science and Engineering department. She has a background in psychology and anthropology, and has spent the last ten years managing multi-million dollar awards for various groups within the department.

Fen Zhao


Strategic Innovation
National Science Foundation

Observer

Fen Zhao is a Staff Associate, Strategic Innovation. Dr. Zhao focuses on building public – private partnerships around CISE’s Big Data, next generation internet, and cybersecurity R&D portfolios. Prior to joining CISE OAD, Dr. Zhao was a AAAS Fellow at the White House Office of Science and Technology Policy working on national security S&T issues. Before her work in the public sector, Dr. Zhao was an associate with McKinsey and Company’s Risk Management Practice, serving public sector clients in the mortgage and debt markets. Fen received a PhD in Applied Physics from Stanford University and her BS from MIT. Her doctoral research was conducted at the Kavli Institute for Particle Astrophysics and Cosmology at SLAC National Accelerator Labs, where she developed supercomputing astrophysical simulations of magnetic fields within the early universe.