Experiment Explorer: Connecting Distributed Team eScience Artifacts – Provenance and Traceability Research Group

About

More and more, eScience experiments incorporate multiple tools, running in various environments, that generate heterogeneous artifacts. Various fields of expertise may be required, resulting in teams of researchers with diverse understanding of the experiment data and processes. To increase understanding of distributed and varied experiment data in these settings, we have created the Experiment Explorer Tool.

Experiment Explorer collects metadata for artifacts from multiple runs of a single experiment, each of which is connected to an overview of the experiment. Intermediate data that contributed to the generation of these data are produced in a loosely distributed computing environment (networked personnel workstations). Connections between these experiment dependencies are shared amongst contributors via modified 3-tier client-server architecture. Each workstation updates a central index with metadata about new Experiment data items as they are produced.

Components have been developed for Experiment Explorer from the bottom-up by several students (Brandon Authier, Thomas Helms, Andrew Hoke, Hyeon Hong, Michael Moseichuk, Jessica Oriondo, Joh Oso, Dorota Pasek, Jessica Pearson, Sadia Suhail, Michael Tran, and Tyler Wong). Later, these components were integrated into a server application by Thomas Helms, as a traditional 3-tier client-server architecture application including a single Apache Solr instance running in the application server and a network file system to house experiment artifacts within the data server. Components include a tool for reporting new or modified files within a monitored directory, a system for analyzing and indexing newly created or modified experiment files, a similar component for analyzing the relationship between a newly indexed file and previously indexed files, the GUI, main application server connectors, and data server configuration.

Component integration was the first step in providing a relational provenance record for the distributed artifacts. However, only some experiment data finds its way to shared data servers. The previously mentioned modification to the traditional 3-tier client server involves using each researcher workstation as a participant in the application server. To accomplish this, we will provide a distributed façade to handle interactions with Solr data relocated from the application server to the data server. In the future, we also plan to improve the relationship analysis process to connect experiment artifacts with more accurate and thorough use of the Prov Ontology.

For students interested in helping with this project, note that we use the following technologies: Apache POI, Apache Solr, Apache Tomcat, C#, Windows Agents, HTML, JavaScript, Java, Servlets, Spring, and XML Stream Parsing. For more information, contact Del Davis.

Publications

Delmar B. Davis, Hazeline U. Asuncion, Ghaleb M. Abdulla, and Christopher W. Carr., Towards Recovering Provenance with Experiment Explorer, In the Fifth International Conference on Information, Process, and Knowledge Management (eKNOW), February 2013.

Delmar B. Davis, Hazeline U. Asuncion, Ghaleb Abdulla. Experiment Explorer: Lightweight Provenance Search over Metadata. USENIX Workshop on the Theory and Practice of Provenance (TaPP), June 2012.

Links

Experiment Explorer presentation at TaPP 2012

This work is based upon work supported by the US National Science Foundation under Grant No. ACI 1350724. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.