Experiments using computational resources often revolve around software that is constantly under development. The results produced by such evolution are ultimately used to inform future changes to the simulation software. Over the course of several of these evolution cycles, various simulation models will have been used to produce several sets of results. The comparison of results from different software / simulation models is what ultimate informs the research being done. In other words, this answers the question, “What parameter, or part of our technique, produced this change in the result?” However, it can be especially difficult to keep track of what model, input, or version of the simulation software produced a given result. Existing data provenance techniques, in eScience, are able to connect the output files produced with analysis software. Unfortunately, they are unable to connect the changes that co-occur in both the simulation software and the data resulting from the software use.
The BrainGrid Workbench is a project management tool for high-performance neural network simulations that allows researchers to explicitly collect simulation provenance. The source code for the simulator represents part of the overall simulation model that contributed to the specific simulation output. By leveraging the git API, we are able to determine the version of the source code used to build the simulator. Later, when simulation results are produced, they are linked to the simulator executable, and in turn, to the version of the source code used to build the executable file. Such provenance information is persisted to Turtle files in the form of RDF. Relationships between data are described using Prov-O. Meanwhile, project information is stored in XML format. By using both media together, the BrainGrid Workbench is able to support an informative record of simulation details for long-running simulation projects.
The current prototype of the BrainGrid Workbench is capable of supporting individual projects to the output phase. In the future, we would like to connect provenance regarding the analysis and result phase of these projects. This can be accomplished by integrating provenance wrappers for Matlab and other analysis tools, such as R. The analysis phase often includes comparing various steps from a single project. Meanwhile, simulations may take days or weeks to run. In order to support analysis involving multiple stages of simulation evolution, the Workbench also needs supplementary version support built into the UI. The first step toward this goal will require that all of the project information stored in the XML files is dealt with in an iterable fashion, rather than be overwritten for each project version.
From the architectural viewpoint of this research, we would also like to make the Workbench capable of evolving along with future simulation models. Since simulation models consist of, not only, the simulator source code, but also the simulation input files, the workbench must be adaptable for new input models. (Aaron Conrad is working on this particular future work item).
For students who are interested in this project, we use the following technologies: Java Swing, Bash Scripting, GIT, and JSCH in order to connect possibly distributed simulation files. For more information, contact Del Davis.
This work is based upon work supported by the US National Science Foundation under Grant No. ACI 1350724 & CCF 1218266. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.