ProvMASS: Tracking Agent Behavior in a Distributed Memory – Provenance and Traceability Research Group

About

Multi-agent modeling helps simulate physical, economic, social, biological and many other types of systems marked by organized behavioral patterns. If the system can be broken into groups of autonomous individuals, it can be modeled with multi-agents. To help simulate such systems of hundreds of thousands or even millions of these individuals, multi-agent modeling platforms may draw on the power of high-performance computing (HPC) resources. The Distributed Systems Laboratory at UWB has been making great strides over the last decade to this end, with the Multi-Agent Spatial Simulation (MASS) libraries.

The MASS software environment coordinates multi-agents in a distributed memory formed within a cluster of computing nodes. The details of parallelization and data locale are abstracted away from the simulation, removing the coordination burden from the developer. A caveat of such machine/thread-unawareness is that it introduces difficulty in determining the cause of logical errors and understanding the impact of newly added agent behaviors. In effect, location transparency turns the simulator into a black-box – leaving the user to guess at the connections between input, output, and static source code. Logging mechanisms allow the addition of custom message generation to model source code. While this certainly helps with the guesswork, it falls short in determining the order of events in a distributed environment. To this end, we have created ProvMASS, an approach to capture causally ordered data operations in a distributed memory (i.e. concurrent operations over distributed shared resources) that accommodates the verification and validation requirements in inspecting multi-agent models in execution.

We are working towards improvements that will extend the novelty of this technique to more central provenance capture challenges in HPC systems. The growing divide between processing power and data transfer rates is a key roadblock in extending provenance support to state of the art HPC. Analyses in the current approach rely on full retention, imposing secondary storage access for large sets of data provenance. We will work towards constraining provenance to distributed memory while maintaining current query support. Queries can be stream processed in a single-host environment, discarding data as it is considered in formulating results. However, this is a challenge when analyzing interdependent distributed memory operations. Our future work will focus on solving these technical challenges.

While the core of our future work will require deep participant involvement, there are several areas of immediate improvement that are also required. For example, the ProvMASS subsystem is currently implemented only in the Java version of MASS. Java introduces some overhead factors that inhibit performance overhead comparisons. We would like to further validate the approach by developing a subsystem for the C++ version of the MASS library. In addition, the current implementation of the automatic source code instrumentation tool lacks some key features, necessitating some avoidable intervention on the part of model developers.

For students interested in helping with this project, note that we use the following technologies (or will in future work): Abstract Syntax Trees, Apache Jena, Apache Jena ARQ, Apache Jena TDB, C++, Java, JSch, Linux (recommended familiarity with bash SSH/FTP commands), MySQL, Prov Ontology, SQL, QDox. This list is meant to match areas of interest, not necessarily proficiency requirements. Please also note that while it is not necessary to have a firm grasp of distributed and parallel computing, an interest in learning about this area of computing is highly recommended. For more information, please contact Del Davis and professors Hazeline Asuncion and Munehiro Fukuda.

References:

Delmar B. Davis II, Data Provenance for Multi-Agent Models in a Distributed Memory. Master’s Thesis, University of Washington, August 2017.

Delmar B. Davis, Jonathan Featherston, Munehiro Fukuda, and Hazeline U. Asuncion, Data Provenance for Multi-Agent Models. In international conference on eScience, October 2017.

This work is based in part upon work supported by the US National Science Foundation under Grant No. ACI 1350724 and the UW Bothell CSS Graduate Research fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.