Versioned-PROV: A PROV extension to support mutable data entities

João Felipe N. Pimentel*, Paolo Missier, Leonardo Murta, Vanessa Braganholo

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The PROV data model assumes that entities are immutable and all changes to an entity e are represented by the creation of a new entity e’. This is reasonable for many provenance applications but may produce verbose results once we move towards fine-grained provenance due to the possibility of multiple binds (i.e., variables, elements of data structures) referring to the same mutable data objects (e.g., lists or dictionaries in Python). Changing a data object that is referenced by multiple immutable entities requires duplicating those immutable entities to keep consistency. This imposes an overhead on the provenance storage and makes it hard to represent data-changing operations and their effect on the provenance graph. In this paper, we propose a PROV extension to represent mutable data structures. We do this by adding reference derivations and checkpoints. We evaluate our approach by comparing it to plain PROV and PROV-Dictionary. Results indicate a reduction in the storage overhead for assignments and changes in data structures from O(N) and Ώ(R × N), respectively, to O(1) in both cases when compared to plain PROV (N is the number of members in the data structure and R is the number of references to the data structure).

Original languageEnglish
Title of host publicationProvenance and Annotation of Data and Processes - 7th International Provenance and Annotation Workshop, IPAW 2018, Proceedings
EditorsKhalid Belhajjame, Ashish Gehani, Pinar Alper
PublisherSpringer Verlag
Pages87-100
Number of pages14
ISBN (Print)9783319983783
DOIs
Publication statusPublished - 2018
Event7th International Provenance and Annotation Workshop, IPAW 2018 - London, United Kingdom
Duration: 9 Jul 201810 Jul 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11017 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference7th International Provenance and Annotation Workshop, IPAW 2018
Country/TerritoryUnited Kingdom
CityLondon
Period9/07/1810/07/18

Bibliographical note

Publisher Copyright:
© Springer Nature Switzerland AG 2018.

Keywords

  • Interoperability
  • Provenance
  • Specification

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Versioned-PROV: A PROV extension to support mutable data entities'. Together they form a unique fingerprint.

Cite this