The Right (Provenance) Hammer for the Job: A Comparison of Data Provenance Instrumentation

Adriane Chapman, Abhirami Sasikant, Giulia Simonelli, Paolo Missier, Riccardo Torlone

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

As data science techniques are being applied to solve societal problems, understanding what is happening within the “pipeline” is essential for establishing trust and reproducibility of the results. Provenance captures information about what happened during design and execution in order to support reasoning for trust and reproducibility. However, how and where the information is captured as provenance within the data science pipelines changes how it can be utilized. In this work, we describe three different mechanisms to capture provenance in data science pipelines: human-based, tool-based, and script-based. By using an implementation of all techniques in a standard data science toolkit, we analyze the difference in provenance generated by these methods and how its use changes.
Original languageEnglish
Title of host publicationProvenance in Data Science
Subtitle of host publicationFrom Data Models to Context-Aware Knowledge Graphs
EditorsLeslie F. Sikos, Oshani W. Seneviratne, Deborah L. McGuinness
PublisherSpringer
Pages25-45
Number of pages20
Edition1
ISBN (Electronic)9783030676810
ISBN (Print)9783030676803
DOIs
Publication statusPublished - 27 Apr 2021

Publication series

NameAdvanced Information and Knowledge Processing
PublisherSpringer
ISSN (Print)1610-3947
ISSN (Electronic)2197-8441

Fingerprint

Dive into the research topics of 'The Right (Provenance) Hammer for the Job: A Comparison of Data Provenance Instrumentation'. Together they form a unique fingerprint.

Cite this