TY - CHAP
T1 - The Right (Provenance) Hammer for the Job
T2 - A Comparison of Data Provenance Instrumentation
AU - Chapman, Adriane
AU - Sasikant, Abhirami
AU - Simonelli, Giulia
AU - Missier, Paolo
AU - Torlone, Riccardo
PY - 2021/4/27
Y1 - 2021/4/27
N2 - As data science techniques are being applied to solve societal problems, understanding what is happening within the “pipeline” is essential for establishing trust and reproducibility of the results. Provenance captures information about what happened during design and execution in order to support reasoning for trust and reproducibility. However, how and where the information is captured as provenance within the data science pipelines changes how it can be utilized. In this work, we describe three different mechanisms to capture provenance in data science pipelines: human-based, tool-based, and script-based. By using an implementation of all techniques in a standard data science toolkit, we analyze the difference in provenance generated by these methods and how its use changes.
AB - As data science techniques are being applied to solve societal problems, understanding what is happening within the “pipeline” is essential for establishing trust and reproducibility of the results. Provenance captures information about what happened during design and execution in order to support reasoning for trust and reproducibility. However, how and where the information is captured as provenance within the data science pipelines changes how it can be utilized. In this work, we describe three different mechanisms to capture provenance in data science pipelines: human-based, tool-based, and script-based. By using an implementation of all techniques in a standard data science toolkit, we analyze the difference in provenance generated by these methods and how its use changes.
UR - https://doi.org/10.1007/978-3-030-67681-0_3
U2 - 10.1007/978-3-030-67681-0_3
DO - 10.1007/978-3-030-67681-0_3
M3 - Chapter
SN - 9783030676803
T3 - Advanced Information and Knowledge Processing
SP - 25
EP - 45
BT - Provenance in Data Science
A2 - Sikos, Leslie F.
A2 - Seneviratne, Oshani W.
A2 - McGuinness, Deborah L.
PB - Springer
ER -