Fine-grained provenance for high-quality data science

Adriane Chapman*, Paolo Missier, Giulia Simonelli, Riccardo Torlone

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

In this work we analyze the typical operations of data preparation within a machine learning process, and provide infrastructure for generating very granular provenance records from it, at the level of individual elements within a dataset. Our contributions include: (i) the formal definition of a core set of preprocessing operators, (ii) the definition of provenance patterns for each of them, and (iii) a prototype implementation of an application-level provenance capture library that works alongside Python.

Original languageEnglish
Pages (from-to)411-418
JournalCEUR Workshop Proceedings
Volume2994
Publication statusPublished - 9 Sept 2021
Event29th Italian Symposium on Advanced Database Systems, SEBD 2021 - Pizzo Calabro, Italy
Duration: 5 Sept 20219 Sept 2021

Bibliographical note

Publisher Copyright:
© 2021 Copyright for this paper by its authors.

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Fine-grained provenance for high-quality data science'. Together they form a unique fingerprint.

Cite this