Abstract
In this work we analyze the typical operations of data preparation within a machine learning process, and provide infrastructure for generating very granular provenance records from it, at the level of individual elements within a dataset. Our contributions include: (i) the formal definition of a core set of preprocessing operators, (ii) the definition of provenance patterns for each of them, and (iii) a prototype implementation of an application-level provenance capture library that works alongside Python.
Original language | English |
---|---|
Pages (from-to) | 411-418 |
Journal | CEUR Workshop Proceedings |
Volume | 2994 |
Publication status | Published - 9 Sept 2021 |
Event | 29th Italian Symposium on Advanced Database Systems, SEBD 2021 - Pizzo Calabro, Italy Duration: 5 Sept 2021 → 9 Sept 2021 |
Bibliographical note
Publisher Copyright:© 2021 Copyright for this paper by its authors.
ASJC Scopus subject areas
- General Computer Science