Abstract
Many resource-intensive analytics processes evolve over time following new versions of the reference datasets and software dependencies they use. We focus on scenarios in which any version change has the potential to affect many outcomes, as is the case for instance in high throughput genomics where the same process is used to analyse large cohorts of patient genomes, or cases. As any version change is unlikely to affect the entire population, an efficient strategy for restoring the currency of the outcomes requires first to identify the scope of a change, i.e., the subset of affected data products. In this paper we describe a generic and reusable provenance-based approach to address this scope discovery problem. It applies to a scenario where the process consists of complex hierarchical components, where different input cases are processed using different version configurations of each component, and where separate provenance traces are collected for the executions of each of the components. We show how a new data structure, called a restart tree, is computed and exploited to manage the change scope discovery problem.
| Original language | English |
|---|---|
| Title of host publication | Provenance and Annotation of Data and Processes - 7th International Provenance and Annotation Workshop, IPAW 2018, Proceedings |
| Editors | Khalid Belhajjame, Ashish Gehani, Pinar Alper |
| Publisher | Springer Verlag |
| Pages | 3-15 |
| Number of pages | 13 |
| ISBN (Print) | 9783319983783 |
| DOIs | |
| Publication status | Published - 2018 |
| Event | 7th International Provenance and Annotation Workshop, IPAW 2018 - London, United Kingdom Duration: 9 Jul 2018 → 10 Jul 2018 |
Publication series
| Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
|---|---|
| Volume | 11017 LNCS |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 7th International Provenance and Annotation Workshop, IPAW 2018 |
|---|---|
| Country/Territory | United Kingdom |
| City | London |
| Period | 9/07/18 → 10/07/18 |
Bibliographical note
Publisher Copyright:© Springer Nature Switzerland AG 2018.
Keywords
- Process re-computation
- Provenance annotations
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science