Abstract
Insights generated from Big Data through analytics processes are often unstable over time and thus lose their value, as the analysis typically depends on elements that change and evolve dynamically. However, the cost of having to periodically 'redo' computationally expensive data analytics is not normally taken into account when assessing the benefits of the outcomes. The ReComp project addresses the problem of efficiently re-computing, all or in part, outcomes from complex analytical processes in response to some of the changes that occur to process dependencies. While such dependencies may include application and system libraries, as well as the deployment environment, ReComp is focused exclusively on changes to reference datasets as well as to the original inputs. Our hypothesis is that an efficient re-computation strategy requires the ability to (i) observe and quantify data changes, (ii) estimate the impact of those changes on a population of prior outcomes, (iii) identify the minimal process fragments that can restore the currency of the impacted outcomes, and (iv) selectively drive their refresh. In this paper we present a generic framework that addresses these requirements, and show how it can be customised to operate on two case studies of very diverse domains, namely genomics and geosciences. We discuss lessons learnt and outline the next steps towards the ReComp vision.
Original language | English |
---|---|
Title of host publication | Proceedings - 2019 IEEE International Congress on Big Data, BigData Congress 2019 - Part of the 2019 IEEE World Congress on Services |
Editors | Elisa Bertino, Carl K. Chang, Peter Chen, Ernesto Damiani, Michael Goul, Katsunori Oyama |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 24-34 |
Number of pages | 11 |
ISBN (Electronic) | 9781728127729 |
DOIs | |
Publication status | Published - Jul 2019 |
Event | 8th IEEE International Congress on Big Data, BigData Congress 2019 - Milan, Italy Duration: 8 Jul 2019 → 13 Jul 2019 |
Publication series
Name | Proceedings - 2019 IEEE International Congress on Big Data, BigData Congress 2019 - Part of the 2019 IEEE World Congress on Services |
---|
Conference
Conference | 8th IEEE International Congress on Big Data, BigData Congress 2019 |
---|---|
Country/Territory | Italy |
City | Milan |
Period | 8/07/19 → 13/07/19 |
Bibliographical note
Funding Information:ACKNOWLEDGMENT This work has been supported by EPSRC in the UK [grant no.: EP/N01426X/1] and a grant from the Microsoft Azure for Research programme.
Publisher Copyright:
© 2019 IEEE.
Keywords
- black-box process
- data analysis
- process recomputation
- provenance
- recomputation optimisation
- workflow
ASJC Scopus subject areas
- Computer Networks and Communications
- Information Systems and Management
- Control and Optimization