Efficient Re-Computation of Big Data Analytics Processes in the Presence of Changes: Computational Framework, Reference Architecture, and Applications

Paolo Missier, Jacek Cala

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Insights generated from Big Data through analytics processes are often unstable over time and thus lose their value, as the analysis typically depends on elements that change and evolve dynamically. However, the cost of having to periodically 'redo' computationally expensive data analytics is not normally taken into account when assessing the benefits of the outcomes. The ReComp project addresses the problem of efficiently re-computing, all or in part, outcomes from complex analytical processes in response to some of the changes that occur to process dependencies. While such dependencies may include application and system libraries, as well as the deployment environment, ReComp is focused exclusively on changes to reference datasets as well as to the original inputs. Our hypothesis is that an efficient re-computation strategy requires the ability to (i) observe and quantify data changes, (ii) estimate the impact of those changes on a population of prior outcomes, (iii) identify the minimal process fragments that can restore the currency of the impacted outcomes, and (iv) selectively drive their refresh. In this paper we present a generic framework that addresses these requirements, and show how it can be customised to operate on two case studies of very diverse domains, namely genomics and geosciences. We discuss lessons learnt and outline the next steps towards the ReComp vision.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE International Congress on Big Data, BigData Congress 2019 - Part of the 2019 IEEE World Congress on Services
EditorsElisa Bertino, Carl K. Chang, Peter Chen, Ernesto Damiani, Michael Goul, Katsunori Oyama
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages24-34
Number of pages11
ISBN (Electronic)9781728127729
DOIs
Publication statusPublished - Jul 2019
Event8th IEEE International Congress on Big Data, BigData Congress 2019 - Milan, Italy
Duration: 8 Jul 201913 Jul 2019

Publication series

NameProceedings - 2019 IEEE International Congress on Big Data, BigData Congress 2019 - Part of the 2019 IEEE World Congress on Services

Conference

Conference8th IEEE International Congress on Big Data, BigData Congress 2019
Country/TerritoryItaly
CityMilan
Period8/07/1913/07/19

Bibliographical note

Funding Information:
ACKNOWLEDGMENT This work has been supported by EPSRC in the UK [grant no.: EP/N01426X/1] and a grant from the Microsoft Azure for Research programme.

Publisher Copyright:
© 2019 IEEE.

Keywords

  • black-box process
  • data analysis
  • process recomputation
  • provenance
  • recomputation optimisation
  • workflow

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems and Management
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Efficient Re-Computation of Big Data Analytics Processes in the Presence of Changes: Computational Framework, Reference Architecture, and Applications'. Together they form a unique fingerprint.

Cite this