Preserving the value of large scale data analytics over time through selective re-computation

Paolo Missier*, Jacek Cała, Manisha Rathi

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A pervasive problem in Data Science is that the knowledge generated by possibly expensive analytics processes is subject to decay over time as the data and algorithms used to compute it change, and the external knowledge embodied by reference datasets evolves. Deciding when such knowledge outcomes should be refreshed, following a sequence of data change events, requires problem-specific functions to quantify their value and its decay over time, as well as models for estimating the cost of their re-computation. Challenging is the ambition to develop a decision support system for informing re-computation decisions over time that is both generic and customisable.With the help of a case study from genomics, in this paper we offer an initial formalisation of this problem, highlight research challenges, and outline a possible approach based on the analysis of metadata from a history of past computations.

Original languageEnglish
Title of host publicationData Analytics - 31st British International Conference on Databases, BICOD 2017, Proceedings
EditorsAndrea Calì, Peter Wood, Nigel Martin, Alexandra Poulovassilis
PublisherSpringer Verlag
Pages65-77
Number of pages13
ISBN (Print)9783319607948
DOIs
Publication statusPublished - 2017
Event31st British International Conference on Databases, BICOD 2017 - London, United Kingdom
Duration: 10 Jul 201712 Jul 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10365 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference31st British International Conference on Databases, BICOD 2017
Country/TerritoryUnited Kingdom
CityLondon
Period10/07/1712/07/17

Bibliographical note

Publisher Copyright:
© Springer International Publishing AG 2017.

Keywords

  • Incremental computation
  • Metadata management
  • Partial re-computation
  • Provenance
  • Selective re-computation

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Preserving the value of large scale data analytics over time through selective re-computation'. Together they form a unique fingerprint.

Cite this