Why-Diff: Explaining differences amongst similar workflow runs by exploiting scientific metadata

Priyaa Thavasimani, Jacek Cala, Paolo Missier

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Majority of workflows executed nowadays need to process a massive amount of data. Re-execution of such dataintensive scientific workflows often results in different outputs. Scientific research progresses when discoveries are reproduced and verified. However, simply re-enacting a scientific computation, such as a workflow, does not guarantee the correctness of results because of unintentional changes that may have interfered with the re-enactment process. We investigate the hypothesis that the metadata of a workflow execution can be used to explain why the experimenter observes different results (cause analysis). Similarly, Scientific metadata can be used to determine the impact of intentional variations that the experimenter may have injected into a new version of the workflow. We explore these two complementary cases using a simple algorithm for traversing two metadata traces in lock-step mode, which we illustrate through two human genomics data analysis workflows.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
EditorsJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages3031-3041
Number of pages11
ISBN (Electronic)9781538627143
DOIs
Publication statusPublished - 1 Jul 2017
Event5th IEEE International Conference on Big Data, Big Data 2017 - Boston, United States
Duration: 11 Dec 201714 Dec 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
Volume2018-January

Conference

Conference5th IEEE International Conference on Big Data, Big Data 2017
Country/TerritoryUnited States
CityBoston
Period11/12/1714/12/17

Bibliographical note

Publisher Copyright:
© 2017 IEEE.

Keywords

  • Big Data
  • eScience Central
  • Metadata
  • Provenance
  • Reproducible Research
  • Why-Diff
  • Workflow

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Information Systems and Management
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Why-Diff: Explaining differences amongst similar workflow runs by exploiting scientific metadata'. Together they form a unique fingerprint.

Cite this