Predicting the execution time of workflow activities based on their input features

Tudor Miu*, Paolo Missier

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The ability to accurately estimate the execution time of computationally expensive e-science algorithms enables better scheduling of workflows that incorporate those algorithms as their building blocks, and may give users an insight into the expected cost of workflow execution on cloud resources. When a large history of past runs can be observed, crude estimates such as the average execution time can easily be provided. We make the hypothesis that, for some algorithms, better estimates can be obtained by using the histories to learn regression models that predict execution time based on selected features of their inputs. We refer to this property as input predictability of algorithms. We are motivated by e-science workflows that involve repetitive training of multiple learning models. Thus, we verify our hypothesis on the specific case of the C4.5 decision tree builder, a well-known learning method whose training execution time is indeed sensitive to the specific input dataset, but in non- obvious ways. We use the case study to demonstrate a method for assessing input predictability. While this yields promising results, we also find that its more general applicability involves a trade off between the black-box nature of the algorithms under analysis, and the need for expert insight into relevant features of their inputs.

Original languageEnglish
Title of host publicationProceedings - 2012 SC Companion
Subtitle of host publicationHigh Performance Computing, Networking Storage and Analysis, SCC 2012
Pages64-72
Number of pages9
DOIs
Publication statusPublished - 2012
Event2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012 - Salt Lake City, UT, United States
Duration: 10 Nov 201216 Nov 2012

Publication series

NameProceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012

Conference

Conference2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012
Country/TerritoryUnited States
CitySalt Lake City, UT
Period10/11/1216/11/12

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Predicting the execution time of workflow activities based on their input features'. Together they form a unique fingerprint.

Cite this