TY - GEN
T1 - Linking multiple workflow provenance traces for interoperable collaborative science
AU - Missier, Paolo
AU - Ludäscher, Bertram
AU - Bowers, Shawn
AU - Dey, Saumen
AU - Sarkar, Anandarup
AU - Shrestha, Biva
AU - Altintas, Ilkay
AU - Anand, Manish Kumar
AU - Goble, Carole
PY - 2010
Y1 - 2010
N2 - Scientific collaboration increasingly involves data sharing between separate groups. We consider a scenario where data products of scientific workflows are published and then used by other researchers as inputs to their workflows. For proper interpretation, shared data must be complemented by descriptive metadata. We focus on provenance traces, a prime example of such metadata which describes the genesis and processing history of data products in terms of the computational workflow steps. Through the reuse of published data, virtual, implicitly collaborative experiments emerge, making it desirable to compose the independently generated traces into global ones that describe the combined executions as single, seamless experiments. We present a model for provenance sharing that realizes this holistic view by overcoming the various interoperability problems that emerge from the heterogeneity of workflow systems, data formats, and provenance models. At the heart lie (i) an abstract workflow and provenance model in which (ii) data sharing becomes itself part of the combined workflow. We then describe an implementation of our model that we developed in the context of the Data Observation Network for Earth (DataONE) project and that can "stitch together" traces from different Kepler and Taverna workflow runs. It provides a prototypical framework for seamless cross-system, collaborative provenance management and can be easily extended to include other systems. Our approach also opens the door to new ways of workflow interoperability not only through often elusive workflow standards but through shared provenance information from public repositories.
AB - Scientific collaboration increasingly involves data sharing between separate groups. We consider a scenario where data products of scientific workflows are published and then used by other researchers as inputs to their workflows. For proper interpretation, shared data must be complemented by descriptive metadata. We focus on provenance traces, a prime example of such metadata which describes the genesis and processing history of data products in terms of the computational workflow steps. Through the reuse of published data, virtual, implicitly collaborative experiments emerge, making it desirable to compose the independently generated traces into global ones that describe the combined executions as single, seamless experiments. We present a model for provenance sharing that realizes this holistic view by overcoming the various interoperability problems that emerge from the heterogeneity of workflow systems, data formats, and provenance models. At the heart lie (i) an abstract workflow and provenance model in which (ii) data sharing becomes itself part of the combined workflow. We then describe an implementation of our model that we developed in the context of the Data Observation Network for Earth (DataONE) project and that can "stitch together" traces from different Kepler and Taverna workflow runs. It provides a prototypical framework for seamless cross-system, collaborative provenance management and can be easily extended to include other systems. Our approach also opens the door to new ways of workflow interoperability not only through often elusive workflow standards but through shared provenance information from public repositories.
UR - http://www.scopus.com/inward/record.url?scp=78751492112&partnerID=8YFLogxK
U2 - 10.1109/WORKS.2010.5671861
DO - 10.1109/WORKS.2010.5671861
M3 - Conference contribution
AN - SCOPUS:78751492112
SN - 9781424489893
T3 - 2010 5th Workshop on Workflows in Support of Large-Scale Science, WORKS 2010
BT - 2010 5th Workshop on Workflows in Support of Large-Scale Science, WORKS 2010
T2 - 2010 5th Workshop on Workflows in Support of Large-Scale Science, WORKS 2010
Y2 - 14 November 2010 through 14 November 2010
ER -