Findable and reusable workflow data products: A genomic workflow case study

Gaignard, Alban; Skaf-Molli, Hala; Belhajjame, Khalid

doi:10.3233/SW-200374

Findable and reusable workflow data products: A genomic workflow case study

Issue title: Semantic eScience: Methods, Tools and Applications

Guest editors: Daniel Garijo, Natalia Villanueva-Rosales and Tomi Kauppinen

Article type: Research Article

Authors: Gaignard, Alban^{a; *} | Skaf-Molli, Hala^b | Belhajjame, Khalid^c

Affiliations: [a] l’institut du thorax, INSERM, CNRS, University of Nantes, Nantes, France. E-mail: [email protected] | [b] LS2N, University of Nantes, Nantes, France. E-mail: [email protected] | [c] PSL, Université Paris-Dauphine, LAMSADE, Paris, France. E-mail: [email protected]

Correspondence: [*] Corresponding author. E-mail: [email protected].

Abstract: While workflow systems have improved the repeatability of scientific experiments, the value of the processed (intermediate) data have been overlooked so far. In this paper, we argue that the intermediate data products of workflow executions should be seen as first-class objects that need to be curated and published. Not only will this be exploited to save time and resources needed when re-executing workflows, but more importantly, it will improve the reuse of data products by the same or peer scientists in the context of new hypotheses and experiments. To assist curator in annotating (intermediate) workflow data, we exploit in this work multiple sources of information, namely: (i) the provenance information captured by the workflow system, and (ii) domain annotations that are provided by tools registries, such as Bio.Tools. Furthermore, we show, on a concrete bioinformatics scenario, how summarising techniques can be used to reduce the machine-generated provenance information of such data products into concise human- and machine-readable annotations.

Keywords: FAIR, Linked Data, scientific workflows, provenance, bioinformatics, data summaries

DOI: 10.3233/SW-200374

Journal: Semantic Web, vol. 11, no. 5, pp. 751-763, 2020

Published: 25 August 2020

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia