Experiences with workflows for automating data-intensive bioinformatics
Caricamento...
Data
2015-08-19
Autori
Spjuth, Ola
Bongcam-Rudlof, Erik
Carrasco Hernández, Guillermo
Forer, Lucas
Giovacchini, Mario
Valls Guimera, Roman
Kallio, Aleksi
Korpelainen, Eija
Kanduła, Maciej M
Krachunov, Milko
Titolo del periodico
ISSN
Titolo del volume
Editore
BioMed Central
Abstract
High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a
data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out
data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of
analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However,
workflow systems can incur significant development and administration overhead so bioinformatics pipelines are
often still built without them. We present the experiences with workflows and workflow systems within the
bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead.
The organizations are working on similar problems, but we have addressed them with different strategies and
solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our
experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics
workflow construction and execution.
Descrizione
Keywords
workflow , automation , big data , reproducibility , high-performance computing , data-intensive