Scripting for large-scale sequencing based on Hadoop

Schumacher, André; Pireddu, Luca; Kallio, Aleksi; Niemenmaa, Matti; Korpelainen, Eija; Zanetti, Gianluigi; Heljanko, Keijo

Scripting for large-scale sequencing based on Hadoop

dc.contributor.author	Schumacher, André
dc.contributor.author	Pireddu, Luca
dc.contributor.author	Kallio, Aleksi
dc.contributor.author	Niemenmaa, Matti
dc.contributor.author	Korpelainen, Eija
dc.contributor.author	Zanetti, Gianluigi
dc.contributor.author	Heljanko, Keijo
dc.date.accessioned	2014-05-16T08:03:04Z
dc.date.available	2014-05-16T08:03:04Z
dc.date.issued	2013
dc.description.abstract	The large volumes of data generated by modern sequencing experiments present significant challenges in their manipulation and analysis. Traditional approaches are often found to be complicated to scale. We describe our ongoing work on SeqPig, a tool that facilitates the use of the Pig Latin distributed scripting language to manipulate, analyze and query sequencing data applying the advances motivated by the “big data revolution” in data-intensive activities. SeqPig provides access to popular data formats and implements a number of custom sequencing-specific functions. Most importantly, it grants users access to the scalable Hadoop platform from a high level scripting language	IT
dc.description.pagenumber	84-85	IT
dc.description.status	Pubblicato	IT
dc.identifier.doi	10.14806/ej.19.A.628	IT
dc.identifier.issn	2226-6089
dc.identifier.uri	http://hdl.handle.net/11050/909
dc.language.iso	en	IT
dc.relation.ispartof	EMBnet.journal. The Next NGS Challenge Conference: Data Processing and Integration 14-16 May 2013, Valencia, Spain	IT
dc.relation.ispartofseries	19;Suppl. A
dc.rights	Attribuzione - Non commerciale - Condividi allo stesso modo 3.0 Italia	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/it/	*
dc.subject	bioinformatics	IT
dc.subject	ngs	IT
dc.subject	data analysis	IT
dc.subject	cloud computing	IT
dc.subject	high-performance computing	IT
dc.subject.een-cordis	EEN CORDIS::SCIENZE BIOLOGICHE ::Ricerca sul genoma ::Bioinformatica	IT
dc.subject.program	Program::Biomedicine::Bioinformatics (BI)	IT
dc.title	Scripting for large-scale sequencing based on Hadoop	IT
dc.type	Articolo	IT

File

Original bundle

Ora in mostra 1 - 1 di 1

Nome:: 628-3761-2-PB.pdf
Dimensione:: 327.01 KB
Formato:: Adobe Portable Document Format
Descrizione:: Articolo in Open Access

Download

License bundle

Ora in mostra 1 - 1 di 1

Nome:: license.txt
Dimensione:: 2.06 KB
Formato:: Item-specific license agreed upon to submission
Descrizione:

Download

collections

CRS4 Articolo