Scripting for large-scale sequencing based on Hadoop
dc.contributor.author | Schumacher, André | |
dc.contributor.author | Pireddu, Luca | |
dc.contributor.author | Kallio, Aleksi | |
dc.contributor.author | Niemenmaa, Matti | |
dc.contributor.author | Korpelainen, Eija | |
dc.contributor.author | Zanetti, Gianluigi | |
dc.contributor.author | Heljanko, Keijo | |
dc.date.accessioned | 2014-05-16T08:03:04Z | |
dc.date.available | 2014-05-16T08:03:04Z | |
dc.date.issued | 2013 | |
dc.description.abstract | The large volumes of data generated by modern sequencing experiments present significant challenges in their manipulation and analysis. Traditional approaches are often found to be complicated to scale. We describe our ongoing work on SeqPig, a tool that facilitates the use of the Pig Latin distributed scripting language to manipulate, analyze and query sequencing data applying the advances motivated by the “big data revolution” in data-intensive activities. SeqPig provides access to popular data formats and implements a number of custom sequencing-specific functions. Most importantly, it grants users access to the scalable Hadoop platform from a high level scripting language | IT |
dc.description.pagenumber | 84-85 | IT |
dc.description.status | Pubblicato | IT |
dc.identifier.doi | 10.14806/ej.19.A.628 | IT |
dc.identifier.issn | 2226-6089 | |
dc.identifier.uri | http://hdl.handle.net/11050/909 | |
dc.language.iso | en | IT |
dc.relation.ispartof | EMBnet.journal. The Next NGS Challenge Conference: Data Processing and Integration 14-16 May 2013, Valencia, Spain | IT |
dc.relation.ispartofseries | 19;Suppl. A | |
dc.rights | Attribuzione - Non commerciale - Condividi allo stesso modo 3.0 Italia | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/3.0/it/ | * |
dc.subject | bioinformatics | IT |
dc.subject | ngs | IT |
dc.subject | data analysis | IT |
dc.subject | cloud computing | IT |
dc.subject | high-performance computing | IT |
dc.subject.een-cordis | EEN CORDIS::SCIENZE BIOLOGICHE ::Ricerca sul genoma ::Bioinformatica | IT |
dc.subject.program | Program::Biomedicine::Bioinformatics (BI) | IT |
dc.title | Scripting for large-scale sequencing based on Hadoop | IT |
dc.type | Articolo | IT |
File
Original bundle
1 - 1 di 1
Caricamento...
- Nome:
- 628-3761-2-PB.pdf
- Dimensione:
- 327.01 KB
- Formato:
- Adobe Portable Document Format
- Descrizione:
- Articolo in Open Access
License bundle
1 - 1 di 1
Caricamento...
- Nome:
- license.txt
- Dimensione:
- 2.06 KB
- Formato:
- Item-specific license agreed upon to submission
- Descrizione: