    Experiences with workflows for automating data-intensive bioinformatics
    (BioMed Central, 2015-08-19) Spjuth, Ola; Bongcam-Rudlof, Erik; Carrasco Hernández, Guillermo; Forer, Lucas; Giovacchini, Mario; Valls Guimera, Roman; Kallio, Aleksi; Korpelainen, Eija; Kanduła, Maciej M; Krachunov, Milko; Kreil, David P.; Kulev, Ognyan; Łabaj, Pavel P.; Lampa, Samuel; Pireddu, Luca; Schönherr, Sebastian; Siretskiy, Alexey; Vassilev, Dimitar
    High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution.
    Kinase domain-targeted isolation of defense-related receptor-like kinases (RLK/Pelle) in Platanus × acerifolia: phylogenetic and structural analysis
    (BioMed Central, 2014-12-08) Pilotti, Massimo; Brunetti, Angela; Uva, Paolo; Lumia, Valentina; Tizzani, Lorenza; Gervasi, Fabio; Iacono, MIchele; Pindo, Massimo
    Background: Plant receptor-like kinase (RLK/Pelle) family regulates growth and developmental processes and interaction with pathogens and symbionts. Platanaceae is one of the earliest branches of Eudicots temporally located before the split which gave rise to Rosids and Asterids. Thus investigations into the RLK family in Platanus can provide information on the evolution of this gene family in the land plants. Moreover RLKs are good candidates for finding genes that are able to confer resistance to Platanus pathogens. Results: Degenerate oligonucleotide primers targeting the kinase domain of stress-related RLKs were used to isolate for the first time 111 RLK gene fragments in Platanus × acerifolia. Sequences were classified as candidates of the following subfamilies: CrRLK1L, LRR XII, WAK-like, and LRR X-BRI1 group. All the structural features typical of the RLK kinase domain were identified, including the non-RD motif which marks potential pathogen recognition receptors (PRRs). The LRR XII candidates, whose counterpart in Arabidopsis and rice comprises non-RD PRRs, were mostly non-RD kinases, suggesting a group of PRRs. Region-specific signatures of a relaxed purifying selection in the LRR XII candidates were also found, which is novel for plant RLK kinase domain and further supports the role of LRR XII candidates as PRRs. As we obtained CrRLK1L candidates using primers designed on Pto of tomato, we analysed the phylogenetic relationship between CrRLK1L and Pto-like of plant species. We thus classified all non-solanaceous Pto-like genes as CrRLK1L and highlighted for the first time the close phylogenetic vicinity between CrRLK1L and Pto group. The origins of Pto from CrRLK1L is proposed as an evolutionary mechanism. Conclusions: The signatures of relaxed purifying selection highlight that a group of RLKs might have been involved in the expression of phenotypic plasticity and is thus a good candidate for investigations into pathogen resistance. Search of Pto-like genes in Platanus highlighted the close relationship between CrRLK1L and Pto group. It will be exciting to verify if sensu strictu Pto are present in taxonomic groups other than Solanaceae, in order to further clarify the evolutionary link with CrRLK1L. We obtained a first valuable resource useful for an in-depth study on stress perception systems.
    A generalizable definition of chemical similarity for read-across
    (BioMed Central, 2014-10-18) Floris, Matteo; Manganaro, Alberto; Nicolotti, Orazio; Medda, Ricardo; Mangiatordi, Giuseppe Felice; Benfenati, Emilio
    Background: Methods that provide a measure of chemical similarity are strongly relevant in several fields of chemoinformatics as they allow to predict the molecular behavior and fate of structurally close compounds. One common application of chemical similarity measurements, based on the principle that similar molecules have similar properties, is the read-across approach, where an estimation of a specific endpoint for a chemical is provided using experimental data available from highly similar compounds. Results: This paper reports the comparison of multiple combinations of binary fingerprints and similarity metrics for computing the chemical similarity in the context of two different applications of the read-across technique. Conclusions: Our analysis demonstrates that the classical similarity measurements can be improved with a generalizable model of similarity. The proposed approach has already been used to build similarity indices in two open-source software tools (CAESAR and VEGA) that make several QSAR models available. In these tools, the similarity index plays a key role for the assessment of the applicability domain.
    QTREDS: a Ruby on Rails-based platform for omics laboratories
    (BioMed Central, 2014-01-10) Palla, Piergiorgio; Frau, Gianfranco; Vargiu, Laura; Rodriguez-Tomé, Patricia
    Background In recent years, the experimental aspects of the laboratory activities have been growing in complexity in terms of amount and diversity of data produced, equipment used, of computer-based workflows needed to process and analyze the raw data generated. To enhance the level of quality control over the laboratory activities and efficiently handle the large amounts of data produced, a Laboratory Management Information System (LIMS) is highly-recommended. A LIMS is a complex software platform that helps researchers to have a complete knowledge of the laboratory activities at each step encouraging them to adopt good laboratory practices. Results We have designed and implemented Quality and TRacEability Data System - QTREDS, a software platform born to address the specific needs of the CRS4 Sequencing and Genotyping Platform (CSGP). The system written in the Ruby programming language and developed using the Rails framework is based on four main functional blocks: a sample handler, a workflow generator, an inventory management system and a user management system. The wizard-based sample handler allows to manage one or multiple samples at a time, tracking the path of each sample and providing a full chain of custody. The workflow generator encapsulates a user-friendly JavaScript-based visual tool that allows users to design customized workflows even for those without a technical background. With the inventory management system, reagents, laboratory glassware and consumables can be easily added through their barcodes and minimum stock levels can be controlled to avoid shortages of essential laboratory supplies. QTREDS provides a system for privileges management and authorizations to create different user roles, each with a well-defined access profile. Conclusions Tracking and monitoring all the phases of the laboratory activities can help to identify and troubleshoot problems more quickly, reducing the risk of process failures and their related costs. QTREDS was designed to address the specific needs of the CSGP laboratory, where it has been successfully used for over a year, but thanks to its flexibility it can be easily adapted to other "omics" laboratories. The software is freely available for academic users from webcite.
    Multifractal analysis and simulation of rainfall fields in space
    (Elsevier, 1999) Deidda, Roberto
    Statistical downscaling of precipitation from the large scales of meteorological models to the characteristic response scales of small catchment basins needs to correctly preserve the anomalous scaling laws observed in real rainfall. Multifractal behaviour of precipitation in space is investigated on a set of rainfall fields obtained by a high resolution simulation with a limited area model for numerical weather prediction and on two sets of radar measures of it rainfall during the GATE campaign. Some sets of synthetic rainfall fields were generated applying a multifractal model based on a wavelet expansion with coefficients extracted by a log-Poisson random cascade, and results of comparisons with the GATE rainfall fields are presented.