Next-Generation Sequencing (NGS) technologies represent a high-throughput, automated, fast and economical tool in order to deep investigate whole genomes and transcriptomes. Here we present an overview of bioinformatics approaches that could be applied for NGS data analysis in microbiology, spanning from de novo and reference-guided genome assembly and relative annotation, transcriptome assembly, differential gene expression analysis, SNPs discovery, to phylogenomics. After sequencing, short sequenced fragments (reads) are filtered to remove adapters and lowquality sequences; the remaining high-quality reads are then assembled (i.e. with SOAPdenovo2, SPAdes) or mapped against a reference genome (bwa, bowtie) or reconstructed based on this latter (IMR-DENOM, Reconstructor). BUSCO and QUAST programs can be used to evaluate the assembly results. Annotation can be performed using ab initio gene predictors, tRNA, rRNA and repeats searching tools, followed by homology search (Blast, InterProScan) to determine the function of predicted features. For transcriptomic data, if a reference genome is not available, high-quality reads can be de novo assembled (Trinity, Trans-AbySS), otherwise reads can be mapped on the reference genome (STAR). Then transcripts can be identified and count for differential gene expression purpose (Cufflink, EdgeR). An homology based search (i.e. Blast2GO) can be applied to make the functional annotation of detected features. Both in genomic and transcriptomic projects, SNPs discovery between samples and reference sequences can be performed using Samtools, GATK or SUPER-CAP softwares. Molecular phylogeny using a gene-bygene comparison with a multi-locus sequencing typing (MLST) strategy can be performed on specific genes sequenced by target-sequencing or extracted from assembled genomes
BIOINFORMATICS ANALYSIS OF NEXT-GENERATION SEQUENCING DATA IN MICROBIOLOGY
D. Giosa;L. Giuffrè;M. R. Felice;G. Criseo;E. D’Alessandro;O. Romeo
2017-01-01
Abstract
Next-Generation Sequencing (NGS) technologies represent a high-throughput, automated, fast and economical tool in order to deep investigate whole genomes and transcriptomes. Here we present an overview of bioinformatics approaches that could be applied for NGS data analysis in microbiology, spanning from de novo and reference-guided genome assembly and relative annotation, transcriptome assembly, differential gene expression analysis, SNPs discovery, to phylogenomics. After sequencing, short sequenced fragments (reads) are filtered to remove adapters and lowquality sequences; the remaining high-quality reads are then assembled (i.e. with SOAPdenovo2, SPAdes) or mapped against a reference genome (bwa, bowtie) or reconstructed based on this latter (IMR-DENOM, Reconstructor). BUSCO and QUAST programs can be used to evaluate the assembly results. Annotation can be performed using ab initio gene predictors, tRNA, rRNA and repeats searching tools, followed by homology search (Blast, InterProScan) to determine the function of predicted features. For transcriptomic data, if a reference genome is not available, high-quality reads can be de novo assembled (Trinity, Trans-AbySS), otherwise reads can be mapped on the reference genome (STAR). Then transcripts can be identified and count for differential gene expression purpose (Cufflink, EdgeR). An homology based search (i.e. Blast2GO) can be applied to make the functional annotation of detected features. Both in genomic and transcriptomic projects, SNPs discovery between samples and reference sequences can be performed using Samtools, GATK or SUPER-CAP softwares. Molecular phylogeny using a gene-bygene comparison with a multi-locus sequencing typing (MLST) strategy can be performed on specific genes sequenced by target-sequencing or extracted from assembled genomesPubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.