Commonly used tools in bioinformatics analyses of NGS dataset. Although, a lot of these tools can be used for other purposes too (such as microarray data, Sanger sequencing, proteomics).
This is a (non-exhaustive) list, but represents some of the most up-to-date and most commonly used tools.
Most of these tools are open source and free.
Most of these tools are manipulated from the command line, although some of them also provide a GUI
Keeping up to data with sequencing platforms and cost
A set of Java command line tools for manipulating high-throughput sequencing data (HTS) data and formats.
A quality control tool for high throughput sequence data.
-Mapping sequences (e.g 454, Illumina) against a large reference genome, such as the human genome.
-Ultrafast, memory-efficient short read aligner.
de novo transcriptome assembly
-Efficient and robust de novo reconstruction of transcriptomes from RNA-seq data
-de novo assembly of RNA-Seq data using ABySS
-de novo transcriptome assembler basing on the SOAPdenovo framework, adapt to alternative splicing and different expression level among transcripts
-dSequence assembler and sequence mapping for whole genome shotgun and EST / RNASeq sequencing data.
de novo genome assembly
Variant calling (SNPs / short indels)
-Samtools is a suite of programs for interacting with high-throughput sequencing data.
The Genome Analysis Toolkit (GATK)
There are a variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance.
-TASSEL is a bioinformatics software package that can analyze diversity for sequences, SNPs, or SSRs.
-Software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.
-pyRAD can analyze RAD, ddRAD, GBS, paired-end ddRAD and paired-end GBS data sets.
-Aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. Usefull for analysing splice variants and their expression from NGS datasets
There are also several R packages listed here
are specifically geared towards gene expression analyses
All-in-one proprietary software
-Comprehensive bioinformatics software platform.
CLC Genomics Workbench
-CLC Genomics Workbench, for analyzing and visualizing next generation sequencing data.
Gene Ontology (GO) analyses
-Functional annotation of (novel) sequences and the analysis of annotation data. Also has a GUI
-Analyses of gene sets in high-throughput genomics data such as gene expression profiling studies. Also has a GUI
-Comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes.
Microbial diversity / ecology
See also this
page on the wiki.
-A comprehensive bioinformatics software platform for microbial ecology (eg. 16S rRNA gene sequences diversity)
Clustering and comparing protein or nucleotide sequences
ngs_sequence_alignment_and_variant_calling.txt · Last modified: 2014/11/26 09:58 by sebastien.renaut