Shotgun metagenomic sequencing comprehensively samples the DNA of a microbial sample. Choosing the best bioinformatics processing package can be daunting due to the wide variety of tools available. Here, we assessed publicly available shotgun metagenomics processing packages/pipelines including bioBakery, Just a Microbiology System (JAMS), Whole metaGenome Sequence Assembly V2 (WGSA2), and Woltka using 19 publicly available mock community samples and a set of five constructed pathogenic gut microbiome samples. Also included is a workflow for labelling bacterial scientific names with NCBI taxonomy identifiers for better resolution in assessing results. The Aitchison distance, a sensitivity metric, and total False Positive Relative Abundance were used for accuracy assessments for all pipelines and mock samples. Overall, bioBakery4 performed the best with most of the accuracy metrics, while JAMS and WGSA2, had the highest sensitivities. Furthermore, bioBakery is commonly used and only requires a basic knowledge of command line usage. This work provides an unbiased assessment of shotgun metagenomics packages and presents results assessing the performance of the packages using mock community sequence data.
Pubmed ID: 38233447 RIS Download
Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.
Database for a curated classification and nomenclature that contains the names of all organisms that are represented in the public sequence databases with at least one nucleotide or protein sequence. Data provided encompasses archaea, bacteria, eukaryota, viroids and viruses. The NCBI taxonomy database is not a primary source for taxonomic or phylogenetic information. Furthermore, the database does not follow a single taxonomic treatise but rather attempts to incorporate phylogenetic and taxonomic knowledge from a variety of sources, including the published literature, web-based databases, and the advice of sequence submitters and outside taxonomy experts. Consequently, the NCBI taxonomy database is not a phylogenetic or taxonomic authority and should not be cited as such.
View all literature mentionsA database of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Users can analyze protein sequences for Pfam matches, view Pfam family annotation and alignments, see groups of related families, look at the domain organization of a protein sequence, find the domains on a PDB structure, and query Pfam by keywords. There are two components to Pfam: Pfam-A and Pfam-B. Pfam-A entries are high quality, manually curated families that may automatically generate a supplement using the ADDA database. These automatically generated entries are called Pfam-B. Although of lower quality, Pfam-B families can be useful for identifying functionally conserved regions when no Pfam-A entries are found. Pfam also generates higher-level groupings of related families, known as clans (collections of Pfam-A entries which are related by similarity of sequence, structure or profile-HMM).
View all literature mentionsTHIS RESOURCE IS NO LONGER IN SERVICE. Documented on February 28,2023. Computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. It relies on unique clade-specific marker genes identified from reference genomes.
View all literature mentionsThe NIH Biowulf cluster is a GNU/Linux parallel processing system designed and built at the National Institutes of Health and managed by the Helix Systems Staff. The system is designed for large numbers of simultaneous jobs common in bioinformatics as well as large-scale distributed memory tasks such as molecular dynamics. Sponsor: This work was supported by the National Institutes of Health Intramural Research Program through the Center for Information Technology and the National Institute of Neurological Disorders and Stroke, and by the Internal National Institute of Standards and Technology Research Fund. Keywords: Software, Program, Processing, System, Simulatenous, Bioinformatics, Memory, Molecular, Dynamics,
View all literature mentionsSoftware-as-a-service for big data management offering fast, reliable, secure file transfer and sharing services to non-profit researchers. It combines state-of-the-art algorithms, data management tools, a graphical workflow environment, and an elastic computing infrastructure making it easy to manipulate, store, and share your data, no matter how big it gets.
View all literature mentionsA collection of tools and class interfaces for the assembly of DNA reads.
View all literature mentionsData aggregate that compiles results from bioinformatics analyses across multiple samples into a single report. It is written in Python.
View all literature mentionsCloud based platform for simplified, standardized and reproducible microbiome data analysis. Allows users to process microbiome datasets through pipelines of existing software tools.
View all literature mentionsSoftware fast and lightweight tool for processing sequences in FASTA or FASTQ format.
View all literature mentions