Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism-calling pipelines.

GigaScience | 2020

Accurately identifying single-nucleotide polymorphisms (SNPs) from bacterial sequencing data is an essential requirement for using genomics to track transmission and predict important phenotypes such as antimicrobial resistance. However, most previous performance evaluations of SNP calling have been restricted to eukaryotic (human) data. Additionally, bacterial SNP calling requires choosing an appropriate reference genome to align reads to, which, together with the bioinformatic pipeline, affects the accuracy and completeness of a set of SNP calls obtained. This study evaluates the performance of 209 SNP-calling pipelines using a combination of simulated data from 254 strains of 10 clinically common bacteria and real data from environmentally sourced and genomically diverse isolates within the genera Citrobacter, Enterobacter, Escherichia, and Klebsiella.

Pubmed ID: 32025702 RIS Download

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


NCBI Genome (tool)

RRID:SCR_002474

Database that organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations in six major organism groups: Archaea, Bacteria, Eukaryotes, Viruses, Viroids, and Plasmids. Genomes of over 1,200 organisms can be found in this database, representing both completely sequenced organisms and those for which sequencing is in progress. Users can browse by organism, and view genome maps and protein clusters. Links to other prokaryotic and archaeal genome projects, as well as BLAST tools and access to the rest of the NCBI online resources are available.

View all literature mentions

RefSeq (tool)

RRID:SCR_003496

Collection of curated, non-redundant genomic DNA, transcript RNA, and protein sequences produced by NCBI. Provides a reference for genome annotation, gene identification and characterization, mutation and polymorphism analysis, expression studies, and comparative analyses. Accessed through the Nucleotide and Protein databases.

View all literature mentions

FigShare (tool)

RRID:SCR_004328

Repository for all data, figures, theses, publications, posters, presentations, filesets, videos, datasets, negative data in a citable, shareable and discoverable manner with Digital Object Identifiers. Allows to upload any file format to be made visualisable in the browser so that figures, datasets, media, papers, posters, presentations and filesets can be disseminated in a way that the current scholarly publishing model does not allow. Features integration with ORCID, Symplectic Elements, can import items from Github and is a source tracked by Altmetric.com. Figshare gives users unlimited public space and 1GB of private storage space for free. Data are digitally preserved by CLOCKSS. Supported by Digital Science, a division of Macmillan Publishers Limited, as a community-based, open science project that retains its autonomy.

View all literature mentions

European Molecular Biology Laboratory (tool)

RRID:SCR_004473

Intergovernmental organisation funded by public research money from its member states in Europe. Groups and laboratories perform basic research in molecular biology and molecular medicine, training for scientists, students and visitors. Provides development of services, new instruments and methods, data and technology in its member states.

View all literature mentions

NCBI BioProject (tool)

RRID:SCR_004801

Database of biological data related to a single initiative, originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project. It is a searchable collection of complete and incomplete (in-progress) large-scale sequencing, assembly, annotation, and mapping projects for cellular organisms. Submissions are supported by a web-based Submission Portal. The database facilitates organization and classification of project data submitted to NCBI, EBI and DDBJ databases that captures descriptive information about research projects that result in high volume submissions to archival databases, ties together related data across multiple archives and serves as a central portal by which to inform users of data availability. BioProject records link to corresponding data stored in archival repositories. The BioProject resource is a redesigned, expanded, replacement of the NCBI Genome Project resource. The redesign adds tracking of several data elements including more precise information about a project''''s scope, material, and objectives. Genome Project identifiers are retained in the BioProject as the ID value for a record, and an Accession number has been added. Database content is exchanged with other members of the International Nucleotide Sequence Database Collaboration (INSDC). BioProject is accessible via FTP.

View all literature mentions

SAMtools/BCFtools (tool)

RRID:SCR_005227

Provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.

View all literature mentions

Stampy (tool)

RRID:SCR_005504

A software package for the mapping of short reads from illumina sequencing machines onto a reference genome. It''s recommended for most workflows, including those for genomic resequencing, RNA-Seq and Chip-seq. Stampy excels in the mapping of reads containing that contain sequence variation relative to the reference, in particular for those containing insertions or deletions. It can map reads from a highly divergent species to a reference genome for instance. Stampy achieves high sensitivity and speed by using a fast hashing algorithm and a detailed statistical model. Stampy has the following features: * Maps single, paired-end and mate pair Illumina reads to a reference genome * Fast: about 20 Gbase per hour in hybrid mode (using BWA) * Low memory footprint: 2.7 Gb shared memory for a 3Gbase genome * High sensitivity for indels and divergent reads, up to 10-15% * Low mapping bias for reads with SNPs * Well calibrated mapping quality scores * Input: Fastq and Fasta; gzipped or plain * Output: SAM, Maq''s map file * Optionally calculates per-base alignment posteriors * Optionally processes part of the input * Handles reads of up to 4500 bases

View all literature mentions

COMPASS (tool)

RRID:SCR_015874

Algorithm for MATLAB and the EEGLAB toolbox that enables the automatic detection of independent components from an ICA that represent event-related brain potentials. It performs automatic Independent Component (IC) selection with respect to the contributions of the ICs to a certain ERP.

View all literature mentions

Porechop (tool)

RRID:SCR_016967

Software tool for finding and removing adapters from Oxford Nanopore reads.

View all literature mentions

Unicycler (tool)

RRID:SCR_024380

Software assembly pipeline for bacterial genomes. Used for resolving bacterial genome assemblies from short and long sequencing reads. Can assemble Illumina only read sets where it functions as SPAdes-optimiser. Can assembly long read only sets for PacBio or Nanopore where it runs miniasm+Racon pipeline.

View all literature mentions

FreeBayes (software resource)

RRID:SCR_010761

A Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs, indels, MNPs, and complex events smaller than the length of a short-read sequencing alignment.

View all literature mentions

Canu (software resource)

RRID:SCR_015880

Software for scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Canu is a fork of the Celera Assembler and is designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION).

View all literature mentions

Kraken (software resource)

RRID:SCR_005484

A set of software tools ( Reaper, Tally and Sequence Imp) designed to streamline the analysis of next-generation sequencing data. Although designed with small RNA sequence analysis in mind the tools can be used to address issues facing next-generation sequencing in general.

View all literature mentions

GATK (software resource)

RRID:SCR_001876

A software package to analyze next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size. This software library makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner. (entry from Genetic Analysis Software)

View all literature mentions

Picard (software toolkit)

RRID:SCR_006525

Java toolset for working with next generation sequencing data in the BAM format.

View all literature mentions

SAMTOOLS (software resource)

RRID:SCR_002105

Original SAMTOOLS package has been split into three separate repositories including Samtools, BCFtools and HTSlib. Samtools for manipulating next generation sequencing data used for reading, writing, editing, indexing,viewing nucleotide alignments in SAM,BAM,CRAM format. BCFtools used for reading, writing BCF2,VCF, gVCF files and calling, filtering, summarising SNP and short indel sequence variants. HTSlib used for reading, writing high throughput sequencing data.

View all literature mentions

MEGA Software (software resource)

RRID:SCR_000667

Software integrated tool for conducting automatic and manual sequence alignment, inferring phylogenetic trees, mining web based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses. Used for comparative analysis of DNA and protein sequences to infer molecular evolutionary patterns of genes, genomes, and species over time. MEGA version 4 expands on existing facilities for editing DNA sequence data from autosequencers, mining Web-databases, performing automatic and manual sequence alignment, analyzing sequence alignments to estimate evolutionary distances, inferring phylogenetic trees, and testing evolutionary hypotheses. MEGA version 6 enables inference of timetrees, as it implements RelTime method for estimating divergence times for all branching points in phylogeny.

View all literature mentions