Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

NanoVar: accurate characterization of patients' genomic structural variants using low-depth nanopore sequencing.

Genome biology | 2020

The recent advent of third-generation sequencing technologies brings promise for better characterization of genomic structural variants by virtue of having longer reads. However, long-read applications are still constrained by their high sequencing error rates and low sequencing throughput. Here, we present NanoVar, an optimized structural variant caller utilizing low-depth (8X) whole-genome sequencing data generated by Oxford Nanopore Technologies. NanoVar exhibits higher structural variant calling accuracy when benchmarked against current tools using low-depth simulated datasets. In patient samples, we successfully validate structural variants characterized by NanoVar and uncover normal alternative sequences or alleles which are present in healthy individuals.

Pubmed ID: 32127024 RIS Download

Associated grants

  • Agency: NHLBI NIH HHS, United States
    Id: P01 HL131477
  • Agency: NCI NIH HHS, United States
    Id: R35 CA197697

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


ATCC (tool)

RRID:SCR_001672

Global nonprofit biological resource center (BRC) and research organization that provides biological products, technical services and educational programs to private industry, government and academic organizations. Its mission is to acquire, authenticate, preserve, develop and distribute biological materials, information, technology, intellectual property and standards for the advancement and application of scientific knowledge. The primary purpose of ATCC is to use its resources and experience as a BRC to become the world leader in standard biological reference materials management, intellectual property resource management and translational research as applied to biomaterial development, standardization and certification. ATCC characterizes cell lines, bacteria, viruses, fungi and protozoa, as well as develops and evaluates assays and techniques for validating research resources and preserving and distributing biological materials to the public and private sector research communities.

View all literature mentions

RSVSim (tool)

RRID:SCR_001777

A software package for the simulation of deletions, insertions, inversions, tandem duplications and translocations of various sizes in any genome available as FASTA-file or data package in R. SV breakpoints can be placed uniformly accross the whole genome, with a bias towards repeat regions and regions of high homology (for hg19) or at user-supplied coordinates.

View all literature mentions

SAMTOOLS (tool)

RRID:SCR_002105

Original SAMTOOLS package has been split into three separate repositories including Samtools, BCFtools and HTSlib. Samtools for manipulating next generation sequencing data used for reading, writing, editing, indexing,viewing nucleotide alignments in SAM,BAM,CRAM format. BCFtools used for reading, writing BCF2,VCF, gVCF files and calling, filtering, summarising SNP and short indel sequence variants. HTSlib used for reading, writing high throughput sequencing data.

View all literature mentions

scikit-learn (tool)

RRID:SCR_002577

scikit-learn: machine learning in Python

View all literature mentions

GenBank (tool)

RRID:SCR_002760

NIH genetic sequence database that provides annotated collection of all publicly available DNA sequences for almost 280 000 formally described species (Jan 2014) .These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using web-based BankIt or standalone Sequin programs, and GenBank staff assigns accession numbers upon data receipt. It is part of International Nucleotide Sequence Database Collaboration and daily data exchange with European Nucleotide Archive (ENA) and DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through NCBI Entrez retrieval system, which integrates data from major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of GenBank database are available by FTP.

View all literature mentions

DELLY (tool)

RRID:SCR_004603

Integrated structural variant prediction software that can detect deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends and split-reads to sensitively and accurately delineate genomic rearrangements throughout genome.

View all literature mentions

LAST (tool)

RRID:SCR_006119

THIS RESOURCE IS NO LONGER IN SERVICE. Documented on February 28,2023. Software tool for aligning sequences, similar to BLAST 2 sequences that colour-codes the alignments by reliability. Another useful feature of LAST is that it can compare huge (vertebrate-genome-sized) datasets. Unfortunately, this only applies to the downloadable version of LAST, not the web service. The web service can just about handle bacterial genomes, but it will take a few minutes and the output will be large. LAST can: * Handle big sequence data, e.g: ** Compare two vertebrate genomes ** Align billions of DNA reads to a genome * Indicate the reliability of each aligned column. * Use sequence quality data properly. * Compare DNA to proteins, with frameshifts. * Compare PSSMs to sequences * Calculate the likelihood of chance similarities between random sequences. LAST cannot (yet): * Do spliced alignment.

View all literature mentions

Bioconductor (tool)

RRID:SCR_006442

Software repository for R packages related to analysis and comprehension of high throughput genomic data. Uses separate set of commands for installation of packages. Software project based on R programming language that provides tools for analysis and comprehension of high throughput genomic data.

View all literature mentions

Picard (tool)

RRID:SCR_006525

Java toolset for working with next generation sequencing data in the BAM format.

View all literature mentions

BEDTools (tool)

RRID:SCR_006646

A powerful toolset for genome arithmetic allowing one to address common genomics tasks such as finding feature overlaps and computing coverage. Bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.

View all literature mentions

1000 Genomes Project and AWS (tool)

RRID:SCR_008801

A dataset containing the full genomic sequence of 1,700 individuals, freely available for research use. The 1000 Genomes Project is an international research effort coordinated by a consortium of 75 companies and organizations to establish the most detailed catalogue of human genetic variation. The project has grown to 200 terabytes of genomic data including DNA sequenced from more than 1,700 individuals that researchers can now access on AWS for use in disease research free of charge. The dataset containing the full genomic sequence of 1,700 individuals is now available to all via Amazon S3. The data can be found at: http://s3.amazonaws.com/1000genomes The 1000 Genomes Project aims to include the genomes of more than 2,662 individuals from 26 populations around the world, and the NIH will continue to add the remaining genome samples to the data collection this year. Public Data Sets on AWS provide a centralized repository of public data hosted on Amazon Simple Storage Service (Amazon S3). The data can be seamlessly accessed from AWS services such Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic MapReduce (Amazon EMR), which provide organizations with the highly scalable compute resources needed to take advantage of these large data collections. AWS is storing the public data sets at no charge to the community. Researchers pay only for the additional AWS resources they need for further processing or analysis of the data. All 200 TB of the latest 1000 Genomes Project data is available in a publicly available Amazon S3 bucket. You can access the data via simple HTTP requests, or take advantage of the AWS SDKs in languages such as Ruby, Java, Python, .NET and PHP. Researchers can use the Amazon EC2 utility computing service to dive into this data without the usual capital investment required to work with data at this scale. AWS also provides a number of orchestration and automation services to help teams make their research available to others to remix and reuse. Making the data available via a bucket in Amazon S3 also means that customers can crunch the information using Hadoop via Amazon Elastic MapReduce, and take advantage of the growing collection of tools for running bioinformatics job flows, such as CloudBurst and Crossbow.

View all literature mentions

Picky (tool)

RRID:SCR_010963

A software tool for selecting optimal oligonucleotides (oligos) that allows the rapid and efficient determination of gene-specific oligos based on given gene sets, and can be used for large, complex genomes such as human, mouse, or maize.

View all literature mentions

New England Biolabs (tool)

RRID:SCR_013517

An Antibody supplier

View all literature mentions

Albacore (tool)

RRID:SCR_015897

Data processing basecaller for the Oxford Nanopore sequencer that identifies DNA sequences directly from raw data. It enhances accuracy of the single-read sequence data, contributing to high consensus accuracy for nanopore sequence data.

View all literature mentions

MCF-10A (tool)

RRID:CVCL_0598

Cell line MCF-10A is a Spontaneously immortalized cell line with a species of origin Homo sapiens

View all literature mentions

HCT 116 (tool)

RRID:CVCL_0291

Cell line HCT 116 is a Cancer cell line with a species of origin Homo sapiens (Human)

View all literature mentions