Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology.

PloS one | 2012

Many genomes have been sequenced to high-quality draft status using Sanger capillary electrophoresis and/or newer short-read sequence data and whole genome assembly techniques. However, even the best draft genomes contain gaps and other imperfections due to limitations in the input data and the techniques used to build draft assemblies. Sequencing biases, repetitive genomic features, genomic polymorphism, and other complicating factors all come together to make some regions difficult or impossible to assemble. Traditionally, draft genomes were upgraded to "phase 3 finished" status using time-consuming and expensive Sanger-based manual finishing processes. For more facile assembly and automated finishing of draft genomes, we present here an automated approach to finishing using long-reads from the Pacific Biosciences RS (PacBio) platform. Our algorithm and associated software tool, PBJelly, (publicly available at https://sourceforge.net/projects/pb-jelly/) automates the finishing process using long sequence reads in a reference-guided assembly process. PBJelly also provides "lift-over" co-ordinate tables to easily port existing annotations to the upgraded assembly. Using PBJelly and long PacBio reads, we upgraded the draft genome sequences of a simulated Drosophila melanogaster, the version 2 draft Drosophila pseudoobscura, an assembly of the Assemblathon 2.0 budgerigar dataset, and a preliminary assembly of the Sooty mangabey. With 24× mapped coverage of PacBio long-reads, we addressed 99% of gaps and were able to close 69% and improve 12% of all gaps in D. pseudoobscura. With 4× mapped coverage of PacBio long-reads we saw reads address 63% of gaps in our budgerigar assembly, of which 32% were closed and 63% improved. With 6.8× mapped coverage of mangabey PacBio long-reads we addressed 97% of gaps and closed 66% of addressed gaps and improved 19%. The accuracy of gap closure was validated by comparison to Sanger sequencing on gaps from the original D. pseudoobscura draft assembly and shown to be dependent on initial reference quality.

Pubmed ID: 23185243 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

  • Agency: NCRR NIH HHS, United States
    Id: S10 RR026605
  • Agency: NHGRI NIH HHS, United States
    Id: U54 HG003273
  • Agency: NCRR NIH HHS, United States
    Id: 1S10RR026605-01

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


GWAS: Catalog of Published Genome-Wide Association Studies (tool)

RRID:SCR_012745

Catalog of published genome-wide association studies. Genome-wide set of genetic variants in different individuals to see if any variant is associated with trait and disease. Database of genome-wide association study (GWAS) publications including only those attempting to assay single nucleotide polymorphisms (SNPs). Publications are organized from most to least recent date of publication. Studies are identified through weekly PubMed literature searches, daily NIH-distributed compilations of news and media reports, and occasional comparisons with an existing database of GWAS literature (HuGE Navigator). Works with HANCESTRO ancestry representation.

View all literature mentions

National Human Genome Research Institute (tool)

RRID:SCR_011416

One of 27 institutes and centers that make up the NIH, National Human Genome Research Institute (NHGRI) is devoted to advancing health through genome research. The Institute led NIH''s contribution to the Human Genome Project, which was successfully completed in 2003 ahead of schedule and under budget. Building on the foundation laid by the sequencing of the human genome, NHGRI''s work now encompasses a broad range of research aimed at expanding understanding of human biology and improving human health. The NHGRI''s mission has expanded to encompass a broad range of studies aimed at understanding the structure and function of the human genome and its role in health and disease. To that end NHGRI supports the development of resources and technology that will accelerate genome research and its application to human health. A critical part of the NHGRI mission continues to be the study of the ethical, legal and social implications (ELSI) of genome research. NHGRI also supports the training of investigators and the dissemination of genome information to the public and to health professionals.

View all literature mentions