Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994% identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state.
Pubmed ID: 24418700 RIS Download
Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.
NIH genetic sequence database that provides annotated collection of all publicly available DNA sequences for almost 280 000 formally described species (Jan 2014) .These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using web-based BankIt or standalone Sequin programs, and GenBank staff assigns accession numbers upon data receipt. It is part of International Nucleotide Sequence Database Collaboration and daily data exchange with European Nucleotide Archive (ENA) and DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through NCBI Entrez retrieval system, which integrates data from major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of GenBank database are available by FTP.
View all literature mentionsA cache-oblivious algorithm designed to map short reads to reference genome assemblies in a fast and memory-efficient manner. It optimizes cache usage to get higher performance. Currently Supported Features: * Mistmatches, No indels * Paired-end Mapping Mode * Discordant Paired-end Mapping Mode (to be used in conjuction with Variation Hunter)
View all literature mentionsDatabase of biological data related to a single initiative, originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project. It is a searchable collection of complete and incomplete (in-progress) large-scale sequencing, assembly, annotation, and mapping projects for cellular organisms. Submissions are supported by a web-based Submission Portal. The database facilitates organization and classification of project data submitted to NCBI, EBI and DDBJ databases that captures descriptive information about research projects that result in high volume submissions to archival databases, ties together related data across multiple archives and serves as a central portal by which to inform users of data availability. BioProject records link to corresponding data stored in archival repositories. The BioProject resource is a redesigned, expanded, replacement of the NCBI Genome Project resource. The redesign adds tracking of several data elements including more precise information about a project''''s scope, material, and objectives. Genome Project identifiers are retained in the BioProject as the ID value for a record, and an Accession number has been added. Database content is exchanged with other members of the International Nucleotide Sequence Database Collaboration (INSDC). BioProject is accessible via FTP.
View all literature mentionsRepository of raw sequencing data from next generation of sequencing platforms including including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, Helicos Heliscope, Complete Genomics, and Pacific Biosciences SMRT. In addition to raw sequence data, SRA now stores alignment information in form of read placements on reference sequence. Data submissions are welcome. Archive of high throughput sequencing data,part of international partnership of archives (INSDC) at NCBI, European Bioinformatics Institute and DNA Database of Japan. Data submitted to any of this three organizations are shared among them.
View all literature mentionsIt is the distribution arm of their academic laboratory. They operate on a cost-recovery mechanism in order to make the resources generated in their laboratory available to the academic scientific community. While clones and screening services are widely available, library arrays are primarily available to researchers with a scientific need to analyze most clones in the library. This site contains information on currently available BAC and PAC genomic DNA libraries, BAC Clones, PAC Clones, Fosmid Clones, cDNA collections, high-density colony hybridization filters, and BAC and PAC cloning vectors. Protocols used in our laboratory for the hybridization-based screening of colony filters, purification of BAC and PAC DNA, and end-sequencing methodologies, are also provided. BPRC does not list clones, for two reasons: 1)most clones have not been characterized and lack specific data. 2)all clones are part of libraries and all clones from a particular library share common characteristics. Hence, to find out if BPRC has a particular clone, one needs either use Automatic Clone Validation or else find out if the clone is compatible with the range of clone names for a corresponding clone library. Typically (although not always), clone names are derived from the library name. BPRC uses the NCBI-recommended clone nomenclature & library nomenclature. Most arrayed libraries are available in frozen microtiter dish format to academic and non-academic users provided that there is a scientific need for complete-library access. (for instance to annotate, modify or analyze all BAC clones as part of a genome project).
View all literature mentionsIt is the distribution arm of their academic laboratory. They operate on a cost-recovery mechanism in order to make the resources generated in their laboratory available to the academic scientific community. While clones and screening services are widely available, library arrays are primarily available to researchers with a scientific need to analyze most clones in the library. This site contains information on currently available BAC and PAC genomic DNA libraries, BAC Clones, PAC Clones, Fosmid Clones, cDNA collections, high-density colony hybridization filters, and BAC and PAC cloning vectors. Protocols used in our laboratory for the hybridization-based screening of colony filters, purification of BAC and PAC DNA, and end-sequencing methodologies, are also provided. BPRC does not list clones, for two reasons: 1)most clones have not been characterized and lack specific data. 2)all clones are part of libraries and all clones from a particular library share common characteristics. Hence, to find out if BPRC has a particular clone, one needs either use Automatic Clone Validation or else find out if the clone is compatible with the range of clone names for a corresponding clone library. Typically (although not always), clone names are derived from the library name. BPRC uses the NCBI-recommended clone nomenclature & library nomenclature. Most arrayed libraries are available in frozen microtiter dish format to academic and non-academic users provided that there is a scientific need for complete-library access. (for instance to annotate, modify or analyze all BAC clones as part of a genome project).
View all literature mentions