Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

The diploid genome sequence of an individual human.

PLoS biology | Sep 4, 2007

Presented here is a genome sequence of an individual human. It was produced from approximately 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2-206 bp), 292,102 heterozygous insertion/deletion events (indels)(1-571 bp), 559,473 homozygous indels (1-82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.

Pubmed ID: 17803354 RIS Download

Mesh terms: Base Sequence | Chromosome Mapping | Chromosomes, Human | Chromosomes, Human, Y | Diploidy | Gene Dosage | Genome, Human | Genotype | Haplotypes | Human Genome Project | Humans | INDEL Mutation | In Situ Hybridization, Fluorescence | Male | Microarray Analysis | Middle Aged | Molecular Sequence Data | Pedigree | Phenotype | Polymorphism, Single Nucleotide | Reproducibility of Results | Sequence Analysis, DNA

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


dbSNP

Database serving as a central repository for both single base nucleotide substitutions and short deletion and insertion polymorphisms. Once discovered, these polymorphisms could be used by additional laboratories, using the sequence information around the polymorphism and the specific experimental conditions. (Note that dbSNP takes the looser "variation" definition for SNPs, so there is no requirement or assumption about minimum allele frequency.) The database accepts data submissions. dbSNP distinguishes a report of how to assay a SNP from the use of that SNP with individuals and populations. This separation simplifies some issues of data representation. However, these initial reports describing how to assay a SNP will often be accompanied by SNP experiments measuring allele occurrence in individuals and populations.

tool

View all literature mentions

Ensembl

A collection of genome databases for vertebrates and other eukaryotic species with DNA and protein sequence search capabilities. The goal of Ensembl is to automatically annotate the genome, integrate this annotation with other available biological data and make the data publicly available via the web. The range of available data has also expanded to include comparative genomics, variation and regulatory data. Ensembl allows users to: upload and analyze data and save it to an Ensembl account; search for a DNA or protein sequence using BLAST or BLAT; fetch desired data from the public database, using the Perl API; download the databases via FTP in FASTA, MySQL and other formats; and mine Ensembl with BioMart and export sequences or tables in text, HTML, or Excel format. The DNA sequences and assemblies used in the Ensembl genebuild are provided by various projects around the world. Ensembl has entered into an agreement with UCSC and NCBI with regard to sequence identifiers in order to improve consistency between the data provided by different genome browsers. The site also links to the Ensembl blog with updates on new species and sequences as they are added to the database.

tool

View all literature mentions

GenBank

NIH genetic sequence database that provides an annotated collection of all publicly available DNA sequences for almost 280 000 formally described species. (Jan 2014) These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and GenBank staff assigns accession numbers upon data receipt. It is part of the International Nucleotide Sequence Database Collaboration and daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP.

tool

View all literature mentions

GO

A community-based bioinformatics resource consisting of three structured controlled vocabularies (ontologies) for the annotation of gene products with respect to their molecular function, cellular component, and biological role in a species-independent manner. This initiative to standardize the representation of gene and gene product attributes across species and databases is an effort to address the need for consistent descriptions of gene products in different databases. The Gene Ontology project encourages input from the community into both the content of the GO and annotation using GO. There are three separate aspects to this effort: first, they write and maintain the ontologies themselves; second, they make cross-links between the ontologies and the genes and gene products in the collaborating databases; and third, they develop tools that facilitate the creation, maintenance and use of ontologies. The controlled vocabularies are structured so that users can query them at different levels: for example, uers can use GO to find all the gene products in the mouse genome that are involved in signal transduction, or users can zoom in on all the receptor tyrosine kinases. This structure also allows annotators to assign properties to gene products at different levels, depending on how much is known about a gene product.

tool

View all literature mentions

International HapMap Project

THIS RESOURCE IS NO LONGER IN SERVICE, documented August 22, 2016. A multi-country collaboration among scientists and funding agencies to develop a public resource where genetic similarities and differences in human beings are identified and catalogued. Using this information, researchers will be able to find genes that affect health, disease, and individual responses to medications and environmental factors. All of the information generated by the Project will be released into the public domain. Their goal is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared. Public and private organizations in six countries are participating in the International HapMap Project. Data generated by the Project can be downloaded with minimal constraints. HapMap project related data, software, and documentation include: bulk data on genotypes, frequencies, LD data, phasing data, allocated SNPs, recombination rates and hotspots, SNP assays, Perlegen amplicons, raw data, inferred genotypes, and mitochondrial and chrY haplogroups; Generic Genome Browser software; protocols and information on assay design, genotyping and other protocols used in the project; and documentation of samples/individuals and the XML format used in the project.

tool

View all literature mentions

Cancer Genomics Project

This portal shows you the current research projects happening with in the Cancer Genomics Project. Sponsors: This project is supported by Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agenecy. Keywords: Research, Cancer, Genomics, Genetic, Gne, Project,

tool

View all literature mentions

NetAffx

Affymetrix is a partially commercial resource that provides DNA Analysis Arrays, Expression Analysis Arrays, Gene Regulation Analysis, and Microarrays. It also provides reagents and assays, instruments, software, and services for a fee. Information is provided for Rats, Humans, and Mice.

tool

View all literature mentions

QIAGEN

A commercial organization which provides assay technologies to isolate DNA, RNA, and proteins from any biological sample. Assay technologies are then used to make specific target biomolecules, such as the DNA of a specific virus, visible for subsequent analysis.

tool

View all literature mentions

RepeatMasker

A software tool that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). Currently over 56% of human genomic sequence is identified and masked by the program. Sequence comparisons in RepeatMasker are performed by one of several popular search engines including nhmmer, cross_match, ABBlast/WUBlast, RMBlast and Decypher. RepeatMasker makes use of curated libraries of repeats and currently supports Dfam ( profile HMM library ) and RepBase ( consensus sequence library ).

tool

View all literature mentions

Bio-Rad

An Antibody supplier

tool

View all literature mentions