Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

ENCODE whole-genome data in the UCSC genome browser (2011 update).

Nucleic acids research | 2011

The ENCODE project is an international consortium with a goal of cataloguing all the functional elements in the human genome. The ENCODE Data Coordination Center (DCC) at the University of California, Santa Cruz serves as the central repository for ENCODE data. In this role, the DCC offers a collection of high-throughput, genome-wide data generated with technologies such as ChIP-Seq, RNA-Seq, DNA digestion and others. This data helps illuminate transcription factor-binding sites, histone marks, chromatin accessibility, DNA methylation, RNA expression, RNA binding and other cell-state indicators. It includes sequences with quality scores, alignments, signals calculated from the alignments, and in most cases, element or peak calls calculated from the signal data. Each data set is available for visualization and download via the UCSC Genome Browser (http://genome.ucsc.edu/). ENCODE data can also be retrieved using a metadata system that captures the experimental parameters of each assay. The ENCODE web portal at UCSC (http://encodeproject.org/) provides information about the ENCODE data and links for access.

Pubmed ID: 21037257 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

  • Agency: NHGRI NIH HHS, United States
    Id: 5P41HG002371-09
  • Agency: NHGRI NIH HHS, United States
    Id: 5U41HG004568-02
  • Agency: Howard Hughes Medical Institute, United States

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


ENCODE (tool)

RRID:SCR_006793

Encyclopedia of DNA elements consisting of list of functional elements in human genome, including elements that act at protein and RNA levels, and regulatory elements that control cells and circumstances in which gene is active. Enables scientific and medical communities to interpret role of human genome in biology and disease. Provides identification of common cell types to facilitate integrative analysis and new experimental technologies based on high-throughput sequencing. Genome Browser containing ENCODE and Epigenomics Roadmap data. Data are available for entire human genome.

View all literature mentions

1000 Genomes: A Deep Catalog of Human Genetic Variation (tool)

RRID:SCR_006828

International collaboration producing an extensive public catalog of human genetic variation, including SNPs and structural variants, and their haplotype contexts, in an effort to provide a foundation for investigating the relationship between genotype and phenotype. The genomes of about 2500 unidentified people from about 25 populations around the world were sequenced using next-generation sequencing technologies. Redundant sequencing on various platforms and by different groups of scientists of the same samples can be compared. The results of the study are freely and publicly accessible to researchers worldwide. The consortium identified the following populations whose DNA will be sequenced: Yoruba in Ibadan, Nigeria; Japanese in Tokyo; Chinese in Beijing; Utah residents with ancestry from northern and western Europe; Luhya in Webuye, Kenya; Maasai in Kinyawa, Kenya; Toscani in Italy; Gujarati Indians in Houston; Chinese in metropolitan Denver; people of Mexican ancestry in Los Angeles; and people of African ancestry in the southwestern United States. The goal Project is to find most genetic variants that have frequencies of at least 1% in the populations studied. Sequencing is still too expensive to deeply sequence the many samples being studied for this project. However, any particular region of the genome generally contains a limited number of haplotypes. Data can be combined across many samples to allow efficient detection of most of the variants in a region. The Project currently plans to sequence each sample to about 4X coverage; at this depth sequencing cannot provide the complete genotype of each sample, but should allow the detection of most variants with frequencies as low as 1%. Combining the data from 2500 samples should allow highly accurate estimation (imputation) of the variants and genotypes for each sample that were not seen directly by the light sequencing. All samples from the 1000 genomes are available as lymphoblastoid cell lines (LCLs) and LCL derived DNA from the Coriell Cell Repository as part of the NHGRI Catalog. The sequence and alignment data generated by the 1000genomes project is made available as quickly as possible via their mirrored ftp sites. ftp://ftp.1000genomes.ebi.ac.uk ftp://ftp-trace.ncbi.nlm.nih.gov/1000genomes

View all literature mentions

Hep-G2 (tool)

RRID:CVCL_0027

Cell line Hep-G2 is a Cancer cell line with a species of origin Homo sapiens (Human)

View all literature mentions

GM12878 (tool)

RRID:CVCL_7526

Cell line GM12878 is a Transformed cell line with a species of origin Homo sapiens (Human)

View all literature mentions

HeLa S3 (tool)

RRID:CVCL_0058

Cell line HeLa S3 is a Cancer cell line with a species of origin Homo sapiens (Human)

View all literature mentions