Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

8.2% of the Human genome is constrained: variation in rates of turnover across functional element classes in the human lineage.

PLoS genetics | 2014

Ten years on from the finishing of the human reference genome sequence, it remains unclear what fraction of the human genome confers function, where this sequence resides, and how much is shared with other mammalian species. When addressing these questions, functional sequence has often been equated with pan-mammalian conserved sequence. However, functional elements that are short-lived, including those contributing to species-specific biology, will not leave a footprint of long-lasting negative selection. Here, we address these issues by identifying and characterising sequence that has been constrained with respect to insertions and deletions for pairs of eutherian genomes over a range of divergences. Within noncoding sequence, we find increasing amounts of mutually constrained sequence as species pairs become more closely related, indicating that noncoding constrained sequence turns over rapidly. We estimate that half of present-day noncoding constrained sequence has been gained or lost in approximately the last 130 million years (half-life in units of divergence time, d1/2 = 0.25-0.31). While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation. Constrained DNase 1 hypersensitivity sites, promoters and untranslated regions have been more evolutionarily stable than long noncoding RNA loci which have turned over especially rapidly. By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (d1/2 = 2.1-5.0). From extrapolations we estimate that 8.2% (7.1-9.2%) of the human genome is presently subject to negative selection and thus is likely to be functional, while only 2.2% has maintained constraint in both human and mouse since these species diverged. These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction.

Pubmed ID: 25057982 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

  • Agency: Medical Research Council, United Kingdom
    Id: G0501331
  • Agency: Medical Research Council, United Kingdom
    Id: MC_UU_12021/1
  • Agency: Medical Research Council, United Kingdom
    Id: MC_U137761446
  • Agency: Wellcome Trust, United Kingdom
    Id: 090532/Z/09/Z
  • Agency: Wellcome Trust, United Kingdom

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


GERP (tool)

RRID:SCR_000563

Software that identifies constrained elements in multiple alignments by quantifying substitution deficits. These deficits represent substitutions that would have occurred if the element were neutral DNA, but did not occur because the element has been under functional constraint. We refer to these deficits as Rejected Substitutions. Rejected substitutions are a natural measure of constraint that reflects the strength of past purifying selection on the element. GERP estimates constraint for each alignment column; elements are identified as excess aggregations of constrained columns. A false-positive rate (which is user-settable) is calculated using "shuffled" alignments in which the order of columns is randomized.

View all literature mentions

UCSC Genome Browser (tool)

RRID:SCR_005780

Portal to interactively visualize genomic data. Provides reference sequences and working draft assemblies for collection of genomes and access to ENCODE and Neanderthal projects. Includes collection of vertebrate and model organism assemblies and annotations, along with suite of tools for viewing, analyzing and downloading data.

View all literature mentions

ENCODE (tool)

RRID:SCR_006793

Encyclopedia of DNA elements consisting of list of functional elements in human genome, including elements that act at protein and RNA levels, and regulatory elements that control cells and circumstances in which gene is active. Enables scientific and medical communities to interpret role of human genome in biology and disease. Provides identification of common cell types to facilitate integrative analysis and new experimental technologies based on high-throughput sequencing. Genome Browser containing ENCODE and Epigenomics Roadmap data. Data are available for entire human genome.

View all literature mentions

RepeatMasker (tool)

RRID:SCR_012954

Software tool that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). Currently over 56% of human genomic sequence is identified and masked by the program. Sequence comparisons in RepeatMasker are performed by one of several popular search engines including nhmmer, cross_match, ABBlast/WUBlast, RMBlast and Decypher. RepeatMasker makes use of curated libraries of repeats and currently supports Dfam ( profile HMM library ) and RepBase ( consensus sequence library ).

View all literature mentions

PAML (tool)

RRID:SCR_014932

Package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood. PAML estimates parameters and tests hypotheses to study the evolutionary process from a phylogenetic tree.

View all literature mentions