The Roadmap Epigenomics Consortium has published whole-genome functional annotation maps in 127 human cell types by integrating data from studies of multiple epigenetic marks. These maps have been widely used for studying gene regulation in cell type-specific contexts and predicting the functional impact of DNA mutations on disease. Here, we present a new map of functional elements produced by applying a method called IDEAS on the same data. The method has several unique advantages and outperforms existing methods, including that used by the Roadmap Epigenomics Consortium. Using five categories of independent experimental datasets, we compared the IDEAS and Roadmap Epigenomics maps. While the overall concordance between the two maps is high, the maps differ substantially in the prediction details and in their consistency of annotation of a given genomic position across cell types. The annotation from IDEAS is uniformly more accurate than the Roadmap Epigenomics annotation and the improvement is substantial based on several criteria. We further introduce a pipeline that improves the reproducibility of functional annotation maps. Thus, we provide a high-quality map of candidate functional regions across 127 human cell types and compare the quality of different annotation methods in order to facilitate biomedical research in epigenomics.
Pubmed ID: 28973456 RIS Download
Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.
Software that identifies constrained elements in multiple alignments by quantifying substitution deficits. These deficits represent substitutions that would have occurred if the element were neutral DNA, but did not occur because the element has been under functional constraint. We refer to these deficits as Rejected Substitutions. Rejected substitutions are a natural measure of constraint that reflects the strength of past purifying selection on the element. GERP estimates constraint for each alignment column; elements are identified as excess aggregations of constrained columns. A false-positive rate (which is user-settable) is calculated using "shuffled" alignments in which the order of columns is randomized.
View all literature mentionsExpression profiling and promoter identification software tool for transcriptional network analysis and transcriptome characterization. DeepCAGE, the combination of next-generation sequencing with next generation expression profiling provides unsurpassed solutions for expression profiling and genome annotation. CAGE will be the experimental approach at need to link gene expression and control regions in the genome. With the availability of next-generation sequencing methods, DNAFORM now offers DeepCAGE services. DeepCAGE libraries are prepared for direct analysis by an Illumina/Solexa Sequencer. One sequencing run using one channel on an Illumina/Solexa Sequencer can yield in over 4,000,000 reads per sample. CAGE is based on our full-length cDNA library technology, where an adaptor is ligated to the 5''''-end of full-length cDNAs, which introduces a recognition site for a Class IIs restriction endonuclease adjacent to the 5''''-end of the cDNA. The Class IIs restriction endonuclease, here MmeI, allows for the cloning of short tags as derived from the 5''''-end of transcripts into concatemers for high-throughput sequencing. CAGE tags are further characterized by mapping to genomic sequences, which enables the identification of transcriptional start sites. As such CAGE can contribute to projects in Gene Discovery, Gene Expression, and Promoter Identification. After the genome sequencing projects have provided us with the genetic blueprints for many organisms, new questions have to be answered on how to correlate the observed genotypes with related phenotypes, and how to understand the regulation of genetic information in time and space. The dynamics of living systems and the functional behavior of cells in multicellular organisms has thus become the subject of the emerging field of system biology. Integration of experimental approaches and computer aided theories on a system level will be the fundamental principle to drive systems biology in order to understand the principles behind complex regulatory networks, which will be an ambitious goal requiring new approaches in life sciences. For ordering and additional information, please contact us under contact_at_dnaform.jp
View all literature mentionsStatistical pipeline for detecting significant chromosomal interactions in Capture Hi-C data. CHiCAGO uses a convolution background model accounting for both random Brownian collisions between chromatin fragments and technical noise. CHiCAGO then performs a p-value weighting procedure based on the expected true positive rates at different distance ranges, with scores representing soft-thresholded -log weighted p-values.
View all literature mentions