Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Mismatch repair-signature mutations activate gene enhancers across human colorectal cancer epigenomes.

eLife | 2019

Commonly-mutated genes have been found for many cancers, but less is known about mutations in cis-regulatory elements. We leverage gains in tumor-specific enhancer activity, coupled with allele-biased mutation detection from H3K27ac ChIP-seq data, to pinpoint potential enhancer-activating mutations in colorectal cancer (CRC). Analysis of a genetically-diverse cohort of CRC specimens revealed that microsatellite instable (MSI) samples have a high indel rate within active enhancers. Enhancers with indels show evidence of positive selection, increased target gene expression, and a subset is highly recurrent. The indels affect short homopolymer tracts of A/T and increase affinity for FOX transcription factors. We further demonstrate that signature mismatch-repair (MMR) mutations activate enhancers using a xenograft tumor metastasis model, where mutations are induced naturally via CRISPR/Cas9 inactivation of MLH1 prior to tumor cell injection. Our results suggest that MMR signature mutations activate enhancers in CRC tumor epigenomes to provide a selective advantage.

Pubmed ID: 30759065 RIS Download

Associated grants

  • Agency: NCATS NIH HHS, United States
    Id: TL1 TR002549
  • Agency: NCI NIH HHS, United States
    Id: R01 CA160356
  • Agency: NIGMS NIH HHS, United States
    Id: T32 GM088088
  • Agency: NIH HHS, United States
    Id: R01CA204279
  • Agency: NCI NIH HHS, United States
    Id: R01 CA193677
  • Agency: NIGMS NIH HHS, United States
    Id: T32 GM007250
  • Agency: NIH HHS, United States
    Id: R01CA143237
  • Agency: NCI NIH HHS, United States
    Id: R01 CA204279
  • Agency: NCI NIH HHS, United States
    Id: P50 CA150964
  • Agency: NIH HHS, United States
    Id: R01CA193677
  • Agency: NIH HHS, United States
    Id: R01CA160356
  • Agency: NCATS NIH HHS, United States
    Id: TL1 TR000441

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


ChIP-seq (tool)

RRID:SCR_001237

Set of software modules for performing common ChIP-seq data analysis tasks across the whole genome, including positional correlation analysis, peak detection, and genome partitioning into signal-rich and signal-poor regions. The tools are designed to be simple, fast and highly modular. Each program carries out a well defined data processing procedure that can potentially fit into a pipeline framework. ChIP-Seq is also freely available on a Web interface.

View all literature mentions

GATK (tool)

RRID:SCR_001876

A software package to analyze next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size. This software library makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner. (entry from Genetic Analysis Software)

View all literature mentions

Gene Set Enrichment Analysis (tool)

RRID:SCR_003199

Software package for interpreting gene expression data. Used for interpretation of a large-scale experiment by identifying pathways and processes.

View all literature mentions

HOCOMOCO (tool)

RRID:SCR_005409

A comprehensive collection of human transcription factor binding sites models. DNA sequences of TF binding regions obtained by both pregenomic and high-throughput methods were collected from existing databases and other public data. The ChIPMunk software was used to construct positional weight matrices. Four motif discovery strategies were tested based on different motif shape priors including flat and periodic priors associated with DNA helix pitch. A quality rating was manually assigned to each model based on known binding preferences. An appropriate TFBS model was selected for each TF, with similar models selected for related TFs. In any case only one model per TF was selected unless there was additional evidence for two distinct binding models or different stable modes of dimerization. All TFBS models and initial binding segments data used for motif discovery were mapped to UniPROT IDs.

View all literature mentions

1000 Genomes Project and AWS (tool)

RRID:SCR_008801

A dataset containing the full genomic sequence of 1,700 individuals, freely available for research use. The 1000 Genomes Project is an international research effort coordinated by a consortium of 75 companies and organizations to establish the most detailed catalogue of human genetic variation. The project has grown to 200 terabytes of genomic data including DNA sequenced from more than 1,700 individuals that researchers can now access on AWS for use in disease research free of charge. The dataset containing the full genomic sequence of 1,700 individuals is now available to all via Amazon S3. The data can be found at: http://s3.amazonaws.com/1000genomes The 1000 Genomes Project aims to include the genomes of more than 2,662 individuals from 26 populations around the world, and the NIH will continue to add the remaining genome samples to the data collection this year. Public Data Sets on AWS provide a centralized repository of public data hosted on Amazon Simple Storage Service (Amazon S3). The data can be seamlessly accessed from AWS services such Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic MapReduce (Amazon EMR), which provide organizations with the highly scalable compute resources needed to take advantage of these large data collections. AWS is storing the public data sets at no charge to the community. Researchers pay only for the additional AWS resources they need for further processing or analysis of the data. All 200 TB of the latest 1000 Genomes Project data is available in a publicly available Amazon S3 bucket. You can access the data via simple HTTP requests, or take advantage of the AWS SDKs in languages such as Ruby, Java, Python, .NET and PHP. Researchers can use the Amazon EC2 utility computing service to dive into this data without the usual capital investment required to work with data at this scale. AWS also provides a number of orchestration and automation services to help teams make their research available to others to remix and reuse. Making the data available via a bucket in Amazon S3 also means that customers can crunch the information using Hadoop via Amazon Elastic MapReduce, and take advantage of the growing collection of tools for running bioinformatics job flows, such as CloudBurst and Crossbow.

View all literature mentions

HCT 116 (tool)

RRID:CVCL_0291

Cell line HCT 116 is a Cancer cell line with a species of origin Homo sapiens (Human)

View all literature mentions

H3K27ac-human (antibody)

RRID:AB_2118291

This polyclonal targets H3K27ac

View all literature mentions

beta Actin antibody - Loading Control (antibody)

RRID:AB_2305186

This polyclonal targets RCJMB04_4h19

View all literature mentions

MLH1 (antibody)

RRID:AB_394040

This monoclonal targets MLH1

View all literature mentions

COLO 205 (cell line)

RRID:CVCL_0218

Cell line COLO 205 is a Cancer cell line with a species of origin Homo sapiens (Human)

View all literature mentions

LoVo (cell line)

RRID:CVCL_0399

Cell line LoVo is a Cancer cell line with a species of origin Homo sapiens (Human)

View all literature mentions