Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Cell-type-specific eQTL of primary melanocytes facilitates identification of melanoma susceptibility genes.

Genome research | 2018

Most expression quantitative trait locus (eQTL) studies to date have been performed in heterogeneous tissues as opposed to specific cell types. To better understand the cell-type-specific regulatory landscape of human melanocytes, which give rise to melanoma but account for <5% of typical human skin biopsies, we performed an eQTL analysis in primary melanocyte cultures from 106 newborn males. We identified 597,335 cis-eQTL SNPs prior to linkage disequilibrium (LD) pruning and 4997 eGenes (FDR < 0.05). Melanocyte eQTLs differed considerably from those identified in the 44 GTEx tissue types, including skin. Over a third of melanocyte eGenes, including key genes in melanin synthesis pathways, were unique to melanocytes compared to those of GTEx skin tissues or TCGA melanomas. The melanocyte data set also identified trans-eQTLs, including those connecting a pigmentation-associated functional SNP with four genes, likely through cis-regulation of IRF4 Melanocyte eQTLs are enriched in cis-regulatory signatures found in melanocytes as well as in melanoma-associated variants identified through genome-wide association studies. Melanocyte eQTLs also colocalized with melanoma GWAS variants in five known loci. Finally, a transcriptome-wide association study using melanocyte eQTLs uncovered four novel susceptibility loci, where imputed expression levels of five genes (ZFP90, HEBP1, MSC, CBWD1, and RP11-383H13.1) were associated with melanoma at genome-wide significant P-values. Our data highlight the utility of lineage-specific eQTL resources for annotating GWAS findings, and present a robust database for genomic research of melanoma risk and melanocyte biology.

Pubmed ID: 30333196 RIS Download

Associated grants

  • Agency: Cancer Research UK, United Kingdom
    Id: 10589
  • Agency: NCI NIH HHS, United States
    Id: P50 CA097007
  • Agency: Cancer Research UK, United Kingdom
    Id: C8216/A6129
  • Agency: NIEHS NIH HHS, United States
    Id: R01 ES011740
  • Agency: Cancer Research UK, United Kingdom
    Id: C588/A4994
  • Agency: Cancer Research UK, United Kingdom
    Id: C588/A10589
  • Agency: Cancer Research UK, United Kingdom
    Id: C490/A10124
  • Agency: Cancer Research UK, United Kingdom
    Id: C588/A19167
  • Agency: Cancer Research UK, United Kingdom
    Id: C1287/A10118
  • Agency: Medical Research Council, United Kingdom
    Id: MR/L01629X/1
  • Agency: NCI NIH HHS, United States
    Id: P50 CA093459
  • Agency: NCI NIH HHS, United States
    Id: R01 CA133996
  • Agency: NCI NIH HHS, United States
    Id: R01 CA083115

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


ChIP-seq (tool)

RRID:SCR_001237

Set of software modules for performing common ChIP-seq data analysis tasks across the whole genome, including positional correlation analysis, peak detection, and genome partitioning into signal-rich and signal-poor regions. The tools are designed to be simple, fast and highly modular. Each program carries out a well defined data processing procedure that can potentially fit into a pipeline framework. ChIP-Seq is also freely available on a Web interface.

View all literature mentions

ADMIXTURE (tool)

RRID:SCR_001263

A software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets. It uses the same statistical model as STRUCTURE but calculates estimates much more rapidly using a fast numerical optimization algorithm. It uses a block relaxation approach to alternately update allele frequency and ancestry fraction parameters. Each block update is handled by solving a large number of independent convex optimization problems, which are tackled using a fast sequential quadratic programming algorithm. Convergence of the algorithm is accelerated using a novel quasi-Newton acceleration method.

View all literature mentions

MEME Suite - Motif-based sequence analysis tools (tool)

RRID:SCR_001783

Suite of motif-based sequence analysis tools to discover motifs using MEME, DREME (DNA only) or GLAM2 on groups of related DNA or protein sequences; search sequence databases with motifs using MAST, FIMO, MCAST or GLAM2SCAN; compare a motif to all motifs in a database of motifs; associate motifs with Gene Ontology terms via their putative target genes, and analyze motif enrichment using SpaMo or CentriMo. Source code, binaries and a web server are freely available for noncommercial use.

View all literature mentions

AmiGO (tool)

RRID:SCR_002143

Web tool to search, sort, analyze, visualize and download data of interest. Along with providing details of the ontologies, gene products and annotations, features a BLAST search, Term Enrichment and GO Slimmer tools, the GO Online SQL Environment and a user help guide.Used at the Gene Ontology (GO) website to access the data provided by the GO Consortium. Developed and maintained by the GO Consortium.

View all literature mentions

RNA-SeQC (tool)

RRID:SCR_005120

Java software which computes a series of quality control metrics for RNA-seq data and can compare sequencing quality across different samples or experiments to evaluate different experimental parameters. The input can be one or more BAM files, and the output consists of HTML reports and tab delimited files of metrics data.

View all literature mentions

HOCOMOCO (tool)

RRID:SCR_005409

A comprehensive collection of human transcription factor binding sites models. DNA sequences of TF binding regions obtained by both pregenomic and high-throughput methods were collected from existing databases and other public data. The ChIPMunk software was used to construct positional weight matrices. Four motif discovery strategies were tested based on different motif shape priors including flat and periodic priors associated with DNA helix pitch. A quality rating was manually assigned to each model based on known binding preferences. An appropriate TFBS model was selected for each TF, with similar models selected for related TFs. In any case only one model per TF was selected unless there was additional evidence for two distinct binding models or different stable modes of dimerization. All TFBS models and initial binding segments data used for motif discovery were mapped to UniPROT IDs.

View all literature mentions

1000 Genomes: A Deep Catalog of Human Genetic Variation (tool)

RRID:SCR_006828

International collaboration producing an extensive public catalog of human genetic variation, including SNPs and structural variants, and their haplotype contexts, in an effort to provide a foundation for investigating the relationship between genotype and phenotype. The genomes of about 2500 unidentified people from about 25 populations around the world were sequenced using next-generation sequencing technologies. Redundant sequencing on various platforms and by different groups of scientists of the same samples can be compared. The results of the study are freely and publicly accessible to researchers worldwide. The consortium identified the following populations whose DNA will be sequenced: Yoruba in Ibadan, Nigeria; Japanese in Tokyo; Chinese in Beijing; Utah residents with ancestry from northern and western Europe; Luhya in Webuye, Kenya; Maasai in Kinyawa, Kenya; Toscani in Italy; Gujarati Indians in Houston; Chinese in metropolitan Denver; people of Mexican ancestry in Los Angeles; and people of African ancestry in the southwestern United States. The goal Project is to find most genetic variants that have frequencies of at least 1% in the populations studied. Sequencing is still too expensive to deeply sequence the many samples being studied for this project. However, any particular region of the genome generally contains a limited number of haplotypes. Data can be combined across many samples to allow efficient detection of most of the variants in a region. The Project currently plans to sequence each sample to about 4X coverage; at this depth sequencing cannot provide the complete genotype of each sample, but should allow the detection of most variants with frequencies as low as 1%. Combining the data from 2500 samples should allow highly accurate estimation (imputation) of the variants and genotypes for each sample that were not seen directly by the light sequencing. All samples from the 1000 genomes are available as lymphoblastoid cell lines (LCLs) and LCL derived DNA from the Coriell Cell Repository as part of the NHGRI Catalog. The sequence and alignment data generated by the 1000genomes project is made available as quickly as possible via their mirrored ftp sites. ftp://ftp.1000genomes.ebi.ac.uk ftp://ftp-trace.ncbi.nlm.nih.gov/1000genomes

View all literature mentions

1000 Genomes Project and AWS (tool)

RRID:SCR_008801

A dataset containing the full genomic sequence of 1,700 individuals, freely available for research use. The 1000 Genomes Project is an international research effort coordinated by a consortium of 75 companies and organizations to establish the most detailed catalogue of human genetic variation. The project has grown to 200 terabytes of genomic data including DNA sequenced from more than 1,700 individuals that researchers can now access on AWS for use in disease research free of charge. The dataset containing the full genomic sequence of 1,700 individuals is now available to all via Amazon S3. The data can be found at: http://s3.amazonaws.com/1000genomes The 1000 Genomes Project aims to include the genomes of more than 2,662 individuals from 26 populations around the world, and the NIH will continue to add the remaining genome samples to the data collection this year. Public Data Sets on AWS provide a centralized repository of public data hosted on Amazon Simple Storage Service (Amazon S3). The data can be seamlessly accessed from AWS services such Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic MapReduce (Amazon EMR), which provide organizations with the highly scalable compute resources needed to take advantage of these large data collections. AWS is storing the public data sets at no charge to the community. Researchers pay only for the additional AWS resources they need for further processing or analysis of the data. All 200 TB of the latest 1000 Genomes Project data is available in a publicly available Amazon S3 bucket. You can access the data via simple HTTP requests, or take advantage of the AWS SDKs in languages such as Ruby, Java, Python, .NET and PHP. Researchers can use the Amazon EC2 utility computing service to dive into this data without the usual capital investment required to work with data at this scale. AWS also provides a number of orchestration and automation services to help teams make their research available to others to remix and reuse. Making the data available via a bucket in Amazon S3 also means that customers can crunch the information using Hadoop via Amazon Elastic MapReduce, and take advantage of the growing collection of tools for running bioinformatics job flows, such as CloudBurst and Crossbow.

View all literature mentions

Roadmap Epigenomics Project (tool)

RRID:SCR_008924

THIS RESOURCE IS NO LONGER IN SERVICE. Documented on July 11, 2022. Project for human epigenomic data from experimental pipelines built around next-generation sequencing technologies to map DNA methylation, histone modifications, chromatin accessibility and small RNA transcripts in stem cells and primary ex vivo tissues selected to represent normal counterparts of tissues and organ systems frequently involved in human disease. Consortium expects to deliver collection of normal epigenomes that will provide framework or reference for comparison and integration within broad array of future studies. Consortium is also committed to development, standardization and dissemination of protocols, reagents and analytical tools to enable research community to utilize, integrate and expand upon this body of data.

View all literature mentions

RepeatMasker (tool)

RRID:SCR_012954

Software tool that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). Currently over 56% of human genomic sequence is identified and masked by the program. Sequence comparisons in RepeatMasker are performed by one of several popular search engines including nhmmer, cross_match, ABBlast/WUBlast, RMBlast and Decypher. RepeatMasker makes use of curated libraries of repeats and currently supports Dfam ( profile HMM library ) and RepBase ( consensus sequence library ).

View all literature mentions

RSEM (tool)

RRID:SCR_013027

Software package for quantifying gene and isoform abundances from single end or paired end RNA Seq data. Accurate transcript quantification from RNA Seq data with or without reference genome. Used for accurate quantification of gene and isoform expression from RNA-Seq data.

View all literature mentions

Encode (tool)

RRID:SCR_015482

Consortium to build comprehensive parts list of functional elements in human genome. This includes elements that act at protein and RNA levels, and regulatory elements that control cells and circumstances in which gene is active. Data from 2012-present.

View all literature mentions