Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

PRISM offers a comprehensive genomic approach to transcription factor function prediction.

Genome research | 2013

The human genome encodes 1500-2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells.

Pubmed ID: 23382538 RIS Download

Research resources used in this publication

None found

Additional research tools detected in this publication

Antibodies used in this publication

None found

Associated grants

  • Agency: NICHD NIH HHS, United States
    Id: R01 HD059862
  • Agency: NHGRI NIH HHS, United States
    Id: R01 HG005058
  • Agency: NICHD NIH HHS, United States
    Id: R01HD059862
  • Agency: NHGRI NIH HHS, United States
    Id: R01HG005058

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


BioCyc (tool)

RRID:SCR_002298

A collection of Pathway/Genome Databases which describes the genome and metabolic pathways of a single organism. The BioCyc collection of Pathway/Genome Databases (PGDBs) provides an electronic reference source on the genomes and metabolic pathways of sequenced organisms. BioCyc PGDBs are generated by software that predicts the metabolic pathway complements of completely sequenced organisms from their genome sequences. They also include the results of a number of other computational inference procedures applied to these genomes, including predictions of which genes code for missing enzymes in metabolic pathways, and predicted operons. The BioCyc Web site provides a suite of software tools for database searching and visualization, for omics data analysis, and for comparative genomics and comparative pathway questions. The databases within the BioCyc collection are organized into tiers according to the amount of manual review and updating they have received. Tier 1 PGDBs have been created through intensive manual efforts, and receive continuous updating. Tier 2 PGDBs were computationally generated by the PathoLogic program, and have undergone moderate amounts of review and updating. Tier 3 PGDBs were computationally generated by the PathoLogic program, and have undergone no review and updating. There are 967 DBs in Tier 3. The downloadable version of BioCyc that includes the Pathway Tools software provides more speed and power than the BioCyc Web site.

View all literature mentions

UniPROBE (tool)

RRID:SCR_005803

Database that hosts experimental data from universal protein binding microarray (PBM) experiments (Berger et al., 2006) and their accompanying statistical analyses from prokaryotic and eukaryotic organisms, malarial parasites, yeast, worms, mouse, and human. It provides a centralized resource for accessing comprehensive data on the preferences of proteins for all possible sequence variants ("words") of length k ("k-mers"), as well as position weight matrix (PWM) and graphical sequence logo representations of the k-mer data. The database's web tools include a text-based search, a function for assessing motif similarity between user-entered data and database PWMs, and a function for locating putative binding sites along user-entered nucleotide sequences.

View all literature mentions

Jurkat (tool)

RRID:CVCL_0065

Cell line Jurkat is a Cancer cell line with a species of origin Homo sapiens (Human)

View all literature mentions