Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Genomic repertoires of DNA-binding transcription factors across the tree of life.

Nucleic acids research | 2010

Sequence-specific transcription factors (TFs) are important to genetic regulation in all organisms because they recognize and directly bind to regulatory regions on DNA. Here, we survey and summarize the TF resources available. We outline the organisms for which TF annotation is provided, and discuss the criteria and methods used to annotate TFs by different databases. By using genomic TF repertoires from ∼700 genomes across the tree of life, covering Bacteria, Archaea and Eukaryota, we review TF abundance with respect to the number of genes, as well as their structural complexity in diverse lineages. While typical eukaryotic TFs are longer than the average eukaryotic proteins, the inverse is true for prokaryotes. Only in eukaryotes does the same family of DNA-binding domain (DBD) occur multiple times within one polypeptide chain. This potentially increases the length and diversity of DNA-recognition sequence by reusing DBDs from the same family. We examined the increase in TF abundance with the number of genes in genomes, using the largest set of prokaryotic and eukaryotic genomes to date. As pointed out before, prokaryotic TFs increase faster than linearly. We further observe a similar relationship in eukaryotic genomes with a slower increase in TFs.

Pubmed ID: 20675356 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

  • Agency: Medical Research Council, United Kingdom
    Id: MC_U105161047

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


FlyTF.org (tool)

RRID:SCR_004123

A database of genomic and protein data for Drosophila site-specific transcription factors.

View all literature mentions

Gene Regulation Databases (tool)

RRID:SCR_008033

In an effort to strongly support the collaborative nature of scientific research, BIOBASE offers academic and non-profit organizations free access to reduced functionality versions of their products. TRANSFAC Professional provides gene regulation analysis solutions, offering the most comprehensive collection of eukaryotic gene regulation data. The professional paid subscription gives customers access to up-to-date data and tools not available in the free version. The public databases currently available for academic and non-profit organizations are: * TRANSFAC: contains data on transcription factors, their experimentally-proven binding sites, and regulated genes. Its broad compilation of binding sites allows the derivation of positional weight matrices. * TRANSPATH: provides data about molecules participating in signal transduction pathways and the reactions they are involved in, resulting in a complex network of interconnected signaling components.TRANSPATH focuses on signaling cascades that change the activities of transcription factors and thus alter the gene expression profile of a given cell. * PathoDB: is a database on pathologically relevant mutated forms of transcription factors and their binding sites. It comprises numerous cases of defective transcription factors or mutated transcription factor binding sites, which are known to cause pathological defects. * S/MARt DB: presents data on scaffold or matrix attached regions (S/MARs) of eukaryotic genomes, as well as about the proteins that bind to them. S/MARs organize the chromatin in the form of functionally independent loop domains gained increasing support. Scaffold or Matrix Attached Regions (S/MARs) are genomic DNA sequences through which the chromatin is tightly attached to the proteinaceous scaffold of the nucleus. * TRANSCompel: is a database on composite regulatory elements affecting gene transcription in eukaryotes. Composite regulatory elements consist of two closely situated binding sites for distinct transcription factors, and provide cross-coupling of different signaling pathways. * PathoSign Public: is a database which collects information about defective cell signaling molecules causing human diseases. While constituting a useful data repository in itself, PathoSign is also aimed at being a foundational part of a platform for modeling human disease processes.

View all literature mentions

PROSITE (tool)

RRID:SCR_003457

Database of protein families and domains that is based on the observation that, while there is a huge number of different proteins, most of them can be grouped, on the basis of similarities in their sequences, into a limited number of families. Proteins or protein domains belonging to a particular family generally share functional attributes and are derived from a common ancestor. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. ScanProsite finds matches of your protein sequences to PROSITE signatures. PROSITE currently contains patterns and profiles specific for more than a thousand protein families or domains. Each of these signatures comes with documentation providing background information on the structure and function of these proteins. The database is available via FTP.

View all literature mentions

SUPERFAMILY (tool)

RRID:SCR_007952

SUPERFAMILY is a database of structural and functional protein annotations for all completely sequenced organisms. The SUPERFAMILY annotation is based on a collection of hidden Markov models, which represent structural protein domains at the SCOP superfamily level. A superfamily groups together domains which have an evolutionary relationship. The annotation is produced by scanning protein sequences from over 1,700 completely sequenced genomes against the hidden Markov models.

View all literature mentions