Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

An integrative genomic approach to uncover molecular mechanisms of prokaryotic traits.

PLoS computational biology | 2006

With mounting availability of genomic and phenotypic databases, data integration and mining become increasingly challenging. While efforts have been put forward to analyze prokaryotic phenotypes, current computational technologies either lack high throughput capacity for genomic scale analysis, or are limited in their capability to integrate and mine data across different scales of biology. Consequently, simultaneous analysis of associations among genomes, phenotypes, and gene functions is prohibited. Here, we developed a high throughput computational approach, and demonstrated for the first time the feasibility of integrating large quantities of prokaryotic phenotypes along with genomic datasets for mining across multiple scales of biology (protein domains, pathways, molecular functions, and cellular processes). Applying this method over 59 fully sequenced prokaryotic species, we identified genetic basis and molecular mechanisms underlying the phenotypes in bacteria. We identified 3,711 significant correlations between 1,499 distinct Pfam and 63 phenotypes, with 2,650 correlations and 1,061 anti-correlations. Manual evaluation of a random sample of these significant correlations showed a minimal precision of 30% (95% confidence interval: 20%-42%; n = 50). We stratified the most significant 478 predictions and subjected 100 to manual evaluation, of which 60 were corroborated in the literature. We furthermore unveiled 10 significant correlations between phenotypes and KEGG pathways, eight of which were corroborated in the evaluation, and 309 significant correlations between phenotypes and 166 GO concepts evaluated using a random sample (minimal precision = 72%; 95% confidence interval: 60%-80%; n = 50). Additionally, we conducted a novel large-scale phenomic visualization analysis to provide insight into the modular nature of common molecular mechanisms spanning multiple biological scales and reused by related phenotypes (metaphenotypes). We propose that this method elucidates which classes of molecular mechanisms are associated with phenotypes or metaphenotypes and holds promise in facilitating a computable systems biology approach to genomic and biomedical research.

Pubmed ID: 17112314 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

  • Agency: NLM NIH HHS, United States
    Id: K22 LM008308-03
  • Agency: NCI NIH HHS, United States
    Id: 1U54CA121852-01A1
  • Agency: NCI NIH HHS, United States
    Id: U54 CA121852
  • Agency: NIAID NIH HHS, United States
    Id: U54 AI057158
  • Agency: NLM NIH HHS, United States
    Id: 1K22 LM008308
  • Agency: NIAID NIH HHS, United States
    Id: 5U54 AI057158-02
  • Agency: NLM NIH HHS, United States
    Id: K22 LM008308

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


KEGG (tool)

RRID:SCR_012773

Integrated database resource consisting of 16 main databases, broadly categorized into systems information, genomic information, and chemical information. In particular, gene catalogs in completely sequenced genomes are linked to higher-level systemic functions of cell, organism, and ecosystem. Analysis tools are also available. KEGG may be used as reference knowledge base for biological interpretation of large-scale datasets generated by sequencing and other high-throughput experimental technologies.

View all literature mentions

Kyoto Encyclopedia of Genes and Genomes Expression Database (tool)

RRID:SCR_001120

Database for mapping gene expression profiles to pathways and genomes. Repository of microarray gene expression profile data for Synechocystis PCC6803 (syn), Bacillus subtilis (bsu), Escherichia coli W3110 (ecj), Anabaena PCC7120 (ana), and other species contributed by the Japanese research community.

View all literature mentions

PlantCyc (tool)

RRID:SCR_002110

Multi species reference database. Comprehensive plant biochemical pathway database, containing curated information from literature and computational analyses about genes, enzymes, compounds, reactions, and pathways involved in primary and secondary metabolism.

View all literature mentions