Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Proteome Profiling Outperforms Transcriptome Profiling for Coexpression Based Gene Function Prediction.

Molecular & cellular proteomics : MCP | 2017

Coexpression of mRNAs under multiple conditions is commonly used to infer cofunctionality of their gene products despite well-known limitations of this "guilt-by-association" (GBA) approach. Recent advancements in mass spectrometry-based proteomic technologies have enabled global expression profiling at the protein level; however, whether proteome profiling data can outperform transcriptome profiling data for coexpression based gene function prediction has not been systematically investigated. Here, we address this question by constructing and analyzing mRNA and protein coexpression networks for three cancer types with matched mRNA and protein profiling data from The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). Our analyses revealed a marked difference in wiring between the mRNA and protein coexpression networks. Whereas protein coexpression was driven primarily by functional similarity between coexpressed genes, mRNA coexpression was driven by both cofunction and chromosomal colocalization of the genes. Functionally coherent mRNA modules were more likely to have their edges preserved in corresponding protein networks than functionally incoherent mRNA modules. Proteomic data strengthened the link between gene expression and function for at least 75% of Gene Ontology (GO) biological processes and 90% of KEGG pathways. A web application Gene2Net (http://cptac.gene2net.org) developed based on the three protein coexpression networks revealed novel gene-function relationships, such as linking ERBB2 (HER2) to lipid biosynthetic process in breast cancer, identifying PLG as a new gene involved in complement activation, and identifying AEBP1 as a new epithelial-mesenchymal transition (EMT) marker. Our results demonstrate that proteome profiling outperforms transcriptome profiling for coexpression based gene function prediction. Proteomics should be integrated if not preferred in gene function and human disease studies.

Pubmed ID: 27836980 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

  • Agency: NCI NIH HHS, United States
    Id: U24 CA160034
  • Agency: NCI NIH HHS, United States
    Id: U24 CA159988
  • Agency: NIAID NIH HHS, United States
    Id: U19 AI109965
  • Agency: NCI NIH HHS, United States
    Id: U24 CA160036
  • Agency: NCI NIH HHS, United States
    Id: U24 CA160019
  • Agency: NCI NIH HHS, United States
    Id: U24 CA160035

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


PlantCyc (tool)

RRID:SCR_002110

Multi species reference database. Comprehensive plant biochemical pathway database, containing curated information from literature and computational analyses about genes, enzymes, compounds, reactions, and pathways involved in primary and secondary metabolism.

View all literature mentions

Gene Ontology (tool)

RRID:SCR_002811

Computable knowledge regarding functions of genes and gene products. GO resources include biomedical ontologies that cover molecular domains of all life forms as well as extensive compilations of gene product annotations to these ontologies that provide largely species-neutral, comprehensive statements about what gene products do. Used to standardize representation of gene and gene product attributes across species and databases.

View all literature mentions

The Cancer Genome Atlas (tool)

RRID:SCR_003193

Project exploring the spectrum of genomic changes involved in more than 20 types of human cancer that provides a platform for researchers to search, download, and analyze data sets generated. As a pilot project it confirmed that an atlas of changes could be created for specific cancer types. It also showed that a national network of research and technology teams working on distinct but related projects could pool the results of their efforts, create an economy of scale and develop an infrastructure for making the data publicly accessible. Its success committed resources to collect and characterize more than 20 additional tumor types. Components of the TCGA Research Network: * Biospecimen Core Resource (BCR); Tissue samples are carefully cataloged, processed, checked for quality and stored, complete with important medical information about the patient. * Genome Characterization Centers (GCCs); Several technologies will be used to analyze genomic changes involved in cancer. The genomic changes that are identified will be further studied by the Genome Sequencing Centers. * Genome Sequencing Centers (GSCs); High-throughput Genome Sequencing Centers will identify the changes in DNA sequences that are associated with specific types of cancer. * Proteome Characterization Centers (PCCs); The centers, a component of NCI's Clinical Proteomic Tumor Analysis Consortium, will ascertain and analyze the total proteomic content of a subset of TCGA samples. * Data Coordinating Center (DCC); The information that is generated by TCGA will be centrally managed at the DCC and entered into the TCGA Data Portal and Cancer Genomics Hub as it becomes available. Centralization of data facilitates data transfer between the network and the research community, and makes data analysis more efficient. The DCC manages the TCGA Data Portal. * Cancer Genomics Hub (CGHub); Lower level sequence data will be deposited into a secure repository. This database stores cancer genome sequences and alignments. * Genome Data Analysis Centers (GDACs) - Immense amounts of data from array and second-generation sequencing technologies must be integrated across thousands of samples. These centers will provide novel informatics tools to the entire research community to facilitate broader use of TCGA data. TCGA is actively developing a network of collaborators who are able to provide samples that are collected retrospectively (tissues that had already been collected and stored) or prospectively (tissues that will be collected in the future).

View all literature mentions

FISHER (tool)

RRID:SCR_009181

THIS RESOURCE IS NO LONGER IN SERVICE, documented on February 1st, 2022. Software application for genetic analysis of classical biometric traits like blood pressure or height that are caused by a combination of polygenic inheritance and complex environmental forces. (entry from Genetic Analysis Software)

View all literature mentions

KEGG (tool)

RRID:SCR_012773

Integrated database resource consisting of 16 main databases, broadly categorized into systems information, genomic information, and chemical information. In particular, gene catalogs in completely sequenced genomes are linked to higher-level systemic functions of cell, organism, and ecosystem. Analysis tools are also available. KEGG may be used as reference knowledge base for biological interpretation of large-scale datasets generated by sequencing and other high-throughput experimental technologies.

View all literature mentions

biomaRt (tool)

RRID:SCR_019214

Software package that integrates BioMart data resources with data analysis software in Bioconductor. Can annotate range of gene or gene product identifiers including Entrez Gene and Affymetrix probe identifiers with information such as gene symbol, chromosomal coordinates, Gene Ontology and OMIM annotation. Enables retrieval of genomic sequences and single nucleotide polymorphism information, which can be used in data analysis.

View all literature mentions