Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Proteome-wide Structural Analysis of PTM Hotspots Reveals Regulatory Elements Predicted to Impact Biological Function and Disease.

Molecular & cellular proteomics : MCP | 2016

Post-translational modifications (PTMs) regulate protein behavior through modulation of protein-protein interactions, enzymatic activity, and protein stability essential in the translation of genotype to phenotype in eukaryotes. Currently, less than 4% of all eukaryotic PTMs are reported to have biological function - a statistic that continues to decrease with an increasing rate of PTM detection. Previously, we developed SAPH-ire (Structural Analysis of PTM Hotspots) - a method for the prioritization of PTM function potential that has been used effectively to reveal novel PTM regulatory elements in discrete protein families (Dewhurst et al., 2015). Here, we apply SAPH-ire to the set of eukaryotic protein families containing experimental PTM and 3D structure data - capturing 1,325 protein families with 50,839 unique PTM sites organized into 31,747 modified alignment positions (MAPs), of which 2010 (∼6%) possess known biological function. Here, we show that using an artificial neural network model (SAPH-ire NN) trained to identify MAP hotspots with biological function results in prediction outcomes that far surpass the use of single hotspot features, including nearest neighbor PTM clustering methods. We find the greatest enhancement in prediction for positions with PTM counts of five or less, which represent 98% of all MAPs in the eukaryotic proteome and 90% of all MAPs found to have biological function. Analysis of the top 1092 MAP hotspots revealed 267 of truly unknown function (containing 5443 distinct PTMs). Of these, 165 hotspots could be mapped to human KEGG pathways for normal and/or disease physiology. Many high-ranking hotspots were also found to be disease-associated pathogenic sites of amino acid substitution despite the lack of observable PTM in the human protein family member. Taken together, these experiments demonstrate that the functional relevance of a PTM can be predicted very effectively by neural network models, revealing a large but testable body of potential regulatory elements that impact hundreds of different biological processes important in eukaryotic biology and human health.

Pubmed ID: 27697855 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

  • Agency: NIGMS NIH HHS, United States
    Id: R00 GM094533
  • Agency: NIGMS NIH HHS, United States
    Id: R01 GM117400

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


UniProt (tool)

RRID:SCR_002380

Collection of data of protein sequence and functional information. Resource for protein sequence and annotation data. Consortium for preservation of the UniProt databases: UniProt Knowledgebase (UniProtKB), UniProt Reference Clusters (UniRef), and UniProt Archive (UniParc), UniProt Proteomes. Collaboration between European Bioinformatics Institute (EMBL-EBI), SIB Swiss Institute of Bioinformatics and Protein Information Resource. Swiss-Prot is a curated subset of UniProtKB.

View all literature mentions

InterPro (tool)

RRID:SCR_006695

Service providing functional analysis of proteins by classifying them into families and predicting domains and important sites. They combine protein signatures from a number of member databases into a single searchable resource, capitalizing on their individual strengths to produce a powerful integrated database and diagnostic tool. This integrated database of predictive protein signatures is used for the classification and automatic annotation of proteins and genomes. InterPro classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites. InterPro adds in-depth annotation, including GO terms, to the protein signatures. You can access the data programmatically, via Web Services. The member databases use a number of approaches: # ProDom: provider of sequence-clusters built from UniProtKB using PSI-BLAST. # PROSITE patterns: provider of simple regular expressions. # PROSITE and HAMAP profiles: provide sequence matrices. # PRINTS provider of fingerprints, which are groups of aligned, un-weighted Position Specific Sequence Matrices (PSSMs). # PANTHER, PIRSF, Pfam, SMART, TIGRFAMs, Gene3D and SUPERFAMILY: are providers of hidden Markov models (HMMs). Your contributions are welcome. You are encouraged to use the ''''Add your annotation'''' button on InterPro entry pages to suggest updated or improved annotation for individual InterPro entries.

View all literature mentions

KEGG (tool)

RRID:SCR_012773

Integrated database resource consisting of 16 main databases, broadly categorized into systems information, genomic information, and chemical information. In particular, gene catalogs in completely sequenced genomes are linked to higher-level systemic functions of cell, organism, and ecosystem. Analysis tools are also available. KEGG may be used as reference knowledge base for biological interpretation of large-scale datasets generated by sequencing and other high-throughput experimental technologies.

View all literature mentions

Human Gene Mutation Database (tool)

RRID:SCR_001621

Curated database of known (published) gene lesions responsible for human inherited disease.

View all literature mentions

ClinVar (tool)

RRID:SCR_006169

Archive of aggregated information about sequence variation and its relationship to human health. Provides reports of relationships among human variations and phenotypes along with supporting evidence. Submissions from clinical testing labs, research labs, locus-specific databases, expert panels and professional societies are welcome. Collects reports of variants found in patient samples, assertions made regarding their clinical significance, information about submitter, and other supporting data. Alleles described in submissions are mapped to reference sequences, and reported according to HGVS standard.

View all literature mentions

MUSCLE (tool)

RRID:SCR_011812

Multiple sequence alignment method with reduced time and space complexity.Multiple sequence alignment with high accuracy and high throughput. Data analysis service for multiple sequence comparison by log- expectation.

View all literature mentions