The low complexity of minimotif patterns results in a high false-positive prediction rate, hampering protein function prediction. A multi-filter algorithm, trained and tested on a linear regression model, support vector machine model, and neural network model, using a large dataset of verified minimotifs, vastly improves minimotif prediction accuracy while generating few false positives. An optimal threshold for the best accuracy reaches an overall accuracy above 90%, while a stringent threshold for the best specificity generates less than 1% false positives or even no false positives and still produces more than 90% true positives for the linear regression and neural network models. The minimotif multi-filter with its excellent accuracy represents the state-of-the-art in minimotif prediction and is expected to be very useful to biologists investigating protein function and how missense mutations cause disease.
Pubmed ID: 23029121 RIS Download
Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.
It analyzes protein queries for the presence of short functional motifs that, in at least one protein, has been demonstrated to be involved in posttranslational modifications (PTM), binding to other proteins, nucleic acids, or small molecules, or proteins trafficking. The low sequence complexity of motifs, suggest that "false positive" motifs may occur and any prediction made by MnM should be experimentally tested. To aid in the selection of motifs, MnM ranks motifs based on frequencies in proteomes, protein surface prediction, and evolutionary conservation. Using annotation of motifs in the Swiss-Prot database, we have found that higher scores are globally correlated with experimentally validated motifs when compared to a similar analysis using randomized motifs with the same amino acid composition. We suggest that the known biology of the protein of interest and of motifs be used in selecting motifs for experimental study.
View all literature mentions