Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Making proteomics data accessible and reusable: current state of proteomics databases and repositories.

Proteomics | 2015

Compared to other data-intensive disciplines such as genomics, public deposition and storage of MS-based proteomics, data are still less developed due to, among other reasons, the inherent complexity of the data and the variety of data types and experimental workflows. In order to address this need, several public repositories for MS proteomics experiments have been developed, each with different purposes in mind. The most established resources are the Global Proteome Machine Database (GPMDB), PeptideAtlas, and the PRIDE database. Additionally, there are other useful (in many cases recently developed) resources such as ProteomicsDB, Mass Spectrometry Interactive Virtual Environment (MassIVE), Chorus, MaxQB, PeptideAtlas SRM Experiment Library (PASSEL), Model Organism Protein Expression Database (MOPED), and the Human Proteinpedia. In addition, the ProteomeXchange consortium has been recently developed to enable better integration of public repositories and the coordinated sharing of proteomics information, maximizing its benefit to the scientific community. Here, we will review each of the major proteomics resources independently and some tools that enable the integration, mining and reuse of the data. We will also discuss some of the major challenges and current pitfalls in the integration and sharing of the data.

Pubmed ID: 25158685 RIS Download

Associated grants

  • Agency: Biotechnology and Biological Sciences Research Council, United Kingdom
    Id: BB/I000909/1
  • Agency: Biotechnology and Biological Sciences Research Council, United Kingdom
    Id: BB/I00095X/1
  • Agency: Wellcome Trust, United Kingdom
    Id: WT101477MA
  • Agency: Biotechnology and Biological Sciences Research Council, United Kingdom
    Id: BB/K01997X/1

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


UniProt (tool)

RRID:SCR_002380

Collection of data of protein sequence and functional information. Resource for protein sequence and annotation data. Consortium for preservation of the UniProt databases: UniProt Knowledgebase (UniProtKB), UniProt Reference Clusters (UniRef), and UniProt Archive (UniParc), UniProt Proteomes. Collaboration between European Bioinformatics Institute (EMBL-EBI), SIB Swiss Institute of Bioinformatics and Protein Information Resource. Swiss-Prot is a curated subset of UniProtKB.

View all literature mentions

PeptideShaker (tool)

RRID:SCR_002520

Software providing a search engine independent platform for visualization of peptide and protein identification results from multiple search engines, currently supporting X!Tandem, MS-GF+, MS Amanda, OMSSA, MyriMatch, Comet, Tide, Mascot and mzIdentML. By combining the results from multiple search engines, while re-calculating PTM localization scores and redoing the protein inference, PeptideShaker attempts to give you the best possible understanding of your proteomics data.

View all literature mentions

Human Proteinpedia (tool)

RRID:SCR_002948

A community portal for sharing and integration of human protein data that allows research laboratories to contribute and maintain protein annotations. The Human Protein Reference Database (HPRD) integrates data that is deposited along with the existing literature curated information in the context of an individual protein. Data pertaining to post-translational modifications, protein-protein interactions, tissue expression, expression in cell lines, subcellular localization and enzyme substrate relationships can be submitted.

View all literature mentions

Proteomics Identifications (PRIDE) (tool)

RRID:SCR_003411

Centralized, standards compliant, public data repository for proteomics data, including protein and peptide identifications, post-translational modifications and supporting spectral evidence. Originally it was developed to provide a common data exchange format and repository to support proteomics literature publications. This remit has grown with PRIDE, with the hope that PRIDE will provide a reference set of tissue-based identifications for use by the community. The future development of PRIDE has become closely linked to HUPO PSI. PRIDE encourages and welcomes direct user submissions of protein and peptide identification data to be published in peer-reviewed publications. Users may Browse public datasets, use PRIDE BioMart for custom queries, or download the data directly from the FTP site. PRIDE has been developed through a collaboration of the EMBL-EBI, Ghent University in Belgium, and the University of Manchester.

View all literature mentions

ProteomeXchange (tool)

RRID:SCR_004055

A data repository for proteomic data sets. The ProteomeExchange consortium, as a whole, aims to provide a coordinated submission of MS proteomics data to the main existing proteomics repositories, as well as to encourage optimal data dissemination. ProteomeXchange provides access to a number of public databases, and users can access and submit data sets to the consortium's PRIDE database and PASSEL/PeptideAtlas.

View all literature mentions

MOPED - Model Organism Protein Expression Database (tool)

RRID:SCR_006065

An expanding multi-omics resource that enables rapid browsing of gene and protein expression information from publicly available studies on humans and model organisms. MOPED also serves the greater research community by enabling users to visualize their own expression data, compare it with existing studies, and share it with others via private accounts. MOPED uniquely provides gene and protein level expression data, meta-analysis capabilities and quantitative data from standardized analysis utilizing SPIRE (Systematic Protein Investigative Research Environment). Data can be queried for specific genes and proteins; browsed based on organism, tissue, localization and condition; and sorted by false discovery rate and expression. MOPED links to various gene, protein, and pathway databases, including GeneCards, Entrez, UniProt, KEGG and Reactome. The current version of MOPED (MOPED 2.5) The current version of MOPED (MOPED 2.5, 2014) contains approximately 5 million total records including ~260 experiments and ~390 conditions.

View all literature mentions

DELSA (tool)

RRID:SCR_006231

THIS RESOURCE IS NO LONGER IN SERVICE. Documented on July 1, 2022. Organization whose mission is to build and promote a sustainable ecosystem of professional societies, funding agencies, foundations, companies, and citizens together with life science researchers and innovators in computing, infrastructure and analysis with the expressed goal of translating new discoveries into tools, resources and products.

View all literature mentions

PeptideAtlas (tool)

RRID:SCR_006783

Multi-organism, publicly accessible compendium of peptides identified in a large set of tandem mass spectrometry proteomics experiments. Mass spectrometer output files are collected for human, mouse, yeast, and several other organisms, and searched using the latest search engines and protein sequences. All results of sequence and spectral library searching are subsequently processed through the Trans Proteomic Pipeline to derive a probability of correct identification for all results in a uniform manner to insure a high quality database, along with false discovery rates at the whole atlas level. The raw data, search results, and full builds can be downloaded for other uses. All results of sequence searching are processed through PeptideProphet to derive a probability of correct identification for all results in a uniform manner ensuring a high quality database. All peptides are mapped to Ensembl and can be viewed as custom tracks on the Ensembl genome browser. The long term goal of the project is full annotation of eukaryotic genomes through a thorough validation of expressed proteins. The PeptideAtlas provides a method and a framework to accommodate proteome information coming from high-throughput proteomics technologies. The online database administers experimental data in the public domain. You are encouraged to contribute to the database.

View all literature mentions

neXtProt (tool)

RRID:SCR_008911

Human protein knowledge platform. Knowledge platform for human proteins selects and filters high throughput data pertinent to human proteins from UniProtKB. Extends UniProtKB/Swiss-Prot annotations for human proteins to include several new data types.

View all literature mentions

Amazon Web Services (tool)

RRID:SCR_012854

IT infrastructure services for businesses in the form of web services, now commonly known as cloud computing. This highly reliable, scalable, low-cost infrastructure platform in the cloud powers hundreds of thousands of businesses. With data center locations in the U.S., Europe, Singapore, and Japan, customers across all industries are taking advantage of the following benefits: * Low cost * Agility and Instant Elasticity * Open and Flexible * Secure

View all literature mentions