Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

AntiFam: a tool to help identify spurious ORFs in protein annotation.

Database : the journal of biological databases and curation | 2012

As the deluge of genomic DNA sequence grows the fraction of protein sequences that have been manually curated falls. In turn, as the number of laboratories with the ability to sequence genomes in a high-throughput manner grows, the informatics capability of those labs to accurately identify and annotate all genes within a genome may often be lacking. These issues have led to fears about transitive annotation errors making sequence databases less reliable. During the lifetime of the Pfam protein families database a number of protein families have been built, which were later identified as composed solely of spurious open reading frames (ORFs) either on the opposite strand or in a different, overlapping reading frame with respect to the true protein-coding or non-coding RNA gene. These families were deleted and are no longer available in Pfam. However, we realized that these may perform a useful function to identify new spurious ORFs. We have collected these families together in AntiFam along with additional custom-made families of spurious ORFs. This resource currently contains 23 families that identified 1310 spurious proteins in UniProtKB and a further 4119 spurious proteins in a collection of metagenomic sequences. UniProt has adopted AntiFam as a part of the UniProtKB quality control process and will investigate these spurious proteins for exclusion.

Pubmed ID: 22434837 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

  • Agency: NHGRI NIH HHS, United States
    Id: R01 HG004881).
  • Agency: Wellcome Trust, United Kingdom
    Id: WT077044/Z/05/Z

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


Pompep (tool)

RRID:SCR_010536

FTP site to access Schizosaccharomyces pombe protein data.

View all literature mentions

Wellcome Trust Sanger Institute; Hinxton; United Kingdom (tool)

RRID:SCR_011784

Non profit research organization for genome sequences to advance understanding of biology of humans and pathogens in order to improve human health globally. Provides data which can be translated for diagnostics, treatments or therapies including over 100 finished genomes, which can be downloaded. Data are publicly available on limited basis, and provided more extensively upon request.

View all literature mentions