• Register
X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X

Leaving Community

Are you sure you want to leave this community? Leaving the community will revoke any permissions you have been granted in this community.

No
Yes

Improving protein structure similarity searches using domain boundaries based on conserved sequence information.

BACKGROUND: The identification of protein domains plays an important role in protein structure comparison. Domain query size and composition are critical to structure similarity search algorithms such as the Vector Alignment Search Tool (VAST), the method employed for computing related protein structures in NCBI Entrez system. Currently, domains identified on the basis of structural compactness are used for VAST computations. In this study, we have investigated how alternative definitions of domains derived from conserved sequence alignments in the Conserved Domain Database (CDD) would affect the domain comparisons and structure similarity search performance of VAST. RESULTS: Alternative domains, which have significantly different secondary structure composition from those based on structurally compact units, were identified based on the alignment footprints of curated protein sequence domain families. Our analysis indicates that domain boundaries disagree on roughly 8% of protein chains in the medium redundancy subset of the Molecular Modeling Database (MMDB). These conflicting sequence based domain boundaries perform slightly better than structure domains in structure similarity searches, and there are interesting cases when structure similarity search performance is markedly improved. CONCLUSION: Structure similarity searches using domain boundaries based on conserved sequence information can provide an additional method for investigators to identify interesting similarities between proteins with known structures. Because of the improvement in performance of structure similarity searches using sequence domain boundaries, we are in the process of implementing their inclusion into the VAST search and MMDB resources in the NCBI Entrez system.

Pubmed ID: 19454035

Authors

  • Thompson KE
  • Wang Y
  • Madej T
  • Bryant SH

Journal

BMC structural biology

Publication Data

June 10, 2009

Associated Grants

  • Agency: Intramural NIH HHS, Id:

Mesh Terms

  • Algorithms
  • Amino Acid Sequence
  • Computational Biology
  • Conserved Sequence
  • DNA Topoisomerases
  • Databases, Protein
  • Fibronectins
  • Humans
  • Protein Structure, Tertiary
  • Sequence Alignment
  • Sequence Analysis, Protein