EVEREST is an automatic process of identifying and classifying of protein domains. Users can search for specific proteins using Protein ID or name, browse through protein families, and upload/download protein sequence data. EVEREST combines methodologies from the fields of finite metric spaces, machine learning and statistical modeling and achieves state of the art results. The process begins by constructing a database of protein segments that emerge in an all vs. all pairwise sequence comparison. It then proceeds to cluster these segments into putative domain families, choosing the best putative families using machine learning techniques, and creating a statistical model for each of the chosen families. This procedure is then iterated: The aforementioned statistical models are used to scan all protein sequences, to recreate a segment database and to cluster them again. Performance was evaluated by comparing with Pfam and SCOP.
Resource Type: Resource
Version: Latest Version
protein domain, protein domain classification, protein domain identification
Additional Resource Types
Created 2 weeks ago by Christie Wang
Created 5 years ago by Anonymous