It contains pre-calculated structural and phylogenomic analyses of over 57,000 protein families and domains. The PhyloFacts resource includes "books" for protein families across the Tree of Life. Each book includes a multiple sequence alignment, one or more phylogenetic trees, predicted subfamilies, predicted 3D protein structures, active sites and other key residues, cellular localization, and Gene Ontology (GO) annotations and evidence codes. PhyloFacts includes hidden Markov models for classification of user-submitted (DNA or protein) sequences to protein families and subfamilies across the tree of life. Our primary current focus is on covering all the gene families represented in the human genome and all structural domains, but plan to expand the resource to include all proteins in all species. The protein families in this resource typically contain homologs from many species. The phylogenetic distribution of a protein family can vary from highly restricted (e.g., to hominidae or mammals) to throughout the tree of life. Gathering homologs from many divergent species enables us to take advantage of experimental investigations in different systems, and allows powerful inferences of function and structure that might not otherwise be possible.
View all literature mentions