This service exclusively searches for literature that cites resources. Please be aware that the total number of searchable documents is limited to those containing RRIDs and does not include all open-access literature.
Amino acid substitution models are a key component in phylogenetic analyses of protein sequences. All commonly used amino acid models available to date are time-reversible, an assumption designed for computational convenience but not for biological reality. Another significant downside to time-reversible models is that they do not allow inference of rooted trees without outgroups. In this article, we introduce a maximum likelihood approach nQMaker, an extension of the recently published QMaker method, that allows the estimation of time nonreversible amino acid substitution models and rooted phylogenetic trees from a set of protein sequence alignments. We show that the nonreversible models estimated with nQMaker are a much better fit to empirical alignments than pre-existing reversible models, across a wide range of data sets including mammals, birds, plants, fungi, and other taxa, and that the improvements in model fit scale with the size of the data set. Notably, for the recently published plant and bird trees, these nonreversible models correctly recovered the commonly estimated root placements with very high-statistical support without the need to use an outgroup. We provide nQMaker as an easy-to-use feature in the IQ-TREE software (http://www.iqtree.org), allowing users to estimate nonreversible models and rooted phylogenies from their own protein data sets. The data sets and scripts used in this article are available at https://doi.org/10.5061/dryad.3tx95x6hx. [amino acid sequence analyses; amino acid substitution models; maximum likelihood model estimation; nonreversible models; phylogenetic inference; reversible models.].
Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA) method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes.
The amino acid substitution model is the core component of many protein analysis systems such as sequence similarity search, sequence alignment, and phylogenetic inference. Although several general amino acid substitution models have been estimated from large and diverse protein databases, they remain inappropriate for analyzing specific species, e.g., viruses. Emerging epidemics of influenza viruses raise the need for comprehensive studies of these dangerous viruses. We propose an influenza-specific amino acid substitution model to enhance the understanding of the evolution of influenza viruses.
Mistranslation, the misincorporation of an amino acid not specified by the "standard" genetic code, occurs in all organisms. tRNA variants that increase mistranslation arise spontaneously and engineered tRNAs can achieve mistranslation frequencies approaching 10% in yeast and bacteria. Interestingly, human genomes contain tRNA variants with the potential to mistranslate. Cells cope with increased mistranslation through multiple mechanisms, though high levels cause proteotoxic stress. The goal of this study was to compare the genetic interactions and the impact on transcriptome and cellular growth of two tRNA variants that mistranslate at a similar frequency but create different amino acid substitutions in Saccharomyces cerevisiae. One tRNA variant inserts alanine at proline codons whereas the other inserts serine for arginine. Both tRNAs decreased growth rate, with the effect being greater for arginine to serine than for proline to alanine. The tRNA that substituted serine for arginine resulted in a heat shock response. In contrast, heat shock response was minimal for proline to alanine substitution. Further demonstrating the significance of the amino acid substitution, transcriptome analysis identified unique up- and down-regulated genes in response to each mistranslating tRNA. Number and extent of negative synthetic genetic interactions also differed depending upon type of mistranslation. Based on the unique responses observed for these mistranslating tRNAs, we predict that the potential of mistranslation to exacerbate diseases caused by proteotoxic stress depends on the tRNA variant. Furthermore, based on their unique transcriptomes and genetic interactions, different naturally occurring mistranslating tRNAs have the potential to negatively influence specific diseases.
Although many amino acid substitution matrices have been developed, it has not been well understood which is the best for similarity searches, especially for remote homology detection. Therefore, we collected information related to existing matrices, condensed it and derived a novel matrix that can detect more remote homology than ever.
Opsins are universal photoreceptive proteins in animals and can be classified into three types based on their photoreaction properties. Upon light irradiation, vertebrate rhodopsin forms a metastable active state, which cannot revert back to the original dark state via either photoreaction or thermal reaction. By contrast, after photoreception, most opsins form a stable active state which can photoconvert back to the dark state. Moreover, we recently found a novel type of opsins whose activity is regulated by photocycling. However, the molecular mechanism underlying this diversification of opsins remains unknown. In this study, we showed that vertebrate rhodopsin acquired the photocyclic and photoreversible properties upon introduction of a single mutation at position 188. This revealed that the residue at position 188 contributes to the diversification of photoreaction properties of opsins by its regulation of the recovery from the active state to the original dark state.
Human genetic variation in coding regions is fundamental to the study of protein structure and function. Most methods for interpreting missense variants consider substitution measures derived from homologous proteins across different species. In this study, we introduce human-specific amino acid (AA) substitution matrices that are based on genetic variations in the modern human population. We analyzed the frequencies of >4.8M single nucleotide variants (SNVs) at codon and AA resolution and compiled human-centric substitution matrices that are fundamentally different from classic cross-species matrices (e.g. BLOSUM, PAM). Our matrices are asymmetric, with some AA replacements showing significant directional preference. Moreover, these AA matrices are only partly predicted by nucleotide substitution rates. We further test the utility of our matrices in exposing functional signals of experimentally-validated protein annotations. A significant reduction in AA transition frequencies was observed across nine post-translational modification (PTM) types and four ion-binding sites. Our results propose a purifying selection signal in the human proteome across a diverse set of functional protein annotations and provide an empirical baseline for interpreting human genetic variation in coding regions.
To assess the conventional treatment in evolutionary inference of alignment gaps as missing data, we propose a simple nonparametric test of the null hypothesis that the locations of alignment gaps are independent of the nucleotide substitution or amino acid replacement process. When we apply the test to 1,390 protein alignments that are informed by protein tertiary structure and use a 5% significance level, the null hypothesis of independence between amino acid replacement and gap location is rejected for ∼65% of datasets. Via simulations that include substitution and insertion-deletion, we show that the test performs well with true alignments. When we simulate according to the null hypothesis and then apply the test to optimal alignments that are inferred by each of four widely used software packages, the null hypothesis is rejected too frequently. Via further simulations and analyses, we show that the overly frequent rejections of the null hypothesis are not solely due to weaknesses of widely used software for finding optimal alignments. Instead, our evidence suggests that optimal alignments are unrepresentative of true alignments and that biased evolutionary inferences may result from relying upon individual optimal alignments.
An amino acid substitution scoring matrix encapsulates the rates at which various amino acid residues in proteins are substituted by other amino acid residues, over time. Database search methods make use of substitution scoring matrices to identify sequences with homologous relationships. However, widely used substitution scoring matrices, such as BLOSUM series, have been developed using aligned blocks that are mostly devoid of disordered regions in proteins. Hence, these substitution-scoring matrices are mostly inappropriate for homology searches involving proteins enriched with disordered regions as the disordered regions have distinct amino acid compositional bias, and therefore expected to have undergone amino acid substitutions that are distinct from those in the ordered regions. We, therefore, developed a novel series of substitution scoring matrices referred to as EDSSMat by exclusively considering the substitution frequencies of amino acids in the disordered regions of the eukaryotic proteins. The newly developed matrices were tested for their ability to detect homologs of proteins enriched with disordered regions by means of SSEARCH tool. The results unequivocally demonstrate that EDSSMat matrices detect more number of homologs than the widely used BLOSUM, PAM and other standard matrices, indicating their utility value for homology searches of intrinsically disordered proteins.
Genetic variations in circadian clock genes may serve as molecular adaptations, allowing populations to adapt to local environments. Here, we carried out a survey of genetic variation in Drosophila cryptochrome (cry), the fly's dedicated circadian photoreceptor. An initial screen of 10 European cry alleles revealed substantial variation, including seven non-synonymous changes. The SNP frequency spectra and the excessive linkage disequilibrium in this locus suggested that this variation is maintained by natural selection. We focused on a non-conservative SNP involving a leucine-histidine replacement (L232H) and found that this polymorphism is common, with both alleles at intermediate frequencies across 27 populations surveyed in Europe, irrespective of latitude. Remarkably, we were able to reproduce this natural observation in the laboratory using replicate population cages where the minor allele frequency was initially set to 10%. Within 20 generations, the two allelic variants converged to approximately equal frequencies. Further experiments using congenic strains, showed that this SNP has a phenotypic impact, with variants showing significantly different eclosion profiles. At the long term, these phase differences in eclosion may contribute to genetic differentiation among individuals, and shape the evolution of wild populations.
The C4-photosynthetic carbon cycle is an elaborated addition to the classical C3-photosynthetic pathway, which improves solar conversion efficiency. The key enzyme in this pathway, phosphoenolpyruvate carboxylase, has evolved from an ancestral non-photosynthetic C3 phosphoenolpyruvate carboxylase. During evolution, C4 phosphoenolpyruvate carboxylase has increased its kinetic efficiency and reduced its sensitivity towards the feedback inhibitors malate and aspartate. An open question is the molecular basis of the shift in inhibitor tolerance. Here we show that a single-point mutation is sufficient to account for the drastic differences between the inhibitor tolerances of C3 and C4 phosphoenolpyruvate carboxylases. We solved high-resolution X-ray crystal structures of a C3 phosphoenolpyruvate carboxylase and a closely related C4 phosphoenolpyruvate carboxylase. The comparison of both structures revealed that Arg884 supports tight inhibitor binding in the C3-type enzyme. In the C4 phosphoenolpyruvate carboxylase isoform, this arginine is replaced by glycine. The substitution reduces inhibitor affinity and enables the enzyme to participate in the C4 photosynthesis pathway.
Many computational approaches exist for predicting the effects of amino acid substitutions. Here, we considered whether the protein sequence position class - rheostat or toggle - affects these predictions. The classes are defined as follows: experimentally evaluated effects of amino acid substitutions at toggle positions are binary, while rheostat positions show progressive changes. For substitutions in the LacI protein, all evaluated methods failed two key expectations: toggle neutrals were incorrectly predicted as more non-neutral than rheostat non-neutrals, while toggle and rheostat neutrals were incorrectly predicted to be different. However, toggle non-neutrals were distinct from rheostat neutrals. Since many toggle positions are conserved, and most rheostats are not, predictors appear to annotate position conservation better than mutational effect. This finding can explain the well-known observation that predictors assign disproportionate weight to conservation, as well as the field's inability to improve predictor performance. Thus, building reliable predictors requires distinguishing between rheostat and toggle positions.
The rapid accumulation of data on non-synonymous single nucleotide polymorphisms (nsSNPs, also called SAPs) should allow us to further our understanding of the underlying disease-associated mechanisms. Here, we use complex networks to study the role of an amino acid in both local and global structures and determine the extent to which disease-associated and polymorphic SAPs differ in terms of their interactions to other residues.
Intrinsically disordered proteins (IDPs) or proteins with disordered regions (IDRs) do not have a well-defined tertiary structure, but perform a multitude of functions, often relying on their native disorder to achieve the binding flexibility through changing to alternative conformations. Intrinsic disorder is frequently found in all three kingdoms of life, and may occur in short stretches or span whole proteins. To date most studies contrasting the differences between ordered and disordered proteins focused on simple summary statistics. Here, we propose an evolutionary approach to study IDPs, and contrast patterns specific to ordered protein regions and the corresponding IDRs.
Human endostatin has an internal Asn-Gly-Arg (NGR) motif at position 126-128 following a proline at position 125. Asn-Gly-Arg-containing peptides have been shown to target tumour vasculature and inhibit aminopeptidase N activity. We previously compared the in vitro and in vivo biological activities of native endostatin and endostatin with a proline to alanine mutation (P125A-endostatin). P125A-endostatin exhibited greater inhibition of both endothelial cell proliferation and human ovarian cancer growth compared to native endostatin. Here we explore further the effects on biological activity of the P125A mutation, and show that aminopeptidase N is not involved. To determine whether the increased biological activity of the mutant was due to unmasking of downstream NGR-sequence, effect of endostatin on aminopeptidase N activity was investigated. Neither the native nor the P125A-endostatin inhibited aminopeptidase N. However, synthetic peptides consisting of the S118-T131 region of endostatin inhibited aminopeptidase N. These results suggest that the internal NGR site in native or mutant endostatin is not accessible to aminopeptidase N, and that this activity is not involved in the enhanced biological activity of the P125A form. P125A-endostatin bound to endothelial cells more efficiently than native endostatin and exhibited greater inhibition of not only proliferation but also migration of endothelial cells. P125A-endostatin also localised into tumour tissue to a higher degree than the native protein, and displayed greater inhibition of growth of colon cancer in athymic mice. Both proteins inhibited tumour cell-induced angiogenesis effectively. Real-time PCR analysis showed that both native and P125A-endostatin decreased expression of key proangiogenic growth factors. Vascular endothelial growth factor and angiopoietin 1 were downregulated more by the mutant. These studies suggest that the region around P125 can be modified to improve the biological activity of endostatin.
Selective pressures at the DNA level shape genes into profiles consisting of patterns of rapidly evolving sites and sites withstanding change. These profiles remain detectable even when protein sequences become extensively diverged. A common task in molecular biology is to infer functional, structural or evolutionary relationships by querying a database using an algorithm. However, problems arise when sequence similarity is low. This study presents an algorithm that uses the evolutionary rate at codon sites, the dN/dS (ω) parameter, coupled to a substitution matrix as an alignment metric for detecting distantly related proteins. The algorithm, called BLOSUM-FIRE couples a newer and improved version of the original FIRE (Functional Inference using Rates of Evolution) algorithm with an amino acid substitution matrix in a dynamic scoring function. The enigmatic hepatitis B virus X protein was used as a test case for BLOSUM-FIRE and its associated database EvoDB.
The Varroa mite, Varroa destructor, is an important pest of honeybees and has played a prominent role in the decline in bee colony numbers over recent years. Although pyrethroids such as tau-fluvalinate and flumethrin can be highly effective in removing the mites from hives, their intensive use has led to many reports of resistance. To investigate the mechanism of resistance in UK Varroa samples, the transmembrane domain regions of the V. destructor voltage-gated sodium channel (the main target site for pyrethroids) were PCR amplified and sequenced from pyrethroid treated/untreated mites collected at several locations in Central/Southern England. A novel amino acid substitution, L925V, was identified that maps to a known hot spot for resistance within the domain IIS5 helix of the channel protein; a region that has also been proposed to form part of the pyrethroid binding site. Using a high throughput diagnostic assay capable of detecting the mutation in individual mites, the L925V substitution was found to correlate well with resistance, being present in all mites that had survived tau-fluvalinate treatment but in only 8 % of control, untreated samples. The potential for using this assay to detect and manage resistance in Varroa-infected hives is discussed.
Bacillus thuringiensis formulation losing its activity under field conditions due to UV radiation and photoprotection of B. thuringiensis based on melanin has attracted the attention of researchers for many years. Here, a single amino acid substitution (G272E) in homogentisate 1,2-dioxygenase was found to be responsible for pigment overproduction in B. thuringiensis BMB181, a derivative of BMB171. Disrupting the gene encoding homogentisate dioxygenase in BMB171 induced the accumulation of the homogentisic acid and provoked an increased pigment formation. To gain insights into homogentisate 1,2-dioxygenase in B. thuringiensis, we constructed a total of 14 mutations with a single amino acid substitution, and six of the mutant proteins were found to affect the melanin production when substituted by alanine. This study provides a new way to construct pigment-overproducing strains by impairing the homogentisate dioxygenase with a single mutation in B. thuringiensis, and the findings will facilitate a better understanding of this enzyme.
Protein sequence matching presently fails to identify many structures that are highly similar, even when they are known to have the same function. The high packing densities in globular proteins lead to interdependent substitutions, which have not previously been considered for amino acid similarities. At present, sequence matching compares sequences based only upon the similarities of single amino acids, ignoring the fact that in densely packed protein, there are additional conservative substitutions representing exchanges between two interacting amino acids, such as a small-large pair changing to a large-small pair substitutions that are not individually so conservative. Here we show that including information for such pairs of substitutions yields improved sequence matches, and that these yield significant gains in the agreements between sequence alignments and structure matches of the same protein pair. The result shows sequence segments matched where structure segments are aligned. There are gains for all 2002 collected cases where the sequence alignments that were not previously congruent with the structure matches. Our results also demonstrate a significant gain in detecting homology for "twilight zone" protein sequences. The amino acid substitution metrics derived have many other potential applications, for annotations, protein design, mutagenesis design, and empirical potential derivation.
Amino acid substitution models play an important role in inferring phylogenies from proteins. Although different amino acid substitution models have been proposed, only a few were estimated from mitochondrial protein sequences for specific taxa such as the mtArt model for Arthropoda. The increasing of mitochondrial genome data from broad Orthoptera taxa provides an opportunity to estimate the Orthoptera-specific mitochondrial amino acid empirical model.
Welcome to the FDI Lab - SciCrunch.org Resources search. From here you can search through a compilation of resources used by FDI Lab - SciCrunch.org and see how data is organized within our community.
You are currently on the Community Resources tab looking through categories and sources that FDI Lab - SciCrunch.org has compiled. You can navigate through those categories from here or change to a different tab to execute your search through. Each tab gives a different perspective on data.
If you have an account on FDI Lab - SciCrunch.org then you can log in from here to get additional features in FDI Lab - SciCrunch.org such as Collections, Saved Searches, and managing Resources.
Here is the search term that is being executed, you can type in anything you want to search for. Some tips to help searching:
You can save any searches you perform for quick access to later from here.
We recognized your search term and included synonyms and inferred terms along side your term to help get the data you are looking for.
If you are logged into FDI Lab - SciCrunch.org you can add data records to your collections to create custom spreadsheets across multiple sources of data.
Here are the facets that you can filter your papers by.
From here we'll present any options for the literature, such as exporting your current results.
If you have any further questions please check out our FAQs Page to ask questions and see our tutorials. Click this button to view this tutorial again.
Year:
Count: