This service exclusively searches for literature that cites resources. Please be aware that the total number of searchable documents is limited to those containing RRIDs and does not include all open-access literature.
JPred4 (http://www.compbio.dundee.ac.uk/jpred4) is the latest version of the popular JPred protein secondary structure prediction server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction. In addition to protein secondary structure, JPred also makes predictions of solvent accessibility and coiled-coil regions. The JPred service runs up to 94 000 jobs per month and has carried out over 1.5 million predictions in total for users in 179 countries. The JPred4 web server has been re-implemented in the Bootstrap framework and JavaScript to improve its design, usability and accessibility from mobile devices. JPred4 features higher accuracy, with a blind three-state (α-helix, β-strand and coil) secondary structure prediction accuracy of 82.0% while solvent accessibility prediction accuracy has been raised to 90% for residues <5% accessible. Reporting of results is enhanced both on the website and through the optional email summaries and batch submission results. Predictions are now presented in SVG format with options to view full multiple sequence alignments with and without gaps and insertions. Finally, the help-pages have been updated and tool-tips added as well as step-by-step tutorials.
Protein secondary structure prediction (SSP) has a variety of applications; however, there has been relatively limited improvement in accuracy for years. With a vision of moving forward all related fields, we aimed to make a fundamental advance in SSP. There have been many admirable efforts made to improve the machine learning algorithm for SSP. This work thus took a step back by manipulating the input features. A secondary structure element-based position-specific scoring matrix (SSE-PSSM) is proposed, based on which a new set of machine learning features can be established. The feasibility of this new PSSM was evaluated by rigid independent tests with training and testing datasets sharing <25% sequence identities. In all experiments, the proposed PSSM outperformed the traditional amino acid PSSM. This new PSSM can be easily combined with the amino acid PSSM, and the improvement in accuracy was remarkable. Preliminary tests made by combining the SSE-PSSM and well-known SSP methods showed 2.0% and 5.2% average improvements in three- and eight-state SSP accuracies, respectively. If this PSSM can be integrated into state-of-the-art SSP methods, the overall accuracy of SSP may break the current restriction and eventually bring benefit to all research and applications where secondary structure prediction plays a vital role during development. To facilitate the application and integration of the SSE-PSSM with modern SSP methods, we have established a web server and standalone programs for generating SSE-PSSM available at http://10.life.nctu.edu.tw/SSE-PSSM.
We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q 3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances.
The prediction of the secondary structure of proteins is one of the most studied problems in bioinformatics. Despite their success in many problems of biological sequence analysis, Hidden Markov Models (HMMs) have not been used much for this problem, as the complexity of the task makes manual design of HMMs difficult. Therefore, we have developed a method for evolving the structure of HMMs automatically, using Genetic Algorithms (GAs).
Although most proteins conform to the classical one-structure/one-function paradigm, an increasing number of proteins with dual structures and functions have been discovered. In response to cellular stimuli, such proteins undergo structural changes sufficiently dramatic to remodel even their secondary structures and domain organization. This "fold-switching" capability fosters protein multi-functionality, enabling cells to establish tight control over various biochemical processes. Accurate predictions of fold-switching proteins could both suggest underlying mechanisms for uncharacterized biological processes and reveal potential drug targets. Recently, we developed a prediction method for fold-switching proteins using structure-based thermodynamic calculations and discrepancies between predicted and experimentally determined protein secondary structure (Porter and Looger, Proc Natl Acad Sci U S A 2018; 115:5968-5973). Here we seek to leverage the negative information found in these secondary structure prediction discrepancies. To do this, we quantified secondary structure prediction accuracies of 192 known fold-switching regions (FSRs) within solved protein structures found in the Protein Data Bank (PDB). We find that the secondary structure prediction accuracies for these FSRs vary widely. Inaccurate secondary structure predictions are strongly associated with fold-switching proteins compared to equally long segments of non-fold-switching proteins selected at random. These inaccurate predictions are enriched in helix-to-strand and strand-to-coil discrepancies. Finally, we find that most proteins with inaccurate secondary structure predictions are underrepresented in the PDB compared with their alternatively folded cognates, suggesting that unequal representation of fold-switching conformers within the PDB could be an important cause of inaccurate secondary structure predictions. These results demonstrate that inconsistent secondary structure predictions can serve as a useful preliminary marker of fold switching.
The program SSDraw generates publication-quality protein secondary structure diagrams from three-dimensional protein structures. To depict relationships between secondary structure and other protein features, diagrams can be colored by conservation score, B-factor, or custom scoring. Diagrams of homologous proteins can be registered according to an input multiple sequence alignment. Linear visualization allows the user to stack registered diagrams, facilitating comparison of secondary structure and other properties among homologous proteins. SSDraw can be used to compare secondary structures of homologous proteins with both conserved and divergent folds. It can also generate one secondary structure diagram from an input protein structure of interest. The source code can be downloaded (https://github.com/ncbi/SSDraw) and run locally for rapid structure generation, while a Google Colab notebook allows easy use.
Secondary structures provide a deep insight into the protein architecture. They can serve for comparison between individual protein family members. The most straightforward way how to deal with protein secondary structure is its visualization using 2D diagrams. Several software tools for the generation of 2D diagrams were developed. Unfortunately, they create 2D diagrams based on only a single protein. Therefore, 2D diagrams of two proteins from one family markedly differ. For this reason, we developed the 2DProts database, which contains secondary structure 2D diagrams for all domains from the CATH and all proteins from PDB databases. These 2D diagrams are generated based on a whole protein family, and they also consider information about the 3D arrangement of secondary structure elements. Moreover, 2DProts database contains multiple 2D diagrams, which provide an overview of a whole protein family's secondary structures. 2DProts is updated weekly and is integrated into CATH.
Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing alpha-helices, beta-strands, and non-regular structures) from primary sequence data which makes use of Parallel Cascade Identification (PCI), a powerful technique from the field of nonlinear system identification.
Understanding how protein sequences confer function remains a defining challenge in molecular biology. Two approaches have yielded enormous insight yet are often pursued separately: structure-based, where sequence-encoded structures mediate function, and disorder-based, where sequences dictate physicochemical and dynamical properties which determine function in the absence of stable structure. Here we study highly charged protein regions (>40% charged residues), which are routinely presumed to be disordered. Using recent advances in structure prediction and experimental structures, we show that roughly 40% of these regions form well-structured helices. Features often used to predict disorder-high charge density, low hydrophobicity, low sequence complexity, and evolutionarily varying length-are also compatible with solvated, variable-length helices. We show that a simple composition classifier predicts the existence of structure far better than well-established heuristics based on charge and hydropathy. We show that helical structure is more prevalent than previously appreciated in highly charged regions of diverse proteomes and characterize the conservation of highly charged regions. Our results underscore the importance of integrating, rather than choosing between, structure- and disorder-based approaches.
The program SSDraw generates publication-quality protein secondary structure diagrams from three-dimensional protein structures. To depict relationships between secondary structure and other protein features, diagrams can be colored by conservation score, B-factor, or custom scoring. Diagrams of homologous proteins can be registered according to an input multiple sequence alignment. Linear visualization allows the user to stack registered diagrams, facilitating comparison of secondary structure and other properties among homologous proteins. SSDraw can be used to compare secondary structures of homologous proteins with both conserved and divergent folds. It can also generate one secondary structure diagram from an input protein structure of interest. The source code can be downloaded (https://github.com/ethanchen1301/SSDraw) and run locally for rapid structure generation, while a Google Colab notebook allows easy use.
Residue conservation is a common observation in alignments of protein families, underscoring positions important in protein structure and function. Though many methods measure the level of conservation of particular residue positions, currently we do not have a way to study spatial oscillations occurring in protein conservation patterns. It is known that hydrophobicity shows spatial oscillations in proteins, which is characterized by computing the hydrophobic moment of the protein domains. Here, we advance the study of moments of conservation of protein families to know whether there might exist spatial asymmetry in the conservation patterns of regular secondary structures. Analogous to the hydrophobic moment, the conservation moment is defined as the modulus of the Fourier transform of the conservation function of an alignment of related protein, where the conservation function is the vector of conservation values at each column of the alignment. The profile of the conservation moment is useful in ascertaining any periodicity of conservation, which might correlate with the period of the secondary structure. To demonstrate the concept, conservation in the family of potassium ion channel proteins was analyzed using moments. It was shown that the pore helix of the potassium channel showed oscillations in the moment of conservation matching the period of the α-helix. This implied that one side of the pore helix was evolutionarily conserved in contrast to its opposite side. In addition, the method of conservation moments correctly identified the disposition of the voltage sensor of voltage-gated potassium channels to form a 310 helix in the membrane.
Multiple sequence alignment (MSA) is commonly used to analyze sets of homologous protein or DNA sequences. This has lead to the development of many methods and packages for MSA over the past 30 years. Being able to compare different methods has been problematic and has relied on gold standard benchmark datasets of 'true' alignments or on MSA simulations. A number of protein benchmark datasets have been produced which rely on a combination of manual alignment and/or automated superposition of protein structures. These are either restricted to very small MSAs with few sequences or require manual alignment which can be subjective. In both cases, it remains very difficult to properly test MSAs of more than a few dozen sequences. PREFAB and HomFam both rely on using a small subset of sequences of known structure and do not fairly test the quality of a full MSA.
Secondary structure prediction (SSP) of proteins is an important structural biology technique with many applications. There have been ~300 algorithms published in the past seven decades with fierce competition in accuracy. In the first 60 years, the accuracy of three-state SSP rose from ~56% to 81%; after that, it has long stayed at 81-86%. In the 1990s, the theoretical limit of three-state SSP accuracy had been estimated to be 88%. Thus, SSP is now generally considered not challenging or too challenging to improve. However, we found that the limit of three-state SSP might be underestimated. Besides, there is still much room for improving segment-based and eight-state SSPs, but the limits of these emerging topics have not been determined. This work performs large-scale sequence and structural analyses to estimate SSP accuracy limits and assess state-of-the-art SSP methods. The limit of three-state SSP is re-estimated to be ~92%, 4-5% higher than previously expected, indicating that SSP is still challenging. The estimated limit of eight-state SSP is 84-87%. Several proposals for improving future SSP algorithms are made based on our results. We hope that these findings will help move forward the development of SSP and all its applications.
Understanding how protein sequences confer function remains a defining challenge in molecular biology. Two approaches have yielded enormous insight yet are often pursued separately: structure-based, where sequence-encoded structures mediate function, and disorder-based, where sequences dictate physicochemical and dynamical properties which determine function in the absence of stable structure. Here we study highly charged protein regions (>40% charged residues), which are routinely presumed to be disordered. Using recent advances in structure prediction and experimental structures, we show that roughly 40% of these regions form well-structured helices. Features often used to predict disorder-high charge density, low hydrophobicity, low sequence complexity, and evolutionarily varying length-are also compatible with solvated, variable-length helices. We show that a simple composition classifier predicts the existence of structure far better than well-established heuristics based on charge and hydropathy. We show that helical structure is more prevalent than previously appreciated in highly charged regions of diverse proteomes and characterize the conservation of highly charged regions. Our results underscore the importance of integrating, rather than choosing between, structure- and disorder-based approaches.
Structural information has been a major concern for biological and pharmaceutical studies for its intimate relationship to the function of a protein. Three-dimensional representation of the positions of protein atoms is utilized among many structural information repositories that have been published. The reliability of the torsional system, which represents the native processes of structural change in the structural analysis, was partially proven with previous structural alignment studies. Here, a web server providing structural information and analysis based on the backbone torsional representation of a protein structure is newly introduced. The web server offers functions of secondary structure database search, secondary structure calculation, and pair-wise protein structure comparison, based on a backbone torsion angle representation system. Application of the implementation in pair-wise structural alignment showed highly accurate results. The information derived from this web server might be further utilized in the field of ab initio protein structure modeling or protein homology-related analyses.
Determining the primary structure (i.e., amino acid sequence) of a protein has become cheaper, faster, and more accurate. Higher order protein structure provides insight into a protein's function in the cell. Understanding a protein's secondary structure is a first step towards this goal. Therefore, a number of computational prediction methods have been developed to predict secondary structure from just the primary amino acid sequence. The most successful methods use machine learning approaches that are quite accurate, but do not directly incorporate structural information. As a step towards improving secondary structure reduction given the primary structure, we propose a Bayesian model based on the knob-socket model of protein packing in secondary structure. The method considers the packing influence of residues on the secondary structure determination, including those packed close in space but distant in sequence. By performing an assessment of our method on 2 test sets we show how incorporation of multiple sequence alignment data, similarly to PSIPRED, provides balance and improves the accuracy of the predictions. Software implementing the methods is provided as a web application and a stand-alone implementation.
Protein backbone angle prediction has achieved significant accuracy improvement with the development of deep learning methods. Usually the same deep learning model is used in making prediction for all residues regardless of the categories of secondary structures they belong to. In this paper, we propose to train separate deep learning models for each category of secondary structures. Machine learning methods strive to achieve generality over the training examples and consequently loose accuracy. In this work, we explicitly exploit classification knowledge to restrict generalisation within the specific class of training examples. This is to compensate the loss of generalisation by exploiting specialisation knowledge in an informed way.
Different protein secondary structure elements have different physicochemical properties and roles in the protein, which may determine their evolutionary flexibility. However, it is not clear to what extent protein structure affects the way Darwinian selection acts at the amino acid level. Using phylogeny-based likelihood tests for positive selection, we have examined the relationship between protein secondary structure and selection across six species of Drosophila. We find that amino acids that form disordered regions, such as random coils, are far more likely to be under positive selection than expected from their proportion in the proteins, and residues in helices and beta-structures are subject to less positive selection than predicted. In addition, it appears that sites undergoing positive selection are more likely than expected to occur close to one another in the protein sequence. Finally, on a genome-wide scale, we have determined that positively selected sites are found more frequently toward the gene ends. Our results demonstrate that protein structures with a greater degree of organization and strong hydrophobicity, represented here as helices and beta-structures, are less tolerant to molecular adaptation than disordered, hydrophilic regions, across a diverse set of proteins.
Infection by the humannoroviruses (hNoV), for the vast majority of strains, requires attachment of the viral capsid to histo blood group antigens (HBGAs). The HBGA-binding pocket is formed by dimers of the protruding domain (P dimers) of the capsid protein VP1. Several studies have focused on HBGA binding to P dimers, reporting binding affinities and stoichiometries. However, nuclear magnetic resonance spectroscopy (NMR) and native mass spectrometry (MS) analyses yielded incongruent dissociation constants (KD) for the binding of HBGAs to P dimers and, in some cases, disagreed on whether glycans bind at all. We hypothesized that glycan clustering during electrospray ionization in native MS critically depends on the physicochemical properties of the protein studied. It follows that the choice of a reference protein is crucial. We analysed carbohydrate clustering using various P dimers and eight non-glycan binding proteins serving as possible references. Data from native and ion mobility MS indicate that the mass fraction of β-sheets has a strong influence on the degree of glycan clustering. Therefore, the determination of specific glycan binding affinities from native MS must be interpreted cautiously.
Recently a new class of methods for fast protein structure comparison has emerged. We call the methods in this class projection methods as they rely on a mapping of protein structure into a high-dimensional vector space. Once the mapping is done, the structure comparison is reduced to distance computation between corresponding vectors. As structural similarity is approximated by distance between projections, the success of any projection method depends on how well its mapping function is able to capture the salient features of protein structure. There is no agreement on what constitutes a good projection technique and the three currently known projection methods utilize very different approaches to the mapping construction, both in terms of what structural elements are included and how this information is integrated to produce a vector representation.
Welcome to the FDI Lab - SciCrunch.org Resources search. From here you can search through a compilation of resources used by FDI Lab - SciCrunch.org and see how data is organized within our community.
You are currently on the Community Resources tab looking through categories and sources that FDI Lab - SciCrunch.org has compiled. You can navigate through those categories from here or change to a different tab to execute your search through. Each tab gives a different perspective on data.
If you have an account on FDI Lab - SciCrunch.org then you can log in from here to get additional features in FDI Lab - SciCrunch.org such as Collections, Saved Searches, and managing Resources.
Here is the search term that is being executed, you can type in anything you want to search for. Some tips to help searching:
You can save any searches you perform for quick access to later from here.
We recognized your search term and included synonyms and inferred terms along side your term to help get the data you are looking for.
If you are logged into FDI Lab - SciCrunch.org you can add data records to your collections to create custom spreadsheets across multiple sources of data.
Here are the facets that you can filter your papers by.
From here we'll present any options for the literature, such as exporting your current results.
If you have any further questions please check out our FAQs Page to ask questions and see our tutorials. Click this button to view this tutorial again.
Year:
Count: