Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

This service exclusively searches for literature that cites resources. Please be aware that the total number of searchable documents is limited to those containing RRIDs and does not include all open-access literature.

Search

Type in a keyword to search

On page 1 showing 1 ~ 20 papers out of 21 papers

De novo design of potent and selective mimics of IL-2 and IL-15.

  • Daniel-Adriano Silva‎ et al.
  • Nature‎
  • 2019‎

We describe a de novo computational approach for designing proteins that recapitulate the binding sites of natural cytokines, but are otherwise unrelated in topology or amino acid sequence. We use this strategy to design mimics of the central immune cytokine interleukin-2 (IL-2) that bind to the IL-2 receptor βγc heterodimer (IL-2Rβγc) but have no binding site for IL-2Rα (also called CD25) or IL-15Rα (also known as CD215). The designs are hyper-stable, bind human and mouse IL-2Rβγc with higher affinity than the natural cytokines, and elicit downstream cell signalling independently of IL-2Rα and IL-15Rα. Crystal structures of the optimized design neoleukin-2/15 (Neo-2/15), both alone and in complex with IL-2Rβγc, are very similar to the designed model. Neo-2/15 has superior therapeutic activity to IL-2 in mouse models of melanoma and colon cancer, with reduced toxicity and undetectable immunogenicity. Our strategy for building hyper-stable de novo mimetics could be applied generally to signalling proteins, enabling the creation of superior therapeutic candidates.


Quantitative reactivity profiling predicts functional cysteines in proteomes.

  • Eranthie Weerapana‎ et al.
  • Nature‎
  • 2010‎

Cysteine is the most intrinsically nucleophilic amino acid in proteins, where its reactivity is tuned to perform diverse biochemical functions. The absence of a consensus sequence that defines functional cysteines in proteins has hindered their discovery and characterization. Here we describe a proteomics method to profile quantitatively the intrinsic reactivity of cysteine residues en masse directly in native biological systems. Hyper-reactivity was a rare feature among cysteines and it was found to specify a wide range of activities, including nucleophilic and reductive catalysis and sites of oxidative modification. Hyper-reactive cysteines were identified in several proteins of uncharacterized function, including a residue conserved across eukaryotic phylogeny that we show is required for yeast viability and is involved in iron-sulphur protein biogenesis. We also demonstrate that quantitative reactivity profiling can form the basis for screening and functional assignment of cysteines in computationally designed proteins, where it discriminated catalytically active from inactive cysteine hydrolase designs.


Exploitation of binding energy for catalysis and design.

  • Summer B Thyme‎ et al.
  • Nature‎
  • 2009‎

Enzymes use substrate-binding energy both to promote ground-state association and to stabilize the reaction transition state selectively. The monomeric homing endonuclease I-AniI cleaves with high sequence specificity in the centre of a 20-base-pair (bp) DNA target site, with the amino (N)-terminal domain of the enzyme making extensive binding interactions with the left (-) side of the target site and the similarly structured carboxy (C)-terminal domain interacting with the right (+) side. Here we show that, despite the approximate twofold symmetry of the enzyme-DNA complex, there is almost complete segregation of interactions responsible for substrate binding to the (-) side of the interface and interactions responsible for transition-state stabilization to the (+) side. Although single base-pair substitutions throughout the entire DNA target site reduce catalytic efficiency, mutations in the (-) DNA half-site almost exclusively increase the dissociation constant (K(D)) and the Michaelis constant under single-turnover conditions (K(M)*), and those in the (+) half-site primarily decrease the turnover number (k(cat)*). The reduction of activity produced by mutations on the (-) side, but not mutations on the (+) side, can be suppressed by tethering the substrate to the endonuclease displayed on the surface of yeast. This dramatic asymmetry in the use of enzyme-substrate binding energy for catalysis has direct relevance to the redesign of endonucleases to cleave genomic target sites for gene therapy and other applications. Computationally redesigned enzymes that achieve new specificities on the (-) side do so by modulating K(M)*, whereas redesigns with altered specificities on the (+) side modulate k(cat)*. Our results illustrate how classical enzymology and modern protein design can each inform the other.


De novo protein design by citizen scientists.

  • Brian Koepnick‎ et al.
  • Nature‎
  • 2019‎

Online citizen science projects such as GalaxyZoo1, Eyewire2 and Phylo3 have proven very successful for data collection, annotation and processing, but for the most part have harnessed human pattern-recognition skills rather than human creativity. An exception is the game EteRNA4, in which game players learn to build new RNA structures by exploring the discrete two-dimensional space of Watson-Crick base pairing possibilities. Building new proteins, however, is a more challenging task to present in a game, as both the representation and evaluation of a protein structure are intrinsically three-dimensional. We posed the challenge of de novo protein design in the online protein-folding game Foldit5. Players were presented with a fully extended peptide chain and challenged to craft a folded protein structure and an amino acid sequence encoding that structure. After many iterations of player design, analysis of the top-scoring solutions and subsequent game improvement, Foldit players can now-starting from an extended polypeptide chain-generate a diversity of protein structures and sequences that encode them in silico. One hundred forty-six Foldit player designs with sequences unrelated to naturally occurring proteins were encoded in synthetic genes; 56 were found to be expressed and soluble in Escherichia coli, and to adopt stable monomeric folded structures in solution. The diversity of these structures is unprecedented in de novo protein design, representing 20 different folds-including a new fold not observed in natural proteins. High-resolution structures were determined for four of the designs, and are nearly identical to the player models. This work makes explicit the considerable implicit knowledge that contributes to success in de novo protein design, and shows that citizen scientists can discover creative new solutions to outstanding scientific challenges such as the protein design problem.


Design of protein-binding proteins from the target structure alone.

  • Longxing Cao‎ et al.
  • Nature‎
  • 2022‎

The design of proteins that bind to a specific site on the surface of a target protein using no information other than the three-dimensional structure of the target remains a challenge1-5. Here we describe a general solution to this problem that starts with a broad exploration of the vast space of possible binding modes to a selected region of a protein surface, and then intensifies the search in the vicinity of the most promising binding modes. We demonstrate the broad applicability of this approach through the de novo design of binding proteins to 12 diverse protein targets with different shapes and surface properties. Biophysical characterization shows that the binders, which are all smaller than 65 amino acids, are hyperstable and, following experimental optimization, bind their targets with nanomolar to picomolar affinities. We succeeded in solving crystal structures of five of the binder-target complexes, and all five closely match the corresponding computational design models. Experimental data on nearly half a million computational designs and hundreds of thousands of point mutants provide detailed feedback on the strengths and limitations of the method and of our current understanding of protein-protein interactions, and should guide improvements of both. Our approach enables the targeted design of binders to sites of interest on a wide variety of proteins for therapeutic and diagnostic applications.


Blueprinting extendable nanomaterials with standardized protein blocks.

  • Timothy F Huddy‎ et al.
  • Nature‎
  • 2024‎

A wooden house frame consists of many different lumber pieces, but because of the regularity of these building blocks, the structure can be designed using straightforward geometrical principles. The design of multicomponent protein assemblies, in comparison, has been much more complex, largely owing to the irregular shapes of protein structures1. Here we describe extendable linear, curved and angled protein building blocks, as well as inter-block interactions, that conform to specified geometric standards; assemblies designed using these blocks inherit their extendability and regular interaction surfaces, enabling them to be expanded or contracted by varying the number of modules, and reinforced with secondary struts. Using X-ray crystallography and electron microscopy, we validate nanomaterial designs ranging from simple polygonal and circular oligomers that can be concentrically nested, up to large polyhedral nanocages and unbounded straight 'train track' assemblies with reconfigurable sizes and geometries that can be readily blueprinted. Because of the complexity of protein structures and sequence-structure relationships, it has not previously been possible to build up large protein assemblies by deliberate placement of protein backbones onto a blank three-dimensional canvas; the simplicity and geometric regularity of our design platform now enables construction of protein nanomaterials according to 'back of an envelope' architectural blueprints.


Exploring the repeat protein universe through computational protein design.

  • T J Brunette‎ et al.
  • Nature‎
  • 2015‎

A central question in protein evolution is the extent to which naturally occurring proteins sample the space of folded structures accessible to the polypeptide chain. Repeat proteins composed of multiple tandem copies of a modular structure unit are widespread in nature and have critical roles in molecular recognition, signalling, and other essential biological processes. Naturally occurring repeat proteins have been re-engineered for molecular recognition and modular scaffolding applications. Here we use computational protein design to investigate the space of folded structures that can be generated by tandem repeating a simple helix-loop-helix-loop structural motif. Eighty-three designs with sequences unrelated to known repeat proteins were experimentally characterized. Of these, 53 are monomeric and stable at 95 °C, and 43 have solution X-ray scattering spectra consistent with the design models. Crystal structures of 15 designs spanning a broad range of curvatures are in close agreement with the design models with root mean square deviations ranging from 0.7 to 2.5 Å. Our results show that existing repeat proteins occupy only a small fraction of the possible repeat protein sequence and structure space and that it is possible to design novel repeat proteins with precisely specified geometries, opening up a wide array of new possibilities for biomolecular engineering.


Predicting protein structures with a multiplayer online game.

  • Seth Cooper‎ et al.
  • Nature‎
  • 2010‎

People exert large amounts of problem-solving effort playing computer games. Simple image- and text-recognition tasks have been successfully 'crowd-sourced' through games, but it is not clear if more complex scientific problems can be solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages non-scientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodology, while they compete and collaborate to optimize the computed energy. We show that top-ranked Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve the burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only the conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.


Controlling protein assembly on inorganic crystals through designed protein interfaces.

  • Harley Pyles‎ et al.
  • Nature‎
  • 2019‎

The ability of proteins and other macromolecules to interact with inorganic surfaces is essential to biological function. The proteins involved in these interactions are highly charged and often rich in carboxylic acid side chains1-5, but the structures of most protein-inorganic interfaces are unknown. We explored the possibility of systematically designing structured protein-mineral interfaces, guided by the example of ice-binding proteins, which present arrays of threonine residues (matched to the ice lattice) that order clathrate waters into an ice-like structure6. Here we design proteins displaying arrays of up to 54 carboxylate residues geometrically matched to the potassium ion (K+) sublattice on muscovite mica (001). At low K+ concentration, individual molecules bind independently to mica in the designed orientations, whereas at high K+ concentration, the designs form two-dimensional liquid-crystal phases, which accentuate the inherent structural bias in the muscovite lattice to produce protein arrays ordered over tens of millimetres. Incorporation of designed protein-protein interactions preserving the match between the proteins and the K+ lattice led to extended self-assembled structures on mica: designed end-to-end interactions produced micrometre-long single-protein-diameter wires and a designed trimeric interface yielded extensive honeycomb arrays. The nearest-neighbour distances in these hexagonal arrays could be set digitally between 7.5 and 15.9 nanometres with 2.1-nanometre selectivity by changing the number of repeat units in the monomer. These results demonstrate that protein-inorganic lattice interactions can be systematically programmed and set the stage for designing protein-inorganic hybrid materials.


De novo design of a fluorescence-activating β-barrel.

  • Jiayi Dou‎ et al.
  • Nature‎
  • 2018‎

The regular arrangements of β-strands around a central axis in β-barrels and of α-helices in coiled coils contrast with the irregular tertiary structures of most globular proteins, and have fascinated structural biologists since they were first discovered. Simple parametric models have been used to design a wide range of α-helical coiled-coil structures, but to date there has been no success with β-barrels. Here we show that accurate de novo design of β-barrels requires considerable symmetry-breaking to achieve continuous hydrogen-bond connectivity and eliminate backbone strain. We then build ensembles of β-barrel backbone models with cavity shapes that match the fluorogenic compound DFHBI, and use a hierarchical grid-based search method to simultaneously optimize the rigid-body placement of DFHBI in these cavities and the identities of the surrounding amino acids to achieve high shape and chemical complementarity. The designs have high structural accuracy and bind and fluorescently activate DFHBI in vitro and in Escherichia coli, yeast and mammalian cells. This de novo design of small-molecule binding activity, using backbones custom-built to bind the ligand, should enable the design of increasingly sophisticated ligand-binding proteins, sensors and catalysts that are not limited by the backbone geometries available in known protein structures.


Quadrivalent influenza nanoparticle vaccines induce broad protection.

  • Seyhan Boyoglu-Barnum‎ et al.
  • Nature‎
  • 2021‎

Influenza vaccines that confer broad and durable protection against diverse viral strains would have a major effect on global health, as they would lessen the need for annual vaccine reformulation and immunization1. Here we show that computationally designed, two-component nanoparticle immunogens2 induce potently neutralizing and broadly protective antibody responses against a wide variety of influenza viruses. The nanoparticle immunogens contain 20 haemagglutinin glycoprotein trimers in an ordered array, and their assembly in vitro enables the precisely controlled co-display of multiple distinct haemagglutinin proteins in defined ratios. Nanoparticle immunogens that co-display the four haemagglutinins of licensed quadrivalent influenza vaccines elicited antibody responses in several animal models against vaccine-matched strains that were equivalent to or better than commercial quadrivalent influenza vaccines, and simultaneously induced broadly protective antibody responses to heterologous viruses by targeting the subdominant yet conserved haemagglutinin stem. The combination of potent receptor-blocking and cross-reactive stem-directed antibodies induced by the nanoparticle immunogens makes them attractive candidates for a supraseasonal influenza vaccine candidate with the potential to replace conventional seasonal vaccines3.


De novo design of bioactive protein switches.

  • Robert A Langan‎ et al.
  • Nature‎
  • 2019‎

Allosteric regulation of protein function is widespread in biology, but is challenging for de novo protein design as it requires the explicit design of multiple states with comparable free energies. Here we explore the possibility of designing switchable protein systems de novo, through the modulation of competing inter- and intramolecular interactions. We design a static, five-helix 'cage' with a single interface that can interact either intramolecularly with a terminal 'latch' helix or intermolecularly with a peptide 'key'. Encoded on the latch are functional motifs for binding, degradation or nuclear export that function only when the key displaces the latch from the cage. We describe orthogonal cage-key systems that function in vitro, in yeast and in mammalian cells with up to 40-fold activation of function by key. The ability to design switchable protein functions that are controlled by induced conformational change is a milestone for de novo protein design, and opens up new avenues for synthetic biology and cell engineering.


Cryo-EM structure of the protein-conducting ERAD channel Hrd1 in complex with Hrd3.

  • Stefan Schoebel‎ et al.
  • Nature‎
  • 2017‎

Misfolded endoplasmic reticulum proteins are retro-translocated through the membrane into the cytosol, where they are poly-ubiquitinated, extracted from the membrane, and degraded by the proteasome-a pathway termed endoplasmic reticulum-associated protein degradation (ERAD). Proteins with misfolded domains in the endoplasmic reticulum lumen or membrane are discarded through the ERAD-L and ERAD-M pathways, respectively. In Saccharomyces cerevisiae, both pathways require the ubiquitin ligase Hrd1, a multi-spanning membrane protein with a cytosolic RING finger domain. Hrd1 is the crucial membrane component for retro-translocation, but it is unclear whether it forms a protein-conducting channel. Here we present a cryo-electron microscopy structure of S. cerevisiae Hrd1 in complex with its endoplasmic reticulum luminal binding partner, Hrd3. Hrd1 forms a dimer within the membrane with one or two Hrd3 molecules associated at its luminal side. Each Hrd1 molecule has eight transmembrane segments, five of which form an aqueous cavity extending from the cytosol almost to the endoplasmic reticulum lumen, while a segment of the neighbouring Hrd1 molecule forms a lateral seal. The aqueous cavity and lateral gate are reminiscent of features of protein-conducting conduits that facilitate polypeptide movement in the opposite direction-from the cytosol into or across membranes. Our results suggest that Hrd1 forms a retro-translocation channel for the movement of misfolded polypeptides through the endoplasmic reticulum membrane.


Proof of principle for epitope-focused vaccine design.

  • Bruno E Correia‎ et al.
  • Nature‎
  • 2014‎

Vaccines prevent infectious disease largely by inducing protective neutralizing antibodies against vulnerable epitopes. Several major pathogens have resisted traditional vaccine development, although vulnerable epitopes targeted by neutralizing antibodies have been identified for several such cases. Hence, new vaccine design methods to induce epitope-specific neutralizing antibodies are needed. Here we show, with a neutralization epitope from respiratory syncytial virus, that computational protein design can generate small, thermally and conformationally stable protein scaffolds that accurately mimic the viral epitope structure and induce potent neutralizing antibodies. These scaffolds represent promising leads for the research and development of a human respiratory syncytial virus vaccine needed to protect infants, young children and the elderly. More generally, the results provide proof of principle for epitope-focused and scaffold-based vaccine design, and encourage the evaluation and further development of these strategies for a variety of other vaccine targets, including antigenically highly variable pathogens such as human immunodeficiency virus and influenza.


Surrogate Wnt agonists that phenocopy canonical Wnt and β-catenin signalling.

  • Claudia Y Janda‎ et al.
  • Nature‎
  • 2017‎

Wnt proteins modulate cell proliferation and differentiation and the self-renewal of stem cells by inducing β-catenin-dependent signalling through the Wnt receptor frizzled (FZD) and the co-receptors LRP5 and LRP6 to regulate cell fate decisions and the growth and repair of several tissues. The 19 mammalian Wnt proteins are cross-reactive with the 10 FZD receptors, and this has complicated the attribution of distinct biological functions to specific FZD and Wnt subtype interactions. Furthermore, Wnt proteins are modified post-translationally by palmitoylation, which is essential for their secretion, function and interaction with FZD receptors. As a result of their acylation, Wnt proteins are very hydrophobic and require detergents for purification, which presents major obstacles to the preparation and application of recombinant Wnt proteins. This hydrophobicity has hindered the determination of the molecular mechanisms of Wnt signalling activation and the functional importance of FZD subtypes, and the use of Wnt proteins as therapeutic agents. Here we develop surrogate Wnt agonists, water-soluble FZD-LRP5/LRP6 heterodimerizers, with FZD5/FZD8-specific and broadly FZD-reactive binding domains. Similar to WNT3A, these Wnt agonists elicit a characteristic β-catenin signalling response in a FZD-selective fashion, enhance the osteogenic lineage commitment of primary mouse and human mesenchymal stem cells, and support the growth of a broad range of primary human organoid cultures. In addition, the surrogates can be systemically expressed and exhibit Wnt activity in vivo in the mouse liver, regulating metabolic liver zonation and promoting hepatocyte proliferation, resulting in hepatomegaly. These surrogates demonstrate that canonical Wnt signalling can be activated by bi-specific ligands that induce receptor heterodimerization. Furthermore, these easily produced, non-lipidated Wnt surrogate agonists facilitate functional studies of Wnt signalling and the exploration of Wnt agonists for translational applications in regenerative medicine.


Design of biologically active binary protein 2D materials.

  • Ariel J Ben-Sasson‎ et al.
  • Nature‎
  • 2021‎

Ordered two-dimensional arrays such as S-layers1,2 and designed analogues3-5 have intrigued bioengineers6,7, but with the exception of a single lattice formed with flexible linkers8, they are constituted from just one protein component. Materials composed of two components have considerable potential advantages for modulating assembly dynamics and incorporating more complex functionality9-12. Here we describe a computational method to generate co-assembling binary layers by designing rigid interfaces between pairs of dihedral protein building blocks, and use it to design a p6m lattice. The designed array components are soluble at millimolar concentrations, but when combined at nanomolar concentrations, they rapidly assemble into nearly crystalline micrometre-scale arrays nearly identical to the computational design model in vitro and in cells without the need for a two-dimensional support. Because the material is designed from the ground up, the components can be readily functionalized and their symmetry reconfigured, enabling formation of ligand arrays with distinguishable surfaces, which we demonstrate can drive extensive receptor clustering, downstream protein recruitment and signalling. Using atomic force microscopy on supported bilayers and quantitative microscopy on living cells, we show that arrays assembled on membranes have component stoichiometry and structure similar to arrays formed in vitro, and that our material can therefore impose order onto fundamentally disordered substrates such as cell membranes. In contrast to previously characterized cell surface receptor binding assemblies such as antibodies and nanocages, which are rapidly endocytosed, we find that large arrays assembled at the cell surface suppress endocytosis in a tunable manner, with potential therapeutic relevance for extending receptor engagement and immune evasion. Our work provides a foundation for a synthetic cell biology in which multi-protein macroscale materials are designed to modulate cell responses and reshape synthetic and living systems.


Cryo-EM structure of a type IV secretion system.

  • Kévin Macé‎ et al.
  • Nature‎
  • 2022‎

Bacterial conjugation is the fundamental process of unidirectional transfer of DNAs, often plasmid DNAs, from a donor cell to a recipient cell1. It is the primary means by which antibiotic resistance genes spread among bacterial populations2,3. In Gram-negative bacteria, conjugation is mediated by a large transport apparatus-the conjugative type IV secretion system (T4SS)-produced by the donor cell and embedded in both its outer and inner membranes. The T4SS also elaborates a long extracellular filament-the conjugative pilus-that is essential for DNA transfer4,5. Here we present a high-resolution cryo-electron microscopy (cryo-EM) structure of a 2.8 megadalton T4SS complex composed of 92 polypeptides representing 8 of the 10 essential T4SS components involved in pilus biogenesis. We added the two remaining components to the structural model using co-evolution analysis of protein interfaces, to enable the reconstitution of the entire system including the pilus. This structure describes the exceptionally large protein-protein interaction network required to assemble the many components that constitute a T4SS and provides insights on the unique mechanism by which they elaborate pili.


Structural and energetic basis of folded-protein transport by the FimD usher.

  • Sebastian Geibel‎ et al.
  • Nature‎
  • 2013‎

Type 1 pili, produced by uropathogenic Escherichia coli, are multisubunit fibres crucial in recognition of and adhesion to host tissues. During pilus biogenesis, subunits are recruited to an outer membrane assembly platform, the FimD usher, which catalyses their polymerization and mediates pilus secretion. The recent determination of the crystal structure of an initiation complex provided insight into the initiation step of pilus biogenesis resulting in pore activation, but very little is known about the elongation steps that follow. Here, to address this question, we determine the structure of an elongation complex in which the tip complex assembly composed of FimC, FimF, FimG and FimH passes through FimD. This structure demonstrates the conformational changes required to prevent backsliding of the nascent pilus through the FimD pore and also reveals unexpected properties of the usher pore. We show that the circular binding interface between the pore lumen and the folded substrate participates in transport by defining a low-energy pathway along which the nascent pilus polymer is guided during secretion.


Accurate de novo design of hyperstable constrained peptides.

  • Gaurav Bhardwaj‎ et al.
  • Nature‎
  • 2016‎

Naturally occurring, pharmacologically active peptides constrained with covalent crosslinks generally have shapes that have evolved to fit precisely into binding pockets on their targets. Such peptides can have excellent pharmaceutical properties, combining the stability and tissue penetration of small-molecule drugs with the specificity of much larger protein therapeutics. The ability to design constrained peptides with precisely specified tertiary structures would enable the design of shape-complementary inhibitors of arbitrary targets. Here we describe the development of computational methods for accurate de novo design of conformationally restricted peptides, and the use of these methods to design 18-47 residue, disulfide-crosslinked peptides, a subset of which are heterochiral and/or N-C backbone-cyclized. Both genetically encodable and non-canonical peptides are exceptionally stable to thermal and chemical denaturation, and 12 experimentally determined X-ray and NMR structures are nearly identical to the computational design models. The computational design methods and stable scaffolds presented here provide the basis for development of a new generation of peptide-based drugs.


Unraveling the functional dark matter through global metagenomics.

  • Georgios A Pavlopoulos‎ et al.
  • Nature‎
  • 2023‎

Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.


  1. SciCrunch.org Resources

    Welcome to the FDI Lab - SciCrunch.org Resources search. From here you can search through a compilation of resources used by FDI Lab - SciCrunch.org and see how data is organized within our community.

  2. Navigation

    You are currently on the Community Resources tab looking through categories and sources that FDI Lab - SciCrunch.org has compiled. You can navigate through those categories from here or change to a different tab to execute your search through. Each tab gives a different perspective on data.

  3. Logging in and Registering

    If you have an account on FDI Lab - SciCrunch.org then you can log in from here to get additional features in FDI Lab - SciCrunch.org such as Collections, Saved Searches, and managing Resources.

  4. Searching

    Here is the search term that is being executed, you can type in anything you want to search for. Some tips to help searching:

    1. Use quotes around phrases you want to match exactly
    2. You can manually AND and OR terms to change how we search between words
    3. You can add "-" to terms to make sure no results return with that term in them (ex. Cerebellum -CA1)
    4. You can add "+" to terms to require they be in the data
    5. Using autocomplete specifies which branch of our semantics you with to search and can help refine your search
  5. Save Your Search

    You can save any searches you perform for quick access to later from here.

  6. Query Expansion

    We recognized your search term and included synonyms and inferred terms along side your term to help get the data you are looking for.

  7. Collections

    If you are logged into FDI Lab - SciCrunch.org you can add data records to your collections to create custom spreadsheets across multiple sources of data.

  8. Facets

    Here are the facets that you can filter your papers by.

  9. Options

    From here we'll present any options for the literature, such as exporting your current results.

  10. Further Questions

    If you have any further questions please check out our FAQs Page to ask questions and see our tutorials. Click this button to view this tutorial again.

Publications Per Year

X

Year:

Count: