Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Phylogeny of Nitrogenase Structural and Assembly Components Reveals New Insights into the Origin and Distribution of Nitrogen Fixation across Bacteria and Archaea.

Microorganisms | 2021

The phylogeny of nitrogenase has only been analyzed using the structural proteins NifHDK. As nifHDKENB has been established as the minimum number of genes necessary for in silico prediction of diazotrophy, we present an updated phylogeny of diazotrophs using both structural (NifHDK) and cofactor assembly proteins (NifENB). Annotated Nif sequences were obtained from InterPro from 963 culture-derived genomes. Nif sequences were aligned individually and concatenated to form one NifHDKENB sequence. Phylogenies obtained using PhyML, FastTree, RapidNJ, and ASTRAL from individuals and concatenated protein sequences were compared and analyzed. All six genes were found across the Actinobacteria, Aquificae, Bacteroidetes, Chlorobi, Chloroflexi, Cyanobacteria, Deferribacteres, Firmicutes, Fusobacteria, Nitrospira, Proteobacteria, PVC group, and Spirochaetes, as well as the Euryarchaeota. The phylogenies of individual Nif proteins were very similar to the overall NifHDKENB phylogeny, indicating the assembly proteins have evolved together. Our higher resolution database upheld the three cluster phylogeny, but revealed undocumented horizontal gene transfers across phyla. Only 48% of the 325 genera containing all six nif genes are currently supported by biochemical evidence of diazotrophy. In addition, this work provides reference for any inter-phyla comparison of Nif sequences and a quality database of Nif proteins that can be used for identifying new Nif sequences.

Pubmed ID: 34442741 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

None

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


UniProt (tool)

RRID:SCR_002380

Collection of data of protein sequence and functional information. Resource for protein sequence and annotation data. Consortium for preservation of the UniProt databases: UniProt Knowledgebase (UniProtKB), UniProt Reference Clusters (UniRef), and UniProt Archive (UniParc), UniProt Proteomes. Collaboration between European Bioinformatics Institute (EMBL-EBI), SIB Swiss Institute of Bioinformatics and Protein Information Resource. Swiss-Prot is a curated subset of UniProtKB.

View all literature mentions

PANTHER (tool)

RRID:SCR_004869

System that classifies genes by their functions, using published scientific experimental evidence and evolutionary relationships to predict function even in absence of direct experimental evidence. Orthologs view is curated orthology relationships between genes for human, mouse, rat, fish, worm, and fly.

View all literature mentions

TIGRFAMS (tool)

RRID:SCR_005493

Consists curated multiple sequence alignments, Hidden Markov Models (HMMs) for protein sequence classification, and associated information designed to support automated annotation of (mostly prokaryotic) proteins. Starting with release 10.0, TIGRFAMs models use HMMER3, which provides excellent search speed as well as exquisite search sensitivity. See the "TIGRFAMs Complete Listing" page to review the accession, protein name, model type, and EC number (if assigned) of all models. TIGRFAMs is a member database in InterPro. The HMM libraries and supporting files are available to download and use for free from our FTP site.

View all literature mentions

PHYLIP (tool)

RRID:SCR_006244

A free package of software programs for inferring phylogenies (evolutionary trees). The source code is distributed (in C), and executables are also distributed. In particular, already-compiled executables are available for Windows (95/98/NT/2000/me/xp/Vista), Mac OS X, and Linux systems. Older executables are also available for Mac OS 8 or 9 systems.

View all literature mentions

Galaxy (tool)

RRID:SCR_006281

Open, web-based platform providing bioinformatics tools and services for data intensive genomic research. Platform may be used as a service or installed locally to perform, reproduce, and share complete analyses. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Community has created Galaxy instances in many different forms and for many different applications including Galaxy servers, cloud services that support Galaxy instances, and virtual machines and containers that can be easily deployed for your own server.The Galaxy team is a part of BX at Penn State, and the Biology and Mathematics and Computer Science departments at Emory University.Training Infrastructure as a Service (TIaaS) is a service offered by some UseGalaxy servers to specifically support training use cases.

View all literature mentions

InterPro (tool)

RRID:SCR_006695

Service providing functional analysis of proteins by classifying them into families and predicting domains and important sites. They combine protein signatures from a number of member databases into a single searchable resource, capitalizing on their individual strengths to produce a powerful integrated database and diagnostic tool. This integrated database of predictive protein signatures is used for the classification and automatic annotation of proteins and genomes. InterPro classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites. InterPro adds in-depth annotation, including GO terms, to the protein signatures. You can access the data programmatically, via Web Services. The member databases use a number of approaches: # ProDom: provider of sequence-clusters built from UniProtKB using PSI-BLAST. # PROSITE patterns: provider of simple regular expressions. # PROSITE and HAMAP profiles: provide sequence matrices. # PRINTS provider of fingerprints, which are groups of aligned, un-weighted Position Specific Sequence Matrices (PSSMs). # PANTHER, PIRSF, Pfam, SMART, TIGRFAMs, Gene3D and SUPERFAMILY: are providers of hidden Markov models (HMMs). Your contributions are welcome. You are encouraged to use the ''''Add your annotation'''' button on InterPro entry pages to suggest updated or improved annotation for individual InterPro entries.

View all literature mentions

PhyML (tool)

RRID:SCR_014629

Web phylogeny server based on the maximum-likelihood principle.

View all literature mentions

FastTree (tool)

RRID:SCR_015501

Source code that infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. It uses the Jukes-Cantor or generalized time-reversible (GTR) models of nucleotide evolution and the JTT, WAG, or LG models of amino acid evolution.

View all literature mentions

ClustalW (tool)

RRID:SCR_017277

Web sevice of ClustalW provided by DNA data bank of Japan.

View all literature mentions

Pfam (tool)

RRID:SCR_004726

A database of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Users can analyze protein sequences for Pfam matches, view Pfam family annotation and alignments, see groups of related families, look at the domain organization of a protein sequence, find the domains on a PDB structure, and query Pfam by keywords. There are two components to Pfam: Pfam-A and Pfam-B. Pfam-A entries are high quality, manually curated families that may automatically generate a supplement using the ADDA database. These automatically generated entries are called Pfam-B. Although of lower quality, Pfam-B families can be useful for identifying functionally conserved regions when no Pfam-A entries are found. Pfam also generates higher-level groupings of related families, known as clans (collections of Pfam-A entries which are related by similarity of sequence, structure or profile-HMM).

View all literature mentions