Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

BMC bioinformatics | 2016

Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation.

Pubmed ID: 27363390 RIS Download

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


UniProt (tool)

RRID:SCR_002380

Collection of data of protein sequence and functional information. Resource for protein sequence and annotation data. Consortium for preservation of the UniProt databases: UniProt Knowledgebase (UniProtKB), UniProt Reference Clusters (UniRef), and UniProt Archive (UniParc), UniProt Proteomes. Collaboration between European Bioinformatics Institute (EMBL-EBI), SIB Swiss Institute of Bioinformatics and Protein Information Resource. Swiss-Prot is a curated subset of UniProtKB.

View all literature mentions

PANTHER (tool)

RRID:SCR_004869

System that classifies genes by their functions, using published scientific experimental evidence and evolutionary relationships to predict function even in absence of direct experimental evidence. Orthologs view is curated orthology relationships between genes for human, mouse, rat, fish, worm, and fly.

View all literature mentions

Gene3D (tool)

RRID:SCR_007672

A large database of CATH protein domain assignments for ENSEMBL genomes and Uniprot sequences. Gene3D is a resource of form studying proteins and the component domains. Gene3D takes CATH domains from Protein Databank (PDB) structures and assigns them to the millions of protein sequences with no PDB structures using Hidden Markov models. Assigning a CATH superfamily to a region of a protein sequence gives information on the gross 3D structure of that region of the protein. CATH superfamilies have a limited set of functions and so the domain assignment provides some functional insights. Furthermore most proteins have several different domains in a specific order, so looking for proteins with a similar domain organization provides further functional insights. Strict confidence cut-offs are used to ensure the reliability of the domain assignments. Gene3D imports functional information from sources such as UNIPROT, and KEGG. They also import experimental datasets on request to help researchers integrate there data with the corpus of the literature. The website allows users to view descriptions for both single proteins and genes and large protein sets, such as superfamilies or genomes. Subsets can then be selected for detailed investigation or associated functions and interactions can be used to expand explorations to new proteins. The Gene3D web services provide programmatic access to the CATH-Gene3D annotation resources and in-house software tools. These services include Gene3DScan for identifying structural domains within protein sequences, access to pre-calculated annotations for the major sequence databases, and linked functional annotation from UniProt, GO and KEGG.

View all literature mentions

HAMAP (tool)

RRID:SCR_007701

HAMAP is a system, based on manual protein annotation, that identifies and semi-automatically annotates proteins that are part of well-conserved families or subfamilies: the HAMAP families. HAMAP is based on manually created family rules and is applied to bacterial, archaeal and plastid-encoded proteins.

View all literature mentions

MetaCyc (tool)

RRID:SCR_007778

MetaCyc is a database of nonredundant, experimentally elucidated metabolic pathways. MetaCyc contains more than 1,200 pathways from more than 1,600 different organisms, and is curated from the scientific experimental literature. MetaCyc contains pathways involved in both primary and secondary metabolism, as well as associated compounds, enzymes, and genes.

View all literature mentions

Protein Information Resource (tool)

RRID:SCR_008229

Integrated public bioinformatics resource to support genomic, proteomic and systems biology research and scientific studies. Provides databases and protein sequence analysis tools to scientific community, including Protein Sequence Database which grew out from the Atlas of Protein Sequence and Structure. Conducts research in biomedical text mining and ontology, computational systems biology, and bioinformatics cyberinfrastructure. In 2002 PIR, along with its international partners, EBI (European Bioinformatics Institute) and SIB (Swiss Institute of Bioinformatics), were awarded a grant from NIH to create UniProt, a single worldwide database of protein sequence and function, by unifying the PIR-PSD, Swiss-Prot, and TrEMBL databases. Currently, PIR major activities include: i) UniProt (Universal Protein Resource) development, ii) iProClass protein data integration and ID mapping, iii) PRO protein ontology, and iv) iProLINK protein literature mining and ontology development. The FTP site provides free download for iProClass, PIRSF, and PRO.

View all literature mentions

EMBOSS (tool)

RRID:SCR_008493

Software analysis package for molecular biology community. Automatically copes with data in variety of formats and allows transparent retrieval of sequence data from web. Libraries are provided with package. Provides toolkit for creating bioinformatics applications or workflows. Provides set of sequence analysis programs. Provided programs cover areas such as sequence alignment, rapid database searching with sequence patterns, protein motif identification, nucleotide sequence pattern analysis, codon usage analysis for small genomes, rapid identification of sequence patterns in large scale sequence sets, and presentation tools for publication.

View all literature mentions

Kalign (tool)

RRID:SCR_011810

A fast and accurate multiple sequence alignment algorithm.

View all literature mentions

FragGeneScan (tool)

RRID:SCR_011929

A software application for finding fragmented genes in short reads and may be applied to predict prokaryotic genes in incomplete assemblies or complete genomes.

View all literature mentions

GeneMark (tool)

RRID:SCR_011930

A family of gene prediction programs developed at Georgia Institute of Technology.

View all literature mentions

Glimmer (tool)

RRID:SCR_011931

A software system for finding genes in microbial DNA, especially the genomes of bacteria, archaea, and viruses.

View all literature mentions

Gene Ontology (tool)

RRID:SCR_002811

Computable knowledge regarding functions of genes and gene products. GO resources include biomedical ontologies that cover molecular domains of all life forms as well as extensive compilations of gene product annotations to these ontologies that provide largely species-neutral, comprehensive statements about what gene products do. Used to standardize representation of gene and gene product attributes across species and databases.

View all literature mentions

TIGRFAMS (tool)

RRID:SCR_005493

Consists curated multiple sequence alignments, Hidden Markov Models (HMMs) for protein sequence classification, and associated information designed to support automated annotation of (mostly prokaryotic) proteins. Starting with release 10.0, TIGRFAMs models use HMMER3, which provides excellent search speed as well as exquisite search sensitivity. See the "TIGRFAMs Complete Listing" page to review the accession, protein name, model type, and EC number (if assigned) of all models. TIGRFAMs is a member database in InterPro. The HMM libraries and supporting files are available to download and use for free from our FTP site.

View all literature mentions

InterPro (tool)

RRID:SCR_006695

Service providing functional analysis of proteins by classifying them into families and predicting domains and important sites. They combine protein signatures from a number of member databases into a single searchable resource, capitalizing on their individual strengths to produce a powerful integrated database and diagnostic tool. This integrated database of predictive protein signatures is used for the classification and automatic annotation of proteins and genomes. InterPro classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites. InterPro adds in-depth annotation, including GO terms, to the protein signatures. You can access the data programmatically, via Web Services. The member databases use a number of approaches: # ProDom: provider of sequence-clusters built from UniProtKB using PSI-BLAST. # PROSITE patterns: provider of simple regular expressions. # PROSITE and HAMAP profiles: provide sequence matrices. # PRINTS provider of fingerprints, which are groups of aligned, un-weighted Position Specific Sequence Matrices (PSSMs). # PANTHER, PIRSF, Pfam, SMART, TIGRFAMs, Gene3D and SUPERFAMILY: are providers of hidden Markov models (HMMs). Your contributions are welcome. You are encouraged to use the ''''Add your annotation'''' button on InterPro entry pages to suggest updated or improved annotation for individual InterPro entries.

View all literature mentions

Protein Information Resource (tool)

RRID:SCR_002837

Integrated public bioinformatics resource to support genomic, proteomic and systems biology research and scientific studies. Provides databases and protein sequence analysis tools to scientific community, including Protein Sequence Database which grew out from the Atlas of Protein Sequence and Structure. Conducts research in biomedical text mining and ontology, computational systems biology, and bioinformatics cyberinfrastructure. In 2002 PIR, along with its international partners, EBI (European Bioinformatics Institute) and SIB (Swiss Institute of Bioinformatics), were awarded a grant from NIH to create UniProt, a single worldwide database of protein sequence and function, by unifying the PIR-PSD, Swiss-Prot, and TrEMBL databases. Currently, PIR major activities include: i) UniProt (Universal Protein Resource) development, ii) iProClass protein data integration and ID mapping, iii) PRO protein ontology, and iv) iProLINK protein literature mining and ontology development. The FTP site provides free download for iProClass, PIRSF, and PRO.

View all literature mentions