Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Ultrafast clustering algorithms for metagenomic sequence analysis.

The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and interactions. However, the massive quantity and the comprehensive complexity of these sequence data pose tremendous challenges in data analysis. These challenges include but are not limited to ever-increasing computational demand, biased sequence sampling, sequence errors, sequence artifacts and novel sequences. Sequence clustering methods can directly answer many of the fundamental questions by grouping similar sequences into families. In addition, clustering analysis also addresses the challenges in metagenomics. Thus, a large redundant data set can be represented with a small non-redundant set, where each cluster can be represented by a single entry or a consensus. Artifacts can be rapidly detected through clustering. Errors can be identified, filtered or corrected by using consensus from sequences within clusters.

Pubmed ID: 22772836 RIS Download

Mesh terms: Algorithms | Cluster Analysis | Metagenome | Metagenomics | Sequence Analysis, DNA

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.

Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis

THIS RESOURCE IS NO LONGER IN SERVICE, documented May 26, 2016; however, the URL provides links to associated projects and data. A suite of data query, download, upload, analysis and sharing tools serving the needs of the microbial ecology research community, and other scientists using metagenomics data.


View all literature mentions

NCBI Sequence Read Archive

Repository of raw sequencing data from the next generation of sequencing platforms including including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, Helicos Heliscope, Complete Genomics, and Pacific Biosciences SMRT. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence. Data submissions are welcome. SRA is NIH''''s primary archive of high-throughput sequencing data and is part of the international partnership of archives (INSDC) at the NCBI, the European Bioinformatics Institute and the DNA Database of Japan. Data submitted to any of the three organizations are shared among them. NCBI announced that due to budget constraints, it would be discontinuing SRA but NIH has since committed interim funding for SRA in its current form until October 1, 2011. In addition, NCBI has been working with staff from other NIH Institutes and NIH grantees to develop an approach to continue archiving a widely used subset of next generation sequencing data after October 1, 2011.


View all literature mentions


A portal to biomedical and genomic information. NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information for the better understanding of molecular processes affecting human health and disease.


View all literature mentions