Our hosting provider will be performing UPS maintenance on Tuesday, Oct 25, 2016 between 8 AM and 5 PM PDT. SciCrunch searching services will be down during this time.

Preparing your results

Our searching services are busy right now. Your search will reload in five seconds.

Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Ultrafast clustering algorithms for metagenomic sequence analysis.


The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and interactions. However, the massive quantity and the comprehensive complexity of these sequence data pose tremendous challenges in data analysis. These challenges include but are not limited to ever-increasing computational demand, biased sequence sampling, sequence errors, sequence artifacts and novel sequences. Sequence clustering methods can directly answer many of the fundamental questions by grouping similar sequences into families. In addition, clustering analysis also addresses the challenges in metagenomics. Thus, a large redundant data set can be represented with a small non-redundant set, where each cluster can be represented by a single entry or a consensus. Artifacts can be rapidly detected through clustering. Errors can be identified, filtered or corrected by using consensus from sequences within clusters.

Pubmed ID: 22772836


  • Li W
  • Fu L
  • Niu B
  • Wu S
  • Wooley J


Briefings in bioinformatics

Publication Data

November 23, 2012

Associated Grants

  • Agency: NHGRI NIH HHS, Id: R01 HG005978
  • Agency: NCRR NIH HHS, Id: R01 RR025030
  • Agency: NHGRI NIH HHS, Id: R01HG005978
  • Agency: NCRR NIH HHS, Id: R01RR025030

Mesh Terms

  • Algorithms
  • Cluster Analysis
  • Metagenome
  • Metagenomics
  • Sequence Analysis, DNA