• Register
X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X

Leaving Community

Are you sure you want to leave this community? Leaving the community will revoke any permissions you have been granted in this community.

No
Yes

Ultrafast clustering algorithms for metagenomic sequence analysis.

The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and interactions. However, the massive quantity and the comprehensive complexity of these sequence data pose tremendous challenges in data analysis. These challenges include but are not limited to ever-increasing computational demand, biased sequence sampling, sequence errors, sequence artifacts and novel sequences. Sequence clustering methods can directly answer many of the fundamental questions by grouping similar sequences into families. In addition, clustering analysis also addresses the challenges in metagenomics. Thus, a large redundant data set can be represented with a small non-redundant set, where each cluster can be represented by a single entry or a consensus. Artifacts can be rapidly detected through clustering. Errors can be identified, filtered or corrected by using consensus from sequences within clusters.

Pubmed ID: 22772836

Authors

  • Li W
  • Fu L
  • Niu B
  • Wu S
  • Wooley J

Journal

Briefings in bioinformatics

Publication Data

November 23, 2012

Associated Grants

  • Agency: NHGRI NIH HHS, Id: R01 HG005978
  • Agency: NCRR NIH HHS, Id: R01 RR025030
  • Agency: NHGRI NIH HHS, Id: R01HG005978
  • Agency: NCRR NIH HHS, Id: R01RR025030

Mesh Terms

  • Algorithms
  • Cluster Analysis
  • Metagenome
  • Metagenomics
  • Sequence Analysis, DNA