DySC: software for greedy clustering of 16S rRNA reads.
Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the normalized mutual information (NMI) criterion show that DySC produces higher quality clusters than UCLUST and CD-HIT at a comparable runtime. AVAILABILITY AND IMPLEMENTATION: DySC, implemented in C, is available at http://code.google.com/p/dysc/ under GNU GPL license.
Pubmed ID: 22730435
Bioinformatics (Oxford, England)
August 15, 2012
- Cluster Analysis
- RNA, Ribosomal, 16S
- Sequence Analysis, RNA