Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Estimating the composition of species in metagenomes by clustering of next-generation read sequences.

Methods (San Diego, Calif.) | 2014

Faster and cheaper sequencing technologies together with the ability to sequence uncultured microbes collected from any environment present us an opportunity to distill meaningful information from the millions of new genomic sequences from environmental samples, called metagenome. Contrary to conventional cultured microbes, however, the metagenomic data is extremely heterogeneous and noisy. Therefore the separation of the sets of sequenced genomic fragments that belong to different microbes is essential for successful assembly of microbial genomes. In this paper, we present a novel clustering method for a given metagenomic dataset. The metagenomic dataset has some distinguished features because (i) it is possible that similar sequence patterns may exist in different species and (ii) each species has different number of individuals in the given metagenomic dataset. Our method overcomes these obstacles by using the Gaussian mixture model and analysis of mixture profiles, and taking advantage of genomic signatures extracted from the metagenomic dataset. Unlike conventional clustering methods where clusters are discovered through global similarities of data instances, our method builds clusters by combining the data instances sharing local similarities captured by mixture analysis. By considering shared mixture components, our method is able to create clusters of genomic sequences although they are globally distinct each other. We applied our method to an artificial metagenomic dataset comprised of simulated 47 million reads from 25 real microbial genomes, and analyzed the resulting clusters in terms of the number of clusters, the number of participating species and dominant species in each cluster. Even though our approach cannot address all challenges in the field of metagenome sequence clustering, we believe that out method can contribute to take a step forward to achieve the goals.

Pubmed ID: 25072168 RIS Download

Research resources used in this publication

None found

Additional research tools detected in this publication

Antibodies used in this publication

None found

Associated grants

None

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


NCBI (tool)

RRID:SCR_006472

A portal to biomedical and genomic information. NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information for the better understanding of molecular processes affecting human health and disease.

View all literature mentions

Meta-IDBA (tool)

RRID:SCR_011913

Software for an iterative De Bruijn Graph De Novo short read assembler specially designed for de novo metagenomic assembly. (Please note that MetaIDBA is out of maintainance now, we recommend using IDBA-UD instead which generally performs better.)

View all literature mentions