Discovering microRNAs from deep sequencing data using miRDeep.
The capacity of highly parallel sequencing technologies to detect small RNAs at unprecedented depth suggests their value in systematically identifying microRNAs (miRNAs). However, the identification of miRNAs from the large pool of sequenced transcripts from a single deep sequencing run remains a major challenge. Here, we present an algorithm, miRDeep, which uses a probabilistic model of miRNA biogenesis to score compatibility of the position and frequency of sequenced RNA with the secondary structure of the miRNA precursor. We demonstrate its accuracy and robustness using published Caenorhabditis elegans data and data we generated by deep sequencing human and dog RNAs. miRDeep reports altogether approximately 230 previously unannotated miRNAs, of which four novel C. elegans miRNAs are validated by northern blot analysis.