Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Reprever: resolving low-copy duplicated sequences using template driven assembly.

Nucleic acids research | Jul 1, 2013

Genomic sequence duplication is an important mechanism for genome evolution, often resulting in large sequence variations with implications for disease progression. Although paired-end sequencing technologies are commonly used for structural variation discovery, the discovery of novel duplicated sequences remains an unmet challenge. We analyze duplicons starting from identified high-copy number variants. Given paired-end mapped reads, and a candidate high-copy region, our tool, Reprever, identifies (a) the insertion breakpoints where the extra duplicons inserted into the donor genome and (b) the actual sequence of the duplicon. Reprever resolves ambiguous mapping signatures from existing homologs, repetitive elements and sequencing errors to identify breakpoint. At each breakpoint, Reprever reconstructs the inserted sequence using profile hidden Markov model (PHMM)-based guided assembly. In a test on 1000 artificial genomes with simulated duplication, Reprever could identify novel duplicates up to 97% of genomes within 3 bp positional and 1% sequence errors. Validation on 680 fosmid sequences identified and reconstructed eight duplicated sequences with high accuracy. We applied Reprever to reanalyzing a re-sequenced data set from the African individual NA18507 to identify >800 novel duplicates, including insertions in genes and insertions with additional variation. polymerase chain reaction followed by capillary sequencing validated both the insertion locations of the strongest predictions and their predicted sequence.

Pubmed ID: 23658221 RIS Download

Mesh terms: Algorithms | DNA Copy Number Variations | Genome, Human | Humans | Polymerase Chain Reaction | Sequence Analysis, DNA | Software | Templates, Genetic

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


GitHub

A web-based hosting service for software development projects that use the Git revision control system offering powerful collaboration, code review, and code management. It offers both paid plans for private repositories, and free accounts for open source projects. Large or small, every repository comes with the same powerful tools. These tools are open to the community for public projects and secure for private projects. Features include: * Integrated issue tracking * Collaborative code review * Easily manage teams within organizations * Text entry with understated power * A growing list of programming languages and data formats * On the desktop and in your pocket - Android app and mobile web views let you keep track of your projects on the go.

tool

View all literature mentions

NCBI

A portal to biomedical and genomic information. NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information for the better understanding of molecular processes affecting human health and disease.

tool

View all literature mentions

Pompep

FTP site to access Schizosaccharomyces pombe protein data.

tool

View all literature mentions