The primary objective of this study was to create a genome-wide high resolution map (i.e., >100 bp) of 'rearrangement hotspots' which can facilitate the identification of regions capable of mediating de novo deletions or duplications in humans. A hierarchical method was employed to fragment segmental duplications (SDs) into multiple smaller SD units. Combining an end space free pairwise alignment algorithm with a 'seed and extend' approach, we have exhaustively searched 409 million alignments to detect complex structural rearrangements within the reference-guided assembly of the NA18507 human genome (18× coverage), including the previously identified novel 4.8 Mb sequence from de novo assembly within this genome. We have identified 1,963 rearrangement hotspots within SDs which encompass 166 genes and display an enrichment of duplicated gene nucleotide variants (DNVs). These regions are correlated with increased non-allelic homologous recombination (NAHR) event frequency which presumably represents the origin of copy number variations (CNVs) and pathogenic duplications/deletions. Analysis revealed that 20% of the detected hotspots are clustered within the proximal and distal SD breakpoints flanked by the pathogenic deletions/duplications that have been mapped for 24 NAHR-mediated genomic disorders. FISH Validation of selected complex regions revealed 94% concordance with in silico localization of the highly homologous derivatives. Other results from this study indicate that intra-chromosomal recombination is enhanced in genic compared with agenic duplicated regions, and that gene desert regions comprising SDs may represent reservoirs for creation of novel genes. The generation of genome-wide signatures of 'rearrangement hotspots', which likely serve as templates for NAHR, may provide a powerful approach towards understanding the underlying mutational mechanism(s) for development of constitutional and acquired diseases.
Pubmed ID: 22194928 RIS Download
Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.
A portal to biomedical and genomic information. NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information for the better understanding of molecular processes affecting human health and disease.
View all literature mentionsA software package for aligning genomic reads against a target genome.
View all literature mentionsSoftware designed to quickly find sequences of 95% and greater similarity of length 25 bases or more.
View all literature mentionsSystem that classifies genes by their functions, using published scientific experimental evidence and evolutionary relationships to predict function even in absence of direct experimental evidence. Orthologs view is curated orthology relationships between genes for human, mouse, rat, fish, worm, and fly.
View all literature mentionsSystem that classifies genes by their functions, using published scientific experimental evidence and evolutionary relationships to predict function even in absence of direct experimental evidence. Orthologs view is curated orthology relationships between genes for human, mouse, rat, fish, worm, and fly.
View all literature mentions