This service exclusively searches for literature that cites resources. Please be aware that the total number of searchable documents is limited to those containing RRIDs and does not include all open-access literature.
Despite the known relevance of genomic structural variants to pathogen behavior, cancer, development, and evolution, certain repeat based structural variants may evade detection by existing high-throughput techniques. Here, we present ruler arrays, a technique to detect genomic structural variants including insertions and deletions (indels), duplications, and translocations. A ruler array exploits DNA polymerase's processivity to detect physical distances between defined genomic sequences regardless of the intervening sequence. The method combines a sample preparation protocol, tiling genomic microarrays, and a new computational analysis. The analysis of ruler array data from two genomic samples enables the identification of structural variation between the samples. In an empirical test between two closely related haploid strains of yeast ruler arrays detected 78% of the structural variants larger than 100 bp.
Much has changed in the last two years at DGVa (http://www.ebi.ac.uk/dgva) and dbVar (http://www.ncbi.nlm.nih.gov/dbvar). We are now processing direct submissions rather than only curating data from the literature and our joint study catalog includes data from over 100 studies in 11 organisms. Studies from human dominate with data from control and case populations, tumor samples as well as three large curated studies derived from multiple sources. During the processing of these data, we have made improvements to our data model, submission process and data representation. Additionally, we have made significant improvements in providing access to these data via web and FTP interfaces.
Indigenous Australians harbour rich and unique genomic diversity. However, Aboriginal and Torres Strait Islander ancestries are historically under-represented in genomics research and almost completely missing from reference datasets1-3. Addressing this representation gap is critical, both to advance our understanding of global human genomic diversity and as a prerequisite for ensuring equitable outcomes in genomic medicine. Here we apply population-scale whole-genome long-read sequencing4 to profile genomic structural variation across four remote Indigenous communities. We uncover an abundance of large insertion-deletion variants (20-49 bp; n = 136,797), structural variants (50 b-50 kb; n = 159,912) and regions of variable copy number (>50 kb; n = 156). The majority of variants are composed of tandem repeat or interspersed mobile element sequences (up to 90%) and have not been previously annotated (up to 62%). A large fraction of structural variants appear to be exclusive to Indigenous Australians (12% lower-bound estimate) and most of these are found in only a single community, underscoring the need for broad and deep sampling to achieve a comprehensive catalogue of genomic structural variation across the Australian continent. Finally, we explore short tandem repeats throughout the genome to characterize allelic diversity at 50 known disease loci5, uncover hundreds of novel repeat expansion sites within protein-coding genes, and identify unique patterns of diversity and constraint among short tandem repeat sequences. Our study sheds new light on the dimensions and dynamics of genomic structural variation within and beyond Australia.
Near isogenic lines (NILs) are a critical genetic resource for the soybean research community. The ability to identify and characterize the genes driving the phenotypic differences between NILs is limited by the degree to which differential genetic introgressions can be resolved. Furthermore, the genetic heterogeneity extant among NIL sub-lines is an unaddressed research topic that might have implications for how genomic and phenotypic data from NILs are utilized. In this study, a recently developed high-resolution comparative genomic hybridization (CGH) platform was used to investigate the structure and diversity of genetic introgressions in two classical soybean NIL populations, respectively varying in protein content and iron deficiency chlorosis (IDC) susceptibility. There were three objectives: assess the capacity for CGH to resolve genomic introgressions, identify introgressions that are heterogeneous among NIL sub-lines, and associate heterogeneous introgressions with susceptibility to IDC. Using the CGH approach, introgression boundaries were refined and previously unknown introgressions were revealed. Furthermore, heterogeneous introgressions were identified within seven sub-lines of the IDC NIL "IsoClark." This included three distinct introgression haplotypes linked to the major iron susceptible locus on chromosome 03. A phenotypic assessment of the seven sub-lines did not reveal any differences in IDC susceptibility, indicating that the genetic heterogeneity among the lines does not have a significant impact on the primary NIL phenotype.
Long-read sequencing is promising for the comprehensive discovery of structural variations (SVs). However, it is still non-trivial to achieve high yields and performance simultaneously due to the complex SV signatures implied by noisy long reads. We propose cuteSV, a sensitive, fast, and scalable long-read-based SV detection approach. cuteSV uses tailored methods to collect the signatures of various types of SVs and employs a clustering-and-refinement method to implement sensitive SV detection. Benchmarks on simulated and real long-read sequencing datasets demonstrate that cuteSV has higher yields and scaling performance than state-of-the-art tools. cuteSV is available at https://github.com/tjiangHIT/cuteSV .
Recurrent miscarriage (RM) is a multifactorial disorder with acknowledged genetic heritability that affects ∼3% of couples aiming at childbirth. As copy number variants (CNVs) have been shown to contribute to reproductive disease susceptibility, we aimed to describe genome-wide profile of CNVs and identify common rearrangements modulating risk to RM. Genome-wide screening of Estonian RM patients and fertile controls identified excessive cumulative burden of CNVs (5.4 and 6.1 Mb per genome) in two RM cases possibly increasing their individual disease risk. Functional profiling of all rearranged genes within RM study group revealed significant enrichment of loci related to innate immunity and immunoregulatory pathways essential for immune tolerance at fetomaternal interface. As a major finding, we report a multicopy duplication (61.6 kb) at 5p13.3 conferring increased maternal risk to RM in Estonia and Denmark (meta-analysis, n = 309/205, odds ratio = 4.82, P = 0.012). Comparison to Estonian population-based cohort (total, n = 1000) confirmed the risk for Estonian female cases (P = 7.9 × 10(-4) ). Datasets of four cohorts from the Database of Genomic Variants (total, n = 5,846 subjects) exhibited similar low duplication prevalence worldwide (0.7%-1.2%) compared to RM cases of this study (6.6%-7.5%). The CNV disrupts PDZD2 and GOLPH3 genes predominantly expressed in placenta and it may represent a novel risk factor for pregnancy complications.
The malaria parasite Plasmodium falciparum has a great capacity for evolutionary adaptation to evade host immunity and develop drug resistance. Current understanding of parasite evolution is impeded by the fact that a large fraction of the genome is either highly repetitive or highly variable and thus difficult to analyze using short-read sequencing technologies. Here, we describe a resource of deep sequencing data on parents and progeny from genetic crosses, which has enabled us to perform the first genome-wide, integrated analysis of SNP, indel and complex polymorphisms, using Mendelian error rates as an indicator of genotypic accuracy. These data reveal that indels are exceptionally abundant, being more common than SNPs and thus the dominant mode of polymorphism within the core genome. We use the high density of SNP and indel markers to analyze patterns of meiotic recombination, confirming a high rate of crossover events and providing the first estimates for the rate of non-crossover events and the length of conversion tracts. We observe several instances of meiotic recombination within copy number variants associated with drug resistance, demonstrating a mechanism whereby fitness costs associated with resistance mutations could be compensated and greater phenotypic plasticity could be acquired.
Although various methods have been developed to detect structural variations (SVs) in genomic sequences, few are used to validate these results. Several commonly used SV callers produce many false positive SVs, and existing validation methods are not accurate enough. Therefore, a highly efficient and accurate validation method is essential. In response, we propose SVvalidation-a new method that uses long-read sequencing data for validating SVs with higher accuracy and efficiency. Compared to existing methods, SVvalidation performs better in validating SVs in repeat regions and can determine the homozygosity or heterozygosity of an SV. Additionally, SVvalidation offers the highest recall, precision, and F1-score (improving by 7-16%) across all datasets. Moreover, SVvalidation is suitable for different types of SVs. The program is available at https://github.com/nwpuzhengyan/SVvalidation.
African cichlid fishes are an ideal system for studying explosive rates of speciation and the origin of diversity in adaptive radiation. Within the last few million years, more than 2000 species have evolved in the Great Lakes of East Africa, the largest adaptive radiation in vertebrates. These young species show spectacular diversity in their coloration, morphology and behavior. However, little is known about the genomic basis of this astonishing diversity. Recently, five African cichlid genomes were sequenced, including that of the Nile Tilapia (Oreochromis niloticus), a basal and only relatively moderately diversified lineage, and the genomes of four representative endemic species of the adaptive radiations, Neolamprologus brichardi, Astatotilapia burtoni, Metriaclima zebra, and Pundamila nyererei. Using the Tilapia genome as a reference genome, we generated a high-resolution genomic variation map, consisting of single nucleotide polymorphisms (SNPs), short insertions and deletions (indels), inversions and deletions. In total, around 18.8, 17.7, 17.0, and 17.0 million SNPs, 2.3, 2.2, 1.4, and 1.9 million indels, 262, 306, 162, and 154 inversions, and 3509, 2705, 2710, and 2634 deletions were inferred to have evolved in N. brichardi, A. burtoni, P. nyererei, and M. zebra, respectively. Many of these variations affected the annotated gene regions in the genome. Different patterns of genetic variation were detected during the adaptive radiation of African cichlid fishes. For SNPs, the highest rate of evolution was detected in the common ancestor of N. brichardi, A. burtoni, P. nyererei, and M. zebra. However, for the evolution of inversions and deletions, we found that the rates at the terminal taxa are substantially higher than the rates at the ancestral lineages. The high-resolution map provides an ideal opportunity to understand the genomic bases of the adaptive radiation of African cichlid fishes.
Powdery mildew, caused by the obligate biotrophic fungus Erysiphe necator, is an economically important disease of grapevines worldwide. Large quantities of fungicides are used for its control, accelerating the incidence of fungicide-resistance. Copy number variations (CNVs) are unbalanced changes in the structure of the genome that have been associated with complex traits. In addition to providing the first description of the large and highly repetitive genome of E. necator, this study describes the impact of genomic structural variation on fungicide resistance in Erysiphe necator.
Hybrids between divergent populations commonly show hybrid sterility; this reproductive barrier hinders hybrid breeding of the japonica and indica rice (Oryza sativa L.) subspecies. Here we show that structural changes and copy number variation at the Sc locus confer japonica-indica hybrid male sterility. The japonica allele, Sc-j, contains a pollen-essential gene encoding a DUF1618-domain protein; the indica allele, Sc-i, contains two or three tandem-duplicated ~ 28-kb segments, each carrying an Sc-j-homolog with a distinct promoter. In Sc-j/Sc-i hybrids, the high-expression of Sc-i in sporophytic cells causes suppression of Sc-j expression in pollen and selective abortion of Sc-j-pollen, leading to transmission ratio distortion. Knocking out one or two of the three Sc-i copies by CRISPR/Cas9 rescues Sc-j expression and male fertility. Our results reveal the gene dosage-dependent allelic suppression as a mechanism of hybrid incompatibility, and provide an effective approach to overcome the reproductive barrier for hybrid breeding.
Structural genomic variations play an important role in human disease and phenotypic diversity. With the rise of high-throughput sequencing tools, mate-pair/paired-end/single-read sequencing has become an important technique for the detection and exploration of structural variation. Several analysis tools exist to handle different parts and aspects of such sequencing based structural variation analyses pipelines. A comprehensive analysis platform to handle all steps, from processing the sequencing data, to the discovery and visualization of structural variants, is missing. The ViVar platform is built to handle the discovery of structural variants, from Depth Of Coverage analysis, aberrant read pair clustering to split read analysis. ViVar provides you with powerful visualization options, enables easy reporting of results and better usability and data management. The platform facilitates the processing, analysis and visualization, of structural variation based on massive parallel sequencing data, enabling the rapid identification of disease loci or genes. ViVar allows you to scale your analysis with your work load over multiple (cloud) servers, has user access control to keep your data safe and is easy expandable as analysis techniques advance. URL: https://www.cmgg.be/vivar/
Genomic structural variation (SV) is a major determinant for phenotypic variation. Although it has been extensively studied in humans, the nucleotide resolution structure of SVs within the widely used model organism Drosophila remains unknown. We report a highly accurate, densely validated map of unbalanced SVs comprising 8962 deletions and 916 tandem duplications in 39 lines derived from short-read DNA sequencing in a natural population (the "Drosophila melanogaster Genetic Reference Panel," DGRP). Most SVs (>90%) were inferred at nucleotide resolution, and a large fraction was genotyped across all samples. Comprehensive analyses of SV formation mechanisms using the short-read data revealed an abundance of SVs formed by mobile element and nonhomologous end-joining-mediated rearrangements, and clustering of variants into SV hotspots. We further observed a strong depletion of SVs overlapping genes, which, along with population genetics analyses, suggests that these SVs are often deleterious. We inferred several gene fusion events also highlighting the potential role of SVs in the generation of novel protein products. Expression quantitative trait locus (eQTL) mapping revealed the functional impact of our high-resolution SV map, with quantifiable effects at >100 genic loci. Our map represents a resource for population-level studies of SVs in an important model organism.
Anterior cruciate ligament (ACL) rupture is an important condition of the human knee. Second ruptures are common and societal costs are substantial. Canine cranial cruciate ligament (CCL) rupture closely models the human disease. CCL rupture is common in the Labrador Retriever (5.79% prevalence), ~100-fold more prevalent than in humans. Labrador Retriever CCL rupture is a polygenic complex disease, based on genome-wide association study (GWAS) of single nucleotide polymorphism (SNP) markers. Dissection of genetic variation in complex traits can be enhanced by studying structural variation, including copy number variants (CNVs). Dogs are an ideal model for CNV research because of reduced genetic variability within breeds and extensive phenotypic diversity across breeds. We studied the genetic etiology of CCL rupture by association analysis of CNV regions (CNVRs) using 110 case and 164 control Labrador Retrievers. CNVs were called from SNPs using three different programs (PennCNV, CNVPartition, and QuantiSNP). After quality control, CNV calls were combined to create CNVRs using ParseCNV and an association analysis was performed. We found no strong effect CNVRs but found 46 small effect (max(T) permutation P<0.05) CCL rupture associated CNVRs in 22 autosomes; 25 were deletions and 21 were duplications. Of the 46 CCL rupture associated CNVRs, we identified 39 unique regions. Thirty four were identified by a single calling algorithm, 3 were identified by two calling algorithms, and 2 were identified by all three algorithms. For 42 of the associated CNVRs, frequency in the population was <10% while 4 occurred at a frequency in the population ranging from 10-25%. Average CNVR length was 198,872bp and CNVRs covered 0.11 to 0.15% of the genome. All CNVRs were associated with case status. CNVRs did not overlap previous canine CCL rupture risk loci identified by GWAS. Associated CNVRs contained 152 annotated genes; 12 CNVRs did not have genes mapped to CanFam3.1. Using pathway analysis, a cluster of 19 homeobox domain transcript regulator genes was associated with CCL rupture (P = 6.6E-13). This gene cluster influences cranial-caudal body pattern formation during embryonic limb development. Clustered genes were found in 3 CNVRs on chromosome 14 (HoxA), 28 (NKX6-2), and 36 (HoxD). When analysis was limited to deletion CNVRs, the association was strengthened (P = 8.7E-16). This study suggests a component of the polygenic risk of CCL rupture in Labrador Retrievers is associated with small effect CNVs and may include aspects of stifle morphology regulated by homeobox domain transcript regulator genes.
Major unresolved questions in evolutionary genetics include determining the contributions of different mutational sources to the total pool of genetic variation in a species, and understanding how these different forms of genetic variation interact with natural selection. Recent work has shown that structural variants (SVs) (insertions, deletions, inversions, and transpositions) are a major source of genetic variation, often outnumbering single nucleotide variants in terms of total bases affected. Despite the near ubiquity of SVs, major questions about their interaction with natural selection remain. For example, how does the allele frequency spectrum of SVs differ when compared with single nucleotide variants? How often do SVs affect genes, and what are the consequences? To begin to address these questions, we have systematically identified and characterized a large set of submicroscopic insertion and deletion (indel) variants (between 1 and 200 kb in length) among ten inbred lines from a single natural population of the plant species Mimulus guttatus. After extensive computational filtering, we focused on a set of 4,142 high-confidence indels that showed an experimental validation rate of 73%. All but one of these indels were less than 200 kb. Although the largest were generally at lower frequencies in the population, a surprising number of large indels are at intermediate frequencies. Although indels overlapping with genes were much rarer than expected by chance, approximately 600 genes were affected by an indel. Nucleotide-binding site leucine-rich repeat (NBS-LRR) defense response genes were the most enriched among the gene families affected. Most indels associated with genes were rare and appeared to be under purifying selection, though we do find four high-frequency derived insertion alleles that show signatures of recent positive selection.
Species harbor extensive structural variation underpinning recent adaptive evolution. However, the causality between genomic features and the induction of new rearrangements is poorly established. Here, we analyze a global set of telomere-to-telomere genome assemblies of a fungal pathogen of wheat to establish a nucleotide-level map of structural variation. We show that the recent emergence of pesticide resistance has been disproportionally driven by rearrangements. We use machine learning to train a model on structural variation events based on 30 chromosomal sequence features. We show that base composition and gene density are the major determinants of structural variation. Retrotransposons explain most inversion, indel and duplication events. We apply our model to Arabidopsis thaliana and show that our approach extends to more complex genomes. Finally, we analyze complete genomes of haploid offspring in a four-generation pedigree. Meiotic crossover locations are enriched for new rearrangements consistent with crossovers being mutational hotspots. The model trained on species-wide structural variation accurately predicts the position of >74% of newly generated variants along the pedigree. The predictive power highlights causality between specific sequence features and the induction of chromosomal rearrangements. Our work demonstrates that training sequence-derived models can accurately identify regions of intrinsic DNA instability in eukaryotic genomes.
The polerovirus (family Solemoviridae, genus Polerovirus) genome consists of single-, positive-strand RNA organized in overlapping open reading frames (ORFs) that, in addition to others, code for protein 0 (P0, a gene silencing suppressor), a coat protein (CP, ORF3), and a read-through domain (ORF5) that is fused to the CP to form a CP-read-through (RT) protein. The genus Polerovirus contains twenty-six virus species that infect a wide variety of plants from cereals to cucurbits, to peppers. Poleroviruses are transmitted by a wide range of aphid species in the genera Rhopalosiphum, Stiobion, Aphis, and Myzus. Aphid transmission is mediated both by the CP and by the CP-RT. In viruses, mutational robustness and structural flexibility are necessary for maintaining functionality in genetically diverse sets of host plants and vectors. Under this scenario, within a virus genome, mutations preferentially accumulate in areas that are determinants of host adaptation or vector transmission. In this study, we profiled genomic variation in poleroviruses. Consistent with their multifunctional nature, single-nucleotide variation and selection analyses showed that ORFs coding for P0 and the read-through domain within the CP-RT are the most variable and contain the highest frequency of sites under positive selection. An order/disorder analysis showed that protein P0 is not disordered. In contrast, proteins CP-RT and virus protein genome-linked (VPg) contain areas of disorder. Disorder is a property of multifunctional proteins with multiple interaction partners. The results described here suggest that using contrasting mechanisms, P0, VPg, and CP-RT mediate adaptation to host plants and to vectors and are contributors to the broad host and vector range of poleroviruses. Profiling genetic variation across the polerovirus genome has practical applications in diagnostics, breeding for resistance, and identification of susceptibility genes and contributes to our understanding of virus interactions with their host, vectors, and environment.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants have been emerging and circulating globally since the start of the COVID-19 pandemic, of which B.1.617 lineage that was first reported in India at the end of 2020, soon became predominant. Tracing genomic variations and understanding their impact on the viral properties are the foundations for the vaccine and drug development and for the mitigation measures to be taken or lifted. In this study, 1,051 near-complete genomes and 1,559 spike (S) sequences belonging to the B.1.617 were analyzed. A genome-wide spread of single nucleotide polymorphisms (SNPs) was identified. Of the high frequency mutations identified, 61% (11/18) involved structural proteins, despite two third of the viral genome encoding nonstructural proteins. There were 22 positive selection sites, mostly distributed across the S protein, of which 16 were led by non-C to U transition and should be of a special attention. Haplotype network revealed that a large number of daughter haplotypes were continually derived throughout the pandemic, of which H177, H181 H219 and H286 from the ancestor haplotype H176 of B.1.617.2 were widely prevalent. Besides the well known substitutions of L452R, P681R and deletions of E156 and F157, as well as the potential biological significance, structural analysis in this study still indicated that new amino acid changes in B.1.617, such as E484Q and N501Y, had reshaped the viral bonding network, and increasingly sequenced N501Y mutant with a potential enhanced binding ability was detected in many other countries in the follow-up monitoring. Although we can't conclude the properties of all the mutants including N501Y thoroughly, it merits focusing on their spread epidemically and biologically.
Effective utilization of wild relatives is key to overcoming challenges in genetic improvement of cultivated tomato, which has a narrow genetic basis; however, current efforts to decipher high-quality genomes for tomato wild species are insufficient. Here, we report chromosome-scale tomato genomes from nine wild species and two cultivated accessions, representative of Solanum section Lycopersicon, the tomato clade. Together with two previously released genomes, we elucidate the phylogeny of Lycopersicon and construct a section-wide gene repertoire. We reveal the landscape of structural variants and provide entry to the genomic diversity among tomato wild relatives, enabling the discovery of a wild tomato gene with the potential to increase yields of modern cultivated tomatoes. Construction of a graph-based genome enables structural-variant-based genome-wide association studies, identifying numerous signals associated with tomato flavor-related traits and fruit metabolites. The tomato super-pangenome resources will expedite biological studies and breeding of this globally important crop.
Welcome to the FDI Lab - SciCrunch.org Resources search. From here you can search through a compilation of resources used by FDI Lab - SciCrunch.org and see how data is organized within our community.
You are currently on the Community Resources tab looking through categories and sources that FDI Lab - SciCrunch.org has compiled. You can navigate through those categories from here or change to a different tab to execute your search through. Each tab gives a different perspective on data.
If you have an account on FDI Lab - SciCrunch.org then you can log in from here to get additional features in FDI Lab - SciCrunch.org such as Collections, Saved Searches, and managing Resources.
Here is the search term that is being executed, you can type in anything you want to search for. Some tips to help searching:
You can save any searches you perform for quick access to later from here.
We recognized your search term and included synonyms and inferred terms along side your term to help get the data you are looking for.
If you are logged into FDI Lab - SciCrunch.org you can add data records to your collections to create custom spreadsheets across multiple sources of data.
Here are the facets that you can filter your papers by.
From here we'll present any options for the literature, such as exporting your current results.
If you have any further questions please check out our FAQs Page to ask questions and see our tutorials. Click this button to view this tutorial again.
Year:
Count: