L1 retrotransposon-derived sequences comprise approximately 17% of the human genome. Darwinian selective pressures alter L1 genomic distributions during evolution, confounding the ability to determine initial L1 integration preferences. Here, we generated high-confidence datasets of greater than 88,000 engineered L1 insertions in human cell lines that act as proxies for cells that accommodate retrotransposition in vivo. Comparing these insertions to a null model, in which L1 endonuclease activity is the sole determinant dictating L1 integration preferences, demonstrated that L1 insertions are not significantly enriched in genes, transcribed regions, or open chromatin. By comparison, we provide compelling evidence that the L1 endonuclease disproportionately cleaves predominant lagging strand DNA replication templates, while lagging strand 3'-hydroxyl groups may prime endonuclease-independent L1 retrotransposition in a Fanconi anemia cell line. Thus, acquisition of an endonuclease domain, in conjunction with the ability to integrate into replicating DNA, allowed L1 to become an autonomous, interspersed retrotransposon.
Pubmed ID: 30955886 RIS Download
Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.
Collection of genome databases for vertebrates and other eukaryotic species with DNA and protein sequence search capabilities. Used to automatically annotate genome, integrate this annotation with other available biological data and make data publicly available via web. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species.
View all literature mentionsPortal to interactively visualize genomic data. Provides reference sequences and working draft assemblies for collection of genomes and access to ENCODE and Neanderthal projects. Includes collection of vertebrate and model organism assemblies and annotations, along with suite of tools for viewing, analyzing and downloading data.
View all literature mentionsSoftware repository for R packages related to analysis and comprehension of high throughput genomic data. Uses separate set of commands for installation of packages. Software project based on R programming language that provides tools for analysis and comprehension of high throughput genomic data.
View all literature mentionsA powerful toolset for genome arithmetic allowing one to address common genomics tasks such as finding feature overlaps and computing coverage. Bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.
View all literature mentionsSoftware for aligning sequencing reads against large reference genome. Consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. First for sequence reads up to 100bp, and other two for longer sequences ranged from 70bp to 1Mbp.
View all literature mentionsSoftware tool for transcriptome assembly and differential expression analysis for RNA-Seq. Includes script called cuffmerge that can be used to merge together several Cufflinks assemblies. It also handles running Cuffcompare as well as automatically filtering a number of transfrags that are likely to be artifacts. If the researcher has a reference GTF file, the researcher can provide it to the script to more effectively merge novel isoforms and maximize overall assembly quality.
View all literature mentionsThis unknown targets BrdU (bromodeoxyuridine)
View all literature mentions