Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

A hominoid-specific endogenous retrovirus may have rewired the gene regulatory network shared between primordial germ cells and naïve pluripotent cells.

PLoS genetics | 2022

Mammalian germ cells stem from primordial germ cells (PGCs). Although the gene regulatory network controlling the development of germ cells such as PGCs is critical for ensuring gamete integrity, substantial differences exist in this network among mammalian species, suggesting that this network has been modified during mammalian evolution. Here, we show that a hominoid-specific group of endogenous retroviruses, LTR5_Hs, discloses enhancer-like signatures in human in vitro-induced PGCs, PGC-like cells (PGCLCs). Human PGCLCs exhibit a transcriptome signature similar to that of naïve-state pluripotent cells. LTR5_Hs are epigenetically activated in both PGCLCs and naïve pluripotent cells, and the expression of genes in the vicinity of LTR5_Hs is coordinately upregulated in these cell types, contributing to the establishment of the transcriptome similarity between these cell types. LTR5_Hs are preferentially bound by transcription factors that are highly expressed in both PGCLCs and naïve pluripotent cells (KLF4, TFAP2C, NANOG, and CBFA2T2), suggesting that these transcription factors contribute to the epigenetic activation of LTR5_Hs in these cells. Comparative transcriptome analysis between humans and macaques suggests that the expression of many genes in PGCLCs and naïve pluripotent cells is upregulated by LTR5_Hs insertions in the hominoid lineage. Together, this study suggests that LTR5_Hs insertions may have finetuned the gene regulatory network shared between PGCLCs and naïve pluripotent cells and coordinately altered the gene expression in these cells during hominoid evolution.

Pubmed ID: 35551519 RIS Download

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


ChIP-seq (tool)

RRID:SCR_001237

Set of software modules for performing common ChIP-seq data analysis tasks across the whole genome, including positional correlation analysis, peak detection, and genome partitioning into signal-rich and signal-poor regions. The tools are designed to be simple, fast and highly modular. Each program carries out a well defined data processing procedure that can potentially fit into a pipeline framework. ChIP-Seq is also freely available on a Web interface.

View all literature mentions

SAMTOOLS (tool)

RRID:SCR_002105

Original SAMTOOLS package has been split into three separate repositories including Samtools, BCFtools and HTSlib. Samtools for manipulating next generation sequencing data used for reading, writing, editing, indexing,viewing nucleotide alignments in SAM,BAM,CRAM format. BCFtools used for reading, writing BCF2,VCF, gVCF files and calling, filtering, summarising SNP and short indel sequence variants. HTSlib used for reading, writing high throughput sequencing data.

View all literature mentions

Ensembl (tool)

RRID:SCR_002344

Collection of genome databases for vertebrates and other eukaryotic species with DNA and protein sequence search capabilities. Used to automatically annotate genome, integrate this annotation with other available biological data and make data publicly available via web. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species.

View all literature mentions

STAR (tool)

RRID:SCR_004463

Software performing alignment of high-throughput RNA-seq data. Aligns RNA-seq reads to reference genome using uncompressed suffix arrays.

View all literature mentions

STRING (tool)

RRID:SCR_005223

Database of known and predicted protein interactions. The interactions include direct (physical) and indirect (functional) associations and are derived from four sources: Genomic Context, High-throughput experiments, (Conserved) Coexpression, and previous knowledge. STRING quantitatively integrates interaction data from these sources for a large number of organisms, and transfers information between these organisms where applicable. The database currently covers 5''214''234 proteins from 1133 organisms. (2013)

View all literature mentions

RAxML (tool)

RRID:SCR_006086

Software program for phylogenetic analyses of large datasets under maximum likelihood.

View all literature mentions

Picard (tool)

RRID:SCR_006525

Java toolset for working with next generation sequencing data in the BAM format.

View all literature mentions

InterPro (tool)

RRID:SCR_006695

Service providing functional analysis of proteins by classifying them into families and predicting domains and important sites. They combine protein signatures from a number of member databases into a single searchable resource, capitalizing on their individual strengths to produce a powerful integrated database and diagnostic tool. This integrated database of predictive protein signatures is used for the classification and automatic annotation of proteins and genomes. InterPro classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites. InterPro adds in-depth annotation, including GO terms, to the protein signatures. You can access the data programmatically, via Web Services. The member databases use a number of approaches: # ProDom: provider of sequence-clusters built from UniProtKB using PSI-BLAST. # PROSITE patterns: provider of simple regular expressions. # PROSITE and HAMAP profiles: provide sequence matrices. # PRINTS provider of fingerprints, which are groups of aligned, un-weighted Position Specific Sequence Matrices (PSSMs). # PANTHER, PIRSF, Pfam, SMART, TIGRFAMs, Gene3D and SUPERFAMILY: are providers of hidden Markov models (HMMs). Your contributions are welcome. You are encouraged to use the ''''Add your annotation'''' button on InterPro entry pages to suggest updated or improved annotation for individual InterPro entries.

View all literature mentions

PeproTech (tool)

RRID:SCR_006802

An Antibody supplier

View all literature mentions

MAFFT (tool)

RRID:SCR_011811

Software package as multiple alignment program for amino acid or nucleotide sequences. Can align up to 500 sequences or maximum file size of 1 MB. First version of MAFFT used algorithm based on progressive alignment, in which sequences were clustered with help of Fast Fourier Transform. Subsequent versions have added other algorithms and modes of operation, including options for faster alignment of large numbers of sequences, higher accuracy alignments, alignment of non-coding RNA sequences, and addition of new sequences to existing alignments.

View all literature mentions

Trimmomatic (tool)

RRID:SCR_011848

Software Java pipeline for trimming tasks for Illumina paired end and single ended data. Flexible Trimmer for Illumina Sequence Data. Pair aware preprocessing tool optimized for Illumina next generation sequencing data. Includes several processing steps for read trimming and filtering. Operating systems Unix/Linux, Mac OS, Windows.

View all literature mentions

featureCounts (tool)

RRID:SCR_012919

A read summarization program, which counts mapped reads for the genomic features such as genes and exons.

View all literature mentions

RepeatMasker (tool)

RRID:SCR_012954

Software tool that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). Currently over 56% of human genomic sequence is identified and masked by the program. Sequence comparisons in RepeatMasker are performed by one of several popular search engines including nhmmer, cross_match, ABBlast/WUBlast, RMBlast and Decypher. RepeatMasker makes use of curated libraries of repeats and currently supports Dfam ( profile HMM library ) and RepBase ( consensus sequence library ).

View all literature mentions

ggplot2 (tool)

RRID:SCR_014601

Open source software package for statistical programming language R to create plots based on grammar of graphics. Used for data visualization to break up graphs into semantic components such as scales and layers.

View all literature mentions

GENCODE (tool)

RRID:SCR_014966

Human and mouse genome annotation project which aims to identify all gene features in the human genome using computational analysis, manual annotation, and experimental validation.

View all literature mentions

DESeq2 (tool)

RRID:SCR_015687

Software package for differential gene expression analysis based on the negative binomial distribution. Used for analyzing RNA-seq data for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates.

View all literature mentions

Monocle2 (tool)

RRID:SCR_016339

Software package for analyzing single cell gene expression, classifying and counting cells, performing differential expression analysis between subpopulations of cells, and reconstructing cellular trajcectories. Works well with very large single-cell RNA-Seq experiments containing tens of thousands of cells or more. Used in computational analysis of gene expression data in single cell gene expression studies to profile transcriptional regulation in complex biological processes and highly heterogeneous cell populations.

View all literature mentions

ComplexHeatmap (tool)

RRID:SCR_017270

Software package to arrange multiple heatmaps and support various annotation graphics. Used to visualize associations between different sources of data sets and to reveal potential patterns.

View all literature mentions