FDI Lab - SciCrunch.org | Searching in Literature

Integrative transcriptomic analysis suggests new autoregulatory splicing events coupled with nonsense-mediated mRNA decay.

Dmitri Pervouchine‎ et al.
Nucleic acids research‎
2019‎

Nonsense-mediated decay (NMD) is a eukaryotic mRNA surveillance system that selectively degrades transcripts with premature termination codons (PTC). Many RNA-binding proteins (RBP) regulate their expression levels by a negative feedback loop, in which RBP binds its own pre-mRNA and causes alternative splicing to introduce a PTC. We present a bioinformatic analysis integrating three data sources, eCLIP assays for a large RBP panel, shRNA inactivation of NMD pathway, and shRNA-depletion of RBPs followed by RNA-seq, to identify novel such autoregulatory feedback loops. We show that RBPs frequently bind their own pre-mRNAs, their exons respond prominently to NMD pathway disruption, and that the responding exons are enriched with nearby eCLIP peaks. We confirm previously proposed models of autoregulation in SRSF7 and U2AF1 genes and present two novel models, in which (i) SFPQ binds its mRNA and promotes switching to an alternative distal 3'-UTR that is targeted by NMD, and (ii) RPS3 binding activates a poison 5'-splice site in its pre-mRNA that leads to a frame shift and degradation by NMD. We also suggest specific splicing events that could be implicated in autoregulatory feedback loops in RBM39, HNRNPM, and U2AF2 genes. The results are available through a UCSC Genome Browser track hub.

Current status and new features of the Consensus Coding Sequence database.

Catherine M Farrell‎ et al.
Nucleic acids research‎
2014‎

The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.

Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes.

Iakes Ezkurdia‎ et al.
Human molecular genetics‎
2014‎

Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort.

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

ENCODE Project Consortium‎ et al.
Nature‎
2007‎

We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

GENCODE 2021.

Adam Frankish‎ et al.
Nucleic acids research‎
2021‎

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.

Standardized annotation of translated open reading frames.

Jonathan M Mudge‎ et al.
Nature biotechnology‎
2022‎

No abstract available

GENCODE reference annotation for the human and mouse genomes.

Adam Frankish‎ et al.
Nucleic acids research‎
2019‎

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.

Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction.

Adam Frankish‎ et al.
BMC genomics‎
2015‎

A vast amount of DNA variation is being identified by increasingly large-scale exome and genome sequencing projects. To be useful, variants require accurate functional annotation and a wide range of tools are available to this end. McCarthy et al recently demonstrated the large differences in prediction of loss-of-function (LoF) variation when RefSeq and Ensembl transcripts are used for annotation, highlighting the importance of the reference transcripts on which variant functional annotation is based.

High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing.

Julien Lagarde‎ et al.
Nature genetics‎
2017‎

Accurate annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput and accuracy. As a result, reference gene collections remain incomplete-many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental reannotation of the GENCODE intergenic lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.

Genome-wide association study: Exploring the genetic basis for responsiveness to ketogenic dietary therapies for drug-resistant epilepsy.

Natasha E Schoeler‎ et al.
Epilepsia‎
2018‎

With the exception of specific metabolic disorders, predictors of response to ketogenic dietary therapies (KDTs) are unknown. We aimed to determine whether common variation across the genome influences the response to KDT for epilepsy.

The GENCODE pseudogene resource.

Baikang Pei‎ et al.
Genome biology‎
2012‎

Pseudogenes have long been considered as nonfunctional genomic sequences. However, recent evidence suggests that many of them might have some form of biological activity, and the possibility of functionality has increased interest in their accurate annotation and integration with functional genomics data.

Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates.

Zhengdong D Zhang‎ et al.
Genome biology‎
2010‎

Unitary pseudogenes are a class of unprocessed pseudogenes without functioning counterparts in the genome. They constitute only a small fraction of annotated pseudogenes in the human genome. However, as they represent distinct functional losses over time, they shed light on the unique features of humans in primate evolution.

Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment.

Vladimir B Bajic‎ et al.
Genome biology‎
2006‎

This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptual strategies used to correlate produced promoter predictions with the manually annotated 5' gene ends.

Progress, Challenges, and Surprises in Annotating the Human Genome.

Daniel R Zerbino‎ et al.
Annual review of genomics and human genetics‎
2020‎

Our understanding of the human genome has continuously expanded since its draft publication in 2001. Over the years, novel assays have allowed us to progressively overlay layers of knowledge above the raw sequence of A's, T's, G's, and C's. The reference human genome sequence is now a complex knowledge base maintained under the shared stewardship of multiple specialist communities. Its complexity stems from the fact that it is simultaneously a template for transcription, a record of evolution, a vehicle for genetics, and a functional molecule. In short, the human genome serves as a frame of reference at the intersection of a diversity of scientific fields. In recent years, the progressive fall in sequencing costs has given increasing importance to the quality of the human reference genome, as hundreds of thousands of individuals are being sequenced yearly, often for clinical applications. Also, novel sequencing-based assays shed light on novel functions of the genome, especially with respect to gene expression regulation. Keeping the human genome annotation up to date and accurate is therefore an ongoing partnership between reference annotation projects and the greater community worldwide.

Cell type-specific novel long non-coding RNA and circular RNA in the BLUEPRINT hematopoietic transcriptomes atlas.

Luigi Grassi‎ et al.
Haematologica‎
2021‎

Transcriptional profiling of hematopoietic cell subpopulations has helped to characterize the developmental stages of the hematopoietic system and the molecular bases of malignant and non-malignant blood diseases. Previously, only the genes targeted by expression microarrays could be profiled genome-wide. High-throughput RNA sequencing, however, encompasses a broader repertoire of RNA molecules, without restriction to previously annotated genes. We analyzed the BLUEPRINT consortium RNA-sequencing data for mature hematopoietic cell types. The data comprised 90 total RNA-sequencing samples, each composed of one of 27 cell types, and 32 small RNA-sequencing samples, each composed of one of 11 cell types. We estimated gene and isoform expression levels for each cell type using existing annotations from Ensembl. We then used guided transcriptome assembly to discover unannotated transcripts. We identified hundreds of novel non-coding RNA genes and showed that the majority have cell type-dependent expression. We also characterized the expression of circular RNA and found that these are also cell type-specific. These analyses refine the active transcriptional landscape of mature hematopoietic cells, highlight abundant genes and transcriptional isoforms for each blood cell type, and provide a valuable resource for researchers of hematologic development and diseases. Finally, we made the data accessible via a web-based interface: https://blueprint.haem.cam.ac.uk/bloodatlas/.

Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome.

Cédric Howald‎ et al.
Genome research‎
2012‎

Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon-exon junctions were evaluated by RT-PCR amplification followed by highly multiplexed sequencing readout, a method we called RT-PCR-seq. Seventy-nine percent of all assessed junctions are confirmed by this evaluation procedure, demonstrating the high quality of the GENCODE gene set. RT-PCR-seq was also efficient to screen gene models predicted using the Human Body Map (HBM) RNA-seq data. We validated 73% of these predictions, thus confirming 1168 novel genes, mostly noncoding, which will further complement the GENCODE annotation. Our novel experimental validation pipeline is extremely sensitive, far more than unbiased transcriptome profiling through RNA sequencing, which is becoming the norm. For example, exon-exon junctions unique to GENCODE annotated transcripts are five times more likely to be corroborated with our targeted approach than with extensive large human transcriptome profiling. Data sets such as the HBM and ENCODE RNA-seq data fail sampling of low-expressed transcripts. Our RT-PCR-seq targeted approach also has the advantage of identifying novel exons of known genes, as we discovered unannotated exons in ~11% of assessed introns. We thus estimate that at least 18% of known loci have yet-unannotated exons. Our work demonstrates that the cataloging of all of the genic elements encoded in the human genome will necessitate a coordinated effort between unbiased and targeted approaches, like RNA-seq and RT-PCR-seq.

The importance of identifying alternative splicing in vertebrate genome annotation.

Adam Frankish‎ et al.
Database : the journal of biological databases and curation‎
2012‎

While alternative splicing (AS) can potentially expand the functional repertoire of vertebrate genomes, relatively few AS transcripts have been experimentally characterized. We describe our detailed manual annotation of vertebrate genomes, which is generating a publicly available geneset rich in AS. In order to achieve this we have adopted a highly sensitive approach to annotating gene models supported by correctly mapped, canonically spliced transcriptional evidence combined with a highly cautious approach to adding unsupported extensions to models and making decisions on their functional potential. We use information about the predicted functional potential and structural properties of every AS transcript annotated at a protein-coding or non-coding locus to place them into one of eleven subclasses. We describe the incorporation of new sequencing and proteomics technologies into our annotation pipelines, which are used to identify and validate AS. Combining all data sources has led to the production of a rich geneset containing an average of 6.3 AS transcripts for every human multi-exon protein-coding gene. The datasets produced have proved very useful in providing context to studies investigating the functional potential of genes and the effect of variation may have on gene structure and function. DATABASE URL: http://www.ensembl.org/index.html, http://vega.sanger.ac.uk/index.html.

The Vertebrate Genome Annotation browser 10 years on.

Jennifer L Harrow‎ et al.
Nucleic acids research‎
2014‎

The Vertebrate Genome Annotation (VEGA) database (http://vega.sanger.ac.uk), initially designed as a community resource for browsing manual annotation of the human genome project, now contains five reference genomes (human, mouse, zebrafish, pig and rat). Its introduction pages have been redesigned to enable the user to easily navigate between whole genomes and smaller multi-species haplotypic regions of interest such as the major histocompatibility complex. The VEGA browser is unique in that annotation is updated via the Human And Vertebrate Analysis aNd Annotation (HAVANA) update track every 2 weeks, allowing single gene updates to be made publicly available to the research community quickly. The user can now access different haplotypic subregions more easily, such as those from the non-obese diabetic mouse, and display them in a more intuitive way using the comparative tools. We also highlight how the user can browse manually annotated updated patches from the Genome Reference Consortium (GRC).

Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene.

Mar Gonzàlez-Porta‎ et al.
Genome biology‎
2013‎

RNA sequencing has opened new avenues for the study of transcriptome composition. Significant evidence has accumulated showing that the human transcriptome contains in excess of a hundred thousand different transcripts. However, it is still not clear to what extent this diversity prevails when considering the relative abundances of different transcripts from the same gene.

Ensembl 2018.

Daniel R Zerbino‎ et al.
Nucleic acids research‎
2018‎

The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.

Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

Integrative transcriptomic analysis suggests new autoregulatory splicing events coupled with nonsense-mediated mRNA decay.

Current status and new features of the Consensus Coding Sequence database.

Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes.

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

GENCODE 2021.

Standardized annotation of translated open reading frames.

GENCODE reference annotation for the human and mouse genomes.

Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction.

High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing.

Genome-wide association study: Exploring the genetic basis for responsiveness to ketogenic dietary therapies for drug-resistant epilepsy.

The GENCODE pseudogene resource.

Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates.

Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment.

Progress, Challenges, and Surprises in Annotating the Human Genome.

Cell type-specific novel long non-coding RNA and circular RNA in the BLUEPRINT hematopoietic transcriptomes atlas.

Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome.

The importance of identifying alternative splicing in vertebrate genome annotation.

The Vertebrate Genome Annotation browser 10 years on.

Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene.

Ensembl 2018.

SciCrunch.org Resources

Navigation

Logging in and Registering

Searching

Save Your Search

Query Expansion

Collections

Facets

Options

Further Questions

About

Recent News Entries

Contact Us

SciCrunch

Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

Log in

Log in

Literature

Current Facets and Filters

Options

Facets

Recent searches

.in-collection { color: green; } Integrative transcriptomic analysis suggests new autoregulatory splicing events coupled with nonsense-mediated mRNA decay.

.in-collection { color: green; } Current status and new features of the Consensus Coding Sequence database.

.in-collection { color: green; } Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes.

.in-collection { color: green; } Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

.in-collection { color: green; } GENCODE 2021.

.in-collection { color: green; } Standardized annotation of translated open reading frames.

.in-collection { color: green; } GENCODE reference annotation for the human and mouse genomes.

.in-collection { color: green; } Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction.

.in-collection { color: green; } High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing.

.in-collection { color: green; } Genome-wide association study: Exploring the genetic basis for responsiveness to ketogenic dietary therapies for drug-resistant epilepsy.

.in-collection { color: green; } The GENCODE pseudogene resource.

.in-collection { color: green; } Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates.

.in-collection { color: green; } Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment.

.in-collection { color: green; } Progress, Challenges, and Surprises in Annotating the Human Genome.

.in-collection { color: green; } Cell type-specific novel long non-coding RNA and circular RNA in the BLUEPRINT hematopoietic transcriptomes atlas.

.in-collection { color: green; } Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome.

.in-collection { color: green; } The importance of identifying alternative splicing in vertebrate genome annotation.

.in-collection { color: green; } The Vertebrate Genome Annotation browser 10 years on.

.in-collection { color: green; } Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene.

.in-collection { color: green; } Ensembl 2018.

SciCrunch.org Resources

Navigation

Logging in and Registering

Searching

Save Your Search

Query Expansion

Collections

Facets

Options

Further Questions

Publications Per Year

About

Recent News Entries

Contact Us

SciCrunch

Integrative transcriptomic analysis suggests new autoregulatory splicing events coupled with nonsense-mediated mRNA decay.

Current status and new features of the Consensus Coding Sequence database.

Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes.

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

GENCODE 2021.

Standardized annotation of translated open reading frames.

GENCODE reference annotation for the human and mouse genomes.

Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction.

High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing.

Genome-wide association study: Exploring the genetic basis for responsiveness to ketogenic dietary therapies for drug-resistant epilepsy.

The GENCODE pseudogene resource.

Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates.

Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment.

Progress, Challenges, and Surprises in Annotating the Human Genome.

Cell type-specific novel long non-coding RNA and circular RNA in the BLUEPRINT hematopoietic transcriptomes atlas.

Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome.

The importance of identifying alternative splicing in vertebrate genome annotation.

The Vertebrate Genome Annotation browser 10 years on.

Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene.

Ensembl 2018.