FDI Lab - SciCrunch.org | Searching in Literature

Ensembl 2011.

Paul Flicek‎ et al.
Nucleic acids research‎
2011‎

The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish reflecting the popularity and importance of these species in biomedical research. As of Ensembl release 59 (August 2010), 56 species are supported of which 5 have been added in the past year. Since our previous report, we have substantially improved the presentation and integration of both data of disease relevance and the regulatory state of different cell types.

Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis.

Rami A Dalloul‎ et al.
PLoS biology‎
2010‎

A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.

Current status and new features of the Consensus Coding Sequence database.

Catherine M Farrell‎ et al.
Nucleic acids research‎
2014‎

The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.

Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes.

Iakes Ezkurdia‎ et al.
Human molecular genetics‎
2014‎

Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort.

Ensembl 2015.

Fiona Cunningham‎ et al.
Nucleic acids research‎
2015‎

Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license.

Ensembl 2017.

Bronwen L Aken‎ et al.
Nucleic acids research‎
2017‎

Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.

Ensembl 2016.

Andrew Yates‎ et al.
Nucleic acids research‎
2016‎

The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license.

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

ENCODE Project Consortium‎ et al.
Nature‎
2007‎

We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

Integrative transcriptomic analysis suggests new autoregulatory splicing events coupled with nonsense-mediated mRNA decay.

Dmitri Pervouchine‎ et al.
Nucleic acids research‎
2019‎

Nonsense-mediated decay (NMD) is a eukaryotic mRNA surveillance system that selectively degrades transcripts with premature termination codons (PTC). Many RNA-binding proteins (RBP) regulate their expression levels by a negative feedback loop, in which RBP binds its own pre-mRNA and causes alternative splicing to introduce a PTC. We present a bioinformatic analysis integrating three data sources, eCLIP assays for a large RBP panel, shRNA inactivation of NMD pathway, and shRNA-depletion of RBPs followed by RNA-seq, to identify novel such autoregulatory feedback loops. We show that RBPs frequently bind their own pre-mRNAs, their exons respond prominently to NMD pathway disruption, and that the responding exons are enriched with nearby eCLIP peaks. We confirm previously proposed models of autoregulation in SRSF7 and U2AF1 genes and present two novel models, in which (i) SFPQ binds its mRNA and promotes switching to an alternative distal 3'-UTR that is targeted by NMD, and (ii) RPS3 binding activates a poison 5'-splice site in its pre-mRNA that leads to a frame shift and degradation by NMD. We also suggest specific splicing events that could be implicated in autoregulatory feedback loops in RBM39, HNRNPM, and U2AF2 genes. The results are available through a UCSC Genome Browser track hub.

GENCODE 2021.

Adam Frankish‎ et al.
Nucleic acids research‎
2021‎

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.

Ensembl Genomes 2022: an expanding genome resource for non-vertebrates.

Andrew D Yates‎ et al.
Nucleic acids research‎
2022‎

Ensembl Genomes (https://www.ensemblgenomes.org) provides access to non-vertebrate genomes and analysis complementing vertebrate resources developed by the Ensembl project (https://www.ensembl.org). The two resources collectively present genome annotation through a consistent set of interfaces spanning the tree of life presenting genome sequence, annotation, variation, transcriptomic data and comparative analysis. Here, we present our largest increase in plant, metazoan and fungal genomes since the project's inception creating one of the world's most comprehensive genomic resources and describe our efforts to reduce genome redundancy in our Bacteria portal. We detail our new efforts in gene annotation, our emerging support for pangenome analysis, our efforts to accelerate data dissemination through the Ensembl Rapid Release resource and our new AlphaFold visualization. Finally, we present details of our future plans including updates on our integration with Ensembl, and how we plan to improve our support for the microbial research community. Software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license). Data updates are synchronised with Ensembl's release cycle.

Standardized annotation of translated open reading frames.

Jonathan M Mudge‎ et al.
Nature biotechnology‎
2022‎

No abstract available

Venetoclax-based low intensity therapy in molecular failure of NPM1-mutated AML.

Carlos Jimenez-Chillon‎ et al.
Blood advances‎
2024‎

Molecular failure in NPM1-mutated acute myeloid leukemia (AML) inevitably progresses to frank relapse if untreated. Recently published small case series show that venetoclax combined with low-dose cytarabine or azacitidine can reduce or eliminate measurable residual disease (MRD). Here, we report on an international multicenter cohort of 79 patients treated for molecular failure with venetoclax combinations and report an overall molecular response (≥1-log reduction in MRD) in 66 patients (84%) and MRD negativity in 56 (71%). Eighteen of 79 patients (23%) required hospitalization, and no deaths were reported during treatment. Forty-one patients were bridged to allogeneic transplant with no further therapy, and 25 of 41 were MRD negative assessed by reverse transcription quantitative polymerase chain reaction before transplant. Overall survival (OS) for the whole cohort at 2 years was 67%, event-free survival (EFS) was 45%, and in responding patients, there was no difference in survival in those who received a transplant using time-dependent analysis. Presence of FLT3-ITD mutation was associated with a lower response rate (64 vs 91%; P < .01), worse OS (hazard ratio [HR], 2.50; 95% confidence interval [CI], 1.06-5.86; P = .036), and EFS (HR, 1.87; 95% CI, 1.06-3.28; P = .03). Eighteen of 35 patients who did not undergo transplant became MRD negative and stopped treatment after a median of 10 months, with 2-year molecular relapse free survival of 62% from the end of treatment. Venetoclax-based low intensive chemotherapy is a potentially effective treatment for molecular relapse in NPM1-mutated AML, either as a bridge to transplant or as definitive therapy.

Ensembl 2012.

Paul Flicek‎ et al.
Nucleic acids research‎
2012‎

The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.

Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates.

Zhengdong D Zhang‎ et al.
Genome biology‎
2010‎

Unitary pseudogenes are a class of unprocessed pseudogenes without functioning counterparts in the genome. They constitute only a small fraction of annotated pseudogenes in the human genome. However, as they represent distinct functional losses over time, they shed light on the unique features of humans in primate evolution.

Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction.

Adam Frankish‎ et al.
BMC genomics‎
2015‎

A vast amount of DNA variation is being identified by increasingly large-scale exome and genome sequencing projects. To be useful, variants require accurate functional annotation and a wide range of tools are available to this end. McCarthy et al recently demonstrated the large differences in prediction of loss-of-function (LoF) variation when RefSeq and Ensembl transcripts are used for annotation, highlighting the importance of the reference transcripts on which variant functional annotation is based.

The GENCODE pseudogene resource.

Baikang Pei‎ et al.
Genome biology‎
2012‎

Pseudogenes have long been considered as nonfunctional genomic sequences. However, recent evidence suggests that many of them might have some form of biological activity, and the possibility of functionality has increased interest in their accurate annotation and integration with functional genomics data.

Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment.

Vladimir B Bajic‎ et al.
Genome biology‎
2006‎

This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptual strategies used to correlate produced promoter predictions with the manually annotated 5' gene ends.

GENCODE reference annotation for the human and mouse genomes.

Adam Frankish‎ et al.
Nucleic acids research‎
2019‎

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.

Progress, Challenges, and Surprises in Annotating the Human Genome.

Daniel R Zerbino‎ et al.
Annual review of genomics and human genetics‎
2020‎

Our understanding of the human genome has continuously expanded since its draft publication in 2001. Over the years, novel assays have allowed us to progressively overlay layers of knowledge above the raw sequence of A's, T's, G's, and C's. The reference human genome sequence is now a complex knowledge base maintained under the shared stewardship of multiple specialist communities. Its complexity stems from the fact that it is simultaneously a template for transcription, a record of evolution, a vehicle for genetics, and a functional molecule. In short, the human genome serves as a frame of reference at the intersection of a diversity of scientific fields. In recent years, the progressive fall in sequencing costs has given increasing importance to the quality of the human reference genome, as hundreds of thousands of individuals are being sequenced yearly, often for clinical applications. Also, novel sequencing-based assays shed light on novel functions of the genome, especially with respect to gene expression regulation. Keeping the human genome annotation up to date and accurate is therefore an ongoing partnership between reference annotation projects and the greater community worldwide.

Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

Ensembl 2011.

Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis.

Current status and new features of the Consensus Coding Sequence database.

Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes.

Ensembl 2015.

Ensembl 2017.

Ensembl 2016.

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Integrative transcriptomic analysis suggests new autoregulatory splicing events coupled with nonsense-mediated mRNA decay.

GENCODE 2021.

Ensembl Genomes 2022: an expanding genome resource for non-vertebrates.

Standardized annotation of translated open reading frames.

Venetoclax-based low intensity therapy in molecular failure of NPM1-mutated AML.

Ensembl 2012.

Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates.

Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction.

The GENCODE pseudogene resource.

Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment.

GENCODE reference annotation for the human and mouse genomes.

Progress, Challenges, and Surprises in Annotating the Human Genome.

SciCrunch.org Resources

Navigation

Logging in and Registering

Searching

Save Your Search

Query Expansion

Collections

Facets

Options

Further Questions

About

Recent News Entries

Contact Us

SciCrunch

Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

Log in

Log in

Literature

Current Facets and Filters

Options

Facets

Recent searches

.in-collection { color: green; } Ensembl 2011.

.in-collection { color: green; } Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis.

.in-collection { color: green; } Current status and new features of the Consensus Coding Sequence database.

.in-collection { color: green; } Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes.

.in-collection { color: green; } Ensembl 2015.

.in-collection { color: green; } Ensembl 2017.

.in-collection { color: green; } Ensembl 2016.

.in-collection { color: green; } Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

.in-collection { color: green; } Integrative transcriptomic analysis suggests new autoregulatory splicing events coupled with nonsense-mediated mRNA decay.

.in-collection { color: green; } GENCODE 2021.

.in-collection { color: green; } Ensembl Genomes 2022: an expanding genome resource for non-vertebrates.

.in-collection { color: green; } Standardized annotation of translated open reading frames.

.in-collection { color: green; } Venetoclax-based low intensity therapy in molecular failure of NPM1-mutated AML.

.in-collection { color: green; } Ensembl 2012.

.in-collection { color: green; } Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates.

.in-collection { color: green; } Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction.

.in-collection { color: green; } The GENCODE pseudogene resource.

.in-collection { color: green; } Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment.

.in-collection { color: green; } GENCODE reference annotation for the human and mouse genomes.

.in-collection { color: green; } Progress, Challenges, and Surprises in Annotating the Human Genome.

SciCrunch.org Resources

Navigation

Logging in and Registering

Searching

Save Your Search

Query Expansion

Collections

Facets

Options

Further Questions

Publications Per Year

About

Recent News Entries

Contact Us

SciCrunch

Ensembl 2011.

Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis.

Current status and new features of the Consensus Coding Sequence database.

Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes.

Ensembl 2015.

Ensembl 2017.

Ensembl 2016.

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Integrative transcriptomic analysis suggests new autoregulatory splicing events coupled with nonsense-mediated mRNA decay.

GENCODE 2021.

Ensembl Genomes 2022: an expanding genome resource for non-vertebrates.

Standardized annotation of translated open reading frames.

Venetoclax-based low intensity therapy in molecular failure of NPM1-mutated AML.

Ensembl 2012.

Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates.

Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction.

The GENCODE pseudogene resource.

Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment.

GENCODE reference annotation for the human and mouse genomes.

Progress, Challenges, and Surprises in Annotating the Human Genome.