FDI Lab - SciCrunch.org | Searching in Literature

The CHEMDNER corpus of chemicals and drugs and its annotation principles.

Martin Krallinger‎ et al.
Journal of cheminformatics‎
2015‎

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/.

Reviewer-coerced citation: case report, update on journal policy and suggestions for future prevention.

Jonathan D Wren‎ et al.
Bioinformatics (Oxford, England)‎
2019‎

No abstract available

The BLUEPRINT Data Analysis Portal.

José María Fernández‎ et al.
Cell systems‎
2016‎

The impact of large and complex epigenomic datasets on biological insights or clinical applications is limited by the lack of accessibility by easy, intuitive, and fast tools. Here, we describe an epigenomics comparative cyber-infrastructure (EPICO), an open-access reference set of libraries to develop comparative epigenomic data portals. Using EPICO, large epigenome projects can make available their rich datasets to the community without requiring specific technical skills. As a first instance of EPICO, we implemented the BLUEPRINT Data Analysis Portal (BDAP). BDAP provides a desktop for the comparative analysis of epigenomes of hematopoietic cell types based on results, such as the position of epigenetic features, from basic analysis pipelines. The BDAP interface facilitates interactive exploration of genomic regions, genes, and pathways in the context of differentiation of hematopoietic lineages. This work represents initial steps toward broadly accessible integrative analysis of epigenomic data across international consortia. EPICO can be accessed at https://github.com/inab, and BDAP can be accessed at http://blueprint-data.bsc.es.

Transcriptome characterization by RNA sequencing identifies a major molecular and clinical subdivision in chronic lymphocytic leukemia.

Pedro G Ferreira‎ et al.
Genome research‎
2014‎

Chronic lymphocytic leukemia (CLL) has heterogeneous clinical and biological behavior. Whole-genome and -exome sequencing has contributed to the characterization of the mutational spectrum of the disease, but the underlying transcriptional profile is still poorly understood. We have performed deep RNA sequencing in different subpopulations of normal B-lymphocytes and CLL cells from a cohort of 98 patients, and characterized the CLL transcriptional landscape with unprecedented resolution. We detected thousands of transcriptional elements differentially expressed between the CLL and normal B cells, including protein-coding genes, noncoding RNAs, and pseudogenes. Transposable elements are globally derepressed in CLL cells. In addition, two thousand genes-most of which are not differentially expressed-exhibit CLL-specific splicing patterns. Genes involved in metabolic pathways showed higher expression in CLL, while genes related to spliceosome, proteasome, and ribosome were among the most down-regulated in CLL. Clustering of the CLL samples according to RNA-seq derived gene expression levels unveiled two robust molecular subgroups, C1 and C2. C1/C2 subgroups and the mutational status of the immunoglobulin heavy variable (IGHV) region were the only independent variables in predicting time to treatment in a multivariate analysis with main clinico-biological features. This subdivision was validated in an independent cohort of patients monitored through DNA microarrays. Further analysis shows that B-cell receptor (BCR) activation in the microenvironment of the lymph node may be at the origin of the C1/C2 differences.

Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes.

Iakes Ezkurdia‎ et al.
Human molecular genetics‎
2014‎

Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort.

Getting personalized cancer genome analysis into the clinic: the challenges in bioinformatics.

Alfonso Valencia‎ et al.
Genome medicine‎
2012‎

Progress in genomics has raised expectations in many fields, and particularly in personalized cancer research. The new technologies available make it possible to combine information about potential disease markers, altered function and accessible drug targets, which, coupled with pathological and medical information, will help produce more appropriate clinical decisions. The accessibility of such experimental techniques makes it all the more necessary to improve and adapt computational strategies to the new challenges. This review focuses on the critical issues associated with the standard pipeline, which includes: DNA sequencing analysis; analysis of mutations in coding regions; the study of genome rearrangements; extrapolating information on mutations to the functional and signaling level; and predicting the effects of therapies using mouse tumor models. We describe the possibilities, limitations and future challenges of current bioinformatics strategies for each of these issues. Furthermore, we emphasize the need for the collaboration between the bioinformaticians who implement the software and use the data resources, the computational biologists who develop the analytical methods, and the clinicians, the systems' end users and those ultimately responsible for taking medical decisions. Finally, the different steps in cancer genome analysis are illustrated through examples of applications in cancer genome analysis.

Genome-wide analysis of Pax8 binding provides new insights into thyroid functions.

Sergio Ruiz-Llorente‎ et al.
BMC genomics‎
2012‎

The transcription factor Pax8 is essential for the differentiation of thyroid cells. However, there are few data on genes transcriptionally regulated by Pax8 other than thyroid-related genes. To better understand the role of Pax8 in the biology of thyroid cells, we obtained transcriptional profiles of Pax8-silenced PCCl3 thyroid cells using whole genome expression arrays and integrated these signals with global cis-regulatory sequencing studies performed by ChIP-Seq analysis

Evaluation of BioCreAtIvE assessment of task 2.

Christian Blaschke‎ et al.
BMC bioinformatics‎
2005‎

Molecular Biology accumulated substantial amounts of data concerning functions of genes and proteins. Information relating to functional descriptions is generally extracted manually from textual data and stored in biological databases to build up annotations for large collections of gene products. Those annotation databases are crucial for the interpretation of large scale analysis approaches using bioinformatics or experimental techniques. Due to the growing accumulation of functional descriptions in biomedical literature the need for text mining tools to facilitate the extraction of such annotations is urgent. In order to make text mining tools useable in real world scenarios, for instance to assist database curators during annotation of protein function, comparisons and evaluations of different approaches on full text articles are needed.

A sentence sliding window approach to extract protein annotations from biomedical articles.

Martin Krallinger‎ et al.
BMC bioinformatics‎
2005‎

Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a great ned of comparative assessment of the performance of the proposed methods and the development of common evaluation criteria. This issue was addressed by the Critical Assessment of Text Mining Methods in Molecular Biology (BioCreative) contest. The aim of this contest was to assess the performance of text mining systems applied to biomedical texts including tools which recognize named entities such as genes and proteins, and tools which automatically extract protein annotations.

The success (or not) of HUGO nomenclature.

Javier Tamames‎ et al.
Genome biology‎
2006‎

Current usage of gene nomenclature is ambiguous and impairs the efficient handling of scientific information. Therefore it is important to propose guidelines to deal with this problem. This study attempts to evaluate the success of HUGO nomenclature for human genes. The results indicate that HUGO guidelines are not supported by the scientific community.

ACRATA: a novel electron transfer domain associated to apoptosis and cancer.

Luis Sanchez-Pulido‎ et al.
BMC cancer‎
2004‎

Recently, several members of a vertebrate protein family containing a six trans-membrane (6TM) domain and involved in apoptosis and cancer (e.g. STEAP, STAMP1, TSAP6), have been identified in Golgi and cytoplasmic membranes. The exact function of these proteins remains unknown.

From cancer genomes to cancer models: bridging the gaps.

Anaïs Baudot‎ et al.
EMBO reports‎
2009‎

Cancer genome projects are now being expanded in an attempt to provide complete landscapes of the mutations that exist in tumours. Although the importance of cataloguing genome variations is well recognized, there are obvious difficulties in bridging the gaps between high-throughput resequencing information and the molecular mechanisms of cancer evolution. Here, we describe the current status of the high-throughput genomic technologies, and the current limitations of the associated computational analysis and experimental validation of cancer genetic variants. We emphasize how the current cancer-evolution models will be influenced by the high-throughput approaches, in particular through efforts devoted to monitoring tumour progression, and how, in turn, the integration of data and models will be translated into mechanistic knowledge and clinical applications.

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

ENCODE Project Consortium‎ et al.
Nature‎
2007‎

We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

iHOP web services.

José M Fernández‎ et al.
Nucleic acids research‎
2007‎

iHOP provides fast, accurate, comprehensive, and up-to-date summary information on more than 80,000 biological molecules by automatically extracting key sentences from millions of PubMed documents. Its intuitive user interface and navigation scheme have made iHOP extremely successful among biologists, counting more than 500,000 visits per month (iHOP access statistics: http://www.ihop-net.org/UniPub/iHOP/info/logs/). Here we describe a public programmatic API that enables the integration of main iHOP functionalities in bioinformatic programs and workflows.

Overview of the protein-protein interaction annotation extraction task of BioCreative II.

Martin Krallinger‎ et al.
Genome biology‎
2008‎

The biomedical literature is the primary information source for manual protein-protein interaction annotations. Text-mining systems have been implemented to extract binary protein interactions from articles, but a comprehensive comparison between the different techniques as well as with manual curation was missing.

Identifying ELIXIR Core Data Resources.

Christine Durinx‎ et al.
F1000Research‎
2016‎

The core mission of ELIXIR is to build a stable and sustainable infrastructure for biological information across Europe. At the heart of this are the data resources, tools and services that ELIXIR offers to the life-sciences community, providing stable and sustainable access to biological data. ELIXIR aims to ensure that these resources are available long-term and that the life-cycles of these resources are managed such that they support the scientific needs of the life-sciences, including biological research. ELIXIR Core Data Resources are defined as a set of European data resources that are of fundamental importance to the wider life-science community and the long-term preservation of biological data. They are complete collections of generic value to life-science, are considered an authority in their field with respect to one or more characteristics, and show high levels of scientific quality and service. Thus, ELIXIR Core Data Resources are of wide applicability and usage. This paper describes the structures, governance and processes that support the identification and evaluation of ELIXIR Core Data Resources. It identifies key indicators which reflect the essence of the definition of an ELIXIR Core Data Resource and support the promotion of excellence in resource development and operation. It describes the specific indicators in more detail and explains their application within ELIXIR's sustainability strategy and science policy actions, and in capacity building, life-cycle management and technical actions. The identification process is currently being implemented and tested for the first time. The findings and outcome will be evaluated by the ELIXIR Scientific Advisory Board in March 2017. Establishing the portfolio of ELIXIR Core Data Resources and ELIXIR Services is a key priority for ELIXIR and publicly marks the transition towards a cohesive infrastructure.

LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes.

Andres Cañada‎ et al.
Nucleic acids research‎
2017‎

A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes-CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es.

Mortality in Persons With Autism Spectrum Disorder or Attention-Deficit/Hyperactivity Disorder: A Systematic Review and Meta-analysis.

Ferrán Catalá-López‎ et al.
JAMA pediatrics‎
2022‎

Autism spectrum disorder (ASD) and attention-deficit/hyperactivity disorder (ADHD) are childhood-onset disorders that may persist into adulthood. Several studies have suggested that they may be associated with an increased risk of mortality; however, the results are inconsistent.

Identification of Plitidepsin as Potent Inhibitor of SARS-CoV-2-Induced Cytopathic Effect After a Drug Repurposing Screen.

Jordi Rodon‎ et al.
Frontiers in pharmacology‎
2021‎

There is an urgent need to identify therapeutics for the treatment of Coronavirus disease 2019 (COVID-19). Although different antivirals are given for the clinical management of SARS-CoV-2 infection, their efficacy is still under evaluation. Here, we have screened existing drugs approved for human use in a variety of diseases, to compare how they counteract SARS-CoV-2-induced cytopathic effect and viral replication in vitro. Among the potential 72 antivirals tested herein that were previously proposed to inhibit SARS-CoV-2 infection, only 18 % had an IC50 below 25 µM or 102 IU/ml. These included plitidepsin, novel cathepsin inhibitors, nelfinavir mesylate hydrate, interferon 2-alpha, interferon-gamma, fenofibrate, camostat along the well-known remdesivir and chloroquine derivatives. Plitidepsin was the only clinically approved drug displaying nanomolar efficacy. Four of these families, including novel cathepsin inhibitors, blocked viral entry in a cell-type specific manner. Since the most effective antivirals usually combine therapies that tackle the virus at different steps of infection, we also assessed several drug combinations. Although no particular synergy was found, inhibitory combinations did not reduce their antiviral activity. Thus, these combinations could decrease the potential emergence of resistant viruses. Antivirals prioritized herein identify novel compounds and their mode of action, while independently replicating the activity of a reduced proportion of drugs which are mostly approved for clinical use. Combinations of these drugs should be tested in animal models to inform the design of fast track clinical trials.

Detection of SARS-CoV-2 in a cat owned by a COVID-19-affected patient in Spain.

Joaquim Segalés‎ et al.
Proceedings of the National Academy of Sciences of the United States of America‎
2020‎

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the etiological agent of COVID-19, is considered a zoonotic pathogen mainly transmitted human to human. Few reports indicate that pets may be exposed to the virus. The present report describes a cat suffering from severe respiratory distress and thrombocytopenia living with a family with several members affected by COVID-19. Clinical signs of the cat prompted humanitarian euthanasia and a detailed postmortem investigation to assess whether a COVID-19-like disease was causing the condition. Necropsy results showed the animal suffered from feline hypertrophic cardiomyopathy and severe pulmonary edema and thrombosis. SARS-CoV-2 RNA was only detected in nasal swab, nasal turbinates, and mesenteric lymph node, but no evidence of histopathological lesions compatible with a viral infection were detected. The cat seroconverted against SARS-CoV-2, further evidencing a productive infection in this animal. We conclude that the animal had a subclinical SARS-CoV-2 infection concomitant to an unrelated cardiomyopathy that led to euthanasia.

Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

The CHEMDNER corpus of chemicals and drugs and its annotation principles.

Reviewer-coerced citation: case report, update on journal policy and suggestions for future prevention.

The BLUEPRINT Data Analysis Portal.

Transcriptome characterization by RNA sequencing identifies a major molecular and clinical subdivision in chronic lymphocytic leukemia.

Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes.

Getting personalized cancer genome analysis into the clinic: the challenges in bioinformatics.

Genome-wide analysis of Pax8 binding provides new insights into thyroid functions.

Evaluation of BioCreAtIvE assessment of task 2.

A sentence sliding window approach to extract protein annotations from biomedical articles.

The success (or not) of HUGO nomenclature.

ACRATA: a novel electron transfer domain associated to apoptosis and cancer.

From cancer genomes to cancer models: bridging the gaps.

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

iHOP web services.

Overview of the protein-protein interaction annotation extraction task of BioCreative II.

Identifying ELIXIR Core Data Resources.

LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes.

Mortality in Persons With Autism Spectrum Disorder or Attention-Deficit/Hyperactivity Disorder: A Systematic Review and Meta-analysis.

Identification of Plitidepsin as Potent Inhibitor of SARS-CoV-2-Induced Cytopathic Effect After a Drug Repurposing Screen.

Detection of SARS-CoV-2 in a cat owned by a COVID-19-affected patient in Spain.

SciCrunch.org Resources

Navigation

Logging in and Registering

Searching

Save Your Search

Query Expansion

Collections

Facets

Options

Further Questions

About

Recent News Entries

Contact Us

SciCrunch

Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

Log in

Log in

Literature

Current Facets and Filters

Options

Facets

Recent searches

.in-collection { color: green; } The CHEMDNER corpus of chemicals and drugs and its annotation principles.

.in-collection { color: green; } Reviewer-coerced citation: case report, update on journal policy and suggestions for future prevention.

.in-collection { color: green; } The BLUEPRINT Data Analysis Portal.

.in-collection { color: green; } Transcriptome characterization by RNA sequencing identifies a major molecular and clinical subdivision in chronic lymphocytic leukemia.

.in-collection { color: green; } Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes.

.in-collection { color: green; } Getting personalized cancer genome analysis into the clinic: the challenges in bioinformatics.

.in-collection { color: green; } Genome-wide analysis of Pax8 binding provides new insights into thyroid functions.

.in-collection { color: green; } Evaluation of BioCreAtIvE assessment of task 2.

.in-collection { color: green; } A sentence sliding window approach to extract protein annotations from biomedical articles.

.in-collection { color: green; } The success (or not) of HUGO nomenclature.

.in-collection { color: green; } ACRATA: a novel electron transfer domain associated to apoptosis and cancer.

.in-collection { color: green; } From cancer genomes to cancer models: bridging the gaps.

.in-collection { color: green; } Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

.in-collection { color: green; } iHOP web services.

.in-collection { color: green; } Overview of the protein-protein interaction annotation extraction task of BioCreative II.

.in-collection { color: green; } Identifying ELIXIR Core Data Resources.

.in-collection { color: green; } LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes.

.in-collection { color: green; } Mortality in Persons With Autism Spectrum Disorder or Attention-Deficit/Hyperactivity Disorder: A Systematic Review and Meta-analysis.

.in-collection { color: green; } Identification of Plitidepsin as Potent Inhibitor of SARS-CoV-2-Induced Cytopathic Effect After a Drug Repurposing Screen.

.in-collection { color: green; } Detection of SARS-CoV-2 in a cat owned by a COVID-19-affected patient in Spain.

SciCrunch.org Resources

Navigation

Logging in and Registering

Searching

Save Your Search

Query Expansion

Collections

Facets

Options

Further Questions

Publications Per Year

About

Recent News Entries

Contact Us

SciCrunch

The CHEMDNER corpus of chemicals and drugs and its annotation principles.

Reviewer-coerced citation: case report, update on journal policy and suggestions for future prevention.

The BLUEPRINT Data Analysis Portal.

Transcriptome characterization by RNA sequencing identifies a major molecular and clinical subdivision in chronic lymphocytic leukemia.

Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes.

Getting personalized cancer genome analysis into the clinic: the challenges in bioinformatics.

Genome-wide analysis of Pax8 binding provides new insights into thyroid functions.

Evaluation of BioCreAtIvE assessment of task 2.

A sentence sliding window approach to extract protein annotations from biomedical articles.

The success (or not) of HUGO nomenclature.

ACRATA: a novel electron transfer domain associated to apoptosis and cancer.

From cancer genomes to cancer models: bridging the gaps.

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

iHOP web services.

Overview of the protein-protein interaction annotation extraction task of BioCreative II.

Identifying ELIXIR Core Data Resources.

LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes.

Mortality in Persons With Autism Spectrum Disorder or Attention-Deficit/Hyperactivity Disorder: A Systematic Review and Meta-analysis.

Identification of Plitidepsin as Potent Inhibitor of SARS-CoV-2-Induced Cytopathic Effect After a Drug Repurposing Screen.

Detection of SARS-CoV-2 in a cat owned by a COVID-19-affected patient in Spain.