A semantically annotated corpus of 240 MEDLINE abstracts (167 on the subject of E. coli species and 73 on the subject of the Human species) intended for training information extraction (IE) systems and/or resources which are used to extract events from biomedical literature.
The corpus has been manually annotated with events relating to gene regulation by biologists. Each event is centered on either a verb (e.g. transcribe) or nominalized verb (e.g. transcription) and annotation consists of identifying, as exhaustively as possible, the structurally-related arguments of the verb or nominalized verb within the same sentence. Each event argument is then assigned the following information:
* A semantic role from a fixed set of 13 roles which are tailored to the biomedical domain.
* A biomedical concept type (where appropriate).
The corpus in available for download in 2 formats:
* A standoff format, based on the BioNLP'09 Shared Task format
* An XML format, based on the GENIA event annotation format
Resource Type: Resource
Version: Latest Version
annotation, information extraction, text mining, semantic role, semantic search, gene, computational linguistics, gene regulation
FORCE11, Beyond the pdf
human, escherichia coli
Gene Event Regulation Corpus
Additional Resource Types
training set, data or information resource
Creative Commons Attribution-NonCommercial-ShareAlike License, v3 Unported, For Copyright of abstracts refer to PubMed.
Created 4 years ago by Anonymous