Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Ontology application and use at the ENCODE DCC.

Database : the journal of biological databases and curation | 2015

The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a catalog of genomic annotations. To date, the project has generated over 4000 experiments across more than 350 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory network and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All ENCODE experimental data, metadata and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage and distribution to community resources and the scientific community. As the volume of data increases, the organization of experimental details becomes increasingly complicated and demands careful curation to identify related experiments. Here, we describe the ENCODE DCC's use of ontologies to standardize experimental metadata. We discuss how ontologies, when used to annotate metadata, provide improved searching capabilities and facilitate the ability to find connections within a set of experiments. Additionally, we provide examples of how ontologies are used to annotate ENCODE metadata and how the annotations can be identified via ontology-driven searches at the ENCODE portal. As genomic datasets grow larger and more interconnected, standardization of metadata becomes increasingly vital to allow for exploration and comparison of data between different scientific projects.

Pubmed ID: 25776021 RIS Download

Associated grants

  • Agency: NHGRI NIH HHS, United States
    Id: U41 HG006992
  • Agency: NIGMS NIH HHS, United States
    Id: GM10331601
  • Agency: NHGRI NIH HHS, United States
    Id: HG006992

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


CHEBI (tool)

RRID:SCR_002088

Collection of chemical compounds and other small molecular entities that incorporates an ontological classification of chemical compounds of biological relevance, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified. The molecular entities in question are either products of nature or synthetic products used to intervene in the processes of living organisms.

View all literature mentions

BioPortal (tool)

RRID:SCR_002713

Open repository of biomedical ontologies that provides access via Web browsers and Web services to ontologies. It supports ontologies in OBO format, OWL, RDF, Rich Release Format (RRF), Protege frames, and LexGrid XML. Functionality includes the ability to browse, search and visualize ontologies as well as to comment on, and create mappings for ontologies. Any registered user can submit an ontology. The NCBO Annotator and NCBO Resource Index can also be accessed via BioPortal. Additional features: * Add Reviews: rate the ontology according to several criteria and describe your experience using the ontology. * Add Mappings: submit point-to-point mappings or upload bulk mappings created with external tools. Notification of new Mappings is RSS-enabled and Mappings can be browsed via BioPortal and accessed via Web services. * NCBO Annotator: Tool that tags free text with ontology terms. NCBO uses the Annotator to generate ontology annotations, creating an ontology index of these resources accessible via the NCBO Resource Index. The Annotator can be accessed through BioPortal or directly as a Web service. The annotation workflow is based on syntactic concept recognition (using the preferred name and synonyms for terms) and on a set of semantic expansion algorithms that leverage the ontology structure (e.g., is_a relations). * NCBO Resource Index: The NCBO Resource Index is a system for ontology based annotation and indexing of biomedical data; the key functionality of this system is to enable users to locate biomedical data linked via ontology terms. A set of annotations is generated automatically, using the NCBO Annotator, and presented in BioPortal. This service uses a concept recognizer (developed by the National Center for Integrative Biomedical Informatics, University of Michigan) to produce a set of annotations and expand them using ontology is_a relations. * Web services: Documentation on all Web services and example code is available at: BioPortal Web services.

View all literature mentions

Protege (tool)

RRID:SCR_003299

Protege is a free, open-source platform that provides a growing user community with a suite of tools to construct domain models and knowledge-based applications with ontologies. At its core, Protege implements a rich set of knowledge-modeling structures and actions that support the creation, visualization, and manipulation of ontologies in various representation formats. Protege can be customized to provide domain-friendly support for creating knowledge models and entering data. Further, Protege can be extended by way of a plug-in architecture and a Java-based Application Programming Interface (API) for building knowledge-based tools and applications. An ontology describes the concepts and relationships that are important in a particular domain, providing a vocabulary for that domain as well as a computerized specification of the meaning of terms used in the vocabulary. Ontologies range from taxonomies and classifications, database schemas, to fully axiomatized theories. In recent years, ontologies have been adopted in many business and scientific communities as a way to share, reuse and process domain knowledge. Ontologies are now central to many applications such as scientific knowledge portals, information management and integration systems, electronic commerce, and semantic web services. The Protege platform supports two main ways of modeling ontologies: * The Protege-Frames editor enables users to build and populate ontologies that are frame-based, in accordance with the Open Knowledge Base Connectivity protocol (OKBC). In this model, an ontology consists of a set of classes organized in a subsumption hierarchy to represent a domain's salient concepts, a set of slots associated to classes to describe their properties and relationships, and a set of instances of those classes - individual exemplars of the concepts that hold specific values for their properties. * The Protege-OWL editor enables users to build ontologies for the Semantic Web, in particular in the W3C's Web Ontology Language (OWL). An OWL ontology may include descriptions of classes, properties and their instances. Given such an ontology, the OWL formal semantics specifies how to derive its logical consequences, i.e. facts not literally present in the ontology, but entailed by the semantics. These entailments may be based on a single document or multiple distributed documents that have been combined using defined OWL mechanisms (see the OWL Web Ontology Language Guide). Protege is based on Java, is extensible, and provides a plug-and-play environment that makes it a flexible base for rapid prototyping and application development.

View all literature mentions

Experimental Factor Ontology (tool)

RRID:SCR_003574

An application focused ontology modelling the experimental factors in ArrayExpress and Gene Expression Atlas. It has been developed to increase the richness of the annotations that are currently made in the ArrayExpress repository, to promote consistent annotation, to facilitate automatic annotation and to integrate external data. The ontology describes cross-product classes from reference ontologies in area such as disease, cell line, cell type and anatomy. The methodology employed in the development of EFO involves construction of mappings to multiple existing domain specific ontologies, such as the Disease Ontology and Cell Type Ontology. This is achieved using a combination of automated and manual curation steps and the use of a phonetic matching algorithm. The ontology is evaluated with use cases from the ArrayExpress repository and ArrayExpress Atlas. You may also browse the EFO in the NCBO Bioportal. Term submissions are welcome.

View all literature mentions

Cell Type Ontology (tool)

RRID:SCR_004251

Ontology designed as a structured controlled vocabulary for cell types. It was constructed for use by the model organism and other bioinformatics databases. It includes cell types from prokaryotes, mammals, and fungi. The ontology is available in the formats adopted by the Open Biological Ontologies umbrella and is designed to be used in the context of model organism genome and other biological databases.

View all literature mentions

SO (tool)

RRID:SCR_004374

A collaborative ontology for the definition of sequence features used in biological sequence annotation. SO was initially developed by the Gene Ontology Consortium. Contributors to SO include the GMOD community, model organism database groups such as WormBase, FlyBase, Mouse Genome Informatics group, and institutes such as the Sanger Institute and the EBI. Input to SO is welcomed from the sequence annotation community. The OBO revision is available here: http://sourceforge.net/p/song/svn/HEAD/tree/ SO includes different kinds of features which can be located on the sequence. Biological features are those which are defined by their disposition to be involved in a biological process. Biomaterial features are those which are intended for use in an experiment such as aptamer and PCR_product. There are also experimental features which are the result of an experiment. SO also provides a rich set of attributes to describe these features such as polycistronic and maternally imprinted. The Sequence Ontologies use the OBO flat file format specification version 1.2, developed by the Gene Ontology Consortium. The ontology is also available in OWL from Open Biomedical Ontologies. This is updated nightly and may be slightly out of sync with the current obo file. An OWL version of the ontology is also available. The resolvable URI for the current version of SO is http://purl.obolibrary.org/obo/so.owl.

View all literature mentions

NCBI BioSample (tool)

RRID:SCR_004854

Database containing descriptions of biological source materials used in experimental assays. Sources include: GenBank, Sequence Read Archive (SRA), Coriell, ATCC. Submissions are supported by a web-based Submission Portal that guides users through a series of forms for input of rich metadata describing their samples. As the capacity and complexity of biological data sets expands, databases face new challenges in ensuring that the information is adequately organized and described. The NCBI BioSample database is being developed to help address the challenges by providing the means by which data generators can organize and describe a broad range of sample types, and link to corresponding sets of experimental data in archival databases.

View all literature mentions

BioSample Database at EBI (tool)

RRID:SCR_004856

Database that aggregates sample information for reference samples (e.g. Coriell Cell lines) and samples for which data exist in one of the EBI''''s assay databases such as ArrayExpress, the European Nucleotide Archive or PRoteomics Identificates DatabasE. It provides links to assays for specific samples, and accepts direct submissions of sample information. The goals of the BioSample Database include: # recording and linking of sample information consistently within EBI databases such as ENA, ArrayExpress and PRIDE; # minimizing data entry efforts for EBI database submitters by enabling submitting sample descriptions once and referencing them later in data submissions to assay databases and # supporting cross database queries by sample characteristics. The database includes a growing set of reference samples, such as cell lines, which are repeatedly used in experiments and can be easily referenced from any database by their accession numbers. Accession numbers for the reference samples will be exchanged with a similar database at NCBI. The samples in the database can be queried by their attributes, such as sample types, disease names or sample providers. A simple tab-delimited format facilitates submissions of sample information to the database, initially via email to biosamples (at) ebi.ac.uk. Current data sources: * European Nucleotide Archive (424,811 samples) * PRIDE (17,001 samples) * ArrayExpress (1,187,884 samples) * ENCODE cell lines (119 samples) * CORIELL cell lines (27,002 samples) * Thousand Genome (2,628 samples) * HapMap (1,417 samples) * IMSR (248,660 samples)

View all literature mentions

Ontology for Biomedical Investigations (tool)

RRID:SCR_006266

An ontology for the description of biological and clinical investigations built with international, collaborative effort. The ontology represents the design of an investigation, the protocols and instrumentation used, the material used, the data generated and the type analysis performed on it. This includes a set of universal terms that are applicable across various biological and technological domains, and domain-specific terms relevant only to a given domain. Currently OBI is being built under the Basic Formal Ontology (BFO). This project was formerly titled the Functional Genomics Investigation Ontology (FuGO) project.

View all literature mentions

Ontology Lookup Service (tool)

RRID:SCR_006596

Interactive and programmatic interfaces to query, browse and navigate an increasing number of biomedical ontologies and controlled vocabularies. It provides a web service interface to query multiple ontologies from a single location with a unified output format. It can integrate any ontology available in the Open Biomedical Ontology (OBO) format. The database can be queried to obtain information on a single term or to browse a complete ontology using AJAX. Auto-completion provides a user-friendly search mechanism. An AJAX-based ontology viewer is available to browse a complete ontology or subsets of it. A weekly MySQL database export file can be downloaded from the EBI public FTP directory.

View all literature mentions

EDAM Ontology (tool)

RRID:SCR_006620

An ontology of bioinformatics operations (tool, application, or workflow functions), types of data including identifiers, topics (application domains), and data formats. The applications of EDAM are within organizing tools and data, finding suitable tools in catalogues, and integrating them into complex applications or workflows. Semantic annotations with EDAM are applicable to diverse entities such as for example Web services, databases, programmatic libraries, standalone tools and toolkits, interactive applications, data schemas, data sets, or publications within bioinformatics. Annotation with EDAM may also contribute to data provenance, and EDAM terms and synonyms can be used in text mining. EDAM - and in particular the EDAM Data sub-ontology - serves also as a markup vocabulary for bioinformatics data on the Semantic Web.

View all literature mentions

UBERON (tool)

RRID:SCR_010668

An integrated cross-species anatomy ontology representing a variety of entities classified according to traditional anatomical criteria such as structure, function and developmental lineage. The ontology includes comprehensive relationships to taxon-specific anatomical ontologies, allowing integration of functional, phenotype and expression data. Uberon consists of over 10000 classes (March 2014) representing structures that are shared across a variety of metazoans. The majority of these classes are chordate specific, and there is large bias towards model organisms and human.

View all literature mentions