Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Wikidata as a semantic framework for the Gene Wiki initiative.

Database : the journal of biological databases and curation | 2016

Open biological data are distributed over many resources making them challenging to integrate, to update and to disseminate quickly. Wikidata is a growing, open community database which can serve this purpose and also provides tight integration with Wikipedia. In order to improve the state of biological data, facilitate data management and dissemination, we imported all human and mouse genes, and all human and mouse proteins into Wikidata. In total, 59,721 human genes and 73,355 mouse genes have been imported from NCBI and 27,306 human proteins and 16,728 mouse proteins have been imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, our corpus of imported data serves as the starting point for integration of further data by scientists, the Wikidata community and citizen scientists alike. The first use case for these data is to populate Wikipedia Gene Wiki infoboxes directly from Wikidata with the data integrated above. This enables immediate updates of the Gene Wiki infoboxes as soon as the data in Wikidata are modified. Although Gene Wiki pages are currently only on the English language version of Wikipedia, the multilingual nature of Wikidata allows for usage of the data we imported in all 280 different language Wikipedias. Apart from the Gene Wiki infobox use case, a SPARQL endpoint and exporting functionality to several standard formats (e.g. JSON, XML) enable use of the data by scientists. In summary, we created a fully open and extensible data resource for human and mouse molecular biology and biochemistry data. This resource enriches all the Wikipedias with structured information and serves as a new linking hub for the biological semantic web. Database URL: https://www.wikidata.org/.

Pubmed ID: 26989148 RIS Download

Research resources used in this publication

None found

Additional research tools detected in this publication

Antibodies used in this publication

None found

Associated grants

  • Agency: NIDA NIH HHS, United States
    Id: DA036134
  • Agency: NIGMS NIH HHS, United States
    Id: GM114833
  • Agency: NIGMS NIH HHS, United States
    Id: R01 GM089820
  • Agency: NIDA NIH HHS, United States
    Id: U54 DA036134
  • Agency: NIMH NIH HHS, United States
    Id: R01 MH111099
  • Agency: NIGMS NIH HHS, United States
    Id: GM089820
  • Agency: NIGMS NIH HHS, United States
    Id: GM083924

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


Python Programming Language (tool)

RRID:SCR_008394

Programming language for all operating systems that lets users work more quickly and integrate their systems more effectively. Often compared to Tcl, Perl, Ruby, Scheme or Java. Some of its key distinguishing features include very clear and readable syntax, strong introspection capabilities, intuitive object orientation, natural expression of procedural code, full modularity, exception-based error handling, high level dynamic data types, extensive standard libraries and third party modules for virtually every task, extensions and modules easily written in C, C (or Java for Python, or .NET languages for IronPython), and embeddable within applications as a scripting interface.

View all literature mentions