ODDPub

02:13pm January 5, 2021
Anita Bandrowski

Text mining

ODDPub

Open Data: F1: 0.74, Sensitivity: 0.73, Specificity: 0.97

Open Code: F1: 0.64 Sensitivity: 0.73, Specificity: 1.00 (Note: the performance is still somewhat uncertain, as the validation dataset contained only few cases of Open Code. An independent validation study (doi.org/10.1101/2020.10.30.361618) reported: F1: 0.70 Sensitivity: 0.58, Specificity: 0.997. This sample also had a limited number of cases of open code, so the estimates still have large confidence intervals.

Size of training data: ~10000 publications; ~1500 of these were manually screened during the validation process).

Known issues: There are many different levels of open data (e.g. do all data underlying a study need to be made available, or is sharing of any open data sufficient?). We chose a rather low barrier definition of OD here. Supplemental data are missed more often than other ways of sharing data (i.e. data repositories). There might be new or less known data repositories that are not yet covered by the algorithm.

Performance data available in: https://datascience.codata.org/article/10.5334/dsj-2020-042/