Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Is useful research data usually shared? An investigation of genome-wide association study summary statistics.

PloS one | 2020

Primary data collected during a research study is often shared and may be reused for new studies. To assess the extent of data sharing in favourable circumstances and whether data sharing checks can be automated, this article investigates summary statistics from primary human genome-wide association studies (GWAS). This type of data is highly suitable for sharing because it is a standard research output, is straightforward to use in future studies (e.g., for secondary analysis), and may be already stored in a standard format for internal sharing within multi-site research projects. Manual checks of 1799 articles from 2010 and 2017 matching a simple PubMed query for molecular epidemiology GWAS were used to identify 314 primary human GWAS papers. Of these, only 13% reported the location of a complete set of GWAS summary data, increasing from 3% in 2010 to 23% in 2017. Whilst information about whether data was shared was typically located clearly within a data availability statement, the exact nature of the shared data was usually unspecified. Thus, data sharing is the exception even in suitable research fields with relatively strong data sharing norms. Moreover, the lack of clear data descriptions within data sharing statements greatly complicates the task of automatically characterising shared data sets.

Pubmed ID: 32084240 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

  • Agency: Medical Research Council, United Kingdom
    Id: MC_UU_00011/7

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


NCBI database of Genotypes and Phenotypes (dbGap) (tool)

RRID:SCR_002709

Database developed to archive and distribute clinical data and results from studies that have investigated interaction of genotype and phenotype in humans. Database to archive and distribute results of studies including genome-wide association studies, medical sequencing, molecular diagnostic assays, and association between genotype and non-clinical traits.

View all literature mentions

MeSH (tool)

RRID:SCR_004750

A controlled vocabulary thesaurus that consists of sets of terms naming descriptors in a hierarchical structure that permits searching at various levels of specificity. MeSH, in machine-readable form, is provided at no charge via electronic means. MeSH descriptors are arranged in both an alphabetic and a hierarchical structure. At the most general level of the hierarchical structure are very broad headings such as Anatomy or Mental Disorders. More specific headings are found at more narrow levels of the twelve-level hierarchy, such as Ankle and Conduct Disorder. There are 27,149 descriptors in 2014 MeSH. There are also over 218,000 entry terms that assist in finding the most appropriate MeSH Heading, for example, Vitamin C is an entry term to Ascorbic Acid. In addition to these headings, there are more than 219,000 headings called Supplementary Concept Records (formerly Supplementary Chemical Records) within a separate thesaurus. The MeSH thesaurus is used by NLM for indexing articles from 5,400 of the world''''s leading biomedical journals for the MEDLINE/PubMED database. It is also used for the NLM-produced database that includes cataloging of books, documents, and audiovisuals acquired by the Library. Each bibliographic reference is associated with a set of MeSH terms that describe the content of the item. Similarly, search queries use MeSH vocabulary to find items on a desired topic.

View all literature mentions

PubMed (tool)

RRID:SCR_004846

Public bibliographic database that provides access to citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites. PubMed citations and abstracts include fields of biomedicine and health, covering portions of life sciences, behavioral sciences, chemical sciences, and bioengineering. Provides access to additional relevant web sites and links to other NCBI molecular biology resources. Publishers of journals can submit their citations to NCBI and then provide access to full-text of articles at journal web sites using LinkOut.

View all literature mentions

GWAS: Catalog of Published Genome-Wide Association Studies (tool)

RRID:SCR_012745

Catalog of published genome-wide association studies. Genome-wide set of genetic variants in different individuals to see if any variant is associated with trait and disease. Database of genome-wide association study (GWAS) publications including only those attempting to assay single nucleotide polymorphisms (SNPs). Publications are organized from most to least recent date of publication. Studies are identified through weekly PubMed literature searches, daily NIH-distributed compilations of news and media reports, and occasional comparisons with an existing database of GWAS literature (HuGE Navigator). Works with HANCESTRO ancestry representation.

View all literature mentions