Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE.

Database : the journal of biological databases and curation | 2012

High-throughput experiments and bioinformatics techniques are creating an exploding volume of data that are becoming overwhelming to keep track of for biologists and researchers who need to access, analyze and process existing data. Much of the available data are being deposited in specialized databases, such as the Gene Expression Omnibus (GEO) for microarrays or the Protein Data Bank (PDB) for protein structures and coordinates. Data sets are also being described by their authors in publications archived in literature databases such as MEDLINE and PubMed Central. Currently, the curation of links between biological databases and the literature mainly relies on manual labour, which makes it a time-consuming and daunting task. Herein, we analysed the current state of link curation between GEO, PDB and MEDLINE. We found that the link curation is heterogeneous depending on the sources and databases involved, and that overlap between sources is low, <50% for PDB and GEO. Furthermore, we showed that text-mining tools can automatically provide valuable evidence to help curators broaden the scope of articles and database entries that they review. As a result, we made recommendations to improve the coverage of curated links, as well as the consistency of information available from different databases while maintaining high-quality curation. Database URLs: http://www.ncbi.nlm.nih.gov/PubMed, http://www.ncbi.nlm.nih.gov/geo/, http://www.rcsb.org/pdb/

Pubmed ID: 22685160 RIS Download

Associated grants

  • Agency: NLM NIH HHS, United States
    Id: LM000002-01

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


PubChem (tool)

RRID:SCR_004284

Collection of information about chemical structures and biological properties of small molecules and siRNA reagents hosted by the National Center for Biotechnology Information (NCBI).

View all literature mentions

ISRCTN Registry (tool)

RRID:SCR_006087

A primary clinical trial registry which houses proposed, ongoing, and completed clinical research studies. An ISRCTN is a simple numeric system for the unique identification of randomized controlled trials worldwide. The registry provides content validation and curation and the unique identification number necessary for publication. Submitted studies range from cancer to urological diseases.

View all literature mentions

NCBI (tool)

RRID:SCR_006472

A portal to biomedical and genomic information. NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information for the better understanding of molecular processes affecting human health and disease.

View all literature mentions

National Library of Medicine (tool)

RRID:SCR_011446

NLM collects, organizes, and makes available biomedical science information to scientists, health professionals, and the public. The Library's Web-based databases, including PubMed/Medline and MedlinePlus, are used extensively around the world. NLM conducts and supports research in biomedical communications; creates information resources for molecular biology, biotechnology, toxicology, and environmental health; and provides grant and contract support for training, medical library resources, and biomedical informatics and communications research. Celebrating its 175th anniversary in 2011, the National Library of Medicine (NLM), in Bethesda, Maryland, is a part of the National Institutes of Health, U.S. Department of Health and Human Services (HHS). Since its founding in 1836 as the library of the U.S. Army Surgeon General, NLM has played a pivotal role in translating biomedical research into practice. It is the world's largest biomedical library and the developer of electronic information services that deliver trillions of bytes of data to millions of users every day. Scientists, health professionals, and the public in the United States and around the globe search the Library's online information resources more than 1 billion times each year. The Library is open to all and has many services and resources for scientists, health professionals, historians, and the general public. NLM has over 17 million books, journals, manuscripts, audiovisuals, and other forms of medical information on its shelves, making it the largest health-science library in the world. In today's increasingly digital world, NLM carries out its mission of enabling biomedical research, supporting health care and public health, and promoting healthy behavior by: * Acquiring, organizing, and preserving the world's scholarly biomedical literature; * Providing access to biomedical and health information across the country in partnership with the 5,800-member National Network of Libraries of Medicine (NN/LM); * Serving as a leading global resource for building, curating and providing sophisticated access to molecular biology and genomic information, including those from the Human Genome Project and NIH Common Fund; * Creating high-quality information services relevant to toxicology and environmental health, health services research, and public health; * Conducting research and development on biomedical communications systems, methods, technologies, and networks and information dissemination and utilization among health professionals, patients, and the general public; * Funding advanced biomedical informatics research and serving as the primary supporter of pre- and post-doctoral research training in biomedical informatics at 18 U.S. universities.

View all literature mentions

LinkHub: A Semantic Web System that facilitates cross-database queries and information retrieval in proteomics (tool)

RRID:SCR_001844

THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 23,2022. LinkHub is a software system using Semantic Web RDF that manages the graph of identifier relationships and allows exploration with a variety of interfaces. It leverages Semantic Web standards-based integrated data to provide novel information retrieval to identifier-related documents through relational graph queries, simplifies and manages connections to major hubs such as UniProt, and provides useful interactive and query interfaces for exploring the integrated data. For efficiency, it is also provided with relational-database access and translation between the relational and RDF versions. LinkHub is practically useful in creating small, local hubs on common topics and then connecting these to major portals in a federated architecture; LinkHub was used to establish such a relationship between UniProt and the North East Structural Genomics Consortium. LinkHub also facilitates queries and access to information and documents related to identifiers spread across multiple databases, acting as connecting glue between different identifier spaces. LinkHub is available at hub.gersteinlab.org and hub.nesg.org with supplement, database models and full-source code. Sponsors: Funding for this work comes from NIH/NIGMS grant P50 GM62413-01, NIH grant K25 HG02378, and NSF grant DBI-0135442.

View all literature mentions