The Text-mining based PubChem Bioassay neighboring analysis.

BMC bioinformatics | Nov 8, 2010

BACKGROUND: In recent years, the number of High Throughput Screening (HTS) assays deposited in PubChem has grown quickly. As a result, the volume of both the structured information (i.e. molecular structure, bioactivities) and the unstructured information (such as descriptions of bioassay experiments), has been increasing exponentially. As a result, it has become even more demanding and challenging to efficiently assemble the bioactivity data by mining the huge amount of information to identify and interpret the relationships among the diversified bioassay experiments. In this work, we propose a text-mining based approach for bioassay neighboring analysis from the unstructured text descriptions contained in the PubChem BioAssay database. RESULTS: The neighboring analysis is achieved by evaluating the cosine scores of each bioassay pair and fraction of overlaps among the human-curated neighbors. Our results from the cosine score distribution analysis and assay neighbor clustering analysis on all PubChem bioassays suggest that strong correlations among the bioassays can be identified from their conceptual relevance. A comparison with other existing assay neighboring methods suggests that the text-mining based bioassay neighboring approach provides meaningful linkages among the PubChem bioassays, and complements the existing methods by identifying additional relationships among the bioassay entries. CONCLUSIONS: The text-mining based bioassay neighboring analysis is efficient for correlating bioassays and studying different aspects of a biological process, which are otherwise difficult to achieve by existing neighboring procedures due to the lack of specific annotations and structured information. It is suggested that the text-mining based bioassay neighboring analysis can be used as a standalone or as a complementary tool for the PubChem bioassay neighboring process to enable efficient integration of assay results and generate hypotheses for the discovery of bioactivities of the tested reagents.

Mesh terms: Biological Assay | Data Mining | Databases, Factual | High-Throughput Screening Assays

Publication data is provided by the National Library of Medicine ® and PubMed ®.

Database of information on the biological activities of small molecules organized as three linked databases. It includes substance information, compound structures, and BioActivity data. It also provides a chemical structure similarity search tool. PubChem services include BioActivity analysis, chemical structure search, download facility, score matrix service, sources, standardization service, structure clustering, bulk data download (ftp site), power user gateway, and a web-based 3D viewer. The Substance/Compound database, where possible, provides links to BioAssay description, literature, references, and assay data points. The BioAssay database also includes links back to the Substance/Compound database. PubChem is integrated with Entrez and also provides compound neighboring, sub/superstructure, similarity structure, BioActivity data, and other searching features. PubChem contains substance and BioAssay information from a multitude of depositors.


A data analysis service to find regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.


PubChem BioAssay

As one of three primary databases of PubChem (Pcsubstance, Pccompound, and PCBioAssay), PubChem BioAssay Database contains bioactivity screens of chemical substances described in PubChem Substance. It provides searchable descriptions of each bioassay, including descriptions of the conditions and readouts specific to that screening procedure. A PubChem FTP and bioassay download are available. New data are accepted into the repository. PCBioAssay contains more than 503,000 BioAssays. Each BioAssay contains a various number of data points (2011). PubChem provides a number of BioAssay services including BioAssay summary for individual assay, BioActivity Summary, Data Table, and Structure-Activity Analysis for selected substance/compound/bioassay set. Data Table further has services for data analysis through Plots and for Selecting detailed test results.


