FDI Lab - SciCrunch.org | Searching for in Literature

Privacy-preserving search for chemical compound databases.

Kana Shimizu‎ et al.
BMC bioinformatics‎
2015‎

Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources.

Quantum chemical benchmark databases of gold-standard dimer interaction energies.

Alexander G Donchev‎ et al.
Scientific data‎
2021‎

Advances in computational chemistry create an ongoing need for larger and higher-quality datasets that characterize noncovalent molecular interactions. We present three benchmark collections of quantum mechanical data, covering approximately 3,700 distinct types of interacting molecule pairs. The first collection, which we refer to as DES370K, contains interaction energies for more than 370,000 dimer geometries. These were computed using the coupled-cluster method with single, double, and perturbative triple excitations [CCSD(T)], which is widely regarded as the gold-standard method in electronic structure theory. Our second benchmark collection, a core representative subset of DES370K called DES15K, is intended for more computationally demanding applications of the data. Finally, DES5M, our third collection, comprises interaction energies for nearly 5,000,000 dimer geometries; these were calculated using SNS-MP2, a machine learning approach that provides results with accuracy comparable to that of our coupled-cluster training data. These datasets may prove useful in the development of density functionals, empirically corrected wavefunction-based approaches, semi-empirical methods, force fields, and models trained using machine learning methods.

Statistical-based database fingerprint: chemical space dependent representation of compound databases.

Norberto Sánchez-Cruz‎ et al.
Journal of cheminformatics‎
2018‎

Simplified representation of compound databases has several applications in cheminformatics. Herein, we introduce an alternative and general method to build single fingerprint representations of compound databases. The approach is inspired on the previously published modal fingerprints that are aimed to capture the most significant bits of a fingerprint representation for a compound data set. The novelty of the herein proposed statistical-based database fingerprint (SB-DFP) is that it is generated based on binomial proportions comparisons taking as reference the distribution of "1" bits on a large representative set of the chemical space.

CLOOME: contrastive learning unlocks bioimaging databases for queries with chemical structures.

Ana Sanchez-Fernandez‎ et al.
Nature communications‎
2023‎

The field of bioimage analysis is currently impacted by a profound transformation, driven by the advancements in imaging technologies and artificial intelligence. The emergence of multi-modal AI systems could allow extracting and utilizing knowledge from bioimaging databases based on information from other data modalities. We leverage the multi-modal contrastive learning paradigm, which enables the embedding of both bioimages and chemical structures into a unified space by means of bioimage and molecular structure encoders. This common embedding space unlocks the possibility of querying bioimaging databases with chemical structures that induce different phenotypic effects. Concretely, in this work we show that a retrieval system based on multi-modal contrastive learning is capable of identifying the correct bioimage corresponding to a given chemical structure from a database of ~2000 candidate images with a top-1 accuracy >70 times higher than a random baseline. Additionally, the bioimage encoder demonstrates remarkable transferability to various further prediction tasks within the domain of drug discovery, such as activity prediction, molecule classification, and mechanism of action identification. Thus, our approach not only addresses the current limitations of bioimaging databases but also paves the way towards foundation models for microscopy images.

Fast 3D shape screening of large chemical databases through alignment-recycling.

Fabien Fontaine‎ et al.
Chemistry Central journal‎
2007‎

Large chemical databases require fast, efficient, and simple ways of looking for similar structures. Although such tasks are now fairly well resolved for graph-based similarity queries, they remain an issue for 3D approaches, particularly for those based on 3D shape overlays. Inspired by a recent technique developed to compare molecular shapes, we designed a hybrid methodology, alignment-recycling, that enables efficient retrieval and alignment of structures with similar 3D shapes.

Ambiguity of non-systematic chemical identifiers within and between small-molecule databases.

Saber A Akhondi‎ et al.
Journal of cheminformatics‎
2015‎

A wide range of chemical compound databases are currently available for pharmaceutical research. To retrieve compound information, including structures, researchers can query these chemical databases using non-systematic identifiers. These are source-dependent identifiers (e.g., brand names, generic names), which are usually assigned to the compound at the point of registration. The correctness of non-systematic identifiers (i.e., whether an identifier matches the associated structure) can only be assessed manually, which is cumbersome, but it is possible to automatically check their ambiguity (i.e., whether an identifier matches more than one structure). In this study we have quantified the ambiguity of non-systematic identifiers within and between eight widely used chemical databases. We also studied the effect of chemical structure standardization on reducing the ambiguity of non-systematic identifiers.

Glycoproteomic and glycomic databases.

Deniz Baycin Hizal‎ et al.
Clinical proteomics‎
2014‎

Protein glycosylation serves critical roles in the cellular and biological processes of many organisms. Aberrant glycosylation has been associated with many illnesses such as hereditary and chronic diseases like cancer, cardiovascular diseases, neurological disorders, and immunological disorders. Emerging mass spectrometry (MS) technologies that enable the high-throughput identification of glycoproteins and glycans have accelerated the analysis and made possible the creation of dynamic and expanding databases. Although glycosylation-related databases have been established by many laboratories and institutions, they are not yet widely known in the community. Our study reviews 15 different publicly available databases and identifies their key elements so that users can identify the most applicable platform for their analytical needs. These databases include biological information on the experimentally identified glycans and glycopeptides from various cells and organisms such as human, rat, mouse, fly and zebrafish. The features of these databases - 7 for glycoproteomic data, 6 for glycomic data, and 2 for glycan binding proteins are summarized including the enrichment techniques that are used for glycoproteome and glycan identification. Furthermore databases such as Unipep, GlycoFly, GlycoFish recently established by our group are introduced. The unique features of each database, such as the analytical methods used and bioinformatical tools available are summarized. This information will be a valuable resource for the glycobiology community as it presents the analytical methods and glycosylation related databases together in one compendium. It will also represent a step towards the desired long term goal of integrating the different databases of glycosylation in order to characterize and categorize glycoproteins and glycans better for biomedical research.

Human variation databases.

Jan Küntzer‎ et al.
Database : the journal of biological databases and curation‎
2010‎

More than 100,000 human genetic variations have been described in various genes that are associated with a wide variety of diseases. Such data provides invaluable information for both clinical medicine and basic science. A number of locus-specific databases have been developed to exploit this huge amount of data. However, the scope, format and content of these databases differ strongly and as no standard for variation databases has yet been adopted, the way data is presented varies enormously. This review aims to give an overview of current resources for human variation data in public and commercial resources.

Human cancer databases (review).

Athanasia Pavlopoulou‎ et al.
Oncology reports‎
2015‎

Cancer is one of the four major non‑communicable diseases (NCD), responsible for ~14.6% of all human deaths. Currently, there are >100 different known types of cancer and >500 genes involved in cancer. Ongoing research efforts have been focused on cancer etiology and therapy. As a result, there is an exponential growth of cancer‑associated data from diverse resources, such as scientific publications, genome‑wide association studies, gene expression experiments, gene‑gene or protein‑protein interaction data, enzymatic assays, epigenomics, immunomics and cytogenetics, stored in relevant repositories. These data are complex and heterogeneous, ranging from unprocessed, unstructured data in the form of raw sequences and polymorphisms to well‑annotated, structured data. Consequently, the storage, mining, retrieval and analysis of these data in an efficient and meaningful manner pose a major challenge to biomedical investigators. In the current review, we present the central, publicly accessible databases that contain data pertinent to cancer, the resources available for delivering and analyzing information from these databases, as well as databases dedicated to specific types of cancer. Examples for this wealth of cancer‑related information and bioinformatic tools have also been provided.

Biological databases for human research.

Dong Zou‎ et al.
Genomics, proteomics & bioinformatics‎
2015‎

The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases. With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of human-related biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation.

"DompeKeys": a set of novel substructure-based descriptors for efficient chemical space mapping, development and structural interpretation of machine learning models, and indexing of large databases.

Candida Manelfi‎ et al.
Journal of cheminformatics‎
2024‎

The conversion of chemical structures into computer-readable descriptors, able to capture key structural aspects, is of pivotal importance in the field of cheminformatics and computer-aided drug design. Molecular fingerprints represent a widely employed class of descriptors; however, their generation process is time-consuming for large databases and is susceptible to bias. Therefore, descriptors able to accurately detect predefined structural fragments and devoid of lengthy generation procedures would be highly desirable. To meet additional needs, such descriptors should also be interpretable by medicinal chemists, and suitable for indexing databases with trillions of compounds. To this end, we developed-as integral part of EXSCALATE, Dompé's end-to-end drug discovery platform-the DompeKeys (DK), a new substructure-based descriptor set, which encodes the chemical features that characterize compounds of pharmaceutical interest. DK represent an exhaustive collection of curated SMARTS strings, defining chemical features at different levels of complexity, from specific functional groups and structural patterns to simpler pharmacophoric points, corresponding to a network of hierarchically interconnected substructures. Because of their extended and hierarchical structure, DK can be used, with good performance, in different kinds of applications. In particular, we demonstrate how they are very well suited for effective mapping of chemical space, as well as substructure search and virtual screening. Notably, the incorporation of DK yields highly performing machine learning models for the prediction of both compounds' activity and metabolic reaction occurrence. The protocol to generate the DK is freely available at https://dompekeys.exscalate.eu and is fully integrated with the Molecular Anatomy protocol for the generation and analysis of hierarchically interconnected molecular scaffolds and frameworks, thus providing a comprehensive and flexible tool for drug design applications.

New approaches in developing medicinal herbs databases.

Zahra Fathifar‎ et al.
Database : the journal of biological databases and curation‎
2023‎

Medicinal herbs databases have become a crucial part of organizing new scientific literature generated in medicinal herbs field, as well as new drug discoveries in the information era. The aim of this review was to track the current status of medicinal herbs databases. Search for finding medicinal herbs databases was carried out via Google and PubMed. PubMed was searched for papers introducing medicinal herbs databases by the recruited search strategy. Papers with an active database on the web were included in the review. Google was also searched for medicinal herbs databases. Both retrieved papers and databases were reviewed by the authors. In this review, the current status of 25 medicinal herbs databases was reviewed, and the important characteristics of databases were mentioned. The reviewed databases had a great variety in terms of characteristics and functions. Finally, some recommendations for the efficient development of medicinal herbs databases were suggested. Although contemporary medicinal herbs databases represent much useful information, adding some features to these databases could assist them to have better functionality. This work may not cover all the necessary information, but we hope that our review can provide readers with fundamental concepts, perspectives and suggestions for constructing more useful databases.

Protease Inhibitors in View of Peptide Substrate Databases.

Birgit J Waldner‎ et al.
Journal of chemical information and modeling‎
2016‎

Protease substrate profiling has nowadays almost become a routine task for experimentalists, and the knowledge on protease peptide substrates is easily accessible via the MEROPS database. We present a shape-based virtual screening workflow using vROCS that applies the information about the specificity of the proteases to find new small-molecule inhibitors. Peptide substrate sequences for three to four substrate positions of each substrate from the MEROPS database were used to build the training set. Two-dimensional substrate sequences were converted to three-dimensional conformations through mutation of a template peptide substrate. The vROCS query was built from single amino acid queries for each substrate position considering the relative frequencies of the amino acids. The peptide-substrate-based shape-based virtual screening approach gives good performance for the four proteases thrombin, factor Xa, factor VIIa, and caspase-3 with the DUD-E data set. The results show that the method works for protease targets with different specificity profiles as well as for targets with different active-site mechanisms. As no structure of the target and no information on small-molecule inhibitors are required to use our approach, the method has significant advantages in comparison with conventional structure- and ligand-based methods.

Multicenter neonatal databases: Trends in research uses.

Liza M Creel‎ et al.
BMC research notes‎
2017‎

In the US, approximately 12.7% of all live births are preterm, 8.2% of live births were low birth weight (LBW), and 1.5% are very low birth weight (VLBW). Although technological advances have improved mortality rates among preterm and LBW infants, improving overall rates of prematurity and LBW remains a national priority. Monitoring short- and long-term outcomes is critical for advancing medical treatment and minimizing morbidities associated with prematurity or LBW; however, studying these infants can be challenging. Several large, multi-center neonatal databases have been developed to improve research and quality improvement of treatments for and outcomes of premature and LBW infants. The purpose of this systematic review was to describe three multi-center neonatal databases.

FDA toxicity databases and real-time data entry.

Kirk B Arvidson‎
Toxicology and applied pharmacology‎
2008‎

Structure-searchable electronic databases are valuable new tools that are assisting the FDA in its mission to promptly and efficiently review incoming submissions for regulatory approval of new food additives and food contact substances. The Center for Food Safety and Applied Nutrition's Office of Food Additive Safety (CFSAN/OFAS), in collaboration with Leadscope, Inc., is consolidating genetic toxicity data submitted in food additive petitions from the 1960s to the present day. The Center for Drug Evaluation and Research, Office of Pharmaceutical Science's Informatics and Computational Safety Analysis Staff (CDER/OPS/ICSAS) is separately gathering similar information from their submissions. Presently, these data are distributed in various locations such as paper files, microfiche, and non-standardized toxicology memoranda. The organization of the data into a consistent, searchable format will reduce paperwork, expedite the toxicology review process, and provide valuable information to industry that is currently available only to the FDA. Furthermore, by combining chemical structures with genetic toxicity information, biologically active moieties can be identified and used to develop quantitative structure-activity relationship (QSAR) modeling and testing guidelines. Additionally, chemicals devoid of toxicity data can be compared to known structures, allowing for improved safety review through the identification and analysis of structural analogs. Four database frameworks have been created: bacterial mutagenesis, in vitro chromosome aberration, in vitro mammalian mutagenesis, and in vivo micronucleus. Controlled vocabularies for these databases have been established. The four separate genetic toxicity databases are compiled into a single, structurally-searchable database for easy accessibility of the toxicity information. Beyond the genetic toxicity databases described here, additional databases for subchronic, chronic, and teratogenicity studies have been prepared.

Comparative analyses of plant transcription factor databases.

Silvia R Ramirez‎ et al.
Current genomics‎
2009‎

Transcription factors (TFs) are proteinaceous complex, which bind to the promoter regions in the DNA and affect transcription initiation. Plant TFs control gene expressions and genes control many physiological processes, which in turn trigger cascades of biochemical reactions in plant cells. The databases available for plant TFs are somewhat abundant but all convey different information and in different formats. Some of the publicly available plant TF databases may be narrow, while others are broad in scopes. For example, some of the best TF databases are ones that are very specific with just one plant species, but there are also other databases that contain a total of up to 20 different plant species. In this review plant TF databases ranging from a single species to many will be assessed and described. The comparative analyses of all the databases and their advantages and disadvantages are also discussed.

Databases and web tools for cancer genomics study.

Yadong Yang‎ et al.
Genomics, proteomics & bioinformatics‎
2015‎

Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community.

Speech databases for mental disorders: A systematic review.

Yiling Li‎ et al.
General psychiatry‎
2019‎

The employment of clinical databases in the study of mental disorders is essential to the diagnosis and treatment of patients with mental illness. While text corpora obtain merely limited information of content, speech corpora capture tones, emotions, rhythms and many other signals beyond content. Hence, the design and development of speech corpora for patients with mental disorders is increasingly important.

Consensus and conflict cards for metabolic pathway databases.

Miranda D Stobbe‎ et al.
BMC systems biology‎
2013‎

The metabolic network of H. sapiens and many other organisms is described in multiple pathway databases. The level of agreement between these descriptions, however, has proven to be low. We can use these different descriptions to our advantage by identifying conflicting information and combining their knowledge into a single, more accurate, and more complete description. This task is, however, far from trivial.

KaBOB: ontology-based semantic integration of biomedical databases.

Kevin M Livingston‎ et al.
BMC bioinformatics‎
2015‎

The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources.

Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

Privacy-preserving search for chemical compound databases.

Quantum chemical benchmark databases of gold-standard dimer interaction energies.

Statistical-based database fingerprint: chemical space dependent representation of compound databases.

CLOOME: contrastive learning unlocks bioimaging databases for queries with chemical structures.

Fast 3D shape screening of large chemical databases through alignment-recycling.

Ambiguity of non-systematic chemical identifiers within and between small-molecule databases.

Glycoproteomic and glycomic databases.

Human variation databases.

Human cancer databases (review).

Biological databases for human research.

"DompeKeys": a set of novel substructure-based descriptors for efficient chemical space mapping, development and structural interpretation of machine learning models, and indexing of large databases.

New approaches in developing medicinal herbs databases.

Protease Inhibitors in View of Peptide Substrate Databases.

Multicenter neonatal databases: Trends in research uses.

FDA toxicity databases and real-time data entry.

Comparative analyses of plant transcription factor databases.

Databases and web tools for cancer genomics study.

Speech databases for mental disorders: A systematic review.

Consensus and conflict cards for metabolic pathway databases.

KaBOB: ontology-based semantic integration of biomedical databases.

SciCrunch.org Resources

Navigation

Logging in and Registering

Searching

Save Your Search

Query Expansion

Collections

Facets

Options

Further Questions

About

Recent News Entries

Contact Us

SciCrunch

Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

Log in

Log in

Literature

Options

Facets

Recent searches

.in-collection { color: green; } Privacy-preserving search for chemical compound databases.

.in-collection { color: green; } Quantum chemical benchmark databases of gold-standard dimer interaction energies.

.in-collection { color: green; } Statistical-based database fingerprint: chemical space dependent representation of compound databases.

.in-collection { color: green; } CLOOME: contrastive learning unlocks bioimaging databases for queries with chemical structures.

.in-collection { color: green; } Fast 3D shape screening of large chemical databases through alignment-recycling.

.in-collection { color: green; } Ambiguity of non-systematic chemical identifiers within and between small-molecule databases.

.in-collection { color: green; } Glycoproteomic and glycomic databases.

.in-collection { color: green; } Human variation databases.

.in-collection { color: green; } Human cancer databases (review).

.in-collection { color: green; } Biological databases for human research.

.in-collection { color: green; } "DompeKeys": a set of novel substructure-based descriptors for efficient chemical space mapping, development and structural interpretation of machine learning models, and indexing of large databases.

.in-collection { color: green; } New approaches in developing medicinal herbs databases.

.in-collection { color: green; } Protease Inhibitors in View of Peptide Substrate Databases.

.in-collection { color: green; } Multicenter neonatal databases: Trends in research uses.

.in-collection { color: green; } FDA toxicity databases and real-time data entry.

.in-collection { color: green; } Comparative analyses of plant transcription factor databases.

.in-collection { color: green; } Databases and web tools for cancer genomics study.

.in-collection { color: green; } Speech databases for mental disorders: A systematic review.

.in-collection { color: green; } Consensus and conflict cards for metabolic pathway databases.

.in-collection { color: green; } KaBOB: ontology-based semantic integration of biomedical databases.

SciCrunch.org Resources

Navigation

Logging in and Registering

Searching

Save Your Search

Query Expansion

Collections

Facets

Options

Further Questions

Publications Per Year

About

Recent News Entries

Contact Us

SciCrunch

Privacy-preserving search for chemical compound databases.

Quantum chemical benchmark databases of gold-standard dimer interaction energies.

Statistical-based database fingerprint: chemical space dependent representation of compound databases.

CLOOME: contrastive learning unlocks bioimaging databases for queries with chemical structures.

Fast 3D shape screening of large chemical databases through alignment-recycling.

Ambiguity of non-systematic chemical identifiers within and between small-molecule databases.

Glycoproteomic and glycomic databases.

Human variation databases.

Human cancer databases (review).

Biological databases for human research.

"DompeKeys": a set of novel substructure-based descriptors for efficient chemical space mapping, development and structural interpretation of machine learning models, and indexing of large databases.

New approaches in developing medicinal herbs databases.

Protease Inhibitors in View of Peptide Substrate Databases.

Multicenter neonatal databases: Trends in research uses.

FDA toxicity databases and real-time data entry.

Comparative analyses of plant transcription factor databases.

Databases and web tools for cancer genomics study.

Speech databases for mental disorders: A systematic review.

Consensus and conflict cards for metabolic pathway databases.

KaBOB: ontology-based semantic integration of biomedical databases.