FDI Lab - SciCrunch.org | Searching for in Literature

Design of chemical space networks incorporating compound distance relationships.

Antonio de la Vega de León‎ et al.
F1000Research‎
2016‎

Networks, in which nodes represent compounds and edges pairwise similarity relationships, are used as coordinate-free representations of chemical space. So-called chemical space networks (CSNs) provide intuitive access to structural relationships within compound data sets and can be annotated with activity information. However, in such similarity-based networks, distances between compounds are typically determined for layout purposes and clarity and have no chemical meaning. By contrast, inter-compound distances as a measure of dissimilarity can be directly obtained from coordinate-based representations of chemical space. Herein, we introduce a CSN variant that incorporates compound distance relationships and thus further increases the information content of compound networks. The design was facilitated by adapting the Kamada-Kawai algorithm. Kamada-Kawai networks are the first CSNs that are based on numerical similarity measures, but do not depend on chosen similarity threshold values.

Activity-relevant similarity values for fingerprints and implications for similarity searching.

Swarit Jasial‎ et al.
F1000Research‎
2016‎

A largely unsolved problem in chemoinformatics is the issue of how calculated compound similarity relates to activity similarity, which is central to many applications. In general, activity relationships are predicted from calculated similarity values. However, there is no solid scientific foundation to bridge between calculated molecular and observed activity similarity. Accordingly, the success rate of identifying new active compounds by similarity searching is limited. Although various attempts have been made to establish relationships between calculated fingerprint similarity values and biological activities, none of these has yielded generally applicable rules for similarity searching. In this study, we have addressed the question of molecular versus activity similarity in a more fundamental way. First, we have evaluated if activity-relevant similarity value ranges could in principle be identified for standard fingerprints and distinguished from similarity resulting from random compound comparisons. Then, we have analyzed if activity-relevant similarity values could be used to guide typical similarity search calculations aiming to identify active compounds in databases. It was found that activity-relevant similarity values can be identified as a characteristic feature of fingerprints. However, it was also shown that such values cannot be reliably used as thresholds for practical similarity search calculations. In addition, the analysis presented herein helped to rationalize differences in fingerprint search performance.

Matched molecular pair-based data sets for computer-aided medicinal chemistry.

Ye Hu‎ et al.
F1000Research‎
2014‎

Matched molecular pairs (MMPs) are widely used in medicinal chemistry to study changes in compound properties including biological activity, which are associated with well-defined structural modifications. Herein we describe up-to-date versions of three MMP-based data sets that have originated from in-house research projects. These data sets include activity cliffs, structure-activity relationship (SAR) transfer series, and second generation MMPs based upon retrosynthetic rules. The data sets have in common that they have been derived from compounds included in the ChEMBL database (release 17) for which high-confidence activity data are available. Thus, the activity data associated with MMP-based activity cliffs, SAR transfer series, and retrosynthetic MMPs cover the entire spectrum of current pharmaceutical targets. Our data sets are made freely available to the scientific community.

Computational Assessment of Chemical Saturation of Analogue Series under Varying Conditions.

Dimitar Yonchev‎ et al.
ACS omega‎
2018‎

Assessing the degree to which analogue series are chemically saturated is of major relevance in compound optimization. Decisions to continue or discontinue series are typically made on the basis of subjective judgment. Currently, only very few methods are available to aid in decision making. We further investigate and extend a computational concept to quantitatively assess the progression and chemical saturation of a series. To these ends, existing analogues and virtual candidates are compared in chemical space and compound neighborhoods are systematically analyzed. A large number of analogue series from different sources are studied, and alternative chemical space representations and virtual analogues of different designs are explored. Furthermore, evolving analogue series are distinguished computationally according to different saturation levels. Taken together, our findings provide a basis for practical applications of computational saturation analysis in compound optimization.

Determining the Degree of Promiscuity of Extensively Assayed Compounds.

Swarit Jasial‎ et al.
PloS one‎
2016‎

In the context of polypharmacology, an emerging concept in drug discovery, promiscuity is rationalized as the ability of compounds to specifically interact with multiple targets. Promiscuity of drugs and bioactive compounds has thus far been analyzed computationally on the basis of activity annotations, without taking assay frequencies or inactivity records into account. Most recent estimates have indicated that bioactive compounds interact on average with only one to two targets, whereas drugs interact with six or more. In this study, we have further extended promiscuity analysis by identifying the most extensively assayed public domain compounds and systematically determining their promiscuity. These compounds were tested in hundreds of assays against hundreds of targets. In our analysis, assay promiscuity was distinguished from target promiscuity and separately analyzed for primary and confirmatory assays. Differences between the degree of assay and target promiscuity were surprisingly small and average and median degrees of target promiscuity of 2.6 to 3.4 and 2.0 were determined, respectively. Thus, target promiscuity remained at a low level even for most extensively tested active compounds. These findings provide further evidence that bioactive compounds are less promiscuous than drugs and have implications for pharmaceutical research. In addition to a possible explanation that drugs are more extensively tested for additional targets, the results would also support a "promiscuity enrichment model" according to which promiscuous compounds might be preferentially selected for therapeutic efficacy during clinical evaluation to ultimately become drugs.

Prediction of Compound Profiling Matrices Using Machine Learning.

Raquel Rodríguez-Pérez‎ et al.
ACS omega‎
2018‎

Screening of compound libraries against panels of targets yields profiling matrices. Such matrices typically contain structurally diverse screening compounds, large numbers of inactives, and small numbers of hits per assay. As such, they represent interesting and challenging test cases for computational screening and activity predictions. In this work, modeling of large compound profiling matrices was attempted that were extracted from publicly available screening data. Different machine learning methods including deep learning were compared and different prediction strategies explored. Prediction accuracy varied for assays with different numbers of active compounds, and alternative machine learning approaches often produced comparable results. Deep learning did not further increase the prediction accuracy of standard methods such as random forests or support vector machines. Target-based random forest models were prioritized and yielded successful predictions of active compounds for many assays.

Prediction of Promiscuity Cliffs Using Machine Learning.

Thomas Blaschke‎ et al.
Molecular informatics‎
2021‎

Compounds with the ability to interact with multiple targets, also called promiscuous compounds, provide the basis for polypharmacological drug discovery. In recent years, a plethora of structural analogs with different promiscuity has been identified. Nevertheless, the molecular origins of promiscuity remain to be elucidated. In this study, we systematically extracted different structural analogs with varying promiscuity using the matched molecular pair (MMP) formalism from public biological screening and medicinal chemistry data. Care was taken to eliminate all compounds with potential false-positive activity annotations from the analysis. Promiscuity predictions were then attempted at the level of compound pairs representing promiscuity cliffs (PCs; formed by analogs with large promiscuity differences) and corresponding non-PC MMPs (analog pairs without significant promiscuity differences). To address this prediction task, different machine learning models were generated and the results were compared with single compound predictions. PCs encoding promiscuity differences were found to contain more structure-promiscuity relationship information than sets of individual promiscuous compounds. In addition, feature analysis was carried out revealing key contributions to the correct prediction of PCs and non-PC MMPs via machine learning.

Application of Generative Autoencoder in De Novo Molecular Design.

Thomas Blaschke‎ et al.
Molecular informatics‎
2018‎

A major challenge in computational chemistry is the generation of novel molecular structures with desirable pharmacological and physiochemical properties. In this work, we investigate the potential use of autoencoder, a deep learning methodology, for de novo molecular design. Various generative autoencoders were used to map molecule structures into a continuous latent space and vice versa and their performance as structure generator was assessed. Our results show that the latent space preserves chemical similarity principle and thus can be used for the generation of analogue structures. Furthermore, the latent space created by autoencoders were searched systematically to generate novel compounds with predicted activity against dopamine receptor type 2 and compounds similar to known active compounds not included in the trainings set were identified.

Advancing the activity cliff concept.

Ye Hu‎ et al.
F1000Research‎
2013‎

The activity cliff concept has experienced increasing interest in medicinal chemistry and chemoinformatics. Activity cliffs have originally been defined as pairs of structurally similar compounds that are active against the same target but have a large difference in potency. Activity cliffs are relevant for structure-activity relationship (SAR) analysis and compound optimization because small chemical modifications can be deduced from cliffs that result in large-magnitude changes in potency. In addition to studying activity cliffs on the basis of individual compounds series, they can be systematically identified through mining of compound activity data. This commentary aims to provide a concise yet detailed picture of our current understanding of activity cliffs. It is also meant to introduce the further refined activity cliff concept to a general audience in drug development.

Towards a systematic assessment of assay interference: Identification of extensively tested compounds with high assay promiscuity.

Erik Gilberg‎ et al.
F1000Research‎
2017‎

A large-scale statistical analysis of hit rates of extensively assayed compounds is presented to provide a basis for a further assessment of assay interference potential and multi-target activities. A special feature of this investigation has been the inclusion of compound series information in activity analysis and the characterization of analog series using different parameters derived from assay statistics. No prior knowledge of compounds or targets was taken into consideration in the data-driven study of analog series. It was anticipated that taking large volumes of activity data, assay frequency, and assay overlap information into account would lead to statistically sound and chemically meaningful results. More than 6000 unique series of analogs with high hit rates were identified, more than 5000 of which did not contain known interference candidates, hence providing ample opportunities for follow-up analyses from a medicinal chemistry perspective.

Identifying Promiscuous Compounds with Activity against Different Target Classes.

Christian Feldmann‎ et al.
Molecules (Basel, Switzerland)‎
2019‎

Compounds with multitarget activity are of high interest for polypharmacological drug discovery. Such promiscuous compounds might be active against closely related target proteins from the same family or against distantly related or unrelated targets. Compounds with activity against distinct targets are not only of interest for polypharmacology but also to better understand how small molecules might form specific interactions in different binding site environments. We have aimed to identify compounds with activity against drug targets from different classes. To these ends, a systematic analysis of public biological screening data was carried out. Care was taken to exclude compounds from further consideration that were prone to experimental artifacts and false positive activity readouts. Extensively assayed compounds were identified and found to contain molecules that were consistently inactive in all assays, active against a single target, or promiscuous. The latter included more than 1000 compounds that were active against 10 or more targets from different classes. These multiclass ligands were further analyzed and exemplary compounds were found in X-ray structures of complexes with distinct targets. Our collection of multiclass ligands should be of interest for pharmaceutical applications and further exploration of binding characteristics at the molecular level. Therefore, these highly promiscuous compounds are made publicly available.

ccbmlib - a Python package for modeling Tanimoto similarity value distributions.

Martin Vogt‎ et al.
F1000Research‎
2020‎

The ccbmlib Python package is a collection of modules for modeling similarity value distributions based on Tanimoto coefficients for fingerprints available in RDKit. It can be used to assess the statistical significance of Tanimoto coefficients and evaluate how molecular similarity is reflected when different fingerprint representations are used. Significance measures derived from p-values allow a quantitative comparison of similarity scores obtained from different fingerprint representations that might have very different value ranges. Furthermore, the package models conditional distributions of similarity coefficients for a given reference compound. The conditional significance score estimates where a test compound would be ranked in a similarity search. The models are based on the statistical analysis of feature distributions and feature correlations of fingerprints of a reference database. The resulting models have been evaluated for 11 RDKit fingerprints, taking a collection of ChEMBL compounds as a reference data set. For most fingerprints, highly accurate models were obtained, with differences of 1% or less for Tanimoto coefficients indicating high similarity.

X-ray-Structure-Based Identification of Compounds with Activity against Targets from Different Families and Generation of Templates for Multitarget Ligand Design.

Erik Gilberg‎ et al.
ACS omega‎
2018‎

Compounds with multitarget activity (promiscuity) are increasingly sought in drug discovery. However, promiscuous compounds are often viewed controversially in light of potential assay artifacts that may give rise to false-positive activity annotations. We have reasoned that the strongest evidence for true multitarget activity of small molecules would be provided by experimentally determined structures of ligand-target complexes. Therefore, we have carried out a systematic search of currently available X-ray structures for compounds forming complexes with different targets. Rather unexpectedly, 1418 such crystallographic ligands were identified, including 702 that formed complexes with targets from different protein families (multifamily ligands). About half of these multifamily ligands originated from the medicinal chemistry literature, making it possible to consider additional target annotations and search for analogues. From 168 distinct series of analogues containing one or more multifamily ligands, 133 unique analogue-series-based scaffolds were isolated that can serve as templates for the design of new compounds with multitarget activity. As a part of our study, all of the multifamily ligands we have identified and the analogue-series-based scaffolds are made freely available.

Identifying relationships between unrelated pharmaceutical target proteins on the basis of shared active compounds.

Filip Miljković‎ et al.
Future science OA‎
2017‎

Computational exploration of small-molecule-based relationships between target proteins from different families.

Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction.

Raquel Rodríguez-Pérez‎ et al.
ACS omega‎
2017‎

In computational chemistry and chemoinformatics, the support vector machine (SVM) algorithm is among the most widely used machine learning methods for the identification of new active compounds. In addition, support vector regression (SVR) has become a preferred approach for modeling nonlinear structure-activity relationships and predicting compound potency values. For the closely related SVM and SVR methods, fingerprints (i.e., bit string or feature set representations of chemical structure and properties) are generally preferred descriptors. Herein, we have compared SVM and SVR calculations for the same compound data sets to evaluate which features are responsible for predictions. On the basis of systematic feature weight analysis, rather surprising results were obtained. Fingerprint features were frequently identified that contributed differently to the corresponding SVM and SVR models. The overlap between feature sets determining the predictive performance of SVM and SVR was only very small. Furthermore, features were identified that had opposite effects on SVM and SVR predictions. Feature weight analysis in combination with feature mapping made it also possible to interpret individual predictions, thus balancing the black box character of SVM/SVR modeling.

Series of screening compounds with high hit rates for the exploration of multi-target activities and assay interference.

Dagmar Stumpfe‎ et al.
Future science OA‎
2018‎

Generation of a database of analog series (ASs) with high assay hit rates for the exploration of assay interference and multi-target activities of compounds.

Predicting Isoform-Selective Carbonic Anhydrase Inhibitors via Machine Learning and Rationalizing Structural Features Important for Selectivity.

Salvatore Galati‎ et al.
ACS omega‎
2021‎

Carbonic anhydrases (CAs) catalyze the physiological hydration of carbon dioxide and are among the most intensely studied pharmaceutical target enzymes. A hallmark of CA inhibition is the complexation of the catalytic zinc cation in the active site. Human (h) CA isoforms belonging to different families are implicated in a wide range of diseases and of very high interest for therapeutic intervention. Given the conserved catalytic mechanisms and high similarity of many hCA isoforms, a major challenge for CA-based therapy is achieving inhibitor selectivity for hCA isoforms that are associated with specific pathologies over other widely distributed isoforms such as hCA I or hCA II that are of critical relevance for the integrity of many physiological processes. To address this challenge, we have attempted to predict compounds that are selective for isoform hCA IX, which is a tumor-associated protein and implicated in metastasis, over hCA II on the basis of a carefully curated data set of selective and nonselective inhibitors. Machine learning achieved surprisingly high accuracy in predicting hCA IX-selective inhibitors. The results were further investigated, and compound features determining successful predictions were identified. These features were then studied on the basis of X-ray structures of hCA isoform-inhibitor complexes and found to include substructures that explain compound selectivity. Our findings lend credence to selectivity predictions and indicate that the machine learning models derived herein have considerable potential to aid in the identification of new hCA IX-selective compounds.

Computational design of new molecular scaffolds for medicinal chemistry, part II: generalization of analog series-based scaffolds.

Dilyana Dimova‎ et al.
Future science OA‎
2018‎

Extending and generalizing the computational concept of analog series-based (ASB) scaffolds.

Promiscuity progression of bioactive compounds over time.

Ye Hu‎ et al.
F1000Research‎
2015‎

In the context of polypharmacology, compound promiscuity is rationalized as the ability of small molecules to specifically interact with multiple targets. To study promiscuity progression of bioactive compounds in detail, nearly 1 million compounds and more than 5.2 million activity records were analyzed. Compound sets were assembled by applying different data confidence criteria and selecting compounds with activity histories over many years. On the basis of release dates, compounds and activity records were organized on a time course, which ultimately enabled monitoring data growth and promiscuity progression over nearly 40 years, beginning in 1976. Surprisingly low degrees of promiscuity were consistently detected for all compound sets and there were only small increases in promiscuity over time. In fact, most compounds had a constant degree of promiscuity, including compounds with an activity history of 10 or 20 years. Moreover, during periods of massive data growth, beginning in 2007, promiscuity degrees also remained constant or displayed only minor increases, depending on the activity data confidence levels. Considering high-confidence data, bioactive compounds currently interact with 1.5 targets on average, regardless of their origins, and display essentially constant degrees of promiscuity over time. Taken together, our findings provide expectation values for promiscuity progression and magnitudes among bioactive compounds as activity data further grow.

Systematic identification of target set-dependent activity cliffs.

Huabin Hu‎ et al.
Future science OA‎
2019‎

Generating a knowledge base of new activity cliffs (ACs) defined on the basis of compound set-dependent potency distributions, also taking confirmed inactive compounds into account.

Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

Design of chemical space networks incorporating compound distance relationships.

Activity-relevant similarity values for fingerprints and implications for similarity searching.

Matched molecular pair-based data sets for computer-aided medicinal chemistry.

Computational Assessment of Chemical Saturation of Analogue Series under Varying Conditions.

Determining the Degree of Promiscuity of Extensively Assayed Compounds.

Prediction of Compound Profiling Matrices Using Machine Learning.

Prediction of Promiscuity Cliffs Using Machine Learning.

Application of Generative Autoencoder in De Novo Molecular Design.

Advancing the activity cliff concept.

Towards a systematic assessment of assay interference: Identification of extensively tested compounds with high assay promiscuity.

Identifying Promiscuous Compounds with Activity against Different Target Classes.

ccbmlib - a Python package for modeling Tanimoto similarity value distributions.

X-ray-Structure-Based Identification of Compounds with Activity against Targets from Different Families and Generation of Templates for Multitarget Ligand Design.

Identifying relationships between unrelated pharmaceutical target proteins on the basis of shared active compounds.

Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction.

Series of screening compounds with high hit rates for the exploration of multi-target activities and assay interference.

Predicting Isoform-Selective Carbonic Anhydrase Inhibitors via Machine Learning and Rationalizing Structural Features Important for Selectivity.

Computational design of new molecular scaffolds for medicinal chemistry, part II: generalization of analog series-based scaffolds.

Promiscuity progression of bioactive compounds over time.

Systematic identification of target set-dependent activity cliffs.

SciCrunch.org Resources

Navigation

Logging in and Registering

Searching

Save Your Search

Query Expansion

Collections

Facets

Options

Further Questions

About

Recent News Entries

Contact Us

SciCrunch

Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

Log in

Log in

Literature

Current Facets and Filters

Options

Facets

Recent searches

.in-collection { color: green; } Design of chemical space networks incorporating compound distance relationships.

.in-collection { color: green; } Activity-relevant similarity values for fingerprints and implications for similarity searching.

.in-collection { color: green; } Matched molecular pair-based data sets for computer-aided medicinal chemistry.

.in-collection { color: green; } Computational Assessment of Chemical Saturation of Analogue Series under Varying Conditions.

.in-collection { color: green; } Determining the Degree of Promiscuity of Extensively Assayed Compounds.

.in-collection { color: green; } Prediction of Compound Profiling Matrices Using Machine Learning.

.in-collection { color: green; } Prediction of Promiscuity Cliffs Using Machine Learning.

.in-collection { color: green; } Application of Generative Autoencoder in De Novo Molecular Design.

.in-collection { color: green; } Advancing the activity cliff concept.

.in-collection { color: green; } Towards a systematic assessment of assay interference: Identification of extensively tested compounds with high assay promiscuity.

.in-collection { color: green; } Identifying Promiscuous Compounds with Activity against Different Target Classes.

.in-collection { color: green; } ccbmlib - a Python package for modeling Tanimoto similarity value distributions.

.in-collection { color: green; } X-ray-Structure-Based Identification of Compounds with Activity against Targets from Different Families and Generation of Templates for Multitarget Ligand Design.

.in-collection { color: green; } Identifying relationships between unrelated pharmaceutical target proteins on the basis of shared active compounds.

.in-collection { color: green; } Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction.

.in-collection { color: green; } Series of screening compounds with high hit rates for the exploration of multi-target activities and assay interference.

.in-collection { color: green; } Predicting Isoform-Selective Carbonic Anhydrase Inhibitors via Machine Learning and Rationalizing Structural Features Important for Selectivity.

.in-collection { color: green; } Computational design of new molecular scaffolds for medicinal chemistry, part II: generalization of analog series-based scaffolds.

.in-collection { color: green; } Promiscuity progression of bioactive compounds over time.

.in-collection { color: green; } Systematic identification of target set-dependent activity cliffs.

SciCrunch.org Resources

Navigation

Logging in and Registering

Searching

Save Your Search

Query Expansion

Collections

Facets

Options

Further Questions

Publications Per Year

About

Recent News Entries

Contact Us

SciCrunch

Design of chemical space networks incorporating compound distance relationships.

Activity-relevant similarity values for fingerprints and implications for similarity searching.

Matched molecular pair-based data sets for computer-aided medicinal chemistry.

Computational Assessment of Chemical Saturation of Analogue Series under Varying Conditions.

Determining the Degree of Promiscuity of Extensively Assayed Compounds.

Prediction of Compound Profiling Matrices Using Machine Learning.

Prediction of Promiscuity Cliffs Using Machine Learning.

Application of Generative Autoencoder in De Novo Molecular Design.

Advancing the activity cliff concept.

Towards a systematic assessment of assay interference: Identification of extensively tested compounds with high assay promiscuity.

Identifying Promiscuous Compounds with Activity against Different Target Classes.

ccbmlib - a Python package for modeling Tanimoto similarity value distributions.

X-ray-Structure-Based Identification of Compounds with Activity against Targets from Different Families and Generation of Templates for Multitarget Ligand Design.

Identifying relationships between unrelated pharmaceutical target proteins on the basis of shared active compounds.

Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction.

Series of screening compounds with high hit rates for the exploration of multi-target activities and assay interference.

Predicting Isoform-Selective Carbonic Anhydrase Inhibitors via Machine Learning and Rationalizing Structural Features Important for Selectivity.

Computational design of new molecular scaffolds for medicinal chemistry, part II: generalization of analog series-based scaffolds.

Promiscuity progression of bioactive compounds over time.

Systematic identification of target set-dependent activity cliffs.