FDI Lab - SciCrunch.org | Searching in Literature

DeepSynergy: predicting anti-cancer drug synergy with Deep Learning.

Kristina Preuer‎ et al.
Bioinformatics (Oxford, England)‎
2018‎

While drug combination therapies are a well-established concept in cancer treatment, identifying novel synergistic combinations is challenging due to the size of combinatorial space. However, computational approaches have emerged as a time- and cost-efficient way to prioritize combinations to test, based on recently available large-scale combination screening data. Recently, Deep Learning has had an impact in many research areas by achieving new state-of-the-art model performance. However, Deep Learning has not yet been applied to drug synergy prediction, which is the approach we present here, termed DeepSynergy. DeepSynergy uses chemical and genomic information as input information, a normalization strategy to account for input data heterogeneity, and conical layers to model drug synergies.

Genome-wide chromatin remodeling identified at GC-rich long nucleosome-free regions.

Karin Schwarzbauer‎ et al.
PloS one‎
2012‎

To gain deeper insights into principles of cell biology, it is essential to understand how cells reorganize their genomes by chromatin remodeling. We analyzed chromatin remodeling on next generation sequencing data from resting and activated T cells to determine a whole-genome chromatin remodeling landscape. We consider chromatin remodeling in terms of nucleosome repositioning which can be observed most robustly in long nucleosome-free regions (LNFRs) that are occupied by nucleosomes in another cell state. We found that LNFR sequences are either AT-rich or GC-rich, where nucleosome repositioning was observed much more prominently in GC-rich LNFRs - a considerable proportion of them outside promoter regions. Using support vector machines with string kernels, we identified a GC-rich DNA sequence pattern indicating loci of nucleosome repositioning in resting T cells. This pattern appears to be also typical for CpG islands. We found out that nucleosome repositioning in GC-rich LNFRs is indeed associated with CpG islands and with binding sites of the CpG-island-binding ZF-CXXC proteins KDM2A and CFP1. That this association occurs prominently inside and also prominently outside of promoter regions hints at a mechanism governing nucleosome repositioning that acts on a whole-genome scale.

HapFABIA: identification of very short segments of identity by descent characterized by rare variants in large sequencing data.

Sepp Hochreiter‎
Nucleic acids research‎
2013‎

Identity by descent (IBD) can be reliably detected for long shared DNA segments, which are found in related individuals. However, many studies contain cohorts of unrelated individuals that share only short IBD segments. New sequencing technologies facilitate identification of short IBD segments through rare variants, which convey more information on IBD than common variants. Current IBD detection methods, however, are not designed to use rare variants for the detection of short IBD segments. Short IBD segments reveal genetic structures at high resolution. Therefore, they can help to improve imputation and phasing, to increase genotyping accuracy for low-coverage sequencing and to increase the power of association studies. Since short IBD segments are further assumed to be old, they can shed light on the evolutionary history of humans. We propose HapFABIA, a computational method that applies biclustering to identify very short IBD segments characterized by rare variants. HapFABIA is designed to detect short IBD segments in genotype data that were obtained from next-generation sequencing, but can also be applied to DNA microarray data. Especially in next-generation sequencing data, HapFABIA exploits rare variants for IBD detection. HapFABIA significantly outperformed competing algorithms at detecting short IBD segments on artificial and simulated data with rare variants. HapFABIA identified 160 588 different short IBD segments characterized by rare variants with a median length of 23 kb (mean 24 kb) in data for chromosome 1 of the 1000 Genomes Project. These short IBD segments contain 752 000 single nucleotide variants (SNVs), which account for 39% of the rare variants and 23.5% of all variants. The vast majority-152 000 IBD segments-are shared by Africans, while only 19 000 and 11 000 are shared by Europeans and Asians, respectively. IBD segments that match the Denisova or the Neandertal genome are found significantly more often in Asians and Europeans but also, in some cases exclusively, in Africans. The lengths of IBD segments and their sharing between continental populations indicate that many short IBD segments from chromosome 1 existed before humans migrated out of Africa. Thus, rare variants that tag these short IBD segments predate human migration from Africa. The software package HapFABIA is available from Bioconductor. All data sets, result files and programs for data simulation, preprocessing and evaluation are supplied at http://www.bioinf.jku.at/research/short-IBD.

Rectified factor networks for biclustering of omics data.

Djork-Arné Clevert‎ et al.
Bioinformatics (Oxford, England)‎
2017‎

Biclustering has become a major tool for analyzing large datasets given as matrix of samples times features and has been successfully applied in life sciences and e-commerce for drug design and recommender systems, respectively. actor nalysis for cluster cquisition (FABIA), one of the most successful biclustering methods, is a generative model that represents each bicluster by two sparse membership vectors: one for the samples and one for the features. However, FABIA is restricted to about 20 code units because of the high computational complexity of computing the posterior. Furthermore, code units are sometimes insufficiently decorrelated and sample membership is difficult to determine. We propose to use the recently introduced unsupervised Deep Learning approach Rectified Factor Networks (RFNs) to overcome the drawbacks of existing biclustering methods. RFNs efficiently construct very sparse, non-linear, high-dimensional representations of the input via their posterior means. RFN learning is a generalized alternating minimization algorithm based on the posterior regularization method which enforces non-negative and normalized posterior means. Each code unit represents a bicluster, where samples for which the code unit is active belong to the bicluster and features that have activating weights to the code unit belong to the bicluster.

FABIA: factor analysis for bicluster acquisition.

Sepp Hochreiter‎ et al.
Bioinformatics (Oxford, England)‎
2010‎

Biclustering of transcriptomic data groups genes and samples simultaneously. It is emerging as a standard tool for extracting knowledge from gene expression measurements. We propose a novel generative approach for biclustering called 'FABIA: Factor Analysis for Bicluster Acquisition'. FABIA is based on a multiplicative model, which accounts for linear dependencies between gene expression and conditions, and also captures heavy-tailed distributions as observed in real-world transcriptomic data. The generative framework allows to utilize well-founded model selection methods and to apply Bayesian techniques.

Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery.

Jaak Simm‎ et al.
Cell chemical biology‎
2018‎

In both academia and the pharmaceutical industry, large-scale assays for drug discovery are expensive and often impractical, particularly for the increasingly important physiologically relevant model systems that require primary cells, organoids, whole organisms, or expensive or rare reagents. We hypothesized that data from a single high-throughput imaging assay can be repurposed to predict the biological activity of compounds in other assays, even those targeting alternate pathways or biological processes. Indeed, quantitative information extracted from a three-channel microscopy-based screen for glucocorticoid receptor translocation was able to predict assay-specific biological activity in two ongoing drug discovery projects. In these projects, repurposing increased hit rates by 50- to 250-fold over that of the initial project assays while increasing the chemical structure diversity of the hits. Our results suggest that data from high-content screens are a rich source of information that can be used to predict and replace customized biological assays.

cn.FARMS: a latent variable model to detect copy number variations in microarray data with a low false discovery rate.

Djork-Arné Clevert‎ et al.
Nucleic acids research‎
2011‎

Cost-effective oligonucleotide genotyping arrays like the Affymetrix SNP 6.0 are still the predominant technique to measure DNA copy number variations (CNVs). However, CNV detection methods for microarrays overestimate both the number and the size of CNV regions and, consequently, suffer from a high false discovery rate (FDR). A high FDR means that many CNVs are wrongly detected and therefore not associated with a disease in a clinical study, though correction for multiple testing takes them into account and thereby decreases the study's discovery power. For controlling the FDR, we propose a probabilistic latent variable model, 'cn.FARMS', which is optimized by a Bayesian maximum a posteriori approach. cn.FARMS controls the FDR through the information gain of the posterior over the prior. The prior represents the null hypothesis of copy number 2 for all samples from which the posterior can only deviate by strong and consistent signals in the data. On HapMap data, cn.FARMS clearly outperformed the two most prevalent methods with respect to sensitivity and FDR. The software cn.FARMS is publicly available as a R package at http://www.bioinf.jku.at/software/cnfarms/cnfarms.html.

IBD Sharing between Africans, Neandertals, and Denisovans.

Gundula Povysil‎ et al.
Genome biology and evolution‎
2016‎

Interbreeding between ancestors of humans and other hominins outside of Africa has been studied intensively, while their common history within Africa still lacks proper attention. However, shedding light on human evolution in this time period about which little is known, is essential for understanding subsequent events outside of Africa. We investigate the genetic relationships of humans, Neandertals, and Denisovans by identifying very short DNA segments in the 1000 Genomes Phase 3 data that these hominins share identical by descent (IBD). By focusing on low frequency and rare variants, we identify very short IBD segments with high confidence. These segments reveal events from a very distant past because shorter IBD segments are presumably older than longer ones. We extracted two types of very old IBD segments that are not only shared among humans, but also with Neandertals and/or Denisovans. The first type contains longer segments that are found primarily in Asians and Europeans where more segments are found in South Asians than in East Asians for both Neandertal and Denisovan. These longer segments indicate complex admixture events outside of Africa. The second type consists of shorter segments that are shared mainly by Africans and therefore may indicate events involving ancestors of humans and other ancient hominins within Africa. Our results from the autosomes are further supported by an analysis of chromosome X, on which segments that are shared by Africans and match the Neandertal and/or Denisovan genome were even more prominent. Our results indicate that interbreeding with other hominins was a common feature of human evolution starting already long before ancestors of modern humans left Africa.

panelcn.MOPS: Copy-number detection in targeted NGS panel data for clinical diagnostics.

Gundula Povysil‎ et al.
Human mutation‎
2017‎

Targeted next-generation-sequencing (NGS) panels have largely replaced Sanger sequencing in clinical diagnostics. They allow for the detection of copy-number variations (CNVs) in addition to single-nucleotide variants and small insertions/deletions. However, existing computational CNV detection methods have shortcomings regarding accuracy, quality control (QC), incidental findings, and user-friendliness. We developed panelcn.MOPS, a novel pipeline for detecting CNVs in targeted NGS panel data. Using data from 180 samples, we compared panelcn.MOPS with five state-of-the-art methods. With panelcn.MOPS leading the field, most methods achieved comparably high accuracy. panelcn.MOPS reliably detected CNVs ranging in size from part of a region of interest (ROI), to whole genes, which may comprise all ROIs investigated in a given sample. The latter is enabled by analyzing reads from all ROIs of the panel, but presenting results exclusively for user-selected genes, thus avoiding incidental findings. Additionally, panelcn.MOPS offers QC criteria not only for samples, but also for individual ROIs within a sample, which increases the confidence in called CNVs. panelcn.MOPS is freely available both as R package and standalone software with graphical user interface that is easy to use for clinical geneticists without any programming experience. panelcn.MOPS combines high sensitivity and specificity with user-friendliness rendering it highly suitable for routine clinical diagnostics.

DEXUS: identifying differential expression in RNA-Seq studies with unknown conditions.

Günter Klambauer‎ et al.
Nucleic acids research‎
2013‎

Detection of differential expression in RNA-Seq data is currently limited to studies in which two or more sample conditions are known a priori. However, these biological conditions are typically unknown in cohort, cross-sectional and nonrandomized controlled studies such as the HapMap, the ENCODE or the 1000 Genomes project. We present DEXUS for detecting differential expression in RNA-Seq data for which the sample conditions are unknown. DEXUS models read counts as a finite mixture of negative binomial distributions in which each mixture component corresponds to a condition. A transcript is considered differentially expressed if modeling of its read counts requires more than one condition. DEXUS decomposes read count variation into variation due to noise and variation due to differential expression. Evidence of differential expression is measured by the informative/noninformative (I/NI) value, which allows differentially expressed transcripts to be extracted at a desired specificity (significance level) or sensitivity (power). DEXUS performed excellently in identifying differentially expressed transcripts in data with unknown conditions. On 2400 simulated data sets, I/NI value thresholds of 0.025, 0.05 and 0.1 yielded average specificities of 92, 97 and 99% at sensitivities of 76, 61 and 38%, respectively. On real-world data sets, DEXUS was able to detect differentially expressed transcripts related to sex, species, tissue, structural variants or quantitative trait loci. The DEXUS R package is publicly available from Bioconductor and the scripts for all experiments are available at http://www.bioinf.jku.at/software/dexus/.

Complex networks govern coiled-coil oligomerization--predicting and profiling by means of a machine learning approach.

Carsten C Mahrenholz‎ et al.
Molecular & cellular proteomics : MCP‎
2011‎

Understanding the relationship between protein sequence and structure is one of the great challenges in biology. In the case of the ubiquitous coiled-coil motif, structure and occurrence have been described in extensive detail, but there is a lack of insight into the rules that govern oligomerization, i.e. how many α-helices form a given coiled coil. To shed new light on the formation of two- and three-stranded coiled coils, we developed a machine learning approach to identify rules in the form of weighted amino acid patterns. These rules form the basis of our classification tool, PrOCoil, which also visualizes the contribution of each individual amino acid to the overall oligomeric tendency of a given coiled-coil sequence. We discovered that sequence positions previously thought irrelevant to direct coiled-coil interaction have an undeniable impact on stoichiometry. Our rules also demystify the oligomerization behavior of the yeast transcription factor GCN4, which can now be described as a hybrid--part dimer and part trimer--with both theoretical and experimental justification.

cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate.

Günter Klambauer‎ et al.
Nucleic acids research‎
2012‎

Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose 'Copy Number estimation by a Mixture Of PoissonS' (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1-FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor.

Furby: fuzzy force-directed bicluster visualization.

Marc Streit‎ et al.
BMC bioinformatics‎
2014‎

Cluster analysis is widely used to discover patterns in multi-dimensional data. Clustered heatmaps are the standard technique for visualizing one-way and two-way clustering results. In clustered heatmaps, rows and/or columns are reordered, resulting in a representation that shows the clusters as contiguous blocks. However, for biclustering results, where clusters can overlap, it is not possible to reorder the matrix in this way without duplicating rows and/or columns.

CLOOME: contrastive learning unlocks bioimaging databases for queries with chemical structures.

Ana Sanchez-Fernandez‎ et al.
Nature communications‎
2023‎

The field of bioimage analysis is currently impacted by a profound transformation, driven by the advancements in imaging technologies and artificial intelligence. The emergence of multi-modal AI systems could allow extracting and utilizing knowledge from bioimaging databases based on information from other data modalities. We leverage the multi-modal contrastive learning paradigm, which enables the embedding of both bioimages and chemical structures into a unified space by means of bioimage and molecular structure encoders. This common embedding space unlocks the possibility of querying bioimaging databases with chemical structures that induce different phenotypic effects. Concretely, in this work we show that a retrieval system based on multi-modal contrastive learning is capable of identifying the correct bioimage corresponding to a given chemical structure from a database of ~2000 candidate images with a top-1 accuracy >70 times higher than a random baseline. Additionally, the bioimage encoder demonstrates remarkable transferability to various further prediction tasks within the domain of drug discovery, such as activity prediction, molecule classification, and mechanism of action identification. Thus, our approach not only addresses the current limitations of bioimaging databases but also paves the way towards foundation models for microscopy images.

Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

DeepSynergy: predicting anti-cancer drug synergy with Deep Learning.

Genome-wide chromatin remodeling identified at GC-rich long nucleosome-free regions.

HapFABIA: identification of very short segments of identity by descent characterized by rare variants in large sequencing data.

Rectified factor networks for biclustering of omics data.

FABIA: factor analysis for bicluster acquisition.

Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery.

cn.FARMS: a latent variable model to detect copy number variations in microarray data with a low false discovery rate.

IBD Sharing between Africans, Neandertals, and Denisovans.

panelcn.MOPS: Copy-number detection in targeted NGS panel data for clinical diagnostics.

DEXUS: identifying differential expression in RNA-Seq studies with unknown conditions.

Complex networks govern coiled-coil oligomerization--predicting and profiling by means of a machine learning approach.

cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate.

Furby: fuzzy force-directed bicluster visualization.

CLOOME: contrastive learning unlocks bioimaging databases for queries with chemical structures.

SciCrunch.org Resources

Navigation

Logging in and Registering

Searching

Save Your Search

Query Expansion

Collections

Facets

Options

Further Questions

About

Recent News Entries

Contact Us

SciCrunch

Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

Log in

Log in

Literature

Current Facets and Filters

Options

Facets

Recent searches

.in-collection { color: green; } DeepSynergy: predicting anti-cancer drug synergy with Deep Learning.

.in-collection { color: green; } Genome-wide chromatin remodeling identified at GC-rich long nucleosome-free regions.

.in-collection { color: green; } HapFABIA: identification of very short segments of identity by descent characterized by rare variants in large sequencing data.

.in-collection { color: green; } Rectified factor networks for biclustering of omics data.

.in-collection { color: green; } FABIA: factor analysis for bicluster acquisition.

.in-collection { color: green; } Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery.

.in-collection { color: green; } cn.FARMS: a latent variable model to detect copy number variations in microarray data with a low false discovery rate.

.in-collection { color: green; } IBD Sharing between Africans, Neandertals, and Denisovans.

.in-collection { color: green; } panelcn.MOPS: Copy-number detection in targeted NGS panel data for clinical diagnostics.

.in-collection { color: green; } DEXUS: identifying differential expression in RNA-Seq studies with unknown conditions.

.in-collection { color: green; } Complex networks govern coiled-coil oligomerization--predicting and profiling by means of a machine learning approach.

.in-collection { color: green; } cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate.

.in-collection { color: green; } Furby: fuzzy force-directed bicluster visualization.

.in-collection { color: green; } CLOOME: contrastive learning unlocks bioimaging databases for queries with chemical structures.

SciCrunch.org Resources

Navigation

Logging in and Registering

Searching

Save Your Search

Query Expansion

Collections

Facets

Options

Further Questions

Publications Per Year

About

Recent News Entries

Contact Us

SciCrunch

DeepSynergy: predicting anti-cancer drug synergy with Deep Learning.

Genome-wide chromatin remodeling identified at GC-rich long nucleosome-free regions.

HapFABIA: identification of very short segments of identity by descent characterized by rare variants in large sequencing data.

Rectified factor networks for biclustering of omics data.

FABIA: factor analysis for bicluster acquisition.

Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery.

cn.FARMS: a latent variable model to detect copy number variations in microarray data with a low false discovery rate.

IBD Sharing between Africans, Neandertals, and Denisovans.

panelcn.MOPS: Copy-number detection in targeted NGS panel data for clinical diagnostics.

DEXUS: identifying differential expression in RNA-Seq studies with unknown conditions.

Complex networks govern coiled-coil oligomerization--predicting and profiling by means of a machine learning approach.

cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate.

Furby: fuzzy force-directed bicluster visualization.

CLOOME: contrastive learning unlocks bioimaging databases for queries with chemical structures.