Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

This service exclusively searches for literature that cites resources. Please be aware that the total number of searchable documents is limited to those containing RRIDs and does not include all open-access literature.

Search

Type in a keyword to search

On page 1 showing 1 ~ 20 papers out of 1,287 papers

Multiple-Choice Item Distractor Development Using Topic Modeling Approaches.

  • Jinnie Shin‎ et al.
  • Frontiers in psychology‎
  • 2019‎

Writing a high-quality, multiple-choice test item is a complex process. Creating plausible but incorrect options for each item poses significant challenges for the content specialist because this task is often undertaken without implementing a systematic method. In the current study, we describe and demonstrate a systematic method for creating plausible but incorrect options, also called distractors, based on students' misconceptions. These misconceptions are extracted from the labeled written responses. One thousand five hundred and fifteen written responses from an existing constructed-response item in Biology from Grade 10 students were used to demonstrate the method. Using a topic modeling procedure commonly used with machine learning and natural language processing called latent dirichlet allocation, 22 plausible misconceptions from students' written responses were identified and used to produce a list of plausible distractors based on students' responses. These distractors, in turn, were used as part of new multiple-choice items. Implications for item development are discussed.


Chemical Similarity Enrichment Analysis (ChemRICH) as alternative to biochemical pathway mapping for metabolomic datasets.

  • Dinesh Kumar Barupal‎ et al.
  • Scientific reports‎
  • 2017‎

Metabolomics answers a fundamental question in biology: How does metabolism respond to genetic, environmental or phenotypic perturbations? Combining several metabolomics assays can yield datasets for more than 800 structurally identified metabolites. However, biological interpretations of metabolic regulation in these datasets are hindered by inherent limits of pathway enrichment statistics. We have developed ChemRICH, a statistical enrichment approach that is based on chemical similarity rather than sparse biochemical knowledge annotations. ChemRICH utilizes structure similarity and chemical ontologies to map all known metabolites and name metabolic modules. Unlike pathway mapping, this strategy yields study-specific, non-overlapping sets of all identified metabolites. Subsequent enrichment statistics is superior to pathway enrichments because ChemRICH sets have a self-contained size where p-values do not rely on the size of a background database. We demonstrate ChemRICH's efficiency on a public metabolomics data set discerning the development of type 1 diabetes in a non-obese diabetic mouse model. ChemRICH is available at www.chemrich.fiehnlab.ucdavis.edu.


Using topic modeling to detect cellular crosstalk in scRNA-seq.

  • Alexandrina Pancheva‎ et al.
  • PLoS computational biology‎
  • 2022‎

Cell-cell interactions are vital for numerous biological processes including development, differentiation, and response to inflammation. Currently, most methods for studying interactions on scRNA-seq level are based on curated databases of ligands and receptors. While those methods are useful, they are limited to our current biological knowledge. Recent advances in single cell protocols have allowed for physically interacting cells to be captured, and as such we have the potential to study interactions in a complemantary way without relying on prior knowledge. We introduce a new method based on Latent Dirichlet Allocation (LDA) for detecting genes that change as a result of interaction. We apply our method to synthetic datasets to demonstrate its ability to detect genes that change in an interacting population compared to a reference population. Next, we apply our approach to two datasets of physically interacting cells to identify the genes that change as a result of interaction, examples include adhesion and co-stimulatory molecules which confirm physical interaction between cells. For each dataset we produce a ranking of genes that are changing in subpopulations of the interacting cells. In addition to the genes discussed in the original publications, we highlight further candidates for interaction in the top 100 and 300 ranked genes. Lastly, we apply our method to a dataset generated by a standard droplet-based protocol not designed to capture interacting cells, and discuss its suitability for analysing interactions. We present a method that streamlines detection of interactions and does not require prior clustering and generation of synthetic reference profiles to detect changes in expression.


Text mining in a literature review of urothelial cancer using topic model.

  • Hsuan-Jen Lin‎ et al.
  • BMC cancer‎
  • 2020‎

Urothelial cancer (UC) includes carcinomas of the bladder, ureters, and renal pelvis. New treatments and biomarkers of UC emerged in this decade. To identify the key information in a vast amount of literature can be challenging. In this study, we use text mining to explore UC publications to identify important information that may lead to new research directions.


Looking at the Full Picture: Utilizing Topic Modeling to Determine Disease-Associated Microbiome Communities.

  • Rachel L Shrode‎ et al.
  • bioRxiv : the preprint server for biology‎
  • 2023‎

The microbiome is a complex micro-ecosystem that provides the host with pathogen defense, food metabolism, and other vital processes. Alterations of the microbiome (dysbiosis) have been linked with a number of diseases such as cancers, multiple sclerosis (MS), Alzheimer's disease, etc. Generally, differential abundance testing between the healthy and patient groups is performed to identify important bacteria (enriched or depleted in one group). However, simply providing a singular species of bacteria to an individual lacking that species for health improvement has not been as successful as fecal matter transplant (FMT) therapy. Interestingly, FMT therapy transfers the entire gut microbiome of a healthy (or mixture of) individual to an individual with a disease. FMTs do, however, have limited success, possibly due to concerns that not all bacteria in the community may be responsible for the healthy phenotype. Therefore, it is important to identify the community of microorganisms linked to the health as well as the disease state of the host. Here we applied topic modeling, a natural language processing tool, to assess latent interactions occurring among microbes; thus, providing a representation of the community of bacteria relevant to healthy vs. disease state. Specifically, we utilized our previously published data that studied the gut microbiome of patients with relapsing-remitting MS (RRMS), a neurodegenerative autoimmune disease that has been linked to a variety of factors, including a dysbiotic gut microbiome. With topic modeling we identified communities of bacteria associated with RRMS, including genera previously discovered, but also other taxa that would have been overlooked simply with differential abundance testing. Our work shows that topic modeling can be a useful tool for analyzing the microbiome in dysbiosis and that it could be considered along with the commonly utilized differential abundance tests to better understand the role of the gut microbiome in health and disease.


Sugar Signaling and Post-transcriptional Regulation in Plants: An Overlooked or an Emerging Topic?

  • Ming Wang‎ et al.
  • Frontiers in plant science‎
  • 2020‎

Plants are autotrophic organisms that self-produce sugars through photosynthesis. These sugars serve as an energy source, carbon skeletons, and signaling entities throughout plants' life. Post-transcriptional regulation of gene expression plays an important role in various sugar-related processes. In cells, it is regulated by many factors, such as RNA-binding proteins (RBPs), microRNAs, the spliceosome, etc. To date, most of the investigations into sugar-related gene expression have been focused on the transcriptional level in plants, while only a few studies have been conducted on post-transcriptional mechanisms. The present review provides an overview of the relationships between sugar and post-transcriptional regulation in plants. It addresses the relationships between sugar signaling and RBPs, microRNAs, and mRNA stability. These new items insights will help to reach a comprehensive understanding of the diversity of sugar signaling regulatory networks, and open onto new investigations into the relevance of these regulations for plant growth and development.


Age-dependent topic modeling of comorbidities in UK Biobank identifies disease subtypes with differential genetic risk.

  • Xilin Jiang‎ et al.
  • Nature genetics‎
  • 2023‎

The analysis of longitudinal data from electronic health records (EHRs) has the potential to improve clinical diagnoses and enable personalized medicine, motivating efforts to identify disease subtypes from patient comorbidity information. Here we introduce an age-dependent topic modeling (ATM) method that provides a low-rank representation of longitudinal records of hundreds of distinct diseases in large EHR datasets. We applied ATM to 282,957 UK Biobank samples, identifying 52 diseases with heterogeneous comorbidity profiles; analyses of 211,908 All of Us samples produced concordant results. We defined subtypes of the 52 heterogeneous diseases based on their comorbidity profiles and compared genetic risk across disease subtypes using polygenic risk scores (PRSs), identifying 18 disease subtypes whose PRS differed significantly from other subtypes of the same disease. We further identified specific genetic variants with subtype-dependent effects on disease risk. In conclusion, ATM identifies disease subtypes with differential genome-wide and locus-specific genetic risk profiles.


Topic modeling for multi-omic integration in the human gut microbiome and implications for Autism.

  • Christine Tataru‎ et al.
  • Scientific reports‎
  • 2023‎

While healthy gut microbiomes are critical to human health, pertinent microbial processes remain largely undefined, partially due to differential bias among profiling techniques. By simultaneously integrating multiple profiling methods, multi-omic analysis can define generalizable microbial processes, and is especially useful in understanding complex conditions such as Autism. Challenges with integrating heterogeneous data produced by multiple profiling methods can be overcome using Latent Dirichlet Allocation (LDA), a promising natural language processing technique that identifies topics in heterogeneous documents. In this study, we apply LDA to multi-omic microbial data (16S rRNA amplicon, shotgun metagenomic, shotgun metatranscriptomic, and untargeted metabolomic profiling) from the stool of 81 children with and without Autism. We identify topics, or microbial processes, that summarize complex phenomena occurring within gut microbial communities. We then subset stool samples by topic distribution, and identify metabolites, specifically neurotransmitter precursors and fatty acid derivatives, that differ significantly between children with and without Autism. We identify clusters of topics, deemed "cross-omic topics", which we hypothesize are representative of generalizable microbial processes observable regardless of profiling method. Interpreting topics, we find each represents a particular diet, and we heuristically label each cross-omic topic as: healthy/general function, age-associated function, transcriptional regulation, and opportunistic pathogenesis.


Single-cell multi-omic topic embedding reveals cell-type-specific and COVID-19 severity-related immune signatures.

  • Manqi Zhou‎ et al.
  • bioRxiv : the preprint server for biology‎
  • 2023‎

The advent of single-cell multi-omics sequencing technology makes it possible for researchers to leverage multiple modalities for individual cells and explore cell heterogeneity. However, the high dimensional, discrete, and sparse nature of the data make the downstream analysis particularly challenging. Most of the existing computational methods for single-cell data analysis are either limited to single modality or lack flexibility and interpretability. In this study, we propose an interpretable deep learning method called multi-omic embedded topic model (moETM) to effectively perform integrative analysis of high-dimensional single-cell multimodal data. moETM integrates multiple omics data via a product-of-experts in the encoder for efficient variational inference and then employs multiple linear decoders to learn the multi-omic signatures of the gene regulatory programs. Through comprehensive experiments on public single-cell transcriptome and chromatin accessibility data (i.e., scRNA+scATAC), as well as scRNA and proteomic data (i.e., CITE-seq), moETM demonstrates superior performance compared with six state-of-the-art single-cell data analysis methods on seven publicly available datasets. By applying moETM to the scRNA+scATAC data in human peripheral blood mononuclear cells (PBMCs), we identified sequence motifs corresponding to the transcription factors that regulate immune gene signatures. Applying moETM analysis to CITE-seq data from the COVID-19 patients revealed not only known immune cell-type-specific signatures but also composite multi-omic biomarkers of critical conditions due to COVID-19, thus providing insights from both biological and clinical perspectives.


Capturing cell type-specific chromatin compartment patterns by applying topic modeling to single-cell Hi-C data.

  • Hyeon-Jin Kim‎ et al.
  • PLoS computational biology‎
  • 2020‎

Single-cell Hi-C (scHi-C) interrogates genome-wide chromatin interaction in individual cells, allowing us to gain insights into 3D genome organization. However, the extremely sparse nature of scHi-C data poses a significant barrier to analysis, limiting our ability to tease out hidden biological information. In this work, we approach this problem by applying topic modeling to scHi-C data. Topic modeling is well-suited for discovering latent topics in a collection of discrete data. For our analysis, we generate nine different single-cell combinatorial indexed Hi-C (sci-Hi-C) libraries from five human cell lines (GM12878, H1Esc, HFF, IMR90, and HAP1), consisting over 19,000 cells. We demonstrate that topic modeling is able to successfully capture cell type differences from sci-Hi-C data in the form of "chromatin topics." We further show enrichment of particular compartment structures associated with locus pairs in these topics.


Determining similarity of scientific entities in annotation datasets.

  • Guillermo Palma‎ et al.
  • Database : the journal of biological databases and curation‎
  • 2015‎

Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug-drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called 'AnnSim' that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1-1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/


Single-cell multi-omics topic embedding reveals cell-type-specific and COVID-19 severity-related immune signatures.

  • Manqi Zhou‎ et al.
  • Cell reports methods‎
  • 2023‎

The advent of single-cell multi-omics sequencing technology makes it possible for researchers to leverage multiple modalities for individual cells and explore cell heterogeneity. However, the high-dimensional, discrete, and sparse nature of the data make the downstream analysis particularly challenging. Here, we propose an interpretable deep learning method called moETM to perform integrative analysis of high-dimensional single-cell multimodal data. moETM integrates multiple omics data via a product-of-experts in the encoder and employs multiple linear decoders to learn the multi-omics signatures. moETM demonstrates superior performance compared with six state-of-the-art methods on seven publicly available datasets. By applying moETM to the scRNA + scATAC data, we identified sequence motifs corresponding to the transcription factors regulating immune gene signatures. Applying moETM to CITE-seq data from the COVID-19 patients revealed not only known immune cell-type-specific signatures but also composite multi-omics biomarkers of critical conditions due to COVID-19, thus providing insights from both biological and clinical perspectives.


A computational approach to qualitative analysis in large textual datasets.

  • Michael S Evans‎
  • PloS one‎
  • 2014‎

In this paper I introduce computational techniques to extend qualitative analysis into the study of large textual datasets. I demonstrate these techniques by using probabilistic topic modeling to analyze a broad sample of 14,952 documents published in major American newspapers from 1980 through 2012. I show how computational data mining techniques can identify and evaluate the significance of qualitatively distinct subjects of discussion across a wide range of public discourse. I also show how examining large textual datasets with computational methods can overcome methodological limitations of conventional qualitative methods, such as how to measure the impact of particular cases on broader discourse, how to validate substantive inferences from small samples of textual data, and how to determine if identified cases are part of a consistent temporal pattern.


Topic Application of the Probiotic Streptococcus dentisani Improves Clinical and Microbiological Parameters Associated With Oral Health.

  • María D Ferrer‎ et al.
  • Frontiers in cellular and infection microbiology‎
  • 2020‎

Streptococcus dentisani 7746, isolated from dental plaque of caries-free individuals, has been shown to have several beneficial effects in vitro which could contribute to promote oral health, including an antimicrobial activity against oral pathogens by the production of bacteriocins and a pH buffering capacity through ammonia production. Previous work has shown that S. dentisani was able to colonize the oral cavity for 2-4 weeks after application. The aim of the present work was to evaluate its clinical efficacy by a randomized, double-blind, placebo-controlled parallel group study. Fifty nine volunteers were enrolled in the study and randomly assigned to a treatment or placebo group. The treatment consisted of a bucco-adhesive gel application (2.5 109 cfu/dose) with a dental splint for 5 min every 48 h, for a period of 1 month (i.e., 14 doses). Dental plaque and saliva samples were collected at baseline, 15 and 30 days after first application, and 15 days after the end of treatment. At baseline, there was a significant correlation between S. dentisani levels and frequency of toothbrushing. Salivary flow, a major factor influencing oral health, was significantly higher in the probiotic group at day 15 compared with the placebo (4.4 and 3.4 ml/5 min, respectively). In the probiotic group, there was a decrease in the amount of dental plaque and in gingival inflammation, but no differences were observed in the placebo group. The probiotic group showed a significant increase in the levels of salivary ammonia and calcium. Finally, Illumina sequencing of plaque samples showed a beneficial shift in bacterial composition at day 30 relative to baseline, with a reduction of several cariogenic organisms and the key players in plaque formation, probably as a result of bacteriocins production. Only 58% of the participants in the probiotic group showed increased plaque levels of S. dentisani at day 30 and 71% by day 45, indicating that the benefits of S. dentisani application could be augmented by improving colonization efficiency. In conclusion, the application of S. dentisani 7746 improved several clinical and microbiological parameters associated with oral health, supporting its use as a probiotic to prevent tooth decay.


Measuring the strength of the horned passalus beetle, Odontotaenius disjunctus: revisiting an old topic with modern technology.

  • Andrew K Davis‎ et al.
  • Journal of insect science (Online)‎
  • 2013‎

Over a century ago, a pioneering researcher cleverly devised a means to measure how much weight the horned passalus beetle, Odontotaenius disjunctus (Illiger) (Coleoptera: Passalidae), could pull using a series of springs, pulleys, and careful observation. The technology available in modern times now allows for more rigorous data collection on this topic, which could have a number of uses in scientific investigations. In this study, an apparatus was constructed using a dynamometer and a data logger in an effort to ascertain the pulling strength of this species. By allowing beetles to pull for 10 min, each beetle's mean and maximum pulling force (in Newtons) were obtained for analyses, and whether these measures are related was determined. Then, whether factors such as body length, thorax size, horn size, or gender affect either measure of strength was investigated. Basic body measurements, including horn size, of males versus females were compared. The measurements of 38 beetles (20 females, 18 males) showed there was no difference in overall body length between sexes, but females had greater girth (thorax width) than males, which could translate into larger muscle mass. A total of 21 beetles (10 females, 11 males) were tested for pulling strength. The grand mean pulling force was 0.14 N, and the grand mean maximum was 0.78 N. Despite the fact that beetles tended to pull at 20% of their maximum capacity most of the time, and that maximum force was over 5 times larger than the mean force, the 2 measures were highly correlated, suggesting they may be interchangeable for research purposes. Females had twice the pulling strength (both maximum and mean force) as males in this species overall, but when the larger thorax size of females was considered, the effect of gender was not significant. Beetle length was not a significant predictor of pulling force, but horn size was associated with maximum force. The best predictor of both measures of strength appeared to be thorax size. There are a multitude of interesting scientific questions that could be addressed using data on beetle pulling strength, and this project serves as a starting point for such work.


TogoID: an exploratory ID converter to bridge biological datasets.

  • Shuya Ikeda‎ et al.
  • Bioinformatics (Oxford, England)‎
  • 2022‎

Understanding life cannot be accomplished without making full use of biological data, which are scattered across databases of diverse categories in life sciences. To connect such data seamlessly, identifier (ID) conversion plays a key role. However, existing ID conversion services have disadvantages, such as covering only a limited range of biological categories of databases, not keeping up with the updates of the original databases and outputs being hard to interpret in the context of biological relations, especially when converting IDs in multiple steps.


GTM-decon: guided-topic modeling of single-cell transcriptomes enables sub-cell-type and disease-subtype deconvolution of bulk transcriptomes.

  • Lakshmipuram Seshadri Swapna‎ et al.
  • Genome biology‎
  • 2023‎

Cell-type composition is an important indicator of health. We present Guided Topic Model for deconvolution (GTM-decon) to automatically infer cell-type-specific gene topic distributions from single-cell RNA-seq data for deconvolving bulk transcriptomes. GTM-decon performs competitively on deconvolving simulated and real bulk data compared with the state-of-the-art methods. Moreover, as demonstrated in deconvolving disease transcriptomes, GTM-decon can infer multiple cell-type-specific gene topic distributions per cell type, which captures sub-cell-type variations. GTM-decon can also use phenotype labels from single-cell or bulk data to infer phenotype-specific gene distributions. In a nested-guided design, GTM-decon identified cell-type-specific differentially expressed genes from bulk breast cancer transcriptomes.


Improving broad-coverage medical entity linking with semantic type prediction and large-scale datasets.

  • Shikhar Vashishth‎ et al.
  • Journal of biomedical informatics‎
  • 2021‎

Biomedical natural language processing tools are increasingly being applied for broad-coverage information extraction-extracting medical information of all types in a scientific document or a clinical note. In such broad-coverage settings, linking mentions of medical concepts to standardized vocabularies requires choosing the best candidate concepts from large inventories covering dozens of types. This study presents a novel semantic type prediction module for biomedical NLP pipelines and two automatically-constructed, large-scale datasets with broad coverage of semantic types.


Open science datasets from PREVENT-AD, a longitudinal cohort of pre-symptomatic Alzheimer's disease.

  • Jennifer Tremblay-Mercier‎ et al.
  • NeuroImage. Clinical‎
  • 2021‎

To move Alzheimer Disease (AD) research forward it is essential to collect data from large cohorts, but also make such data available to the global research community. We describe the creation of an open science dataset from the PREVENT-AD (PResymptomatic EValuation of Experimental or Novel Treatments for AD) cohort, composed of cognitively unimpaired older individuals with a parental or multiple-sibling history of AD. From 2011 to 2017, 386 participants were enrolled (mean age 63 years old ± 5) for sustained investigation among whom 349 have retrospectively agreed to share their data openly. Repositories are findable through the unified interface of the Canadian Open Neuroscience Platform and contain up to five years of longitudinal imaging data, cerebral fluid biochemistry, neurosensory capacities, cognitive, genetic, and medical information. Imaging data can be accessed openly at https://openpreventad.loris.ca while most of the other information, sensitive by nature, is accessible by qualified researchers at https://registeredpreventad.loris.ca. In addition to being a living resource for continued data acquisition, PREVENT-AD offers opportunities to facilitate understanding of AD pathogenesis.


Racial underrepresentation in dermatological datasets leads to biased machine learning models and inequitable healthcare.

  • Giona Kleinberg‎ et al.
  • Journal of biomed research‎
  • 2022‎

Clinical applications of machine learning are promising as a tool to improve patient outcomes through assisting diagnoses, treatment, and analyzing risk factors for screening. Possible clinical applications are especially prominent in dermatology as many diseases and conditions present visually. This allows a machine learning model to analyze and diagnose conditions using patient images and data from electronic health records (EHRs) after training on clinical datasets but could also introduce bias. Despite promising applications, artificial intelligence has the capacity to exacerbate existing demographic disparities in healthcare if models are trained on biased datasets.


  1. SciCrunch.org Resources

    Welcome to the FDI Lab - SciCrunch.org Resources search. From here you can search through a compilation of resources used by FDI Lab - SciCrunch.org and see how data is organized within our community.

  2. Navigation

    You are currently on the Community Resources tab looking through categories and sources that FDI Lab - SciCrunch.org has compiled. You can navigate through those categories from here or change to a different tab to execute your search through. Each tab gives a different perspective on data.

  3. Logging in and Registering

    If you have an account on FDI Lab - SciCrunch.org then you can log in from here to get additional features in FDI Lab - SciCrunch.org such as Collections, Saved Searches, and managing Resources.

  4. Searching

    Here is the search term that is being executed, you can type in anything you want to search for. Some tips to help searching:

    1. Use quotes around phrases you want to match exactly
    2. You can manually AND and OR terms to change how we search between words
    3. You can add "-" to terms to make sure no results return with that term in them (ex. Cerebellum -CA1)
    4. You can add "+" to terms to require they be in the data
    5. Using autocomplete specifies which branch of our semantics you with to search and can help refine your search
  5. Save Your Search

    You can save any searches you perform for quick access to later from here.

  6. Query Expansion

    We recognized your search term and included synonyms and inferred terms along side your term to help get the data you are looking for.

  7. Collections

    If you are logged into FDI Lab - SciCrunch.org you can add data records to your collections to create custom spreadsheets across multiple sources of data.

  8. Facets

    Here are the facets that you can filter your papers by.

  9. Options

    From here we'll present any options for the literature, such as exporting your current results.

  10. Further Questions

    If you have any further questions please check out our FAQs Page to ask questions and see our tutorials. Click this button to view this tutorial again.

Publications Per Year

X

Year:

Count: