Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

This service exclusively searches for literature that cites resources. Please be aware that the total number of searchable documents is limited to those containing RRIDs and does not include all open-access literature.

Search

Type in a keyword to search

On page 1 showing 1 ~ 20 papers out of 78,733 papers

Evaluating statistical analysis models for RNA sequencing experiments.

  • Pablo D Reeb‎ et al.
  • Frontiers in genetics‎
  • 2013‎

Validating statistical analysis methods for RNA sequencing (RNA-seq) experiments is a complex task. Researchers often find themselves having to decide between competing models or assessing the reliability of results obtained with a designated analysis program. Computer simulation has been the most frequently used procedure to verify the adequacy of a model. However, datasets generated by simulations depend on the parameterization and the assumptions of the selected model. Moreover, such datasets may constitute a partial representation of reality as the complexity or RNA-seq data is hard to mimic. We present the use of plasmode datasets to complement the evaluation of statistical models for RNA-seq data. A plasmode is a dataset obtained from experimental data but for which come truth is known. Using a set of simulated scenarios of technical and biological replicates, and public available datasets, we illustrate how to design algorithms to construct plasmodes under different experimental conditions. We contrast results from two types of methods for RNA-seq: (1) models based on negative binomial distribution (edgeR and DESeq), and (2) Gaussian models applied after transformation of data (MAANOVA). Results emphasize the fact that deciding what method to use may be experiment-specific due to the unknown distributions of expression levels. Plasmodes may contribute to choose which method to apply by using a similar pre-existing dataset. The promising results obtained from this approach, emphasize the need of promoting and improving systematic data sharing across the research community to facilitate plasmode building. Although we illustrate the use of plasmode for comparing differential expression analysis models, the flexibility of plasmode construction allows comparing upstream analysis, as normalization procedures or alignment pipelines, as well.


Wastewater Quality Estimation Through Spectrophotometry-Based Statistical Models.

  • Daniel Carreres-Prieto‎ et al.
  • Sensors (Basel, Switzerland)‎
  • 2020‎

Local administrations are increasingly demanding real-time continuous monitoring of pollution in the sanitation system to improve and optimize its operation, to comply with EU environmental policies and to reach European Green Deal targets. The present work shows a full-scale Wastewater Treatment Plant field-sampling campaign to estimate COD, BOD5, TSS, P, TN and NO3-N in both influent and effluent, in the absence of pre-treatment or chemicals addition to the samples, resulting in a reduction of the duration and cost of analysis. Different regression models were developed to estimate the pollution load of sewage systems from the spectral response of wastewater samples measured at 380-700 nm through multivariate linear regressions and machine learning genetic algorithms. The tests carried out concluded that the models calculated by means of genetic algorithms can estimate the levels of five of the pollutants under study (COD, BOD5, TSS, TN and NO3-N), including both raw and treated wastewater, with an error rate below 4%. In the case of the multilinear regression models, these are limited to raw water and the estimate is limited to COD and TSS, with less than a 0.5% error rate.


Six Degrees of Epistasis: Statistical Network Models for GWAS.

  • B A McKinney‎ et al.
  • Frontiers in genetics‎
  • 2011‎

There is growing evidence that much more of the genome than previously thought is required to explain the heritability of complex phenotypes. Recent studies have demonstrated that numerous common variants from across the genome explain portions of genetic variability, spawning various avenues of research directed at explaining the remaining heritability. This polygenic structure is also the motivation for the growing application of pathway and gene set enrichment techniques, which have yielded promising results. These findings suggest that the coordination of genes in pathways that are known to occur at the gene regulatory level also can be detected at the population level. Although genes in these networks interact in complex ways, most population studies have focused on the additive contribution of common variants and the potential of rare variants to explain additional variation. In this brief review, we discuss the potential to explain additional genetic variation through the agglomeration of multiple gene-gene interactions as well as main effects of common variants in terms of a network paradigm. Just as is the case for single-locus contributions, we expect each gene-gene interaction edge in the network to have a small effect, but these effects may be reinforced through hubs and other connectivity structures in the network. We discuss some of the opportunities and challenges of network methods for analyzing genome-wide association studies (GWAS) such as the study of hubs and motifs, and integrating other types of variation and environmental interactions. Such network approaches may unveil hidden variation in GWAS, improve understanding of mechanisms of disease, and possibly fit into a network paradigm of evolutionary genetics.


Statistical representation models for mutation information within genomic data.

  • N Özlem Özcan Şimşek‎ et al.
  • BMC bioinformatics‎
  • 2019‎

As DNA sequencing technologies are improving and getting cheaper, genomic data can be utilized for diagnosis of many diseases such as cancer. Human raw genome data is huge in size for computational systems. Therefore, there is a need for a compact and accurate representation of the valuable information in DNA. The occurrence of complex genetic disorders often results from multiple gene mutations. The effect of each mutation is not equal for the development of a disease. Inspired from the field of information retrieval, we propose using the term frequency (tf) and BM25 term weighting measures with the inverse document frequency (idf) and relevance frequency (rf) measures to weight genes based on their mutations. The underlying assumption is that the more mutations a gene has in patients with a certain disease and the less mutations it has in other patients, the more discriminative that gene is.


Statistical Models for Predicting Threat Detection From Human Behavior.

  • Timothy Kelley‎ et al.
  • Frontiers in psychology‎
  • 2018‎

Users must regularly distinguish between secure and insecure cyber platforms in order to preserve their privacy and safety. Mouse tracking is an accessible, high-resolution measure that can be leveraged to understand the dynamics of perception, categorization, and decision-making in threat detection. Researchers have begun to utilize measures like mouse tracking in cyber security research, including in the study of risky online behavior. However, it remains an empirical question to what extent real-time information about user behavior is predictive of user outcomes and demonstrates added value compared to traditional self-report questionnaires. Participants navigated through six simulated websites, which resembled either secure "non-spoof" or insecure "spoof" versions of popular websites. Websites also varied in terms of authentication level (i.e., extended validation, standard validation, or partial encryption). Spoof websites had modified Uniform Resource Locator (URL) and authentication level. Participants chose to "login" to or "back" out of each website based on perceived website security. Mouse tracking information was recorded throughout the task, along with task performance. After completing the website identification task, participants completed a questionnaire assessing their security knowledge and degree of familiarity with the websites simulated during the experiment. Despite being primed to the possibility of website phishing attacks, participants generally showed a bias for logging in to websites versus backing out of potentially dangerous sites. Along these lines, participant ability to identify spoof websites was around the level of chance. Hierarchical Bayesian logistic models were used to compare the accuracy of two-factor (i.e., website security and encryption level), survey-based (i.e., security knowledge and website familiarity), and real-time measures (i.e., mouse tracking) in predicting risky online behavior during phishing attacks. Participant accuracy in identifying spoof and non-spoof websites was best captured using a model that included real-time indicators of decision-making behavior, as compared to two-factor and survey-based models. Findings validate three widely applicable measures of user behavior derived from mouse tracking recordings, which can be utilized in cyber security and user intervention research. Survey data alone are not as strong at predicting risky Internet behavior as models that incorporate real-time measures of user behavior, such as mouse tracking.


Statistical quantification of confounding bias in machine learning models.

  • Tamas Spisak‎
  • GigaScience‎
  • 2022‎

The lack of nonparametric statistical tests for confounding bias significantly hampers the development of robust, valid, and generalizable predictive models in many fields of research. Here I propose the partial confounder test, which, for a given confounder variable, probes the null hypotheses of the model being unconfounded.


Bridging the gaps in statistical models of protein alignment.

  • Dinithi Sumanaweera‎ et al.
  • Bioinformatics (Oxford, England)‎
  • 2022‎

Sequences of proteins evolve by accumulating substitutions together with insertions and deletions (indels) of amino acids. However, it remains a common practice to disconnect substitutions and indels, and infer approximate models for each of them separately, to quantify sequence relationships. Although this approach brings with it computational convenience (which remains its primary motivation), there is a dearth of attempts to unify and model them systematically and together. To overcome this gap, this article demonstrates how a complete statistical model quantifying the evolution of pairs of aligned proteins can be constructed using a time-parameterized substitution matrix and a time-parameterized alignment state machine. Methods to derive all parameters of such a model from any benchmark collection of aligned protein sequences are described here. This has not only allowed us to generate a unified statistical model for each of the nine widely used substitution matrices (PAM, JTT, BLOSUM, JO, WAG, VTML, LG, MIQS and PFASUM), but also resulted in a new unified model, MMLSUM. Our underlying methodology measures the Shannon information content using each model to explain losslessly any given collection of alignments, which has allowed us to quantify the performance of all the above models on six comprehensive alignment benchmarks. Our results show that MMLSUM results in a new and clear overall best performance, followed by PFASUM, VTML, BLOSUM and MIQS, respectively, amongst the top five. We further analyze the statistical properties of MMLSUM model and contrast it with others.


An efficient simulator of 454 data using configurable statistical models.

  • Fredrik Lysholm‎ et al.
  • BMC research notes‎
  • 2011‎

Roche 454 is one of the major 2nd generation sequencing platforms. The particular characteristics of 454 sequence data pose new challenges for bioinformatic analyses, e.g. assembly and alignment search algorithms. Simulation of these data is therefore useful, in order to further assess how bioinformatic applications and algorithms handle 454 data.


Statistical Models for Tornado Climatology: Long and Short-Term Views.

  • James B Elsner‎ et al.
  • PloS one‎
  • 2016‎

This paper estimates regional tornado risk from records of past events using statistical models. First, a spatial model is fit to the tornado counts aggregated in counties with terms that control for changes in observational practices over time. Results provide a long-term view of risk that delineates the main tornado corridors in the United States where the expected annual rate exceeds two tornadoes per 10,000 square km. A few counties in the Texas Panhandle and central Kansas have annual rates that exceed four tornadoes per 10,000 square km. Refitting the model after removing the least damaging tornadoes from the data (EF0) produces a similar map but with the greatest tornado risk shifted south and eastward. Second, a space-time model is fit to the counts aggregated in raster cells with terms that control for changes in climate factors. Results provide a short-term view of risk. The short-term view identifies a shift of tornado activity away from the Ohio Valley under El Niño conditions and away from the Southeast under positive North Atlantic oscillation conditions. The combined predictor effects on the local rates is quantified by fitting the model after leaving out the year to be predicted from the data. The models provide state-of-the-art views of tornado risk that can be used by government agencies, the insurance industry, and the general public.


Fast optimization of statistical potentials for structurally constrained phylogenetic models.

  • Cécile Bonnard‎ et al.
  • BMC evolutionary biology‎
  • 2009‎

Statistical approaches for protein design are relevant in the field of molecular evolutionary studies. In recent years, new, so-called structurally constrained (SC) models of protein-coding sequence evolution have been proposed, which use statistical potentials to assess sequence-structure compatibility. In a previous work, we defined a statistical framework for optimizing knowledge-based potentials especially suited to SC models. Our method used the maximum likelihood principle and provided what we call the joint potentials. However, the method required numerical estimations by the use of computationally heavy Markov Chain Monte Carlo sampling algorithms.


Translating statistical species-habitat models to interactive decision support tools.

  • Lyndsie S Wszola‎ et al.
  • PloS one‎
  • 2017‎

Understanding species-habitat relationships is vital to successful conservation, but the tools used to communicate species-habitat relationships are often poorly suited to the information needs of conservation practitioners. Here we present a novel method for translating a statistical species-habitat model, a regression analysis relating ring-necked pheasant abundance to landcover, into an interactive online tool. The Pheasant Habitat Simulator combines the analytical power of the R programming environment with the user-friendly Shiny web interface to create an online platform in which wildlife professionals can explore the effects of variation in local landcover on relative pheasant habitat suitability within spatial scales relevant to individual wildlife managers. Our tool allows users to virtually manipulate the landcover composition of a simulated space to explore how changes in landcover may affect pheasant relative habitat suitability, and guides users through the economic tradeoffs of landscape changes. We offer suggestions for development of similar interactive applications and demonstrate their potential as innovative science delivery tools for diverse professional and public audiences.


Comparison and evaluation of statistical error models for scRNA-seq.

  • Saket Choudhary‎ et al.
  • Genome biology‎
  • 2022‎

Heterogeneity in single-cell RNA-seq (scRNA-seq) data is driven by multiple sources, including biological variation in cellular state as well as technical variation introduced during experimental processing. Deconvolving these effects is a key challenge for preprocessing workflows. Recent work has demonstrated the importance and utility of count models for scRNA-seq analysis, but there is a lack of consensus on which statistical distributions and parameter settings are appropriate.


Protein family comparison using statistical models and predicted structural information.

  • Richard Chung‎ et al.
  • BMC bioinformatics‎
  • 2004‎

This paper presents a simple method to increase the sensitivity of protein family comparisons by incorporating secondary structure (SS) information. We build upon the effective information theory approach towards profile-profile comparison described in [Yona & Levitt 2002]. Our method augments profile columns using PSIPRED secondary structure predictions and assesses statistical similarity using information theoretical principles.


Statistical Shape and Appearance Models: Development Towards Improved Osteoporosis Care.

  • Lorenzo Grassi‎ et al.
  • Current osteoporosis reports‎
  • 2021‎

Statistical models of shape and appearance have increased their popularity since the 1990s and are today highly prevalent in the field of medical image analysis. In this article, we review the recent literature about how statistical models have been applied in the context of osteoporosis and fracture risk estimation.


Analysis of repeat-protein folding using nearest-neighbor statistical mechanical models.

  • Tural Aksel‎ et al.
  • Methods in enzymology‎
  • 2009‎

The linear "Ising" model, which has been around for nearly a century, treats the behavior of linear arrays of repetitive, interacting subunits. Linear "repeat-proteins" have only been described in the last decade or so, and their folding energies have only been characterized very recently. Owing to their repetitive structures, linear repeat-proteins are particularly well suited for analysis by the nearest-neighbor Ising formalism. After briefly describing the historical origins and applications of the Ising model to biopolymers, and introducing repeat protein structure, this chapter will focus on the application of the linear Ising model to repeat proteins. When applied to homopolymers, the model can be represented and applied in a fairly simplified form. When applied to heteropolymers, where differences in energies among individual subunits (i.e. repeats) must be included, some (but not all) of this simplicity is lost. Derivations of the linear Ising model for both homopolymer and heteropolymer repeat-proteins will be presented. With the increased complexity required for analysis of heteropolymeric repeat proteins, the ability to resolve different energy terms from experimental data can be compromised. Thus, a simple matrix approach will be developed to help inform on the degree to which different thermodynamic parameters can be extracted from a particular set of unfolding curves. Finally, we will describe the application of these models to analyze repeat-protein folding equilibria, focusing on simplified repeat proteins based on "consensus" sequence information.


Separating positional noise from neutral alignment in multicomponent statistical shape models.

  • E A Audenaert‎ et al.
  • Bone reports‎
  • 2020‎

Given sufficient training samples, statistical shape models can provide detailed population representations for use in anthropological and computational genetic studies, injury biomechanics, musculoskeletal disease models or implant design optimization. While the technique has become extremely popular for the description of isolated anatomical structures, it suffers from positional interference when applied to coupled or articulated input data. In the present manuscript we describe and validate a novel approach to extract positional noise from such coupled data. The technique was first validated and then implemented in a multicomponent model of the lower limb. The impact of noise on the model itself as well as on the description of sexual dimorphism was evaluated. The novelty of our methodology lies in the fact that no rigid transformations are calculated or imposed on the data by means of idealized joint definitions and by extension the models obtained from them.


Statistical models and computational tools for predicting complex traits and diseases.

  • Wonil Chung‎
  • Genomics & informatics‎
  • 2021‎

Predicting individual traits and diseases from genetic variants is critical to fulfilling the promise of personalized medicine. The genetic variants from genome-wide association studies (GWAS), including variants well below GWAS significance, can be aggregated into highly significant predictions across a wide range of complex traits and diseases. The recent arrival of large-sample public biobanks enables highly accurate polygenic predictions based on genetic variants across the whole genome. Various statistical methodologies and diverse computational tools have been introduced and developed to computed the polygenic risk score (PRS) more accurately. However, many researchers utilize PRS tools without a thorough understanding of the underlying model and how to specify the parameters for the best performance. It is advantageous to study the statistical models implemented in computational tools for PRS estimation and the formulas of parameters to be specified. Here, we review a variety of recent statistical methodologies and computational tools for PRS computation.


A statistical analysis of murine incisional and excisional acute wound models.

  • David M Ansell‎ et al.
  • Wound repair and regeneration : official publication of the Wound Healing Society [and] the European Tissue Repair Society‎
  • 2014‎

Mice represent the most commonly used species for preclinical in vivo research. While incisional and excisional acute murine wound models are both frequently employed, there is little agreement on which model is optimum. Moreover, current lack of standardization of wounding procedure, analysis time point(s), method of assessment, and the use of individual wounds vs. individual animals as replicates makes it difficult to compare across studies. Here we have profiled secondary intention healing of incisional and excisional wounds within the same animal, assessing multiple parameters to determine the optimal methodology for future studies. We report that histology provides the least variable assessment of healing. Furthermore, histology alone (not planimetry) is able to detect accelerated healing in a castrated mouse model. Perhaps most importantly, we find virtually no correlation between wounds within the same animal, suggesting that use of wound (not animal) biological replicates is perfectly acceptable. Overall, these findings should guide and refine future studies, increasing the likelihood of detecting novel phenotypes while reducing the numbers of animals required for experimentation.


Statistical analysis of longitudinal neuroimage data with Linear Mixed Effects models.

  • Jorge L Bernal-Rusiel‎ et al.
  • NeuroImage‎
  • 2013‎

Longitudinal neuroimaging (LNI) studies are rapidly becoming more prevalent and growing in size. Today, no standardized computational tools exist for the analysis of LNI data and widely used methods are sub-optimal for the types of data encountered in real-life studies. Linear Mixed Effects (LME) modeling, a mature approach well known in the statistics community, offers a powerful and versatile framework for analyzing real-life LNI data. This article presents the theory behind LME models, contrasts it with other popular approaches in the context of LNI, and is accompanied with an array of computational tools that will be made freely available through FreeSurfer - a popular Magnetic Resonance Image (MRI) analysis software package. Our core contribution is to provide a quantitative empirical evaluation of the performance of LME and competing alternatives popularly used in prior longitudinal structural MRI studies, namely repeated measures ANOVA and the analysis of annualized longitudinal change measures (e.g. atrophy rate). In our experiments, we analyzed MRI-derived longitudinal hippocampal volume and entorhinal cortex thickness measurements from a public dataset consisting of Alzheimer's patients, subjects with mild cognitive impairment and healthy controls. Our results suggest that the LME approach offers superior statistical power in detecting longitudinal group differences.


Statistical models for detecting differential chromatin interactions mediated by a protein.

  • Liang Niu‎ et al.
  • PloS one‎
  • 2014‎

Chromatin interactions mediated by a protein of interest are of great scientific interest. Recent studies show that protein-mediated chromatin interactions can have different intensities in different types of cells or in different developmental stages of a cell. Such differences can be associated with a disease or with the development of a cell. Thus, it is of great importance to detect protein-mediated chromatin interactions with different intensities in different cells. A recent molecular technique, Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET), which uses formaldehyde cross-linking and paired-end sequencing, is able to detect genome-wide chromatin interactions mediated by a protein of interest. Here we proposed two models (One-Step Model and Two-Step Model) for two sample ChIA-PET count data (one biological replicate in each sample) to identify differential chromatin interactions mediated by a protein of interest. Both models incorporate the data dependency and the extent to which a fragment pair is related to a pair of DNA loci of interest to make accurate identifications. The One-Step Model makes use of the data more efficiently but is more computationally intensive. An extensive simulation study showed that the models can detect those differentially interacted chromatins and there is a good agreement between each classification result and the truth. Application of the method to a two-sample ChIA-PET data set illustrates its utility. The two models are implemented as an R package MDM (available at http://www.stat.osu.edu/~statgen/SOFTWARE/MDM).


  1. SciCrunch.org Resources

    Welcome to the FDI Lab - SciCrunch.org Resources search. From here you can search through a compilation of resources used by FDI Lab - SciCrunch.org and see how data is organized within our community.

  2. Navigation

    You are currently on the Community Resources tab looking through categories and sources that FDI Lab - SciCrunch.org has compiled. You can navigate through those categories from here or change to a different tab to execute your search through. Each tab gives a different perspective on data.

  3. Logging in and Registering

    If you have an account on FDI Lab - SciCrunch.org then you can log in from here to get additional features in FDI Lab - SciCrunch.org such as Collections, Saved Searches, and managing Resources.

  4. Searching

    Here is the search term that is being executed, you can type in anything you want to search for. Some tips to help searching:

    1. Use quotes around phrases you want to match exactly
    2. You can manually AND and OR terms to change how we search between words
    3. You can add "-" to terms to make sure no results return with that term in them (ex. Cerebellum -CA1)
    4. You can add "+" to terms to require they be in the data
    5. Using autocomplete specifies which branch of our semantics you with to search and can help refine your search
  5. Save Your Search

    You can save any searches you perform for quick access to later from here.

  6. Query Expansion

    We recognized your search term and included synonyms and inferred terms along side your term to help get the data you are looking for.

  7. Collections

    If you are logged into FDI Lab - SciCrunch.org you can add data records to your collections to create custom spreadsheets across multiple sources of data.

  8. Facets

    Here are the facets that you can filter your papers by.

  9. Options

    From here we'll present any options for the literature, such as exporting your current results.

  10. Further Questions

    If you have any further questions please check out our FAQs Page to ask questions and see our tutorials. Click this button to view this tutorial again.

Publications Per Year

X

Year:

Count: