This service exclusively searches for literature that cites resources. Please be aware that the total number of searchable documents is limited to those containing RRIDs and does not include all open-access literature.
Modern DNA sequencing methods produce vast amounts of data that often requires mapping to a reference genome. Most existing programs use the number of mismatches between the read and the genome as a measure of quality. This approach is without a statistical foundation and can for some data types result in many wrongly mapped reads. Here we present a probabilistic mapping method based on position-specific scoring matrices, which can take into account not only the quality scores of the reads but also user-specified models of evolution and data-specific biases.
The flexibility in gap cost enjoyed by hidden Markov models (HMMs) is expected to afford them better retrieval accuracy than position-specific scoring matrices (PSSMs). We attempt to quantify the effect of more general gap parameters by separately examining the influence of position- and composition-specific gap scores, as well as by comparing the retrieval accuracy of the PSSMs constructed using an iterative procedure to that of the HMMs provided by Pfam and SUPERFAMILY, curated ensembles of multiple alignments.
Recent accumulation of sequence and structural data, in conjunction with systematical classification into a set of families, has significantly advanced our understanding of diverse and specific protein functions. Analysis and interpretation of protein family data requires comprehensive sequence and structural alignments. Here, we present a simple scheme for analyzing a set of experimental structures of a given protein or family of proteins, using microbial rhodopsins as an example. For a data set comprised of around a dozen highly similar structures to each other (overall pairwise root-mean-squared deviation < 2.3 Å), intramolecular distance scoring analysis yielded valuable information with respect to structural properties, such as differences in the relative variability of transmembrane helices. Furthermore, a comparison with recent results for G protein-coupled receptors demonstrates how the results of the present analysis can be interpreted and effectively utilized for structural characterization of diverse protein families in general.
Systematic identification of binding partners for modular domains such as Src homology 2 (SH2) is important for understanding the biological function of the corresponding SH2 proteins. We have developed a worldwide web-accessible computer program dubbed SMALI for scoring matrix-assisted ligand identification for SH2 domains and other signaling modules. The current version of SMALI harbors 76 unique scoring matrices for SH2 domains derived from screening oriented peptide array libraries. These scoring matrices are used to search a protein database for short peptides preferred by an SH2 domain. An experimentally determined cut-off value is used to normalize an SMALI score, therefore allowing for direct comparison in peptide-binding potential for different SH2 domains. SMALI employs distinct scoring matrices from Scansite, a popular motif-scanning program. Moreover, SMALI contains built-in filters for phosphoproteins, Gene Ontology (GO) correlation and colocalization of subject and query proteins. Compared to Scansite, SMALI exhibited improved accuracy in identifying binding peptides for SH2 domains. Applying SMALI to a group of SH2 domains identified hundreds of interactions that overlap significantly with known networks mediated by the corresponding SH2 proteins, suggesting SMALI is a useful tool for facile identification of signaling networks mediated by modular domains that recognize short linear peptide motifs.
The abundance of diverse biological data from various sources constitutes a rich source of knowledge, which has the power to advance our understanding of organisms. This requires computational methods in order to integrate and exploit these data effectively and elucidate local and genome wide functional connections between protein pairs, thus enabling functional inferences for uncharacterized proteins. These biological data are primarily in the form of sequences, which determine functions, although functional properties of a protein can often be predicted from just the domains it contains. Thus, protein sequences and domains can be used to predict protein pair-wise functional relationships, and thus contribute to the function prediction process of uncharacterized proteins in order to ensure that knowledge is gained from sequencing efforts. In this work, we introduce information-theoretic based approaches to score protein-protein functional interaction pairs predicted from protein sequence similarity and conserved protein signature matches. The proposed schemes are effective for data-driven scoring of connections between protein pairs. We applied these schemes to the Mycobacterium tuberculosis proteome to produce a homology-based functional network of the organism with a high confidence and coverage. We use the network for predicting functions of uncharacterised proteins.
Conventional approaches to predict transcriptional regulatory interactions usually rely on the definition of a shared motif sequence on the target genes of a transcription factor (TF). These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices, which may match large numbers of sites and produce an unreliable list of target genes. To improve the prediction of binding sites, we propose to additionally use the unrelated knowledge of the genome layout. Indeed, it has been shown that co-regulated genes tend to be either neighbors or periodically spaced along the whole chromosome. This study demonstrates that respective gene positioning carries significant information. This novel type of information is combined with traditional sequence information by a machine learning algorithm called PreCisIon. To optimize this combination, PreCisIon builds a strong gene target classifier by adaptively combining weak classifiers based on either local binding sequence or global gene position. This strategy generically paves the way to the optimized incorporation of any future advances in gene target prediction based on local sequence, genome layout or on novel criteria. With the current state of the art, PreCisIon consistently improves methods based on sequence information only. This is shown by implementing a cross-validation analysis of the 20 major TFs from two phylogenetically remote model organisms. For Bacillus subtilis and Escherichia coli, respectively, PreCisIon achieves on average an area under the receiver operating characteristic curve of 70 and 60%, a sensitivity of 80 and 70% and a specificity of 60 and 56%. The newly predicted gene targets are demonstrated to be functionally consistent with previously known targets, as assessed by analysis of Gene Ontology enrichment or of the relevant literature and databases.
Protein domains are basic functional units of proteins. Many protein domains are pervasive among diverse biological processes, yet some are associated with specific pathways. Human complex diseases are generally viewed as pathway-level disorders. Therefore, we hypothesized that pathway-specific domains could be highly informative for human diseases. To test the hypothesis, we developed a network-based scoring scheme to quantify specificity of domain-pathway associations. We first generated domain profiles for human proteins, then constructed a co-pathway protein network based on the associations between domain profiles. Based on the score, we classified human protein domains into pathway-specific domains (PSDs) and non-specific domains (NSDs). We found that PSDs contained more pathogenic variants than NSDs. PSDs were also enriched for disease-associated mutations that disrupt protein-protein interactions (PPIs) and tend to have a moderate number of domain interactions. These results suggest that mutations in PSDs are likely to disrupt within-pathway PPIs, resulting in functional failure of pathways. Finally, we demonstrated the prediction capacity of PSDs for disease-associated genes with experimental validations in zebrafish. Taken together, the network-based quantitative method of modeling domain-pathway associations presented herein suggested underlying mechanisms of how protein domains associated with specific pathways influence mutational impacts on diseases via perturbations in within-pathway PPIs, and provided a novel genomic feature for interpreting genetic variants to facilitate the discovery of human disease genes.
Antimicrobial peptides (AMPs) are diverse, biologically active, essential components of the innate immune system. As compared to conventional antibiotics, AMPs exhibit broad spectrum antimicrobial activity, reduced toxicity and reduced microbial resistance. They are widely researched for their therapeutic potential, especially against multi-drug resistant pathogens. AMPs are known to have family-specific sequence composition, which can be mined for their discovery and rational design. Here, we present a detailed family-based study on AMP families. The study involved the use of sequence signatures represented by patterns and hidden Markov models (HMMs) present in experimentally studied AMPs to identify novel AMPs. Along with AMPs, peptides hitherto lacking antimicrobial annotation were also retrieved and wet-lab studies on randomly selected sequences proved their antimicrobial activity against Escherichia coli. CAMPSign, a webserver has been created for researchers to effortlessly exploit the use of AMP family signatures for identification of AMPs. The webserver is available online at www.campsign.bicnirrh.res.in. In this work, we demonstrate an optimised and experimentally validated protocol along with a freely available webserver that uses family-based sequence signatures for accelerated discovery of novel AMPs.
Expression quantitative trait loci (eQTL) analysis is useful for identifying genetic variants correlated with gene expression, however, it cannot distinguish between causal and nearby non-functional variants. Because the majority of disease-associated SNPs are located in regulatory regions, they can impact allele-specific binding (ASB) of transcription factors and result in differential expression of the target gene alleles. In this study, our aim was to identify functional single-nucleotide polymorphisms (SNPs) that alter transcriptional regulation and thus, potentially impact cellular function. Here, we present regSNPs-ASB, a generalized linear model-based approach to identify regulatory SNPs that are located in transcription factor binding sites. The input for this model includes ATAC-seq (assay for transposase-accessible chromatin with high-throughput sequencing) raw read counts from heterozygous loci, where differential transposase-cleavage patterns between two alleles indicate preferential transcription factor binding to one of the alleles. Using regSNPs-ASB, we identified 53 regulatory SNPs in human MCF-7 breast cancer cells and 125 regulatory SNPs in human mesenchymal stem cells (MSC). By integrating the regSNPs-ASB output with RNA-seq experimental data and publicly available chromatin interaction data from MCF-7 cells, we found that these 53 regulatory SNPs were associated with 74 potential target genes and that 32 (43%) of these genes showed significant allele-specific expression. By comparing all of the MCF-7 and MSC regulatory SNPs to the eQTLs in the Genome-Tissue Expression (GTEx) Project database, we found that 30% (16/53) of the regulatory SNPs in MCF-7 and 43% (52/122) of the regulatory SNPs in MSC were also in eQTL regions. The enrichment of regulatory SNPs in eQTLs indicated that many of them are likely responsible for allelic differences in gene expression (chi-square test, p-value < 0.01). In summary, we conclude that regSNPs-ASB is a useful tool for identifying causal variants from ATAC-seq data. This new computational tool will enable efficient prioritization of genetic variants identified as eQTL for further studies to validate their causal regulatory function. Ultimately, identifying causal genetic variants will further our understanding of the underlying molecular mechanisms of disease and the eventual development of potential therapeutic targets.
The microbes that inhabit particular environments must be able to perform molecular functions that provide them with a competitive advantage to thrive in those environments. As most molecular functions are performed by proteins and are conserved between related proteins, we can expect that organisms successful in a given environmental niche would contain protein families that are specific for functions that are important in that environment. For instance, the human gut is rich in polysaccharides from the diet or secreted by the host, and is dominated by Bacteroides, whose genomes contain highly expanded repertoire of protein families involved in carbohydrate metabolism. To identify other protein families that are specific to this environment, we investigated the distribution of protein families in the currently available human gut genomic and metagenomic data. Using an automated procedure, we identified a group of protein families strongly overrepresented in the human gut. These not only include many families described previously but also, interestingly, a large group of previously unrecognized protein families, which suggests that we still have much to discover about this environment. The identification and analysis of these families could provide us with new information about an environment critical to our health and well being.
Maize (Zea mays L.), a model species for genetic studies, is one of the two most important crop species worldwide. The genome sequence of the reference genotype, B73, representative of the stiff stalk heterotic group was recently updated (AGPv4) using long-read sequencing and optical mapping technology. To facilitate the use of AGPv4 and to enable functional genomic studies and association of genotype with phenotype, we determined expression abundances for replicated mRNA-sequencing datasets from 79 tissues and five abiotic/biotic stress treatments revealing 36 207 expressed genes. Characterization of the B73 transcriptome across six organs revealed 4154 organ-specific and 7704 differentially expressed (DE) genes following stress treatment. Gene co-expression network analyses revealed 12 modules associated with distinct biological processes containing 13 590 genes providing a resource for further association of gene function based on co-expression patterns. Presence-absence variants (PAVs) previously identified using whole genome resequencing data from 61 additional inbred lines were enriched in organ-specific and stress-induced DE genes suggesting that PAVs may function in phenological variation and adaptation to environment. Relative to core genes conserved across the 62 profiled inbreds, PAVs have lower expression abundances which are correlated with their frequency of dispersion across inbreds and on average have significantly fewer co-expression network connections suggesting that a subset of PAVs may be on an evolutionary path to pseudogenization. To facilitate use by the community, we developed the Maize Genomics Resource website (maize.plantbiology.msu.edu) for viewing and data-mining these resources and deployed two new views on the maize electronic Fluorescent Pictograph Browser (bar.utoronto.ca/efp_maize).
In fungi, the most abundant transcription factor (TF) class contains a fungal-specific 'GAL4-like' Zn2C6 DNA binding domain (DBD), while the second class contains another fungal-specific domain, known as 'fungal_trans' or middle homology domain (MHD), whose function remains largely uncharacterized. Remarkably, almost a third of MHD-containing TFs in public sequence databases apparently lack DNA binding activity, since they are not predicted to contain a DBD. Here, we reassess the domain organization of these 'MHD-only' proteins using an in silico error-tracking approach. In a large-scale analysis of ~17,000 MHD-only TF sequences present in all fungal phyla except Microsporidia and Cryptomycota, we show that the vast majority (>90%) result from genome annotation errors and we are able to predict a new DBD sequence for 14,261 of them. Most of these sequences correspond to a Zn2C6 domain (82%), with a small proportion of C2H2 domains (4%) found only in Dikarya. Our results contradict previous findings that the MHD-only TF are widespread in fungi. In contrast, we show that they are exceptional cases, and that the fungal-specific Zn2C6-MHD domain pair represents the canonical domain signature defining the most predominant fungal TF family. We call this family CeGAL, after the highly characterized members: Cep3, whose 3D structure is determined, and GAL4, a eukaryotic TF archetype. We believe that this will not only improve the annotation and classification of the Zn2C6 TF but will also provide critical guidance for future fungal gene regulatory network analyses.
The funnel-web spider Macrothele calpeiana is a charismatic Mygalomorph with a great interest in basic, applied and translational research. Nevertheless, current scarcity of genomic and transcriptomic data of this species clearly limits the research in this non-model organism. To overcome this limitation, we launched the first tissue-specific enriched RNA-seq analysis in this species using a subtractive hybridization approach, with two main objectives, to characterize the specific transcriptome of the putative chemosensory appendages (palps and first pair of legs), and to provide a new set of DNA markers for further phylogenetic studies. We have characterized the set of transcripts specifically expressed in putative chemosensory tissues of this species, much of them showing features shared by chemosensory system genes. Among specific candidates, we have identified some members of the iGluR and NPC2 families. Moreover, we have demonstrated the utility of these newly generated data as molecular markers by inferring the phylogenetic position M. calpeina in the phylogenetic tree of Mygalomorphs. Our results provide novel resources for researchers interested in spider molecular biology and systematics, which can help to expand our knowledge on the evolutionary processes underlying fundamental biological questions, as species invasion or biodiversity origin and maintenance.
Over the last two decades, extensive studies have been performed at the molecular level to understand the evolution of carnivorous plants. As fruits, the repertoire of protein components in the digestive fluids of several carnivorous plants have gradually become clear. However, the quantitative aspects of these proteins and the expression mechanisms of the genes that encode them are still poorly understood. In this study, using the Australian sundew Drosera adelae, we identified and quantified the digestive fluid proteins. We examined the expression and methylation status of the genes corresponding to major hydrolytic enzymes in various organs; these included thaumatin-like protein, S-like RNase, cysteine protease, class I chitinase, β-1, 3-glucanase, and hevein-like protein. The genes encoding these proteins were exclusively expressed in the glandular tentacles. Furthermore, the promoters of the β-1, 3-glucanase and cysteine protease genes were demethylated only in the glandular tentacles, similar to the previously reported case of the S-like RNase gene da-I. This phenomenon correlated with high expression of the DNA demethylase DEMETER in the glandular tentacles, strongly suggesting that it performs glandular tentacle-specific demethylation of the genes. The current study strengthens and generalizes the relevance of epigenetics to trap organ-specific gene expression in D. adelae. We also suggest similarities between the trap organs of carnivorous plants and the roots of non-carnivorous plants.
The complex cell wall and biofilm matrix (ECM) act as key barriers to antibiotics in mycobacteria. Here, the ECM and envelope proteins of Mycobacterium marinum ATCC 927, a nontuberculous mycobacterial model, were monitored over 3 months by label-free proteomics and compared with cell surface proteins on planktonic cells to uncover pathways leading to virulence, tolerance, and persistence. We show that ATCC 927 forms pellicle-type and submerged-type biofilms (PBFs and SBFs, respectively) after 2 weeks and 2 days of growth, respectively, and that the increased CelA1 synthesis in this strain prevents biofilm formation and leads to reduced rifampicin tolerance. The proteomic data suggest that specific changes in mycolic acid synthesis (cord factor), Esx1 secretion, and cell wall adhesins explain the appearance of PBFs as ribbon-like cords and SBFs as lichen-like structures. A subpopulation of cells resisting 64× MIC rifampicin (persisters) was detected in both biofilm subtypes and already in 1-week-old SBFs. The key forces boosting their development could include subtype-dependent changes in asymmetric cell division, cell wall biogenesis, tricarboxylic acid/glyoxylate cycle activities, and energy/redox/iron metabolisms. The effect of various ambient oxygen tensions on each cell type and nonclassical protein secretion are likely factors explaining the majority of the subtype-specific changes. The proteomic findings also imply that Esx1-type protein secretion is more efficient in planktonic (PL) and PBF cells, while SBF may prefer both the Esx5 and nonclassical pathways to control virulence and prolonged viability/persistence. In conclusion, this study reports the first proteomic insight into aging mycobacterial biofilm ECMs and indicates biofilm subtype-dependent mechanisms conferring increased adaptive potential and virulence of nontuberculous mycobacteria. IMPORTANCE Mycobacteria are naturally resilient, and mycobacterial infections are notoriously difficult to treat with antibiotics, with biofilm formation being the main factor complicating the successful treatment of tuberculosis (TB). The present study shows that nontuberculous Mycobacterium marinum ATCC 927 forms submerged- and pellicle-type biofilms with lichen- and ribbon-like structures, respectively, as well as persister cells under the same conditions. We show that both biofilm subtypes differ in terms of virulence-, tolerance-, and persistence-conferring activities, highlighting the fact that both subtypes should be targeted to maximize the power of antimycobacterial treatment therapies.
Hookworms infect over 400 million people, stunting and impoverishing them. Sequencing hookworm genomes and finding which genes they express during infection should help in devising new drugs or vaccines against hookworms. Unlike other hookworms, Ancylostoma ceylanicum infects both humans and other mammals, providing a laboratory model for hookworm disease. We determined an A. ceylanicum genome sequence of 313 Mb, with transcriptomic data throughout infection showing expression of 30,738 genes. Approximately 900 genes were upregulated during early infection in vivo, including ASPRs, a cryptic subfamily of activation-associated secreted proteins (ASPs). Genes downregulated during early infection included ion channels and G protein-coupled receptors; this downregulation was observed in both parasitic and free-living nematodes. Later, at the onset of heavy blood feeding, C-lectin genes were upregulated along with genes for secreted clade V proteins (SCVPs), encoding a previously undescribed protein family. These findings provide new drug and vaccine targets and should help elucidate hookworm pathogenesis.
Oomycetes include many devastating plant pathogens. Across oomycete diversity, plant-infecting lineages are interspersed by non-pathogenic ones. Unfortunately, our understanding of the evolution of lifestyle switches is hampered by a scarcity of data on the molecular biology of saprotrophic oomycetes, ecologically important primary colonizers of dead tissue that can serve as informative reference points for understanding the evolution of pathogens. Here, we established Salisapilia sapeloensis as a tractable system for the study of saprotrophic oomycetes. We generated multiple transcriptomes from S. sapeloensis and compared them with (i) 22 oomycete genomes and (ii) the transcriptomes of eight pathogenic oomycetes grown under 13 conditions. We obtained a global perspective on gene expression signatures of oomycete lifestyles. Our data reveal that oomycete saprotrophs and pathogens use similar molecular mechanisms for colonization but exhibit distinct expression patterns. We identify a S. sapeloensis-specific array and expression of carbohydrate-active enzymes and putative regulatory differences, highlighted by distinct expression levels of transcription factors. Salisapilia sapeloensis expresses only a small repertoire of candidates for virulence-associated genes. Our analyses suggest lifestyle-specific gene regulatory signatures and that, in addition to variation in gene content, shifts in gene regulatory networks underpin the evolution of oomycete lifestyles.
Wharton's jelly-derived mesenchymal stem cells (WJ-MSCs) are a valuable tool in stem cell research due to their high proliferation rate, multi-lineage differentiation potential, and immunotolerance properties. However, fibroblast impurity during WJ-MSCs isolation is unavoidable because of morphological similarities and shared surface markers. Here, a proteomic approach was employed to identify specific proteins differentially expressed by WJ-MSCs in comparison to those by neonatal foreskin and adult skin fibroblasts (NFFs and ASFs, respectively). Mass spectrometry analysis identified 454 proteins with a transmembrane domain. These proteins were then compared across the different cell-lines and categorized based on their cellular localizations, biological processes, and molecular functions. The expression patterns of a selected set of proteins were further confirmed by quantitative reverse transcription polymerase chain reaction (qRT-PCR), Western blotting, and immunofluorescence assays. As anticipated, most of the studied proteins had common expression patterns. However, EphA2, SLC25A4, and SOD2 were predominantly expressed by WJ-MSCs, while CDH2 and Talin2 were specific to NFFs and ASFs, respectively. Here, EphA2 was established as a potential surface-specific marker to distinguish WJ-MSCs from fibroblasts and for prospective use to prepare pure primary cultures of WJ-MSCs. Additionally, CDH2 could be used for a negative-selection isolation/depletion method to remove neonatal fibroblasts contaminating preparations of WJ-MSCs.
In embryonal rhabdomyosarcoma (ERMS) and generally in sarcomas, the role of wild-type and loss- or gain-of-function TP53 mutations remains largely undefined. Eliminating mutant or restoring wild-type p53 is challenging; nevertheless, understanding p53 variant effects on tumorigenesis remains central to realizing better treatment outcomes. In ERMS, >70% of patients retain wild-type TP53, yet mutations when present are associated with worse prognosis. Employing a kRASG12D-driven ERMS tumor model and tp53 null (tp53-/-) zebrafish, we define wild-type and patient-specific TP53 mutant effects on tumorigenesis. We demonstrate that tp53 is a major suppressor of tumorigenesis, where tp53 loss expands tumor initiation from <35% to >97% of animals. Characterizing three patient-specific alleles reveals that TP53C176F partially retains wild-type p53 apoptotic activity that can be exploited, whereas TP53P153Δ and TP53Y220C encode two structurally related proteins with gain-of-function effects that predispose to head musculature ERMS. TP53P153Δ unexpectedly also predisposes to hedgehog-expressing medulloblastomas in the kRASG12D-driven ERMS-model.
Welcome to the FDI Lab - SciCrunch.org Resources search. From here you can search through a compilation of resources used by FDI Lab - SciCrunch.org and see how data is organized within our community.
You are currently on the Community Resources tab looking through categories and sources that FDI Lab - SciCrunch.org has compiled. You can navigate through those categories from here or change to a different tab to execute your search through. Each tab gives a different perspective on data.
If you have an account on FDI Lab - SciCrunch.org then you can log in from here to get additional features in FDI Lab - SciCrunch.org such as Collections, Saved Searches, and managing Resources.
Here is the search term that is being executed, you can type in anything you want to search for. Some tips to help searching:
You can save any searches you perform for quick access to later from here.
We recognized your search term and included synonyms and inferred terms along side your term to help get the data you are looking for.
If you are logged into FDI Lab - SciCrunch.org you can add data records to your collections to create custom spreadsheets across multiple sources of data.
Here are the facets that you can filter your papers by.
From here we'll present any options for the literature, such as exporting your current results.
If you have any further questions please check out our FAQs Page to ask questions and see our tutorials. Click this button to view this tutorial again.
Year:
Count: