Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

This service exclusively searches for literature that cites resources. Please be aware that the total number of searchable documents is limited to those containing RRIDs and does not include all open-access literature.

Search

Type in a keyword to search

On page 1 showing 1 ~ 20 papers out of 37 papers

Protein profile of Beta vulgaris leaf apoplastic fluid and changes induced by Fe deficiency and Fe resupply.

  • Laura Ceballos-Laita‎ et al.
  • Frontiers in plant science‎
  • 2015‎

The fluid collected by direct leaf centrifugation has been used to study the proteome of the sugar beet apoplastic fluid as well as the changes induced by Fe deficiency and Fe resupply to Fe-deficient plants in the protein profile. Plants were grown in Fe-sufficient and Fe-deficient conditions, and Fe resupply was carried out with 45 μM Fe(III)-EDTA for 24 h. Protein extracts of leaf apoplastic fluid were analyzed by two-dimensional isoelectric focusing-SDS-PAGE electrophoresis. Gel image analysis revealed 203 consistent spots, and proteins in 81% of them (164) were identified by nLC-MS/MS using a custom made reference repository of beet protein sequences. When redundant UniProt entries were deleted, a non-redundant leaf apoplastic proteome consisting of 109 proteins was obtained. TargetP and SecretomeP algorithms predicted that 63% of them were secretory proteins. Functional classification of the non-redundant proteins indicated that stress and defense, protein metabolism, cell wall and C metabolism accounted for approximately 75% of the identified proteome. The effects of Fe-deficiency on the leaf apoplast proteome were limited, with only five spots (2.5%) changing in relative abundance, thus suggesting that protein homeostasis in the leaf apoplast fluid is well-maintained upon Fe shortage. The identification of three chitinase isoforms among proteins increasing in relative abundance with Fe-deficiency suggests that one of the few effects of Fe deficiency in the leaf apoplast proteome includes cell wall modifications. Iron resupply to Fe deficient plants changed the relative abundance of 16 spots when compared to either Fe-sufficient or Fe-deficient samples. Proteins identified in these spots can be broadly classified as those responding to Fe-resupply, which included defense and cell wall related proteins, and non-responsive, which are mainly protein metabolism related proteins and whose changes in relative abundance followed the same trend as with Fe-deficiency.


Robust identification of orthologues and paralogues for microbial pan-genomics using GET_HOMOLOGUES: a case study of pIncA/C plasmids.

  • Pablo Vinuesa‎ et al.
  • Methods in molecular biology (Clifton, N.J.)‎
  • 2015‎

GET_HOMOLOGUES is an open-source software package written in Perl and R to define robust core- and pan-genomes by computing consensus clusters of orthologous gene families from whole-genome sequences using the bidirectional best-hit, COGtriangles, and OrthoMCL clustering algorithms. The granularity of the clusters can be fine-tuned by a user-configurable filtering strategy based on a combination of blastp pairwise alignment parameters, hmmscan-based scanning of Pfam domain composition of the proteins in each cluster, and a partial synteny criterion. We present detailed protocols to fit exponential and binomial mixture models to estimate core- and pan-genome sizes, compute pan-genome trees from the pan-genome matrix using a parsimony criterion, analyze and graphically represent the pan-genome structure, and identify lineage-specific gene families for the 12 complete pIncA/C plasmids currently available in NCBI's RefSeq. The software package, license, and detailed user manual can be downloaded for free for academic use from two mirrors: http://www.eead.csic.es/compbio/soft/gethoms.php and http://maya.ccg.unam.mx/soft/gethoms.php.


Understanding the Mechanisms Behind the Response to Environmental Perturbation in Microbial Mats: A Metagenomic-Network Based Approach.

  • Valerie De Anda‎ et al.
  • Frontiers in microbiology‎
  • 2018‎

To date, it remains unclear how anthropogenic perturbations influence the dynamics of microbial communities, what general patterns arise in response to disturbance, and whether it is possible to predict them. Here, we suggest the use of microbial mats as a model of study to reveal patterns that can illuminate the ecological processes underlying microbial dynamics in response to stress. We traced the responses to anthropogenic perturbation caused by water depletion in microbial mats from Cuatro Cienegas Basin (CCB), Mexico, by using a time-series spatially resolved analysis in a novel combination of three computational approaches. First, we implemented MEBS (Multi-genomic Entropy-Based Score) to evaluate the dynamics of major biogeochemical cycles across spatio-temporal scales with a single informative value. Second, we used robust Time Series-Ecological Networks (TS-ENs) to evaluate the total percentage of interactions at different taxonomic levels. Lastly, we utilized network motifs to characterize specific interaction patterns. Our results indicate that microbial mats from CCB contain an enormous taxonomic diversity with at least 100 phyla, mainly represented by members of the rare biosphere (RB). Statistical ecological analyses point out a clear involvement of anaerobic guilds related to sulfur and methane cycles during wet versus dry conditions, where we find an increase in fungi, photosynthetic, and halotolerant taxa. TS-ENs indicate that in wet conditions, there was an equilibrium between cooperation and competition (positive and negative relationships, respectively), while under dry conditions there is an over-representation of negative relationships. Furthermore, most of the keystone taxa of the TS-ENs at family level are members of the RB and the microbial mat core highlighting their crucial role within the community. Our results indicate that microbial mats are more robust to perturbation due to redundant functions that are likely shared among community members in the highly connected TS-ENs with density values close to one (≈0.9). Finally, we provide evidence that suggests that a large taxonomic diversity where all community members interact with each other (low modularity), the presence of permanent of low-abundant taxa, and an increase in competition can be potential buffers against environmental disturbance in microbial mats.


GET_PHYLOMARKERS, a Software Package to Select Optimal Orthologous Clusters for Phylogenomics and Inferring Pan-Genome Phylogenies, Used for a Critical Geno-Taxonomic Revision of the Genus Stenotrophomonas.

  • Pablo Vinuesa‎ et al.
  • Frontiers in microbiology‎
  • 2018‎

The massive accumulation of genome-sequences in public databases promoted the proliferation of genome-level phylogenetic analyses in many areas of biological research. However, due to diverse evolutionary and genetic processes, many loci have undesirable properties for phylogenetic reconstruction. These, if undetected, can result in erroneous or biased estimates, particularly when estimating species trees from concatenated datasets. To deal with these problems, we developed GET_PHYLOMARKERS, a pipeline designed to identify high-quality markers to estimate robust genome phylogenies from the orthologous clusters, or the pan-genome matrix (PGM), computed by GET_HOMOLOGUES. In the first context, a set of sequential filters are applied to exclude recombinant alignments and those producing anomalous or poorly resolved trees. Multiple sequence alignments and maximum likelihood (ML) phylogenies are computed in parallel on multi-core computers. A ML species tree is estimated from the concatenated set of top-ranking alignments at the DNA or protein levels, using either FastTree or IQ-TREE (IQT). The latter is used by default due to its superior performance revealed in an extensive benchmark analysis. In addition, parsimony and ML phylogenies can be estimated from the PGM. We demonstrate the practical utility of the software by analyzing 170 Stenotrophomonas genome sequences available in RefSeq and 10 new complete genomes of Mexican environmental S. maltophilia complex (Smc) isolates reported herein. A combination of core-genome and PGM analyses was used to revise the molecular systematics of the genus. An unsupervised learning approach that uses a goodness of clustering statistic identified 20 groups within the Smc at a core-genome average nucleotide identity (cgANIb) of 95.9% that are perfectly consistent with strongly supported clades on the core- and pan-genome trees. In addition, we identified 16 misclassified RefSeq genome sequences, 14 of them labeled as S. maltophilia, demonstrating the broad utility of the software for phylogenomics and geno-taxonomic studies. The code, a detailed manual and tutorials are freely available for Linux/UNIX servers under the GNU GPLv3 license at https://github.com/vinuesa/get_phylomarkers. A docker image bundling GET_PHYLOMARKERS with GET_HOMOLOGUES is available at https://hub.docker.com/r/csicunam/get_homologues/, which can be easily run on any platform.


Protein disorder in plants: a view from the chloroplast.

  • Inmaculada Yruela‎ et al.
  • BMC plant biology‎
  • 2012‎

The intrinsically unstructured state of some proteins, observed in all living organisms, is essential for basic cellular functions. In this field the available information from plants is limited but it has been reached a point where these proteins can be comprehensively classified on the basis of disorder, function and evolution.


MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle.

  • Valerie De Anda‎ et al.
  • GigaScience‎
  • 2017‎

The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large "omic" datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H΄), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa.


Prediction of TF target sites based on atomistic models of protein-DNA complexes.

  • Vladimir Espinosa Angarica‎ et al.
  • BMC bioinformatics‎
  • 2008‎

The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence.


Ensembl Genomes 2022: an expanding genome resource for non-vertebrates.

  • Andrew D Yates‎ et al.
  • Nucleic acids research‎
  • 2022‎

Ensembl Genomes (https://www.ensemblgenomes.org) provides access to non-vertebrate genomes and analysis complementing vertebrate resources developed by the Ensembl project (https://www.ensembl.org). The two resources collectively present genome annotation through a consistent set of interfaces spanning the tree of life presenting genome sequence, annotation, variation, transcriptomic data and comparative analysis. Here, we present our largest increase in plant, metazoan and fungal genomes since the project's inception creating one of the world's most comprehensive genomic resources and describe our efforts to reduce genome redundancy in our Bacteria portal. We detail our new efforts in gene annotation, our emerging support for pangenome analysis, our efforts to accelerate data dissemination through the Ensembl Rapid Release resource and our new AlphaFold visualization. Finally, we present details of our future plans including updates on our integration with Ensembl, and how we plan to improve our support for the microbial research community. Software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license). Data updates are synchronised with Ensembl's release cycle.


Tuning promoter boundaries improves regulatory motif discovery in nonmodel plants: the peach example.

  • Najla Ksouri‎ et al.
  • Plant physiology‎
  • 2021‎

The identification of functional elements encoded in plant genomes is necessary to understand gene regulation. Although much attention has been paid to model species like Arabidopsis (Arabidopsis thaliana), little is known about regulatory motifs in other plants. Here, we describe a bottom-up approach for de novo motif discovery using peach (Prunus persica) as an example. These predictions require pre-computed gene clusters grouped by their expression similarity. After optimizing the boundaries of proximal promoter regions, two motif discovery algorithms from RSAT::Plants (http://plants.rsat.eu) were tested (oligo and dyad analysis). Overall, 18 out of 45 co-expressed modules were enriched in motifs typical of well-known transcription factor (TF) families (bHLH, bZip, BZR, CAMTA, DOF, E2FE, AP2-ERF, Myb-like, NAC, TCP, and WRKY) and a few uncharacterized motifs. Our results indicate that small modules and promoter window of [-500 bp, +200 bp] relative to the transcription start site (TSS) maximize the number of motifs found and reduce low-complexity signals in peach. The distribution of discovered regulatory sites was unbalanced, as they accumulated around the TSS. This approach was benchmarked by testing two different expression-based clustering algorithms (network-based and hierarchical) and, as control, genes grouped for harboring ChIPseq peaks of the same Arabidopsis TF. The method was also verified on maize (Zea mays), a species with a large genome. In summary, this article presents a glimpse of the peach regulatory components at genome scale and provides a general protocol that can be applied to other species. A Docker software container is released to facilitate the reproduction of these analyses.


Large Differences in Gene Expression Responses to Drought and Heat Stress between Elite Barley Cultivar Scarlett and a Spanish Landrace.

  • Carlos P Cantalapiedra‎ et al.
  • Frontiers in plant science‎
  • 2017‎

Drought causes important losses in crop production every season. Improvement for drought tolerance could take advantage of the diversity held in germplasm collections, much of which has not been incorporated yet into modern breeding. Spanish landraces constitute a promising resource for barley breeding, as they were widely grown until last century and still show good yielding ability under stress. Here, we study the transcriptome expression landscape in two genotypes, an outstanding Spanish landrace-derived inbred line (SBCC073) and a modern cultivar (Scarlett). Gene expression of adult plants after prolonged stresses, either drought or drought combined with heat, was monitored. Transcriptome of mature leaves presented little changes under severe drought, whereas abundant gene expression changes were observed under combined mild drought and heat. Developing inflorescences of SBCC073 exhibited mostly unaltered gene expression, whereas numerous changes were found in the same tissues for Scarlett. Genotypic differences in physiological traits and gene expression patterns confirmed the different behavior of landrace SBCC073 and cultivar Scarlett under abiotic stress, suggesting that they responded to stress following different strategies. A comparison with related studies in barley, addressing gene expression responses to drought, revealed common biological processes, but moderate agreement regarding individual differentially expressed transcripts. Special emphasis was put in the search of co-expressed genes and underlying common regulatory motifs. Overall, 11 transcription factors were identified, and one of them matched cis-regulatory motifs discovered upstream of co-expressed genes involved in those responses.


A roadmap for gene functional characterisation in crops with large genomes: Lessons from polyploid wheat.

  • Nikolai M Adamski‎ et al.
  • eLife‎
  • 2020‎

Understanding the function of genes within staple crops will accelerate crop improvement by allowing targeted breeding approaches. Despite their importance, a lack of genomic information and resources has hindered the functional characterisation of genes in major crops. The recent release of high-quality reference sequences for these crops underpins a suite of genetic and genomic resources that support basic research and breeding. For wheat, these include gene model annotations, expression atlases and gene networks that provide information about putative function. Sequenced mutant populations, improved transformation protocols and structured natural populations provide rapid methods to study gene function directly. We highlight a case study exemplifying how to integrate these resources. This review provides a helpful guide for plant scientists, especially those expanding into crop research, to capitalise on the discoveries made in Arabidopsis and other plants. This will accelerate the improvement of crops of vital importance for food and nutrition security.


Transcriptional Responses in Root and Leaf of Prunus persica under Drought Stress Using RNA Sequencing.

  • Najla Ksouri‎ et al.
  • Frontiers in plant science‎
  • 2016‎

Prunus persica L. Batsch, or peach, is one of the most important crops and it is widely established in irrigated arid and semi-arid regions. However, due to variations in the climate and the increased aridity, drought has become a major constraint, causing crop losses worldwide. The use of drought-tolerant rootstocks in modern fruit production appears to be a useful method of alleviating water deficit problems. However, the transcriptomic variation and the major molecular mechanisms that underlie the adaptation of drought-tolerant rootstocks to water shortage remain unclear. Hence, in this study, high-throughput sequencing (RNA-seq) was performed to assess the transcriptomic changes and the key genes involved in the response to drought in root tissues (GF677 rootstock) and leaf tissues (graft, var. Catherina) subjected to 16 days of drought stress. In total, 12 RNA libraries were constructed and sequenced. This generated a total of 315 M raw reads from both tissues, which allowed the assembly of 22,079 and 17,854 genes associated with the root and leaf tissues, respectively. Subsets of 500 differentially expressed genes (DEGs) in roots and 236 in leaves were identified and functionally annotated with 56 gene ontology (GO) terms and 99 metabolic pathways, which were mostly associated with aminobenzoate degradation and phenylpropanoid biosynthesis. The GO analysis highlighted the biological functions that were exclusive to the root tissue, such as "locomotion," "hormone metabolic process," and "detection of stimulus," indicating the stress-buffering role of the GF677 rootstock. Furthermore, the complex regulatory network involved in the drought response was revealed, involving proteins that are associated with signaling transduction, transcription and hormone regulation, redox homeostasis, and frontline barriers. We identified two poorly characterized genes in P. persica: growth-regulating factor 5 (GRF5), which may be involved in cellular expansion, and AtHB12, which may be involved in root elongation. The reliability of the RNA-seq experiment was validated by analyzing the expression patterns of 34 DEGs potentially involved in drought tolerance using quantitative reverse transcription polymerase chain reaction. The transcriptomic resources generated in this study provide a broad characterization of the acclimation of P. persica to drought, shedding light on the major molecular responses to the most important environmental stressor.


Ensembl Genomes 2020-enabling non-vertebrate genomic research.

  • Kevin L Howe‎ et al.
  • Nucleic acids research‎
  • 2020‎

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.


Evolution of Protein Ductility in Duplicated Genes of Plants.

  • Inmaculada Yruela‎ et al.
  • Frontiers in plant science‎
  • 2018‎

Previous work has shown that ductile/intrinsically disordered proteins (IDPs) and residues (IDRs) are found in all unicellular and multicellular organisms, wherein they are essential for basic cellular functions and complement the function of rigid proteins. In addition, computational studies of diverse phylogenetic lineages have revealed: (1) that protein ductility increases in concert with organismic complexity, and (2) that distributions of IDPs and IDRs along the chromosomes of plant species are non-random and correlate with variations in the rates of the genetic recombination and chromosomal rearrangement. Here, we show that approximately 50% of aligned residues in paralogs across a spectrum of algae, bryophytes, monocots, and eudicots are IDRs and that a high proportion (ca. 60%) are in disordered segments greater than 30 residues. When three types of IDRs are distinguished (i.e., identical, similar and variable IDRs) we find that species with large numbers of chromosome and endoduplicated genes exhibit paralogous sequences with a higher frequency of identical IDRs, whereas species with small chromosomes numbers exhibit paralogous sequences with a higher frequency of similar and variable IDRs. These results are interpreted to indicate that genome duplication events influence the distribution of IDRs along protein sequences and likely favor the presence of identical IDRs (compared to similar IDRs or variable IDRs). We discuss the evolutionary implications of gene duplication events in the context of ductile/disordered residues and segments, their conservation, and their effects on functionality.


Genetic recombination is associated with intrinsic disorder in plant proteomes.

  • Inmaculada Yruela‎ et al.
  • BMC genomics‎
  • 2013‎

Intrinsically disordered proteins, found in all living organisms, are essential for basic cellular functions and complement the function of ordered proteins. It has been shown that protein disorder is linked to the G + C content of the genome. Furthermore, recent investigations have suggested that the evolutionary dynamics of the plant nucleus adds disordered segments to open reading frames alike, and these segments are not necessarily conserved among orthologous genes.


Mycobacterium tuberculosis Complex Exhibits Lineage-Specific Variations Affecting Protein Ductility and Epitope Recognition.

  • Inmaculada Yruela‎ et al.
  • Genome biology and evolution‎
  • 2016‎

The advent of whole-genome sequencing has provided an unprecedented detail about the evolution and genetic significance of species-specific variations across the whole Mycobacterium tuberculosis Complex. However, little attention has been focused on understanding the functional roles of these variations in the protein coding sequences. In this work, we compare the coding sequences from 74 sequenced mycobacterial species including M. africanum, M. bovis, M. canettii, M. caprae, M. orygis, and M. tuberculosis. Results show that albeit protein variations affect all functional classes, those proteins involved in lipid and intermediary metabolism and respiration have accumulated mutations during evolution. To understand the impact of these mutations on protein functionality, we explored their implications on protein ductility/disorder, a yet unexplored feature of mycobacterial proteomes. In agreement with previous studies, we found that a Gly71Ile substitution in the PhoPR virulence system severely affects the ductility of its nearby region in M. africanum and animal-adapted species. In the same line of evidence, the SmtB transcriptional regulator shows amino acid variations specific to the Beijing lineage, which affects the flexibility of the N-terminal trans-activation domain. Furthermore, despite the fact that MTBC epitopes are evolutionary hyperconserved, we identify strain- and lineage-specific amino acid mutations affecting previously known T-cell epitopes such as EsxH and FbpA (Ag85A). Interestingly, in silico studies reveal that these variations result in differential interaction of epitopes with the main HLA haplogroups.


Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species.

  • Bruno Contreras-Moreira‎ et al.
  • Frontiers in plant science‎
  • 2017‎

The pan-genome of a species is defined as the union of all the genes and non-coding sequences found in all its individuals. However, constructing a pan-genome for plants with large genomes is daunting both in sequencing cost and the scale of the required computational analysis. A more affordable alternative is to focus on the genic repertoire by using transcriptomic data. Here, the software GET_HOMOLOGUES-EST was benchmarked with genomic and RNA-seq data of 19 Arabidopsis thaliana ecotypes and then applied to the analysis of transcripts from 16 Hordeum vulgare genotypes. The goal was to sample their pan-genomes and classify sequences as core, if detected in all accessions, or accessory, when absent in some of them. The resulting sequence clusters were used to simulate pan-genome growth, and to compile Average Nucleotide Identity matrices that summarize intra-species variation. Although transcripts were found to under-estimate pan-genome size by at least 10%, we concluded that clusters of expressed sequences can recapitulate phylogeny and reproduce two properties observed in A. thaliana gene models: accessory loci show lower expression and higher non-synonymous substitution rates than core genes. Finally, accessory sequences were observed to preferentially encode transposon components in both species, plus disease resistance genes in cultivated barleys, and a variety of protein domains from other families that appear frequently associated with presence/absence variation in the literature. These results demonstrate that pan-genome analyses are useful to explore germplasm diversity.


Evolutionary divergence of chloroplast FAD synthetase proteins.

  • Inmaculada Yruela‎ et al.
  • BMC evolutionary biology‎
  • 2010‎

Flavin adenine dinucleotide synthetases (FADSs) - a group of bifunctional enzymes that carry out the dual functions of riboflavin phosphorylation to produce flavin mononucleotide (FMN) and its subsequent adenylation to generate FAD in most prokaryotes - were studied in plants in terms of sequence, structure and evolutionary history.


Genome-wide identification of transcription start sites, promoters and transcription factor binding sites in E. coli.

  • Alfredo Mendoza-Vargas‎ et al.
  • PloS one‎
  • 2009‎

Despite almost 40 years of molecular genetics research in Escherichia coli a major fraction of its Transcription Start Sites (TSSs) are still unknown, limiting therefore our understanding of the regulatory circuits that control gene expression in this model organism. RegulonDB (http://regulondb.ccg.unam.mx/) is aimed at integrating the genetic regulatory network of E. coli K12 as an entirely bioinformatic project up till now. In this work, we extended its aims by generating experimental data at a genome scale on TSSs, promoters and regulatory regions. We implemented a modified 5' RACE protocol and an unbiased High Throughput Pyrosequencing Strategy (HTPS) that allowed us to map more than 1700 TSSs with high precision. From this collection, about 230 corresponded to previously reported TSSs, which helped us to benchmark both our methodologies and the accuracy of the previous mapping experiments. The other ca 1500 TSSs mapped belong to about 1000 different genes, many of them with no assigned function. We identified promoter sequences and type of sigma factors that control the expression of about 80% of these genes. As expected, the housekeeping sigma(70) was the most common type of promoter, followed by sigma(38). The majority of the putative TSSs were located between 20 to 40 nucleotides from the translational start site. Putative regulatory binding sites for transcription factors were detected upstream of many TSSs. For a few transcripts, riboswitches and small RNAs were found. Several genes also had additional TSSs within the coding region. Unexpectedly, the HTPS experiments revealed extensive antisense transcription, probably for regulatory functions. The new information in RegulonDB, now with more than 2400 experimentally determined TSSs, strengthens the accuracy of promoter prediction, operon structure, and regulatory networks and provides valuable new information that will facilitate the understanding from a global perspective the complex and intricate regulatory network that operates in E. coli.


3D-footprint: a database for the structural analysis of protein-DNA complexes.

  • Bruno Contreras-Moreira‎
  • Nucleic acids research‎
  • 2010‎

3D-footprint is a living database, updated and curated on a weekly basis, which provides estimates of binding specificity for all protein-DNA complexes available at the Protein Data Bank. The web interface allows the user to: (i) browse DNA-binding proteins by keyword; (ii) find proteins that recognize a similar DNA motif and (iii) BLAST similar DNA-binding proteins, highlighting interface residues in the resulting alignments. Each complex in the database is dissected to draw interface graphs and footprint logos, and two complementary algorithms are employed to characterize binding specificity. Moreover, oligonucleotide sequences extracted from literature abstracts are reported in order to show the range of variant sites bound by each protein and other related proteins. Benchmark experiments, including comparisons with expert-curated databases RegulonDB and TRANSFAC, support the quality of structure-based estimates of specificity. The relevant content of the database is available for download as flat files and it is also possible to use the 3D-footprint pipeline to analyze protein coordinates input by the user. 3D-footprint is available at http://floresta.eead.csic.es/3dfootprint with demo buttons and a comprehensive tutorial that illustrates the main uses of this resource.


  1. SciCrunch.org Resources

    Welcome to the FDI Lab - SciCrunch.org Resources search. From here you can search through a compilation of resources used by FDI Lab - SciCrunch.org and see how data is organized within our community.

  2. Navigation

    You are currently on the Community Resources tab looking through categories and sources that FDI Lab - SciCrunch.org has compiled. You can navigate through those categories from here or change to a different tab to execute your search through. Each tab gives a different perspective on data.

  3. Logging in and Registering

    If you have an account on FDI Lab - SciCrunch.org then you can log in from here to get additional features in FDI Lab - SciCrunch.org such as Collections, Saved Searches, and managing Resources.

  4. Searching

    Here is the search term that is being executed, you can type in anything you want to search for. Some tips to help searching:

    1. Use quotes around phrases you want to match exactly
    2. You can manually AND and OR terms to change how we search between words
    3. You can add "-" to terms to make sure no results return with that term in them (ex. Cerebellum -CA1)
    4. You can add "+" to terms to require they be in the data
    5. Using autocomplete specifies which branch of our semantics you with to search and can help refine your search
  5. Save Your Search

    You can save any searches you perform for quick access to later from here.

  6. Query Expansion

    We recognized your search term and included synonyms and inferred terms along side your term to help get the data you are looking for.

  7. Collections

    If you are logged into FDI Lab - SciCrunch.org you can add data records to your collections to create custom spreadsheets across multiple sources of data.

  8. Facets

    Here are the facets that you can filter your papers by.

  9. Options

    From here we'll present any options for the literature, such as exporting your current results.

  10. Further Questions

    If you have any further questions please check out our FAQs Page to ask questions and see our tutorials. Click this button to view this tutorial again.

Publications Per Year

X

Year:

Count: