This service exclusively searches for literature that cites resources. Please be aware that the total number of searchable documents is limited to those containing RRIDs and does not include all open-access literature.
When applying genome-wide sequencing technologies to disease investigation, it is increasingly important to resolve sequence variation in regions of the genome that may have homologous sequences. The human mitochondrial genome challenges interpretation given the potential for heteroplasmy, somatic variation, and homologous nuclear mitochondrial sequences (numts). Identical twins share the same mitochondrial DNA (mtDNA) from early life, but whether the mitochondrial sequence remains similar is unclear. We compared an adult monozygotic twin pair using high throughput-sequencing and evaluated variants with primer extension and mitochondrial pre-enrichment. Thirty-seven variants were shared between the twin individuals, and the variants were verified on the original genomic DNA. These studies support highly identical genetic sequence in this case. Certain low-level variant calls were of high quality and homology to the mitochondrial DNA, and they were further evaluated. When we assessed calls in pre-enriched mitochondrial DNA templates, we found that these may represent numts, which can be differentiated from mtDNA variation. We conclude that twin identity extends to mitochondrial DNA, and it is critical to differentiate between numts and mtDNA in genome sequencing, particularly since significant heteroplasmy could influence genome interpretation. Further studies on mtDNA and numts will aid in understanding how variation occurs and persists.
The rapid decrease in the cost of DNA sequencing will enable its use for novel applications. Here, we investigate the use of DNA sequencing for simultaneous discovery and genotyping of polymorphisms in family linkage studies. In the proposed approach, short contiguous segments of genomic DNA, regularly spaced across the genome, are resequenced in each pedigree member, and all sequence polymorphisms discovered within a pedigree are used as genetic markers. We use computer simulations consistent with observed human sequence diversity to show that segments of 500-1,000 base pairs, spaced at intervals of 1-2 Mb across the genome, provide linkage information that equals or exceeds that of traditional marker-based approaches. We validate these results experimentally by implementing the sequence-based linkage approach for chromosome 19 in CEPH pedigrees.
The advent of cheaper, faster sequencing technologies has pushed the task of sequence annotation from the exclusive domain of large-scale multi-national sequencing projects to that of research laboratories and small consortia. The bioinformatics burden placed on these laboratories, some with very little programming experience can be daunting. Fortunately, there exist software libraries and pipelines designed with these groups in mind, to ease the transition from an assembled genome to an annotated and accessible genome resource. We have developed the Sequence Ontology Bioinformatics Analysis (SOBA) tool to provide a simple statistical and graphical summary of an annotated genome. We envisage its use during annotation jamborees, genome comparison and for use by developers for rapid feedback during annotation software development and testing. SOBA also provides annotation consistency feedback to ensure correct use of terminology within annotations, and guides users to add new terms to the Sequence Ontology when required. SOBA is available at http://www.sequenceontology.org/cgi-bin/soba.cgi.
Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE.
One of the major goals of comparative genomics is to understand the evolutionary history of each nucleotide in the human genome sequence, and the degree to which it is under selective pressure. Ascertainment of selective constraint at nucleotide resolution is particularly important for predicting the functional significance of human genetic variation and for analyzing the sequence substructure of cis-regulatory sequences and other functional elements. Current methods for analysis of sequence conservation are focused on delineation of conserved regions comprising tens or even hundreds of consecutive nucleotides. We therefore developed a novel computational approach designed specifically for scoring evolutionary conservation at individual base-pair resolution. Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score. We computed SCONE scores in a continuous fashion across 1% of the human genome for which high-quality sequence information from up to 23 genomes are available. We show that SCONE scores are clearly correlated with the allele frequency of human polymorphisms in both coding and noncoding regions. We find that the majority of noncoding conserved nucleotides lie outside of longer conserved elements predicted by other conservation analyses, and are experiencing ongoing selection in modern humans as evident from the allele frequency spectrum of human polymorphism. We also applied SCONE to analyze the distribution of conserved nucleotides within functional regions. These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved "chunks." Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences. A small subset of these fragmented positions may be identified with high confidence.
The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and interactions. However, the massive quantity and the comprehensive complexity of these sequence data pose tremendous challenges in data analysis. These challenges include but are not limited to ever-increasing computational demand, biased sequence sampling, sequence errors, sequence artifacts and novel sequences. Sequence clustering methods can directly answer many of the fundamental questions by grouping similar sequences into families. In addition, clustering analysis also addresses the challenges in metagenomics. Thus, a large redundant data set can be represented with a small non-redundant set, where each cluster can be represented by a single entry or a consensus. Artifacts can be rapidly detected through clustering. Errors can be identified, filtered or corrected by using consensus from sequences within clusters.
Gene duplication and gene loss during the evolution of eukaryotes have hindered attempts to estimate phylogenies and divergence times of species. Although current methods that identify clusters of orthologous genes in complete genomes have helped to investigate gene function and gene content, they have not been optimized for evolutionary sequence analyses requiring strict orthology and complete gene matrices. Here we adopt a relatively simple and fast genome comparison approach designed to assemble orthologs for evolutionary analysis. Our approach identifies single-copy genes representing only species divergences (panorthologs) in order to minimize potential errors caused by gene duplication. We apply this approach to complete sets of proteins from published eukaryote genomes specifically for phylogeny and time estimation.
microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.
The biosynthetic gene cluster of porothramycin, a sequence-selective DNA alkylating compound, was identified in the genome of producing strain Streptomyces albus subsp. albus (ATCC 39897) and sequentially characterized. A 39.7 kb long DNA region contains 27 putative genes, 18 of them revealing high similarity with homologous genes from biosynthetic gene cluster of closely related pyrrolobenzodiazepine (PBD) compound anthramycin. However, considering the structures of both compounds, the number of differences in the gene composition of compared biosynthetic gene clusters was unexpectedly high, indicating participation of alternative enzymes in biosynthesis of both porothramycin precursors, anthranilate, and branched L-proline derivative. Based on the sequence analysis of putative NRPS modules Por20 and Por21, we suppose that in porothramycin biosynthesis, the methylation of anthranilate unit occurs prior to the condensation reaction, while modifications of branched proline derivative, oxidation, and dimethylation of the side chain occur on already condensed PBD core. Corresponding two specific methyltransferase encoding genes por26 and por25 were identified in the porothramycin gene cluster. Surprisingly, also methyltransferase gene por18 homologous to orf19 from anthramycin biosynthesis was detected in porothramycin gene cluster even though the appropriate biosynthetic step is missing, as suggested by ultra high-performance liquid chromatography-diode array detection-mass spectrometry (UHPLC-DAD-MS) analysis of the product in the S. albus culture broth.
An important aspect of studying the relationship between protein sequence, structure and function is the molecular characterization of the effect of protein mutations. To understand the functional impact of amino acid changes, the multiple biological properties of protein residues have to be considered together.
Normal blood glucose level depends on the availability of insulin and its ability to bind insulin receptor (IR) that regulates the downstream signaling pathway. Insulin sequence and blood glucose level usually vary among animals due to species specificity. The study of genetic variation of insulin, blood glucose level and diabetics symptoms development in Aves is interesting because of its optimal high blood glucose level than mammals. Therefore, it is of interest to study its evolutionary relationship with other mammals using sequence data. Hence, we compiled 32 Aves insulin from GenBank to compare its sequence-structure features with phylogeny for evolutionary inference. The analysis shows long conserved motifs (about 14 residues) for functional inference. These sequences show high leucine content (20%) with high instability index (>40). Amino acid position 11, 14, 16 and 20 are variable that may have contribution to binding to IR. We identified functionally critical variable residues in the dataset for possible genetic implication. Structural models of these sequences were developed for surface analysis towards functional representation. These data find application in the understanding of insulin function across species.
The single late 26S mRNA of Semliki Forest virus (SFV) directs the synthesis of the four viral structural proteins, C, E3, E2, and E1, and the recently described nonstructural protein, 6K. We report here partial NH2-terminal amino acid sequences of the SFV polypeptides E3 and 6K and of p62, the precursor to E3 and E2. In addition, were have determined a partial NH2-terminal sequence of the Sindbis virus homolog of 6K, the 4.2K protein. p62 and E3 of SFV have identical NH2-terminal amino acid sequences. Comparison of the partial NH2-terminal sequences of 6K of SFV and 4.2K of Sindbis virus with the deduced amino acid sequence encoded by the 26S mRNA of each virus reveals that the genes for these peptides are located in each case between those for E2 and E1. The order of the genes on the 26S mRNA of the alphaviruses is therefore 5'-C-E3-E2-6K-E1-3'. We discuss two mechanisms by which the nascent viral glycoproteins may be inserted into the membrane of the endoplasmic reticulum.
Current analyses of protein sequence/structure relationships have focused on expected similarity relationships for structurally similar proteins. To survey and explore the basis of these relationships, we present a general sequence/structure map that covers all combinations of similarity/dissimilarity relationships and provide novel energetic analyses of these relationships. To aid our analysis, we divide protein relationships into four categories: expected/unexpected similarity (S and S(?)) and expected/unexpected dissimilarity (D and D(?)) relationships. In the expected similarity region S, we show that trends in the sequence/structure relation can be derived based on the requirement of protein stability and the energetics of sequence and structural changes. Specifically, we derive a formula relating sequence and structural deviations to a parameter characterizing protein stiffness; the formula fits the data reasonably well. We suggest that the absence of data in region S(?) (high structural but low sequence similarity) is due to unfavorable energetics. In contrast to region S, region D(?) (high sequence but low structural similarity) is well-represented by proteins that can accommodate large structural changes. Our analyses indicate that there are several categories of similarity relationships and that protein energetics provide a basis for understanding these relationships.
With the exponential growth of biological sequence data (DNA or Protein Sequence), DNA sequence analysis has become an essential task for biologist to understand the features, functions, structures, and evolution of species. Encoding DNA sequences is an effective method to extract the features from DNA sequences. It is commonly used for visualizing DNA sequences and analyzing similarities/dissimilarities between different species or cells. Although there have been many encoding approaches proposed for DNA sequence analysis, we require more elegant approaches for higher accuracy. In this paper, we propose a noble encoding approach for measuring the degree of similarity/dissimilarity between different species. Our approach can preserve the physiochemical properties, positional information, and the codon usage bias of nucleotides. An extensive performance study shows that our approach provides higher accuracy than existing approaches in terms of the degree of similarity.
Computational sequence analysis, that is, prediction of local sequence properties, homologs, spatial structure and function from the sequence of a protein, offers an efficient way to obtain needed information about proteins under study. Since reliable prediction is usually based on the consensus of many computer programs, meta-severs have been developed to fit such needs. Most meta-servers focus on one aspect of sequence analysis, while others incorporate more information, such as PredictProtein for local sequence feature predictions, SMART for domain architecture and sequence motif annotation, and GeneSilico for secondary and spatial structure prediction. However, as predictions of local sequence properties, three-dimensional structure and function are usually intertwined, it is beneficial to address them together.
The sequence method is an important approach to assess the baroreflex function, mainly because it is based on the spontaneous fluctuations of beat-by-beat arterial pressure (for example, systolic arterial pressure or SAP) and pulse interval (PI). However, some studies revealed that the baroreflex effectiveness index (BEI), calculated through the sequence method, shows an intriguing oscillatory pattern as function of the delay between SAP and PI. It has been hypothesized that this pattern is related to the respiratory influence on SAP and/or PI variability, limiting the SAP ramps to 3 or 4 beats of length. In this study, this hypothesis was tested by assessing the sequence method using raw (original) and filtered series. Results were contrasted to the well-established transfer function, estimated between SAP and PI. Continuous arterial pressure recordings were obtained from healthy rats (N = 61) and beat-by-beat series of SAP and PI were generated. Low-pass (LP) and high-pass (HP) filtered series of SAP and PI were created by filtering the original series with a cutoff frequency of 0.8 Hz. Original series were analyzed by either the sequence method or cross-spectral analysis (transfer function) at low- (LF) and high- (HF) frequency bands, while filtered series were evaluated only by the sequence method. Baroreflex sensitivity (BRS) and BEI of original series, calculated by sequence method, was highly (85-90%) determined by HP series, with no significant association between original and LP series. A high correlation (>0.7) was found between the BRS estimated from original series (sequence method) and HF band (transfer function), as well as for LP series (sequence method) and LF band (transfer function). These findings confirmed the hypothesis that the sequence method quantifies only the high-frequency components of the baroreflex, neglecting the low-frequency influences, such as the Mayer waves. Therefore, we propose using both the original and LP filtered time series for a broader assessment of the baroreflex function using the sequence method.
Clostridioides difficile sequence type 2 (ST2) has been increasingly recognized as one of the major genotypes in China, while the genomic characteristics and biological phenotypes of Chinese ST2 strains remain to be determined. We used whole-genome sequencing and phylogenetic analysis to investigate the genomic features of 182 ST2 strains, isolated between 2011 and 2017. PCR ribotyping (RT) was performed, and antibiotic resistance, toxin concentration, and sporulation capacity were measured. The core genome Maximum-likelihood phylogenetic analysis showed that ST2 strains were distinctly segregated into two genetically diverse lineages [L1 (67.0% from Northern America) and L2], while L2 further divided into two sub-lineages, SL2a and SL2b (73.5% from China). The 36 virulence-related genes were widely distributed in ST2 genomes, but in which only 11 antibiotic resistance-associated genes were dispersedly found. Among the 25 SL2b sequenced isolates, RT014 (40.0%, n = 10) and RT020 (28.0%, n = 7) were two main genotypes with no significant difference on antibiotic resistance (χ2 = 0.024-2.667, P > 0.05). A non-synonymous amino acid substitution was found in tcdB (Y1975D) which was specific to SL2b. Although there was no significant difference in sporulation capacity between the two lineages, the average toxin B concentration (5.11 ± 3.20 ng/μL) in SL2b was significantly lower in comparison to those in L1 (10.49 ± 15.82 ng/μL) and SL2a (13.92 ± 2.39 ng/μL) (χ2 = 12.30, P < 0.05). This study described the genomic characteristics of C. difficile ST2, with many virulence loci and few antibiotic resistance elements. The Chinese ST2 strains with the mutation in codon 1975 of the tcdB gene clustering in SL2b circulating in China express low toxin B, which may be associated with mild or moderate C. difficile infection.
High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, DESeq, as an R/Bioconductor package.
Welcome to the FDI Lab - SciCrunch.org Resources search. From here you can search through a compilation of resources used by FDI Lab - SciCrunch.org and see how data is organized within our community.
You are currently on the Community Resources tab looking through categories and sources that FDI Lab - SciCrunch.org has compiled. You can navigate through those categories from here or change to a different tab to execute your search through. Each tab gives a different perspective on data.
If you have an account on FDI Lab - SciCrunch.org then you can log in from here to get additional features in FDI Lab - SciCrunch.org such as Collections, Saved Searches, and managing Resources.
Here is the search term that is being executed, you can type in anything you want to search for. Some tips to help searching:
You can save any searches you perform for quick access to later from here.
We recognized your search term and included synonyms and inferred terms along side your term to help get the data you are looking for.
If you are logged into FDI Lab - SciCrunch.org you can add data records to your collections to create custom spreadsheets across multiple sources of data.
Here are the facets that you can filter your papers by.
From here we'll present any options for the literature, such as exporting your current results.
If you have any further questions please check out our FAQs Page to ask questions and see our tutorials. Click this button to view this tutorial again.
Year:
Count: