This service exclusively searches for literature that cites resources. Please be aware that the total number of searchable documents is limited to those containing RRIDs and does not include all open-access literature.
In this data article we present species trees based on coalescent species delimitation results for North American whipsnakes, as well as metadata pertaining to the article "The effect of missing data on coalescent species delimitation and a taxonomic revision of whipsnakes (Colubridae: Masticophis)" (MPE-2017-76-R1). Species trees were constructed using SNP data generated from double-digest RADseq, filtered to 80% completeness between species. Tables correspond with the primary manuscript and serve as a repository of genetic sequence information for whipsnakes. These data can be downloaded and combined with future whipsnake datasets.
Hypnales comprise over 50% of all pleurocarpous mosses. They provide a young radiation complicating phylogenetic analyses. To resolve the hypnalean phylogeny, it is necessary to use a phylogenetic marker providing highly variable features to resolve species on the one hand and conserved features enabling a backbone analysis on the other. Therefore we used highly variable internal transcribed spacer 2 (ITS2) sequences and conserved secondary structures, as deposited with the ITS2 Database, simultaneously.
It is estimated that 15 to 20 million people are infected with the human T-cell lymphotropic virus type 1 (HTLV-1). At present, there are more than 2,000 unique HTLV-1 isolate sequences published. A central database to aggregate sequence information from a range of epidemiological aspects including HTLV-1 infections, pathogenesis, origins, and evolutionary dynamics would be useful to scientists and physicians worldwide. Described here, we have developed a database that collects and annotates sequence data and can be accessed through a user-friendly search interface. The HTLV-1 Molecular Epidemiology Database website is available at http://htlv1db.bahia.fiocruz.br/.
MATLAB is a high-performance language for technical computing, integrating computation, visualization, and programming in an easy-to-use environment. It has been widely used in many areas, such as mathematics and computation, algorithm development, data acquisition, modeling, simulation, and scientific and engineering graphics. However, few functions are freely available in MATLAB to perform the sequence data analyses specifically required for molecular biology and evolution.
Next-generation sequencing provides a powerful means of molecular characterization. However, methods such as single-nucleotide polymorphism detection or whole-chromosome sequence analysis are computationally expensive, prone to errors, and are still less accessible than traditional typing methods. Here, we present the Listeria monocytogenes core-genome sequence typing method for molecular characterization. This method uses a high-confidence core (HCC) genome, calculated to ensure accurate identification of orthologs. We also developed an evolutionarily relevant nomenclature based upon phylogenetic analysis of HCC genomes. Finally, we created a pipeline (LmCGST; https://sourceforge.net/projects/lmcgst/files/) that takes in raw next-generation sequencing reads, calculates a subject HCC profile, compares it to an expandable database, assigns a sequence type, and performs a phylogenetic analysis.
The taxonomy and phylogeny of Asian Meconopsis (Himalayan blue poppy) remain largely unresolved. We used the internal transcribed spacer (ITS) region of nuclear ribosomal DNA (nrDNA) and the chloroplast DNA (cpDNA) trnL-F region for phylogenetic reconstruction of Meconopsis and its close relatives Papaver, Roemeria, and Stylomecon. We identified five main clades, which were well-supported in the gene trees reconstructed with the nrDNA ITS and cpDNA trnL-F sequences. We found that 41 species of Asian Meconopsis did not constitute a monophyletic clade, but formed two solid clades (I and V) separated in the phylogenetic tree by three clades (II, III and IV) of Papaver and its allies. Clade V includes only four Asian Meconopsis species, with the remaining 90 percent of Asian species included in clade I. In this core Asian Meconopsis clade, five subclades (Ia-Ie) were recognized in the nrDNA ITS tree. Three species (Meconopsis discigera, M. pinnatifolia, and M. torquata) of subgenus Discogyne were imbedded in subclade Ia, indicating that the present definition of subgenera in Meconopsis should be rejected. These subclades are inconsistent with any series or sections of the present classifications, suggesting that classifications of the genus should be completely revised. Finally, proposals for further revision of the genus Meconopsis were put forward based on molecular, morphological, and biogeographical evidences.
The advent of the DNA sequencing age has led to a revolution in biology. The rapid and cost-effective generation of high-quality sequence data has transformed many fields, including those focused on discovering species and surveying biodiversity, monitoring movement of biological materials, forensic biology, and disease diagnostics. There is a need to build capacity to generate useful sequence data in countries with limited historical access to laboratory resources, so that researchers can benefit from the advantages offered by these data. Commonly used molecular techniques such as DNA extraction, PCR, and DNA sequencing are within the reach of small laboratories in many countries, with the main obstacles to successful implementation being lack of funding and limited practical experience. Here we describe a successful approach that we developed to obtain DNA sequence data during a small DNA barcoding project in Indonesia.
Nucleotide diversity estimates for the genes Cyt-b (cytochrome b) and Co-1 (cytochrome oxidase 1) are analyzed. Genetic divergence of populations (1) and taxa of different rank, such as subspecies, semispecies or/and sibling species (2), species within a genus (3), species from different genera within a family (4), and species from separate families within an order (5) have been compared using a database of p-distances and similar measures. Empirical data for 20,731 vertebrate and invertebrate animal species reveal various and increasing levels of genetic divergence of the sequences of the two genes, Cyt-b and Co-1, in the five groups compared. Mean unweighted scores of p-distances (%) for five groups are: Cyt-b (1) 1.38±0.30, (2) 5.10±0.91, (3) 10.31±0.93, (4) 17.86±1.36, (5) 26.36±3.88 and Co-1 (1) 0.89±0.16, (2) 3.78±1.18, (3) 11.06±0.53, (4) 16.60±0.69, (5) 20.57±0.40. These estimates testify to the applicability of p-distance for most intraspecies and interspecies comparisons of genetic divergence up to the order level for the two genes compared. The results of the analysis of the nucleotide divergence within species and higher taxa of animals suggest that a phyletic evolution in animals is likely to prevail at the molecular level, and speciation mainly corresponds to the geographic or divergence mode (type D1). The prevalence of the D1 speciation mode does not mean that other modes are absent. At least seven possible modes of speciation are considered. The approach suggested that allows recognize the speciation modes formally with the operational genetic criteria. Such approach may help to solve a key problem of the biological species concept, i.e. the lack of ability to monitor in most cases the reproductive isolation barriers between species.
Reduced-representation genome sequencing represents a new source of data for systematics, and its potential utility in interspecific phylogeny reconstruction has not yet been explored. One approach that seems especially promising is the use of inexpensive short-read technologies (e.g., Illumina, SOLiD) to sequence restriction-site associated DNA (RAD)--the regions of the genome that flank the recognition sites of restriction enzymes. In this study, we simulated the collection of RAD sequences from sequenced genomes of different taxa (Drosophila, mammals, and yeasts) and developed a proof-of-concept workflow to test whether informative data could be extracted and used to accurately reconstruct "known" phylogenies of species within each group. The workflow consists of three basic steps: first, sequences are clustered by similarity to estimate orthology; second, clusters are filtered by taxonomic coverage; and third, they are aligned and concatenated for "total evidence" phylogenetic analysis. We evaluated the performance of clustering and filtering parameters by comparing the resulting topologies with well-supported reference trees and we were able to identify conditions under which the reference tree was inferred with high support. For Drosophila, whole genome alignments allowed us to directly evaluate which parameters most consistently recovered orthologous sequences. For the parameter ranges explored, we recovered the best results at the low ends of sequence similarity and taxonomic representation of loci; these generated the largest supermatrices with the highest proportion of missing data. Applications of the method to mammals and yeasts were less successful, which we suggest may be due partly to their much deeper evolutionary divergence times compared to Drosophila (crown ages of approximately 100 and 300 versus 60 Mya, respectively). RAD sequences thus appear to hold promise for reconstructing phylogenetic relationships in younger clades in which sufficient numbers of orthologous restriction sites are retained across species.
High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, DESeq, as an R/Bioconductor package.
The risks associated with re-identification of human genetic data are severely limiting open data sharing in life sciences, even in studies where donor-related genetic variant information is not of primary interest. Here, we developed BAMboozle, a versatile tool to eliminate critical types of sensitive genetic information in human sequence data by reverting aligned reads to the genome reference sequence. Applying BAMboozle to functional genomics data, such as single-cell RNA-seq (scRNA-seq) and scATAC-seq datasets, confirmed the removal of donor-related single nucleotide polymorphisms (SNPs) and indels in a manner that did not disclose the altered positions. Importantly, BAMboozle only removes the genetic sequence variants of the sample (i.e., donor) while preserving other important aspects of the raw sequence data. For example, BAMboozled scRNA-seq data contained accurate cell-type associated gene expression signatures, splice kinetic information, and can be used for methods benchmarking. Altogether, BAMboozle efficiently removes genetic variation in aligned sequence data, which represents a step forward towards open data sharing in many areas of genomics where the genetic variant information is not of primary interest.
The F8 and F9 genes encode for coagulation factor VIII (FVIII) and FIX, respectively, and mutations in these genes are the genetic basis of hemophilia A/B. To determine whether a sequence variation in F8/F9 is a disease-causing mutation, frequency data from a control population is needed. This study aimed to obtain data on sequence variation in F8/F9 in a set of functionally validated control chromosomes of Korean descent.
Although many next-generation sequencing (NGS) read preprocessing tools already existed, we could not find any tool or combination of tools that met our requirements in terms of flexibility, correct handling of paired-end data and high performance. We have developed Trimmomatic as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data.
The amount of publicly available DNA sequence data is drastically increasing, making it a tedious task to create sequence databases necessary for the design of diagnostic assays. The selection of appropriate sequences is especially challenging in genes affected by frequent point mutations such as antibiotic resistance genes. To overcome this issue, we have designed the webtool resiDB, a rapid and user-friendly sequence database manager for bacteria, fungi, viruses, protozoa, invertebrates, plants, archaea, environmental and whole genome shotgun sequence data. It automatically identifies and curates sequence clusters to create custom sequence databases based on user-defined input sequences. A collection of helpful visualization tools gives the user the opportunity to easily access, evaluate, edit, and download the newly created database. Consequently, researchers do no longer have to manually manage sequence data retrieval, deal with hardware limitations, and run multiple independent software tools, each having its own requirements, input and output formats. Our tool was developed within the H2020 project FAPIC aiming to develop a single diagnostic assay targeting all sepsis-relevant pathogens and antibiotic resistance mechanisms. ResiDB is freely accessible to all users through https://residb.ait.ac.at/.
The data presented here includes selection of 5 successfully amplified protein-coding markers for inferring phylogenetic relationships of the family of amphipod crustaceans Niphargidae. These markers have been efficiently amplified from niphargid samples for the first time and present the framework for robust phylogenetic assessment of the family Niphargidae. They are useful for phylogenetic purposes among other amphipod genera as well. In detail, the data consists of two parts: 1. Information regarding markers, specific oligonucleotide primer pairs and conditions for PCR reaction that enables successful amplification of specific nucleotide fragments. Two pairs of novel oligonucleotide primers were constructed which enable partial sequence amplification of two housekeeping genes: arginine kinase (ArgKin) and glyceraldehyde phosphate dehydrogenase (GAPDH), respectively. Additionally, 3 existing combinations of oligonucleotide primer pairs for protein-coding loci for glutamyl-prolyl tRNA synthetase (EPRS), opsin (OP) and phosphoenolpyruvate carboxykinase (PEPCK) were proven to be suitable to amplify specific nucleotide fragments from selected amphipod specimens; 2. Information on novel nucleotide sequences from amphipod taxa of the family Niphagidae and related outgroup taxa. Unilocus phylogenetic trees were constructed using Bayesian analysis and show relationships among selected taxa. Altogether 299 new nucleotide sequences from 92 specimens of the family Niphargidae and related outgroup amphipod taxa are deposited in GenBank (NCBI) repository and available for further use in phylogenetic analyses.
Nanobodies are a class of antigen-binding protein derived from camelids that achieve comparable binding affinities and specificities to classical antibodies, despite comprising only a single 15 kDa variable domain. Their reduced size makes them an exciting target molecule with which we can explore the molecular code that underpins binding specificity-how is such high specificity achieved? Here, we use a novel dataset of 90 nonredundant, protein-binding nanobodies with antigen-bound crystal structures to address this question. To provide a baseline for comparison we construct an analogous set of classical antibodies, allowing us to probe how nanobodies achieve high specificity binding with a dramatically reduced sequence space. Our analysis reveals that nanobodies do not diversify their framework region to compensate for the loss of the VL domain. In addition to the previously reported increase in H3 loop length, we find that nanobodies create diversity by drawing their paratope regions from a significantly larger set of aligned sequence positions, and by exhibiting greater structural variation in their H1 and H2 loops.
Genome-sequencing projects are currently producing an enormous amount of new sequences and cause the rapid increasing of protein sequence databases. The unsupervised classification of these data into functional groups or families, clustering, has become one of the principal research objectives in structural and functional genomics. Computer programs to automatically and accurately classify sequences into families become a necessity. A significant number of methods have addressed the clustering of protein sequences and most of them can be categorized in three major groups: hierarchical, graph-based and partitioning methods. Among the various sequence clustering methods in literature, hierarchical and graph-based approaches have been widely used. Although partitioning clustering techniques are extremely used in other fields, few applications have been found in the field of protein sequence clustering. It is not fully demonstrated if partitioning methods can be applied to protein sequence data and if these methods can be efficient compared to the published clustering methods.
The conventional approach to finding structurally similar search models for use in molecular replacement (MR) is to use the sequence of the target to search against those of a set of known structures. Sequence similarity often correlates with structure similarity. Given sufficient similarity, a known structure correctly positioned in the target cell by the MR process can provide an approximation to the unknown phases of the target. An alternative approach to identifying homologous structures suitable for MR is to exploit the measured data directly, comparing the lattice parameters or the experimentally derived structure-factor amplitudes with those of known structures. Here, SIMBAD, a new sequence-independent MR pipeline which implements these approaches, is presented. SIMBAD can identify cases of contaminant crystallization and other mishaps such as mistaken identity (swapped crystallization trays), as well as solving unsequenced targets and providing a brute-force approach where sequence-dependent search-model identification may be nontrivial, for example because of conformational diversity among identifiable homologues. The program implements a three-step pipeline to efficiently identify a suitable search model in a database of known structures. The first step performs a lattice-parameter search against the entire Protein Data Bank (PDB), rapidly determining whether or not a homologue exists in the same crystal form. The second step is designed to screen the target data for the presence of a crystallized contaminant, a not uncommon occurrence in macromolecular crystallography. Solving structures with MR in such cases can remain problematic for many years, since the search models, which are assumed to be similar to the structure of interest, are not necessarily related to the structures that have actually crystallized. To cater for this eventuality, SIMBAD rapidly screens the data against a database of known contaminant structures. Where the first two steps fail to yield a solution, a final step in SIMBAD can be invoked to perform a brute-force search of a nonredundant PDB database provided by the MoRDa MR software. Through early-access usage of SIMBAD, this approach has solved novel cases that have otherwise proved difficult to solve.
In standard high throughput sequencing analysis, genetic variants are not assigned to a homologous chromosome of origin. This process, called haplotype phasing, can reveal information important for understanding the relationship between genetic variants and biological phenotypes. For example, in genes that carry multiple heterozygous missense variants, phasing resolves whether one or both gene copies are altered. Here, we present a novel approach to phasing variants that takes advantage of unique properties of paired tumor:normal sequencing data from cancer studies.
The MGnify platform (https://www.ebi.ac.uk/metagenomics) facilitates the assembly, analysis and archiving of microbiome-derived nucleic acid sequences. The platform provides access to taxonomic assignments and functional annotations for nearly half a million analyses covering metabarcoding, metatranscriptomic, and metagenomic datasets, which are derived from a wide range of different environments. Over the past 3 years, MGnify has not only grown in terms of the number of datasets contained but also increased the breadth of analyses provided, such as the analysis of long-read sequences. The MGnify protein database now exceeds 2.4 billion non-redundant sequences predicted from metagenomic assemblies. This collection is now organised into a relational database making it possible to understand the genomic context of the protein through navigation back to the source assembly and sample metadata, marking a major improvement. To extend beyond the functional annotations already provided in MGnify, we have applied deep learning-based annotation methods. The technology underlying MGnify's Application Programming Interface (API) and website has been upgraded, and we have enabled the ability to perform downstream analysis of the MGnify data through the introduction of a coupled Jupyter Lab environment.
Welcome to the FDI Lab - SciCrunch.org Resources search. From here you can search through a compilation of resources used by FDI Lab - SciCrunch.org and see how data is organized within our community.
You are currently on the Community Resources tab looking through categories and sources that FDI Lab - SciCrunch.org has compiled. You can navigate through those categories from here or change to a different tab to execute your search through. Each tab gives a different perspective on data.
If you have an account on FDI Lab - SciCrunch.org then you can log in from here to get additional features in FDI Lab - SciCrunch.org such as Collections, Saved Searches, and managing Resources.
Here is the search term that is being executed, you can type in anything you want to search for. Some tips to help searching:
You can save any searches you perform for quick access to later from here.
We recognized your search term and included synonyms and inferred terms along side your term to help get the data you are looking for.
If you are logged into FDI Lab - SciCrunch.org you can add data records to your collections to create custom spreadsheets across multiple sources of data.
Here are the facets that you can filter your papers by.
From here we'll present any options for the literature, such as exporting your current results.
If you have any further questions please check out our FAQs Page to ask questions and see our tutorials. Click this button to view this tutorial again.
Year:
Count: