Global changes in cannabis legislation after decades of stringent regulation and heightened demand for its industrial and medicinal applications have spurred recent genetic and genomics research. An international research community emerged and identified the need for a web portal to host cannabis-specific datasets that seamlessly integrates multiple data sources and serves omics-type analyses, fostering information sharing. The Tripal platform was used to host public genome assemblies, gene annotations, quantitative trait loci and genetic maps, gene and protein expression data, metabolic profiles and their sample attributes. Single nucleotide polymorphisms were called using public resequencing datasets on three genomes. Additional applications, such as SNP-Seek and MapManJS, were embedded into Tripal. A multi-omics data integration web-service Application Programming Interface (API), developed on top of existing Tripal modules, returns generic tables of samples, properties and values. Use cases demonstrate the API's utility for various omics analyses, enabling researchers to perform multi-omics analyses efficiently.
Pubmed ID: 39469541 RIS Download
Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.
Computable knowledge regarding functions of genes and gene products. GO resources include biomedical ontologies that cover molecular domains of all life forms as well as extensive compilations of gene product annotations to these ontologies that provide largely species-neutral, comprehensive statements about what gene products do. Used to standardize representation of gene and gene product attributes across species and databases.
View all literature mentionsSoftware toolkit for detection and evolutionary analysis of gene synteny and collinearity.
View all literature mentionsOntology that includes crop-specific trait ontologies for several economically important plants like rice, wheat, maize, potato, musa, chickpea and sorghum along with other important domains for crop research such as germplasm, passport, trait measurement scales, experimental design factors etc.
View all literature mentionsSoftware Python package for data analysis providing labeled data structures similar to R data. Provides data structures designed to make working with relational or labeled data. Software as building block for doing practical, real world open source data analysis and manipulation tool.
View all literature mentionsSoftware R package for weighted correlation network analysis. WGCNA is also available as point-and-click application. Unfortunately this application is not maintained anymore. It is known to have compatibility problems with R-2.8.x and newer, and the methods it implements are not all state of the art.
View all literature mentionsOpen source web application to create and share documents that contain live code, equations, visualizations and narrative text. Used for data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning.
View all literature mentionsAdjusting batch effects in microarray expression data using Empirical Bayes methods.
View all literature mentionsOpen source whole genome association analysis toolset, designed to perform range of basic, large scale analyses in computationally efficient manner. Used for analysis of genotype/phenotype data. Through integration with gPLINK and Haploview, there is some support for subsequent visualization, annotation and storage of results. PLINK 1.9 is improved and second generation of the software.
View all literature mentionsWeb application to search nucleotide databases using a nucleotide query. Algorithms: blastn, megablast, discontiguous megablast.
View all literature mentionsSoftware tool that displays large genomics datasets (e.g. gene expression data from Arabidopsis Affymetrix arrays) onto diagrams of metabolic pathways or other biological processes.
View all literature mentionsProvide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
View all literature mentionsSoftware tool for quantifying expression of transcripts using RNA-seq data. Provides fast and bias-aware quantification of transcript expression. Transcriptome-wide quantifier to correct for fragment GC-content bias.
View all literature mentionsTHIS RESOURCE IS NO LONGER IN SERVICE, documented August 29, 2016. A software program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models. Methodology underlying the program includes a minimal sampling strategy for genomic mapping, oligomer chaining for approximate alignment, sandwich DP for splice site detection, and microexon identification with statistical significance testing.
View all literature mentionsSoftware package for functional analysis of sequences by classifying them into families and predicting presence of domains and sites. Scans sequences against InterPro's signatures. Characterizes nucleotide or protein function by matching it with models from several different databases. Used in large scale analysis of whole proteomes, genomes and metagenomes. Available as Web based version and standalone Perl version and SOAP Web Service.
View all literature mentionsSoftware tool for assembling transcripts from RNA-Seq data. Explores surprising computational parallels between assembly of transcriptomes and single cell genomes. Suitable for all kind of organisms. Part of SPAdes package since version 3.9.
View all literature mentionsSoftware tool to identify candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to genome using Tophat and Cufflinks.Starts from FASTA or GFF file. Can scan and retain open reading frames (ORFs) for homology to known proteins by using BlastP or Pfam search and incorporate results into obtained selection. Predictions can then be visualized by using genome browser such as IGV.
View all literature mentionsSoftware suite for ultra fast and sensitive sequence search and clustering. Used to search and cluster huge protein and nucleotide sequence sets. Designed to run on multiple cores and servers.
View all literature mentionsA software package to analyze next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size. This software library makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner. (entry from Genetic Analysis Software)
View all literature mentionsOpen source software tool to manipulate files in GFF format. Used to convert, sort, filter, transform, or cluster genomic features.
View all literature mentionsService providing functional analysis of proteins by classifying them into families and predicting domains and important sites. They combine protein signatures from a number of member databases into a single searchable resource, capitalizing on their individual strengths to produce a powerful integrated database and diagnostic tool. This integrated database of predictive protein signatures is used for the classification and automatic annotation of proteins and genomes. InterPro classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites. InterPro adds in-depth annotation, including GO terms, to the protein signatures. You can access the data programmatically, via Web Services. The member databases use a number of approaches: # ProDom: provider of sequence-clusters built from UniProtKB using PSI-BLAST. # PROSITE patterns: provider of simple regular expressions. # PROSITE and HAMAP profiles: provide sequence matrices. # PRINTS provider of fingerprints, which are groups of aligned, un-weighted Position Specific Sequence Matrices (PSSMs). # PANTHER, PIRSF, Pfam, SMART, TIGRFAMs, Gene3D and SUPERFAMILY: are providers of hidden Markov models (HMMs). Your contributions are welcome. You are encouraged to use the ''''Add your annotation'''' button on InterPro entry pages to suggest updated or improved annotation for individual InterPro entries.
View all literature mentionsDatabase of Cannabis sativa Single Nucleotide Polymorphisms discovered from NGS sequences and BioSamples available in NCBI. Sequence reads were called against cs10, Purple kush, and Finola reference assemblies using GATK and Parabricks pipelines.
View all literature mentionsSoftware Java pipeline for trimming tasks for Illumina paired end and single ended data. Flexible Trimmer for Illumina Sequence Data. Pair aware preprocessing tool optimized for Illumina next generation sequencing data. Includes several processing steps for read trimming and filtering. Operating systems Unix/Linux, Mac OS, Windows.
View all literature mentionsFunctional genomics data repository supporting MIAME-compliant data submissions. Includes microarray-based experiments measuring the abundance of mRNA, genomic DNA, and protein molecules, as well as non-array-based technologies such as serial analysis of gene expression (SAGE) and mass spectrometry proteomic technology. Array- and sequence-based data are accepted. Collection of curated gene expression DataSets, as well as original Series and Platform records. The database can be searched using keywords, organism, DataSet type and authors. DataSet records contain additional resources including cluster tools and differential expression queries.
View all literature mentionsA collaboration involving developers of science-based ontologies who are establishing a set of principles for ontology development with the goal of creating a suite of orthogonal interoperable reference ontologies in the biomedical domain. In addition to a listing of OBO ontologies, this site provides a statement of the OBO Foundry principles, discussion fora, technical infrastructure, and other services to facilitate ontology development. Feedback is welcome and participation encouraged.
View all literature mentionsCollection of data of protein sequence and functional information. Resource for protein sequence and annotation data. Consortium for preservation of the UniProt databases: UniProt Knowledgebase (UniProtKB), UniProt Reference Clusters (UniRef), and UniProt Archive (UniParc), UniProt Proteomes. Collaboration between European Bioinformatics Institute (EMBL-EBI), SIB Swiss Institute of Bioinformatics and Protein Information Resource. Swiss-Prot is a curated subset of UniProtKB.
View all literature mentionsCollection of curated, non-redundant genomic DNA, transcript RNA, and protein sequences produced by NCBI. Provides a reference for genome annotation, gene identification and characterization, mutation and polymorphism analysis, expression studies, and comparative analyses. Accessed through the Nucleotide and Protein databases.
View all literature mentionsWeb search tool to find regions of similarity between biological sequences. Program compares nucleotide or protein sequences to sequence databases and calculates statistical significance. Used for identifying homologous sequences.
View all literature mentionsAn open source data warehouse system built for the integration and analysis of complex biological data that enables the creation of biological databases accessed by sophisticated web query tools. Parsers are provided for integrating data from many common biological data sources and formats, and there is a framework for adding data. InterMine includes a user-friendly web interface that works "out of the box" and can be easily customized for specific needs, as well as a powerful, scriptable web-service API to allow programmatic access to data.
View all literature mentionsRelational database schema that underlies many GMOD installations. It is capable of representing many of the general classes of data frequently encountered in modern biology such as sequence, sequence comparisons, phenotypes, genotypes, ontologies, publications, and phylogeny. It has been designed to handle complex representations of biological knowledge and should be considered one of the most sophisticated relational schemas currently available in molecular biology. The price of this capability is that the new user must spend some time becoming familiar with its fundamentals.
View all literature mentionsA high-performance visualization tool for interactive exploration of large, integrated genomic datasets written primarily in JavaScript. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.
View all literature mentionsMetaCyc is a database of nonredundant, experimentally elucidated metabolic pathways. MetaCyc contains more than 1,200 pathways from more than 1,600 different organisms, and is curated from the scientific experimental literature. MetaCyc contains pathways involved in both primary and secondary metabolism, as well as associated compounds, enzymes, and genes.
View all literature mentionsRepository of raw sequencing data from next generation of sequencing platforms including including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, Helicos Heliscope, Complete Genomics, and Pacific Biosciences SMRT. In addition to raw sequence data, SRA now stores alignment information in form of read placements on reference sequence. Data submissions are welcome. Archive of high throughput sequencing data,part of international partnership of archives (INSDC) at NCBI, European Bioinformatics Institute and DNA Database of Japan. Data submitted to any of this three organizations are shared among them.
View all literature mentions