Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Integration of full-length transcriptomics and targeted metabolomics to identify benzylisoquinoline alkaloid biosynthetic genes in Corydalis yanhusuo.

Horticulture research | 2021

Corydalis yanhusuo W.T. Wang is a classic herb that is frequently used in traditional Chinese medicine and is efficacious in promoting blood circulation, enhancing energy, and relieving pain. Benzylisoquinoline alkaloids (BIAs) are the main bioactive ingredients in Corydalis yanhusuo. However, few studies have investigated the BIA biosynthetic pathway in C. yanhusuo, and the biosynthetic pathway of species-specific chemicals such as tetrahydropalmatine remains unclear. We performed full-length transcriptomic and metabolomic analyses to identify candidate genes that might be involved in BIA biosynthesis and identified a total of 101 full-length transcripts and 19 metabolites involved in the BIA biosynthetic pathway. Moreover, the contents of 19 representative BIAs in C. yanhusuo were quantified by classical targeted metabolomic approaches. Their accumulation in the tuber was consistent with the expression patterns of identified BIA biosynthetic genes in tubers and leaves, which reinforces the validity and reliability of the analyses. Full-length genes with similar expression or enrichment patterns were identified, and a complete BIA biosynthesis pathway in C. yanhusuo was constructed according to these findings. Phylogenetic analysis revealed a total of ten enzymes that may possess columbamine-O-methyltransferase activity, which is the final step for tetrahydropalmatine synthesis. Our results span the whole BIA biosynthetic pathway in C. yanhusuo. Our full-length transcriptomic data will enable further molecular cloning of enzymes and activity validation studies.

Pubmed ID: 33423040 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

None

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


GenBank (tool)

RRID:SCR_002760

NIH genetic sequence database that provides annotated collection of all publicly available DNA sequences for almost 280 000 formally described species (Jan 2014) .These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using web-based BankIt or standalone Sequin programs, and GenBank staff assigns accession numbers upon data receipt. It is part of International Nucleotide Sequence Database Collaboration and daily data exchange with European Nucleotide Archive (ENA) and DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through NCBI Entrez retrieval system, which integrates data from major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of GenBank database are available by FTP.

View all literature mentions

Pfam (tool)

RRID:SCR_004726

A database of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Users can analyze protein sequences for Pfam matches, view Pfam family annotation and alignments, see groups of related families, look at the domain organization of a protein sequence, find the domains on a PDB structure, and query Pfam by keywords. There are two components to Pfam: Pfam-A and Pfam-B. Pfam-A entries are high quality, manually curated families that may automatically generate a supplement using the ADDA database. These automatically generated entries are called Pfam-B. Although of lower quality, Pfam-B families can be useful for identifying functionally conserved regions when no Pfam-A entries are found. Pfam also generates higher-level groupings of related families, known as clans (collections of Pfam-A entries which are related by similarity of sequence, structure or profile-HMM).

View all literature mentions

KOBAS (tool)

RRID:SCR_006350

Web server to identify statistically enriched pathways, diseases, and GO terms for a set of genes or proteins, using pathway, disease, and GO knowledge from multiple famous databases. It allows for both ID mapping and cross-species sequence similarity mapping. It then performs statistical tests to identify statistically significantly enriched pathways and diseases. KOBAS 2.0 incorporates knowledge across 1327 species from 5 pathway databases (KEGG PATHWAY, PID, BioCyc, Reactome and Panther) and 5 human disease databases (OMIM, KEGG DISEASE, FunDO, GAD and NHGRI GWAS Catalog). A standalone command line version is also available

View all literature mentions

HMDB (tool)

RRID:SCR_007712

Curated collection of human metabolite and human metabolism data which contains records for endogenous metabolites, with each metabolite entry containing detailed chemical, physical, biochemical, concentration, and disease information. This is further supplemented with thousands of NMR and MS spectra collected on purified reference metabolites.

View all literature mentions

MUSCLE (tool)

RRID:SCR_011812

Multiple sequence alignment method with reduced time and space complexity.Multiple sequence alignment with high accuracy and high throughput. Data analysis service for multiple sequence comparison by log- expectation.

View all literature mentions

KEGG (tool)

RRID:SCR_012773

Integrated database resource consisting of 16 main databases, broadly categorized into systems information, genomic information, and chemical information. In particular, gene catalogs in completely sequenced genomes are linked to higher-level systemic functions of cell, organism, and ecosystem. Analysis tools are also available. KEGG may be used as reference knowledge base for biological interpretation of large-scale datasets generated by sequencing and other high-throughput experimental technologies.

View all literature mentions

RSEM (tool)

RRID:SCR_013027

Software package for quantifying gene and isoform abundances from single end or paired end RNA Seq data. Accurate transcript quantification from RNA Seq data with or without reference genome. Used for accurate quantification of gene and isoform expression from RNA-Seq data.

View all literature mentions

MassLynx (tool)

RRID:SCR_014271

Software which can acquire, analyze, manage, and share mass spectrometry data. MassLynx controls any Waters mass spectrometry system, from sample and solvent management components to mass spectrometer and auxiliary detectors. The software can acquire nominal mass, exact mass, MS/MS and exact mass MS/MS data. The software system also maintains and consolidates all user sample data. Optional Application Manager programs provide additional information for specific MS analyses and data.

View all literature mentions

DIAMOND (tool)

RRID:SCR_016071

Software that performs sequence alignment for protein and translated DNA searches and functions. Used for high performance analysis of big sequence data, protein-protein search, and DNA-protein search.

View all literature mentions