Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Publication

Full-length transcript sequencing accelerates the transcriptome research of Gymnocypris namensis, an iconic fish of the Tibetan Plateau.

Scientific reports | 2020

Gymnocypris namensis, the only commercial fish in Namtso Lake of Tibet in China, is rated as nearly threatened species in the Red List of China's Vertebrates. As one of the highest-altitude schizothorax fish in China, G. namensis has strong adaptability to the plateau harsh environment. Although being an indigenous economic fish with high value in research, the biological characterization, genetic diversity, and plateau adaptability of G. namensis are still unclear. Here, we used Pacific Biosciences single molecular real time long read sequencing technology to generate full-length transcripts of G. namensis. Sequences clustering analysis and error correction with Illumina-produced short reads to obtain 319,044 polished isoforms. After removing redundant reads, 125,396 non-redundant isoforms were obtained. Among all transcripts, 103,286 were annotated to public databases. Natural selection has acted on 42 genes for G. namensis, which were enriched on the functions of mismatch repair and Glutathione metabolism. Total 89,736 open reading frames, 95,947 microsatellites, and 21,360 long non-coding RNAs were identified across all transcripts. This is the first study of transcriptome in G. namensis by using PacBio Iso-seq. The acquisition of full-length transcript isoforms might accelerate the transcriptome research of G. namensis and provide basis for further research.

Pubmed ID: 32541658 RIS Download

Research resources used in this publication

None found

Additional research tools detected in this publication

Antibodies used in this publication

None found

Associated grants

None

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.

Pfam (tool)

RRID:SCR_004726

A database of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Users can analyze protein sequences for Pfam matches, view Pfam family annotation and alignments, see groups of related families, look at the domain organization of a protein sequence, find the domains on a PDB structure, and query Pfam by keywords. There are two components to Pfam: Pfam-A and Pfam-B. Pfam-A entries are high quality, manually curated families that may automatically generate a supplement using the ADDA database. These automatically generated entries are called Pfam-B. Although of lower quality, Pfam-B families can be useful for identifying functionally conserved regions when no Pfam-A entries are found. Pfam also generates higher-level groupings of related families, known as clans (collections of Pfam-A entries which are related by similarity of sequence, structure or profile-HMM).

View all literature mentions

Promega (tool)

RRID:SCR_006724

An Antibody supplier

View all literature mentions

OrthoMCL DB: Ortholog Groups of Protein Sequences (tool)

RRID:SCR_007839

OrthoMCL is a genome-scale algorithm for grouping orthologous protein sequences. It provides not only groups shared by two or more species/genomes, but also groups representing species-specific gene expansion families. OrthoMCL starts with reciprocal best hits within each genome as putative in-paralog/recent paralog pairs and reciprocal best hits across any two genomes as putative ortholog pairs. Related proteins are interlinked in a similarity graph. Then MCL (Markov Clustering algorithm,Van Dongen 2000; www.micans.org/mcl) is invoked to split mega-clusters. This process is analogous to the manual review in COG construction. MCL clustering is based on weights between each pair of proteins, so to correct for differences in evolutionary distance the weights are normalized before running MCL.

View all literature mentions

MISA (tool)

RRID:SCR_010765

Software tool that allows the identification and localization of perfect microsatellites as well as compound microsatellites which are interrupted by a certain number of bases.

View all literature mentions

KEGG (tool)

RRID:SCR_012773

Integrated database resource consisting of 16 main databases, broadly categorized into systems information, genomic information, and chemical information. In particular, gene catalogs in completely sequenced genomes are linked to higher-level systemic functions of cell, organism, and ecosystem. Analysis tools are also available. KEGG may be used as reference knowledge base for biological interpretation of large-scale datasets generated by sequencing and other high-throughput experimental technologies.

View all literature mentions

Agilent Technologies (tool)

RRID:SCR_013575

Company provides laboratories worldwide with analytical instruments and supplies, clinical and diagnostic testing services, consumables, applications and expertise in life sciences and applied chemical markets.

View all literature mentions

TransDecoder (tool)

RRID:SCR_017647

Software tool to identify candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to genome using Tophat and Cufflinks.Starts from FASTA or GFF file. Can scan and retain open reading frames (ORFs) for homology to known proteins by using BlastP or Pfam search and incorporate results into obtained selection. Predictions can then be visualized by using genome browser such as IGV.

View all literature mentions

About

The SciCrunch Infrastructure was developed as a cooperative data platform to be used by diverse communities in making data more FAIR.

Contact Us

FAIR Data Informatics Lab

University of California, San Diego

9500 Gilman Drive, Mail Code 0608

La Jolla, CA 92093-0608

United States

info

scicrunch.org

About SciCrunch | Privacy Policy | Terms of Service

Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

Full-length transcript sequencing accelerates the transcriptome research of Gymnocypris namensis, an iconic fish of the Tibetan Plateau.

Research resources used in this publication

Additional research tools detected in this publication

Antibodies used in this publication

Associated grants