Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

A haplotype-resolved genome assembly of Rhododendron vialii based on PacBio HiFi reads and Hi-C data.

Scientific data | 2023

Rhododendron vialii (subgen. Azaleastrum) is an evergreen shrub with high ornamental value. This species has been listed as a plant species with extremely small populations (PSESP) for urgent protection by China's Yunnan provincial government in 2021, due to anthropogenic habitat fragmentation. However, limited genomic resources hinder scientifically understanding of genetic threats that the species is currently facing. In this study, we assembled a high-quality haplotype-resolved genome of R. vialii based on PacBio HiFi long reads and Hi-C reads. The assembly contains two haploid genomes with sizes 532.73 Mb and 521.98 Mb, with contig N50 length of 35.67 Mb and 34.70 Mb, respectively. About 99.92% of the assembled sequences could be anchored to 26 pseudochromosomes, and 14 gapless assembled chromosomes were included in this assembly. Additionally, 60,926 protein-coding genes were identified, of which 93.82% were functionally annotated. This is the first reported genome of R. vialii, and hopefully it will lay the foundations for further research into the conservation genomics and horticultural domestication of this ornamentally important species.

Pubmed ID: 37438373 RIS Download

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.

BLASTN (tool)


Web application to search nucleotide databases using a nucleotide query. Algorithms: blastn, megablast, discontiguous megablast.

View all literature mentions

BLASTX (tool)


Web application to search protein databases using a translated nucleotide query. Translated BLAST services are useful when trying to find homologous proteins to a nucleotide coding region. Blastx compares translational products of the nucleotide query sequence to a protein database. Because blastx translates the query sequence in all six reading frames and provides combined significance statistics for hits to different frames, it is particularly useful when the reading frame of the query sequence is unknown or it contains errors that may lead to frame shifts or other coding errors. Thus blastx is often the first analysis performed with a newly determined nucleotide sequence and is used extensively in analyzing EST sequences. This search is more sensitive than nucleotide blast since the comparison is performed at the protein level.

View all literature mentions

eggNOG (tool)


A database of orthologous groups of genes. The orthologous groups are annotated with functional description lines (derived by identifying a common denominator for the genes based on their various annotations), with functional categories (i.e derived from the original COG/KOG categories). eggNOG's database currently counts 1.7 million orthologous groups in 3686 species, covering over 7.7 million proteins (built from 9.6 million proteins). (Jan 30, 2014)

View all literature mentions

Pfam (tool)


A database of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Users can analyze protein sequences for Pfam matches, view Pfam family annotation and alignments, see groups of related families, look at the domain organization of a protein sequence, find the domains on a PDB structure, and query Pfam by keywords. There are two components to Pfam: Pfam-A and Pfam-B. Pfam-A entries are high quality, manually curated families that may automatically generate a supplement using the ADDA database. These automatically generated entries are called Pfam-B. Although of lower quality, Pfam-B families can be useful for identifying functionally conserved regions when no Pfam-A entries are found. Pfam also generates higher-level groupings of related families, known as clans (collections of Pfam-A entries which are related by similarity of sequence, structure or profile-HMM).

View all literature mentions

PANTHER (tool)


System that classifies genes by their functions, using published scientific experimental evidence and evolutionary relationships to predict function even in absence of direct experimental evidence. Orthologs view is curated orthology relationships between genes for human, mouse, rat, fish, worm, and fly.

View all literature mentions

InterPro (tool)


Service providing functional analysis of proteins by classifying them into families and predicting domains and important sites. They combine protein signatures from a number of member databases into a single searchable resource, capitalizing on their individual strengths to produce a powerful integrated database and diagnostic tool. This integrated database of predictive protein signatures is used for the classification and automatic annotation of proteins and genomes. InterPro classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites. InterPro adds in-depth annotation, including GO terms, to the protein signatures. You can access the data programmatically, via Web Services. The member databases use a number of approaches: # ProDom: provider of sequence-clusters built from UniProtKB using PSI-BLAST. # PROSITE patterns: provider of simple regular expressions. # PROSITE and HAMAP profiles: provide sequence matrices. # PRINTS provider of fingerprints, which are groups of aligned, un-weighted Position Specific Sequence Matrices (PSSMs). # PANTHER, PIRSF, Pfam, SMART, TIGRFAMs, Gene3D and SUPERFAMILY: are providers of hidden Markov models (HMMs). Your contributions are welcome. You are encouraged to use the ''''Add your annotation'''' button on InterPro entry pages to suggest updated or improved annotation for individual InterPro entries.

View all literature mentions

Augustus (tool)


Software for gene prediction in eukaryotic genomic sequences. Serves as a basis for further steps in the analysis of sequenced and assembled eukaryotic genomes.

View all literature mentions

RepeatMasker (tool)


Software tool that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). Currently over 56% of human genomic sequence is identified and masked by the program. Sequence comparisons in RepeatMasker are performed by one of several popular search engines including nhmmer, cross_match, ABBlast/WUBlast, RMBlast and Decypher. RepeatMasker makes use of curated libraries of repeats and currently supports Dfam ( profile HMM library ) and RepBase ( consensus sequence library ).

View all literature mentions

PASA (tool)


Gene structure annotation and analysis tool that uses spliced alignments of expressed transcript sequences to automatically model gene structures. It also incorporates gene structures based on transcript alignments into existing gene structure annotations. It is one component of a larger eukayotic annotation pipeline implemented at the Broad Institute.

View all literature mentions

StringTie (tool)


Software application for assembling of RNA-Seq alignments into potential transcripts. It enables improved reconstruction of a transcriptome from RNA-seq reads. This transcript assembling and quantification program is implemented in C++ .

View all literature mentions

Semi-Manual Alignment to Reference Templates (tool)


Software tool that extends WholeBrain framework in R for segmenting and registering experimental images to Allen Mouse Common Coordinate Framework (CCF). Streamlines processing of large volumetric LSFM datasets and solves issues with non-uniform morphing across anterior-posterior axis with interactive “choice game.” Accounts for duplicate cell counts in adjacent z images and presents new ways to easily parse apart and interactively visualize final mapped datasets.

View all literature mentions