Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Construction of representative transcript and protein sets of human, mouse, and rat as a platform for their transcriptome and proteome analysis.

Genomics | 2004

The number of mammalian transcripts identified by full-length cDNA projects and genome sequencing projects is increasing remarkably. Clustering them into a strictly nonredundant and comprehensive set provides a platform for functional analysis of the transcriptome and proteome, but the quality of the clustering and predictive usefulness have previously required manual curation to identify truncated transcripts and inappropriate clustering of closely related sequences. A Representative Transcript and Protein Sets (RTPS) pipeline was previously designed to identify the nonredundant and comprehensive set of mouse transcripts based on clustering of a large mouse full-length cDNA set (FANTOM2). Here we propose an alternative method that is more robust, requires less manual curation, and is applicable to other organisms in addition to mouse. RTPSs of human, mouse, and rat have been produced by this method and used for validation. Their comprehensiveness and quality are discussed by comparison with other clustering approaches. The RTPSs are available at .

Pubmed ID: 15533708 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

None

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


Database of Transcribed Sequences (tool)

RRID:SCR_002334

DoTS (Database Of Transcribed Sequences) is a human and mouse transcript index created from all publicly available transcript sequences. The input sequences are clustered and assembled to form the DoTS Consensus Transcripts that comprise the index. These transcripts are assigned stable identifiers of the form DT.123456 (and are often referred to as dots). The transcripts are in turn clustered to form putative DoTS Genes. These are assigned stable identifiers of the form DG.1234356. As of September 1, 2004, the DoTS annotation team has manually annotated 43,164 human and 78,054 mouse DoTS Transcripts (DTs), corresponding to 3,939 human and 7,752 mouse DoTS Genes (DGs). Use the manually annotated gene query to see the DoTS Transcripts that have been manually annotated. The focus of the DoTS project is integrating the various types of data (e.g., EST sequences, genomic sequence, expression data, functional annotation) in a structured manner which facilitates sophisticated queries that are otherwise not easy to perform. DoTS is built on the GUS Platform which includes a relational database that uses controlled vocabularies and ontologies to ensure that biologically meaningful queries can be posed in a uniform fashion. An easy way to start using the site is to search for DoTS Transcripts using an existing cDNA or mRNA sequence. Click on the BLAST tab at the top of the page and enter your sequence in the form provided. All the transcripts with significant sequence similarity to your query sequence will be displayed. Or use one of the provided queries to retrieve transcripts using a number of criteria. These queries are listed on the query page, which can also be reached by clicking on the tab marked query at the top of the page. Finally, the boolean query page allows these queries to be combined in a variety of ways. Sponsors: Funding provided by -NIH grant RO1-HG-01539-03 -DOE grant DE-FG02-00ER62893

View all literature mentions

UniGene (tool)

RRID:SCR_004405

THIS RESOURCE IS NO LONGER IN SERVICE. Documented on January 11, 2023. Web tool for an organized view of the transcriptome. Collection of the computationally identified transcripts from the same locus. Information on protein similarities, gene expression, cDNA clones, and genomic location. System for automatically partitioning GenBank sequences into a non redundant set of gene oriented clusters.

View all literature mentions

BLAT (tool)

RRID:SCR_011919

Software designed to quickly find sequences of 95% and greater similarity of length 25 bases or more.

View all literature mentions