Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Sim4cc: a cross-species spliced alignment program.

Nucleic acids research | 2009

Advances in sequencing technologies have accelerated the sequencing of new genomes, far outpacing the generation of gene and protein resources needed to annotate them. Direct comparison and alignment of existing cDNA sequences from a related species is an effective and readily available means to determine genes in the new genomes. Current spliced alignment programs are inadequate for comparing sequences between different species, owing to their low sensitivity and splice junction accuracy. A new spliced alignment tool, sim4cc, overcomes problems in the earlier tools by incorporating three new features: universal spaced seeds, to increase sensitivity and allow comparisons between species at various evolutionary distances, and powerful splice signal models and evolutionarily-aware alignment techniques, to improve the accuracy of gene models. When tested on vertebrate comparisons at diverse evolutionary distances, sim4cc had significantly higher sensitivity compared to existing alignment programs, more than 10% higher than the closest competitor for some comparisons, while being comparable in speed to its predecessor, sim4. Sim4cc can be used in one-to-one or one-to-many comparisons of genomic and cDNA sequences, and can also be effectively incorporated into a high-throughput annotation engine, as demonstrated by the mapping of 64,000 Fagus grandifolia 454 ESTs and unigenes to the poplar genome.

Pubmed ID: 19429899 RIS Download

Research resources used in this publication

None found

Additional research tools detected in this publication

Antibodies used in this publication

None found

Associated grants

  • Agency: NLM NIH HHS, United States
    Id: R01 LM006845
  • Agency: NLM NIH HHS, United States
    Id: R01 LM006845-09
  • Agency: NLM NIH HHS, United States
    Id: R01 LM006845-10
  • Agency: NLM NIH HHS, United States
    Id: R01-LM006845

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


UCSC Genome Browser (tool)

RRID:SCR_005780

Portal to interactively visualize genomic data. Provides reference sequences and working draft assemblies for collection of genomes and access to ENCODE and Neanderthal projects. Includes collection of vertebrate and model organism assemblies and annotations, along with suite of tools for viewing, analyzing and downloading data.

View all literature mentions

VEGA (tool)

RRID:SCR_007907

Central repository for high quality frequently updated manual annotation of vertebrate finished genome sequence. Human, mouse and zebrafish are in the process of being completely annotated, whereas for other species the annotation is only of specific genomic regions of particular biological interest. The majority of the annotation is from the HAVANA group at the Welcome Trust Sanger Institute. Users can BLAST, search for specific text, export, and download data. Genomes and details of the projects for each species are available through the homepages for human mouse and zebrafish. The website is built upon code from the EnsEMBL (http://www.ensembl.org) project. Some Ensembl features are not available in Vega. From the users point of view perhaps the most significant of these is MartView. However due to their inclusion in Ensembl, Vega human and mouse data can be queried using Ensembl MartView. Vega contains annotation of the human MHC region in eight haplotypes, and the LRC region in three haplotypes. Vega also contains annotation on the Insulin Dependent Diabetes (IDD) regions on non-reference assemblies for mouse.

View all literature mentions