• Register
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.


Leaving Community

Are you sure you want to leave this community? Leaving the community will revoke any permissions you have been granted in this community.


Sim4cc: a cross-species spliced alignment program.

Advances in sequencing technologies have accelerated the sequencing of new genomes, far outpacing the generation of gene and protein resources needed to annotate them. Direct comparison and alignment of existing cDNA sequences from a related species is an effective and readily available means to determine genes in the new genomes. Current spliced alignment programs are inadequate for comparing sequences between different species, owing to their low sensitivity and splice junction accuracy. A new spliced alignment tool, sim4cc, overcomes problems in the earlier tools by incorporating three new features: universal spaced seeds, to increase sensitivity and allow comparisons between species at various evolutionary distances, and powerful splice signal models and evolutionarily-aware alignment techniques, to improve the accuracy of gene models. When tested on vertebrate comparisons at diverse evolutionary distances, sim4cc had significantly higher sensitivity compared to existing alignment programs, more than 10% higher than the closest competitor for some comparisons, while being comparable in speed to its predecessor, sim4. Sim4cc can be used in one-to-one or one-to-many comparisons of genomic and cDNA sequences, and can also be effectively incorporated into a high-throughput annotation engine, as demonstrated by the mapping of 64,000 Fagus grandifolia 454 ESTs and unigenes to the poplar genome.

Pubmed ID: 19429899


  • Zhou L
  • Pertea M
  • Delcher AL
  • Florea L


Nucleic acids research

Publication Data

June 23, 2009

Associated Grants

  • Agency: NLM NIH HHS, Id: R01 LM006845
  • Agency: NLM NIH HHS, Id: R01 LM006845-09
  • Agency: NLM NIH HHS, Id: R01 LM006845-10
  • Agency: NLM NIH HHS, Id: R01-LM006845

Mesh Terms

  • Algorithms
  • Animals
  • Dogs
  • Genome, Plant
  • Genomics
  • Humans
  • Mice
  • RNA Splicing
  • Reference Standards
  • Sequence Alignment
  • Software
  • Vertebrates