The data for the public working draft of the human genome contains roughly 400,000 initial sequence contigs in approximately 30,000 large insert clones. Many of these initial sequence contigs overlap. A program, GigAssembler, was built to merge them and to order and orient the resulting larger sequence contigs based on mRNA, paired plasmid ends, EST, BAC end pairs, and other information. This program produced the first publicly available assembly of the human genome, a working draft containing roughly 2.7 billion base pairs and covering an estimated 88% of the genome that has been used for several recent studies of the genome. Here we describe the algorithm used by GigAssembler.
Pubmed ID: 11544197 RIS Download
Mesh terms: Algorithms | Chromosomes, Artificial, Bacterial | Computational Biology | Contig Mapping | Expressed Sequence Tags | Genome, Human | Human Genome Project | Humans | RNA, Messenger | Repetitive Sequences, Nucleic Acid | Sequence Alignment | Software
Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.