Insertions and deletions (indels) are important types of structural variations. Obtaining accurate genotypes of indels may facilitate further genetic study. There are a few existing methods for calling indel genotypes from sequence reads. However, none of these tools can accurately call indel genotypes for indels of all lengths, especially for low coverage sequence data. In this paper, we present GINDEL, an approach for calling genotypes of both insertions and deletions from sequence reads. GINDEL uses a machine learning approach which combines multiple features extracted from next generation sequencing data. We test our approach on both simulated and real data and compare with existing tools, including Genome STRiP, Pindel and Clever-sv. Results show that GINDEL works well for deletions larger than 50 bp on both high and low coverage data. Also, GINDEL performs well for insertion genotyping on both simulated and real data. For comparison, Genome STRiP performs less well for shorter deletions (50-200 bp) on both simulated and real sequence data from the 1000 Genomes Project. Clever-sv performs well for intermediate deletions (200-1500 bp) but is less accurate when coverage is low. Pindel only works well for high coverage data, but does not perform well at low coverage. To summarize, we show that GINDEL not only can call genotypes of insertions and deletions (both short and long) for high and low coverage population sequence data, but also is more accurate and efficient than other approaches. The program GINDEL can be downloaded at: http://sourceforge.net/p/gindel.
Pubmed ID: 25423315 RIS Download
Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.
International collaboration producing an extensive public catalog of human genetic variation, including SNPs and structural variants, and their haplotype contexts, in an effort to provide a foundation for investigating the relationship between genotype and phenotype. The genomes of about 2500 unidentified people from about 25 populations around the world were sequenced using next-generation sequencing technologies. Redundant sequencing on various platforms and by different groups of scientists of the same samples can be compared. The results of the study are freely and publicly accessible to researchers worldwide. The consortium identified the following populations whose DNA will be sequenced: Yoruba in Ibadan, Nigeria; Japanese in Tokyo; Chinese in Beijing; Utah residents with ancestry from northern and western Europe; Luhya in Webuye, Kenya; Maasai in Kinyawa, Kenya; Toscani in Italy; Gujarati Indians in Houston; Chinese in metropolitan Denver; people of Mexican ancestry in Los Angeles; and people of African ancestry in the southwestern United States. The goal Project is to find most genetic variants that have frequencies of at least 1% in the populations studied. Sequencing is still too expensive to deeply sequence the many samples being studied for this project. However, any particular region of the genome generally contains a limited number of haplotypes. Data can be combined across many samples to allow efficient detection of most of the variants in a region. The Project currently plans to sequence each sample to about 4X coverage; at this depth sequencing cannot provide the complete genotype of each sample, but should allow the detection of most variants with frequencies as low as 1%. Combining the data from 2500 samples should allow highly accurate estimation (imputation) of the variants and genotypes for each sample that were not seen directly by the light sequencing. All samples from the 1000 genomes are available as lymphoblastoid cell lines (LCLs) and LCL derived DNA from the Coriell Cell Repository as part of the NHGRI Catalog. The sequence and alignment data generated by the 1000genomes project is made available as quickly as possible via their mirrored ftp sites. ftp://ftp.1000genomes.ebi.ac.uk ftp://ftp-trace.ncbi.nlm.nih.gov/1000genomes
View all literature mentionsAn independent federal agency created by Congress to promote the progress of science; to advance the national health, prosperity, and welfare; to secure the national defense They are the funding source for approximately 20 percent of all federally supported basic research conducted by America''s colleges and universities. In many fields such as mathematics, computer science and the social sciences, NSF is the major source of federal backing. NSF leadership has two major components: a director who oversees NSF staff and management responsible for program creation and administration, merit review, planning, budget and day-to-day operations; and a 24-member National Science Board (NSB) of eminent individuals that meets six times a year to establish the overall policies of the foundation.The director and all Board members serve six year terms. Each of them, as well as the NSF deputy director, is appointed by the President of the United States and confirmed by the U.S. Senate. At present, NSF has a total workforce of about 2,100 at its Arlington, Va., headquarters, including approximately 1,400 career employees, 200 scientists from research institutions on temporary duty, 450 contract workers and the staff of the NSB office and the Office of the Inspector General. NSF is the only federal agency whose mission includes support for all fields of fundamental science and engineering, except for medical sciences. They are tasked with keeping the United States at the leading edge of discovery in areas from astronomy to geology to zoology. So, in addition to funding research in the traditional academic areas, the agency also supports high-risk, high pay-off ideas, novel collaborations and numerous projects that may seem like science fiction today, but which the public will take for granted tomorrow. And in every case, they ensure that research is fully integrated with education so that today''s revolutionary work will also be training tomorrow''s top scientists and engineers NSF''s task of identifying and funding work at the frontiers of science and engineering is not a top-down process.
View all literature mentions