Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Haplotype estimation using sequencing reads.

American journal of human genetics | 2013

High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved.

Pubmed ID: 24094745 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

  • Agency: Wellcome Trust, United Kingdom
    Id: 090532
  • Agency: Medical Research Council, United Kingdom
    Id: G0801823

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


BEAGLE (tool)

RRID:SCR_001789

Software package for analysis of large-scale genetic data sets with hundreds of thousands of markers genotyped on thousands of samples. BEAGLE can * phase genotype data (i.e. infer haplotypes) for unrelated individuals, parent-offspring pairs, and parent-offspring trios. * infer sporadic missing genotype data. * impute ungenotyped markers that have been genotyped in a reference panel. * perform single marker and haplotypic association analysis. * detect genetic regions that are homozygous-by-descent in an individual or identical-by-descent in pairs of individuals. Beagle can also be used in conjunction with PRESTO, a program for fast and flexible permutation testing. PRESTO can compute empirical distributions of order statistics, analyze stratified data, and determine significance levels for one-stage and two-stage genetic association studies. BEAGLE is written in Java and runs on any computing platform with a Java version 1.6 interpreter (e.g. Windows, Unix, Linux, Solaris, Mac).

View all literature mentions

VCFtools (tool)

RRID:SCR_012092

Software package for working with VCF files. Used to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.Implements various utilities for processing Variant Call Format files, including validation, merging, comparing. Provides general Perl API.

View all literature mentions

SAM format (tool)

RRID:SCR_012093

A generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms.

View all literature mentions

VCFtools (tool)

RRID:SCR_001235

Software package for working with VCF files. Used to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.Implements various utilities for processing Variant Call Format files, including validation, merging, comparing. Provides general Perl API.

View all literature mentions