Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel.

Nature communications | 2015

Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants.

Pubmed ID: 26368830 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

  • Agency: Wellcome Trust, United Kingdom
    Id: 095564
  • Agency: Medical Research Council, United Kingdom
    Id: MC_UU_12015/1
  • Agency: Wellcome Trust, United Kingdom
    Id: 096599
  • Agency: Wellcome Trust, United Kingdom
    Id: 098498
  • Agency: British Heart Foundation, United Kingdom
    Id: PG/13/66/30442
  • Agency: Wellcome Trust, United Kingdom
    Id: 095515
  • Agency: Medical Research Council, United Kingdom
    Id: MC_UU_12013/4
  • Agency: Medical Research Council, United Kingdom
    Id: MC_UU_12013/3
  • Agency: Medical Research Council, United Kingdom
    Id: MC_UU_12012/5
  • Agency: Wellcome Trust, United Kingdom
    Id: WT098051
  • Agency: Medical Research Council, United Kingdom
    Id: MR/L010305/1
  • Agency: CIHR, Canada
  • Agency: Department of Health, United Kingdom
  • Agency: Medical Research Council, United Kingdom
    Id: MC_UU_12013/1
  • Agency: British Heart Foundation, United Kingdom
    Id: RG/10/17/28553
  • Agency: Medical Research Council, United Kingdom
    Id: G0800509
  • Agency: Wellcome Trust, United Kingdom
    Id: 102215
  • Agency: British Heart Foundation, United Kingdom
    Id: RG/10/13/28570
  • Agency: Medical Research Council, United Kingdom
    Id: MC_PC_15018
  • Agency: Wellcome Trust, United Kingdom
    Id: WT091310
  • Agency: Wellcome Trust, United Kingdom
    Id: 098497
  • Agency: Wellcome Trust, United Kingdom
    Id: 100574
  • Agency: Wellcome Trust, United Kingdom
    Id: 100140
  • Agency: Wellcome Trust, United Kingdom
    Id: 091551

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


IMPUTE2 (tool)

RRID:SCR_013055

A computer program for phasing observed genotypes and imputing missing genotypes.

View all literature mentions

International HapMap Project (tool)

RRID:SCR_002846

THIS RESOURCE IS NO LONGER IN SERVICE, documented August 22, 2016. A multi-country collaboration among scientists and funding agencies to develop a public resource where genetic similarities and differences in human beings are identified and catalogued. Using this information, researchers will be able to find genes that affect health, disease, and individual responses to medications and environmental factors. All of the information generated by the Project will be released into the public domain. Their goal is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared. Public and private organizations in six countries are participating in the International HapMap Project. Data generated by the Project can be downloaded with minimal constraints. HapMap project related data, software, and documentation include: bulk data on genotypes, frequencies, LD data, phasing data, allocated SNPs, recombination rates and hotspots, SNP assays, Perlegen amplicons, raw data, inferred genotypes, and mitochondrial and chrY haplogroups; Generic Genome Browser software; protocols and information on assay design, genotyping and other protocols used in the project; and documentation of samples/individuals and the XML format used in the project.

View all literature mentions

GATK (tool)

RRID:SCR_001876

A software package to analyze next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size. This software library makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner. (entry from Genetic Analysis Software)

View all literature mentions

SAMTOOLS (tool)

RRID:SCR_002105

Original SAMTOOLS package has been split into three separate repositories including Samtools, BCFtools and HTSlib. Samtools for manipulating next generation sequencing data used for reading, writing, editing, indexing,viewing nucleotide alignments in SAM,BAM,CRAM format. BCFtools used for reading, writing BCF2,VCF, gVCF files and calling, filtering, summarising SNP and short indel sequence variants. HTSlib used for reading, writing high throughput sequencing data.

View all literature mentions

UnifiedGenotyper (tool)

RRID:SCR_004710

A multiple-sample, technology-aware SNP and indel caller.

View all literature mentions

SAMtools/BCFtools (tool)

RRID:SCR_005227

Provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.

View all literature mentions

Picard (tool)

RRID:SCR_006525

Java toolset for working with next generation sequencing data in the BAM format.

View all literature mentions