Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Assessment of genotype imputation performance using 1000 Genomes in African American studies.

PloS one | 2012

Genotype imputation, used in genome-wide association studies to expand coverage of single nucleotide polymorphisms (SNPs), has performed poorly in African Americans compared to less admixed populations. Overall, imputation has typically relied on HapMap reference haplotype panels from Africans (YRI), European Americans (CEU), and Asians (CHB/JPT). The 1000 Genomes project offers a wider range of reference populations, such as African Americans (ASW), but their imputation performance has had limited evaluation. Using 595 African Americans genotyped on Illumina's HumanHap550v3 BeadChip, we compared imputation results from four software programs (IMPUTE2, BEAGLE, MaCH, and MaCH-Admix) and three reference panels consisting of different combinations of 1000 Genomes populations (February 2012 release): (1) 3 specifically selected populations (YRI, CEU, and ASW); (2) 8 populations of diverse African (AFR) or European (AFR) descent; and (3) all 14 available populations (ALL). Based on chromosome 22, we calculated three performance metrics: (1) concordance (percentage of masked genotyped SNPs with imputed and true genotype agreement); (2) imputation quality score (IQS; concordance adjusted for chance agreement, which is particularly informative for low minor allele frequency [MAF] SNPs); and (3) average r2hat (estimated correlation between the imputed and true genotypes, for all imputed SNPs). Across the reference panels, IMPUTE2 and MaCH had the highest concordance (91%-93%), but IMPUTE2 had the highest IQS (81%-83%) and average r2hat (0.68 using YRI+ASW+CEU, 0.62 using AFR+EUR, and 0.55 using ALL). Imputation quality for most programs was reduced by the addition of more distantly related reference populations, due entirely to the introduction of low frequency SNPs (MAF≤2%) that are monomorphic in the more closely related panels. While imputation was optimized by using IMPUTE2 with reference to the ALL panel (average r2hat = 0.86 for SNPs with MAF>2%), use of the ALL panel for African American studies requires careful interpretation of the population specificity and imputation quality of low frequency SNPs.

Pubmed ID: 23226329 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

  • Agency: NIDA NIH HHS, United States
    Id: 5R33DA027486
  • Agency: NIDA NIH HHS, United States
    Id: R01 DA025888
  • Agency: NIDA NIH HHS, United States
    Id: R33 DA027486
  • Agency: NIDA NIH HHS, United States
    Id: 5R01DA026141
  • Agency: NIDA NIH HHS, United States
    Id: R01 DA026141

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


International HapMap Project (tool)

RRID:SCR_002846

THIS RESOURCE IS NO LONGER IN SERVICE, documented August 22, 2016. A multi-country collaboration among scientists and funding agencies to develop a public resource where genetic similarities and differences in human beings are identified and catalogued. Using this information, researchers will be able to find genes that affect health, disease, and individual responses to medications and environmental factors. All of the information generated by the Project will be released into the public domain. Their goal is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared. Public and private organizations in six countries are participating in the International HapMap Project. Data generated by the Project can be downloaded with minimal constraints. HapMap project related data, software, and documentation include: bulk data on genotypes, frequencies, LD data, phasing data, allocated SNPs, recombination rates and hotspots, SNP assays, Perlegen amplicons, raw data, inferred genotypes, and mitochondrial and chrY haplogroups; Generic Genome Browser software; protocols and information on assay design, genotyping and other protocols used in the project; and documentation of samples/individuals and the XML format used in the project.

View all literature mentions

PLINK (tool)

RRID:SCR_001757

Open source whole genome association analysis toolset, designed to perform range of basic, large scale analyses in computationally efficient manner. Used for analysis of genotype/phenotype data. Through integration with gPLINK and Haploview, there is some support for subsequent visualization, annotation and storage of results. PLINK 1.9 is improved and second generation of the software.

View all literature mentions

BEAGLE (tool)

RRID:SCR_001789

Software package for analysis of large-scale genetic data sets with hundreds of thousands of markers genotyped on thousands of samples. BEAGLE can * phase genotype data (i.e. infer haplotypes) for unrelated individuals, parent-offspring pairs, and parent-offspring trios. * infer sporadic missing genotype data. * impute ungenotyped markers that have been genotyped in a reference panel. * perform single marker and haplotypic association analysis. * detect genetic regions that are homozygous-by-descent in an individual or identical-by-descent in pairs of individuals. Beagle can also be used in conjunction with PRESTO, a program for fast and flexible permutation testing. PRESTO can compute empirical distributions of order statistics, analyze stratified data, and determine significance levels for one-stage and two-stage genetic association studies. BEAGLE is written in Java and runs on any computing platform with a Java version 1.6 interpreter (e.g. Windows, Unix, Linux, Solaris, Mac).

View all literature mentions

STRUCTURE (tool)

RRID:SCR_002151

Software package for using multi locus genotype data to investigate population structure. Used for inferring presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed. Can be applied to most of commonly used genetic markers, including SNPS, microsatellites, RFLPs and Amplified Fragment Length Polymorphisms.

View all literature mentions

KING (tool)

RRID:SCR_009251

Software toolset that makes use of high-throughput SNP data typically seen in a genome-wide association study (GWAS) for applications such as family relationship inference and population structure identification (entry from Genetic Analysis Software)

View all literature mentions

MaCH-Admix (tool)

RRID:SCR_009598

A genotype imputation software that is an extension to MaCH for faster and more flexible imputaiton, especially in admixed populations. It has incorporated a novel piecewise reference selection method to create reference panels tailored for target individual(s). This reference selection method generates better imputation quality in shorter running time. MaCH-Admix also separates model parameter estimation from imputation. The separation allows users to perform imputation with standard reference panels + pre-calibrated parameters in a data independent fashion. Alternatively, if one works with study-specific reference panels, or isolated target population, one has the option to simultaneously estimate these model parameters while performing imputation. MaCH-Admix has included many other useful options and supports VCF input files. All existing MaCH documentation applies to MaCH-Admix.

View all literature mentions

MACH (tool)

RRID:SCR_009621

QTL analysis based on imputed dosages/posterior_probabilities.

View all literature mentions

IMPUTE2 (tool)

RRID:SCR_013055

A computer program for phasing observed genotypes and imputing missing genotypes.

View all literature mentions