Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Comparing variant calling algorithms for target-exon sequencing in a large sample.

BMC bioinformatics | 2015

Sequencing studies of exonic regions aim to identify rare variants contributing to complex traits. With high coverage and large sample size, these studies tend to apply simple variant calling algorithms. However, coverage is often heterogeneous; sites with insufficient coverage may benefit from sophisticated calling algorithms used in low-coverage sequencing studies. We evaluate the potential benefits of different calling strategies by performing a comparative analysis of variant calling methods on exonic data from 202 genes sequenced at 24x in 7,842 individuals. We call variants using individual-based, population-based and linkage disequilibrium (LD)-aware methods with stringent quality control. We measure genotype accuracy by the concordance with on-target GWAS genotypes and between 80 pairs of sequencing replicates. We validate selected singleton variants using capillary sequencing.

Pubmed ID: 25884587 RIS Download

Associated grants

  • Agency: NEI NIH HHS, United States
    Id: EY022005
  • Agency: NEI NIH HHS, United States
    Id: R01 EY009859
  • Agency: NEI NIH HHS, United States
    Id: EY007003
  • Agency: NHGRI NIH HHS, United States
    Id: RC2 HG005552
  • Agency: NHGRI NIH HHS, United States
    Id: U01 HG006513
  • Agency: NHGRI NIH HHS, United States
    Id: R01 HG005855
  • Agency: NHGRI NIH HHS, United States
    Id: U54HG003079
  • Agency: NHGRI NIH HHS, United States
    Id: HG005855
  • Agency: NHGRI NIH HHS, United States
    Id: HG006513
  • Agency: NEI NIH HHS, United States
    Id: EY016862
  • Agency: NEI NIH HHS, United States
    Id: R01 EY016862
  • Agency: NHGRI NIH HHS, United States
    Id: U54 HG003079
  • Agency: NHGRI NIH HHS, United States
    Id: HG005552
  • Agency: NEI NIH HHS, United States
    Id: EY09859
  • Agency: NHGRI NIH HHS, United States
    Id: HG007022
  • Agency: NHGRI NIH HHS, United States
    Id: R01 HG007022
  • Agency: NEI NIH HHS, United States
    Id: P30 EY007003
  • Agency: NEI NIH HHS, United States
    Id: F31 EY007003
  • Agency: Medical Research Council, United Kingdom
    Id: G0000067
  • Agency: NEI NIH HHS, United States
    Id: R01 EY022005

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


dbSNP (tool)

RRID:SCR_002338

Database as central repository for both single base nucleotide substitutions and short deletion and insertion polymorphisms. Distinguishes report of how to assay SNP from use of that SNP with individuals and populations. This separation simplifies some issues of data representation. However, these initial reports describing how to assay SNP will often be accompanied by SNP experiments measuring allele occurrence in individuals and populations. Community can contribute to this resource.

View all literature mentions

BWA (tool)

RRID:SCR_010910

Software for aligning sequencing reads against large reference genome. Consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. First for sequence reads up to 100bp, and other two for longer sequences ranged from 70bp to 1Mbp.

View all literature mentions

ANNOVAR (tool)

RRID:SCR_012821

An efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, as well as mouse, worm, fly, yeast and many others). Given a list of variants with chromosome, start position, end position, reference nucleotide and observed nucleotides, ANNOVAR can perform: 1. gene-based annotation. 2. region-based annotation. 3. filter-based annotation. 4. other functionalities. (entry from Genetic Analysis Software)

View all literature mentions

GLFSINGLE/GLFTRIO/GLFMULTIPLES (tool)

RRID:SCR_013128

Software application that is a GLF-based variant caller for next-generation sequencing data. It takes one/three/multiple GLF format genotype likelihood files as input and generates a VCF-format set of variant calls as output. (entry from Genetic Analysis Software)

View all literature mentions

PLINK (tool)

RRID:SCR_001757

Open source whole genome association analysis toolset, designed to perform range of basic, large scale analyses in computationally efficient manner. Used for analysis of genotype/phenotype data. Through integration with gPLINK and Haploview, there is some support for subsequent visualization, annotation and storage of results. PLINK 1.9 is improved and second generation of the software.

View all literature mentions

UnifiedGenotyper (tool)

RRID:SCR_004710

A multiple-sample, technology-aware SNP and indel caller.

View all literature mentions

Picard (tool)

RRID:SCR_006525

Java toolset for working with next generation sequencing data in the BAM format.

View all literature mentions

1000 Genomes Project and AWS (tool)

RRID:SCR_008801

A dataset containing the full genomic sequence of 1,700 individuals, freely available for research use. The 1000 Genomes Project is an international research effort coordinated by a consortium of 75 companies and organizations to establish the most detailed catalogue of human genetic variation. The project has grown to 200 terabytes of genomic data including DNA sequenced from more than 1,700 individuals that researchers can now access on AWS for use in disease research free of charge. The dataset containing the full genomic sequence of 1,700 individuals is now available to all via Amazon S3. The data can be found at: http://s3.amazonaws.com/1000genomes The 1000 Genomes Project aims to include the genomes of more than 2,662 individuals from 26 populations around the world, and the NIH will continue to add the remaining genome samples to the data collection this year. Public Data Sets on AWS provide a centralized repository of public data hosted on Amazon Simple Storage Service (Amazon S3). The data can be seamlessly accessed from AWS services such Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic MapReduce (Amazon EMR), which provide organizations with the highly scalable compute resources needed to take advantage of these large data collections. AWS is storing the public data sets at no charge to the community. Researchers pay only for the additional AWS resources they need for further processing or analysis of the data. All 200 TB of the latest 1000 Genomes Project data is available in a publicly available Amazon S3 bucket. You can access the data via simple HTTP requests, or take advantage of the AWS SDKs in languages such as Ruby, Java, Python, .NET and PHP. Researchers can use the Amazon EC2 utility computing service to dive into this data without the usual capital investment required to work with data at this scale. AWS also provides a number of orchestration and automation services to help teams make their research available to others to remix and reuse. Making the data available via a bucket in Amazon S3 also means that customers can crunch the information using Hadoop via Amazon Elastic MapReduce, and take advantage of the growing collection of tools for running bioinformatics job flows, such as CloudBurst and Crossbow.

View all literature mentions

MACH (tool)

RRID:SCR_009621

QTL analysis based on imputed dosages/posterior_probabilities.

View all literature mentions