Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Utilizing genotype imputation for the augmentation of sequence data.

PloS one | 2010

In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS) for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci.

Pubmed ID: 20543988 RIS Download

Associated grants

  • Agency: NIGMS NIH HHS, United States
    Id: U01 GM61388
  • Agency: NIGMS NIH HHS, United States
    Id: R01 GM28157

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


MACH 1.0 (tool)

RRID:SCR_001759

A Markov Chain based software tool for haplotyping, genotype imputation and disease association analysis that can resolve long haplotypes or infer missing genotypes in samples of unrelated individuals.

View all literature mentions

International HapMap Project (tool)

RRID:SCR_002846

THIS RESOURCE IS NO LONGER IN SERVICE, documented August 22, 2016. A multi-country collaboration among scientists and funding agencies to develop a public resource where genetic similarities and differences in human beings are identified and catalogued. Using this information, researchers will be able to find genes that affect health, disease, and individual responses to medications and environmental factors. All of the information generated by the Project will be released into the public domain. Their goal is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared. Public and private organizations in six countries are participating in the International HapMap Project. Data generated by the Project can be downloaded with minimal constraints. HapMap project related data, software, and documentation include: bulk data on genotypes, frequencies, LD data, phasing data, allocated SNPs, recombination rates and hotspots, SNP assays, Perlegen amplicons, raw data, inferred genotypes, and mitochondrial and chrY haplogroups; Generic Genome Browser software; protocols and information on assay design, genotyping and other protocols used in the project; and documentation of samples/individuals and the XML format used in the project.

View all literature mentions

UCSC Genome Browser (tool)

RRID:SCR_005780

Portal to interactively visualize genomic data. Provides reference sequences and working draft assemblies for collection of genomes and access to ENCODE and Neanderthal projects. Includes collection of vertebrate and model organism assemblies and annotations, along with suite of tools for viewing, analyzing and downloading data.

View all literature mentions

1000 Genomes: A Deep Catalog of Human Genetic Variation (tool)

RRID:SCR_006828

International collaboration producing an extensive public catalog of human genetic variation, including SNPs and structural variants, and their haplotype contexts, in an effort to provide a foundation for investigating the relationship between genotype and phenotype. The genomes of about 2500 unidentified people from about 25 populations around the world were sequenced using next-generation sequencing technologies. Redundant sequencing on various platforms and by different groups of scientists of the same samples can be compared. The results of the study are freely and publicly accessible to researchers worldwide. The consortium identified the following populations whose DNA will be sequenced: Yoruba in Ibadan, Nigeria; Japanese in Tokyo; Chinese in Beijing; Utah residents with ancestry from northern and western Europe; Luhya in Webuye, Kenya; Maasai in Kinyawa, Kenya; Toscani in Italy; Gujarati Indians in Houston; Chinese in metropolitan Denver; people of Mexican ancestry in Los Angeles; and people of African ancestry in the southwestern United States. The goal Project is to find most genetic variants that have frequencies of at least 1% in the populations studied. Sequencing is still too expensive to deeply sequence the many samples being studied for this project. However, any particular region of the genome generally contains a limited number of haplotypes. Data can be combined across many samples to allow efficient detection of most of the variants in a region. The Project currently plans to sequence each sample to about 4X coverage; at this depth sequencing cannot provide the complete genotype of each sample, but should allow the detection of most variants with frequencies as low as 1%. Combining the data from 2500 samples should allow highly accurate estimation (imputation) of the variants and genotypes for each sample that were not seen directly by the light sequencing. All samples from the 1000 genomes are available as lymphoblastoid cell lines (LCLs) and LCL derived DNA from the Coriell Cell Repository as part of the NHGRI Catalog. The sequence and alignment data generated by the 1000genomes project is made available as quickly as possible via their mirrored ftp sites. ftp://ftp.1000genomes.ebi.ac.uk ftp://ftp-trace.ncbi.nlm.nih.gov/1000genomes

View all literature mentions

GWAS: Catalog of Published Genome-Wide Association Studies (tool)

RRID:SCR_012745

Catalog of published genome-wide association studies. Genome-wide set of genetic variants in different individuals to see if any variant is associated with trait and disease. Database of genome-wide association study (GWAS) publications including only those attempting to assay single nucleotide polymorphisms (SNPs). Publications are organized from most to least recent date of publication. Studies are identified through weekly PubMed literature searches, daily NIH-distributed compilations of news and media reports, and occasional comparisons with an existing database of GWAS literature (HuGE Navigator). Works with HANCESTRO ancestry representation.

View all literature mentions

1000 Genomes Project and AWS (tool)

RRID:SCR_008801

A dataset containing the full genomic sequence of 1,700 individuals, freely available for research use. The 1000 Genomes Project is an international research effort coordinated by a consortium of 75 companies and organizations to establish the most detailed catalogue of human genetic variation. The project has grown to 200 terabytes of genomic data including DNA sequenced from more than 1,700 individuals that researchers can now access on AWS for use in disease research free of charge. The dataset containing the full genomic sequence of 1,700 individuals is now available to all via Amazon S3. The data can be found at: http://s3.amazonaws.com/1000genomes The 1000 Genomes Project aims to include the genomes of more than 2,662 individuals from 26 populations around the world, and the NIH will continue to add the remaining genome samples to the data collection this year. Public Data Sets on AWS provide a centralized repository of public data hosted on Amazon Simple Storage Service (Amazon S3). The data can be seamlessly accessed from AWS services such Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic MapReduce (Amazon EMR), which provide organizations with the highly scalable compute resources needed to take advantage of these large data collections. AWS is storing the public data sets at no charge to the community. Researchers pay only for the additional AWS resources they need for further processing or analysis of the data. All 200 TB of the latest 1000 Genomes Project data is available in a publicly available Amazon S3 bucket. You can access the data via simple HTTP requests, or take advantage of the AWS SDKs in languages such as Ruby, Java, Python, .NET and PHP. Researchers can use the Amazon EC2 utility computing service to dive into this data without the usual capital investment required to work with data at this scale. AWS also provides a number of orchestration and automation services to help teams make their research available to others to remix and reuse. Making the data available via a bucket in Amazon S3 also means that customers can crunch the information using Hadoop via Amazon Elastic MapReduce, and take advantage of the growing collection of tools for running bioinformatics job flows, such as CloudBurst and Crossbow.

View all literature mentions

dbSNP (tool)

RRID:SCR_002338

Database as central repository for both single base nucleotide substitutions and short deletion and insertion polymorphisms. Distinguishes report of how to assay SNP from use of that SNP with individuals and populations. This separation simplifies some issues of data representation. However, these initial reports describing how to assay SNP will often be accompanied by SNP experiments measuring allele occurrence in individuals and populations. Community can contribute to this resource.

View all literature mentions

Haploview (tool)

RRID:SCR_003076

A Java based software tool designed to simplify and expedite the process of haplotype analysis by providing a common interface to several tasks relating to such analyses. Haploview currently allows users to examine block structures, generate haplotypes in these blocks, run association tests, and save the data in a number of formats. All functionalities are highly customizable. (entry from Genetic Analysis Software) * LD & haplotype block analysis * haplotype population frequency estimation * single SNP and haplotype association tests * permutation testing for association significance * implementation of Paul de Bakker's Tagger tag SNP selection algorithm. * automatic download of phased genotype data from HapMap * visualization and plotting of PLINK whole genome association results including advanced filtering options Haploview is fully compatible with data dumps from the HapMap project and the Perlegen Genotype Browser. It can analyze thousands of SNPs (tens of thousands in command line mode) in thousands of individuals. Note: Haploview is currently on a development and support freeze. The team is currently looking at a variety of options in order to provide support for the software. Haploview is an open source project hosted by SourceForge. The source can be downloaded at the SourceForge project site.

View all literature mentions