Identity by descent (IBD) has played a fundamental role in the discovery of genetic loci underlying human diseases. Both pedigree-based and population-based linkage analyses rely on estimating recent IBD, and evidence of ancient IBD can be used to detect population structure in genetic association studies. Various methods for detecting IBD, including those implemented in the soft- ware programs fastIBD and GERMLINE, have been developed in the past several years using population genotype data from microarray platforms. Now, next-generation DNA sequencing data is becoming increasingly available, enabling the comprehensive analysis of genomes, in- cluding identifying rare variants. These sequencing data may provide an opportunity to detect IBD with higher resolution than previously possible, potentially enabling the detection of disease causing loci that were previously undetectable with sparser genetic data.
Pubmed ID: 22672699 RIS Download
Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.
Consortium of 50 research groups across the UK to harness the power of newly-available genotyping technologies to improve our understanding of the aetiological basis of several major causes of global disease. The consortium has gathered genotype data for up to 500,000 sites of genome sequence variation (single nucleotide polymorphisms or SNPs) in samples ascertained for the disease phenotypes. Analysis of the genome-wide association data generated has lead to the identification of many SNPs and genes showing evidence of association with disease susceptibility, some of which will be followed up in future studies. In addition, the Consortium has gained important insights into the technical, analytical, methodological and biological aspects of genome-wide association analysis. The core of the study comprised an analysis of 2,000 samples from each of seven diseases (type 1 diabetes, type 2 diabetes, coronary heart disease, hypertension, bipolar disorder, rheumatoid arthritis and Crohn's disease). For each disease, the case samples have been ascertained from sites widely distributed across Great Britain, allowing us to obtain considerable efficiencies by comparing each of these case populations to a common set of 3,000 nationally-ascertained controls also from England, Scotland and Wales. These controls come from two sources: 1,500 are representative samples from the 1958 British Birth Cohort and 1,500 are blood donors recruited by the three national UK Blood Services. One of the questions that the WTCCC study has addressed relates to the relative merits of these alternative strategies for the generation of representative population cohorts. Genotyping for this main Case Control study was conducted by Affymetrix using the (commercial) Affymetrix 500K chip. As part of this study a total of 17,000 samples were typed for 500,000 SNPs. There are two additional components to the study. First, the WTCCC award is part-funding a study of host resistance to infectious diseases in African populations. The same approach has been used to type 2,000 cases of tuberculosis (TB) and 2,000 cases of malaria, as well as 2,000 shared controls. As well as addressing diseases of major global significance, and extending WTCCC coverage into the area of infectious disease, the inclusion of samples of African origin has obvious benefits with respect to methodological aspects of genome-wide association analysis. Second, the WTCCC has, for four additional diseases (autoimmune thyroid disease, breast cancer, ankylosing spondylitis, multiple sclerosis), completed an analysis of 15,000 SNPs designed to represent a large proportion of the known non-synonymous coding SNPs across the genome. This analysis has been performed at the WTSI using a custom Infinium chip (Illumina). Data release The genotypic data of the control samples (1958 British Birth Cohort and UK Blood Service) and from seven diseases analyzed in the main study are now available to qualified researchers. Summary genotype statistics for these collections are available directly from the website. Access to the individual-level genotype data and summary genotype statistics is by application to the Consortium Data Access Committee (CDAC) and approval subject to a Data Access Agreement. WTCCC2: A further round of GWA studies were funded in April 2008. These include 15 WTCCC-collaborative studies and 12 independent studies be supported totaling approximately 120,000 samples. Many of the studies represent major international collaborative networks that have together assembled large sample collections. WTCCC2 will perform genome-wide association studies in 13 disease conditions: Ankylosing spondylitis, Barrett's oesophagus and oesophageal adenocarcinoma, glaucoma, ischaemic stroke, multiple sclerosis, pre-eclampsia, Parkinson's disease, psychosis endophenotypes, psoriasis, schizophrenia, ulcerative colitis and visceral leishmaniasis. WTCCC2 will also investigate the genetics of reading and mathematics abilities in children and the pharmacogenomics of statin response. Over 60,000 samples will be analyzed using either the Affymetrix v6.0 chip or the Illumina 660K chip. The WTCCC2 will also genotype 3,000 controls each from the 1958 British Birth cohort and the UK Blood Service control group, and the 6,000 controls will be genotyped on both the Affymetrix v6.0 and Illumina 1.2M chips. WTCCC3: The Wellcome Trust has provided support for a further round of GWA studies in January 2009. These include 5 WTCCC-collaborative studies to be carried out in WTCCC3 and 5 independent studies, across a range of diseases. Many of the studies represent major international collaborative networks that have together assembled large sample collections. WTCCC3 will perform genome-wide association studies in the following 4 disease conditions: primary biliary cirrhosis, anorexia nervosa, pre-eclampsia in UK subjects, and the interactions between donor and recipient DNA related to early and late renal transplant dysfunction. The WTCCC3 will also carry out a pilot in a study of the genetics of host control of HIV-1 infection. Over 40,000 samples will be analyzed using the Illumina 660K chip. The WTCCC3 will utilize the 6,000 control genotypes generated by the WTCCC2.
View all literature mentionsInternational collaboration producing an extensive public catalog of human genetic variation, including SNPs and structural variants, and their haplotype contexts, in an effort to provide a foundation for investigating the relationship between genotype and phenotype. The genomes of about 2500 unidentified people from about 25 populations around the world were sequenced using next-generation sequencing technologies. Redundant sequencing on various platforms and by different groups of scientists of the same samples can be compared. The results of the study are freely and publicly accessible to researchers worldwide. The consortium identified the following populations whose DNA will be sequenced: Yoruba in Ibadan, Nigeria; Japanese in Tokyo; Chinese in Beijing; Utah residents with ancestry from northern and western Europe; Luhya in Webuye, Kenya; Maasai in Kinyawa, Kenya; Toscani in Italy; Gujarati Indians in Houston; Chinese in metropolitan Denver; people of Mexican ancestry in Los Angeles; and people of African ancestry in the southwestern United States. The goal Project is to find most genetic variants that have frequencies of at least 1% in the populations studied. Sequencing is still too expensive to deeply sequence the many samples being studied for this project. However, any particular region of the genome generally contains a limited number of haplotypes. Data can be combined across many samples to allow efficient detection of most of the variants in a region. The Project currently plans to sequence each sample to about 4X coverage; at this depth sequencing cannot provide the complete genotype of each sample, but should allow the detection of most variants with frequencies as low as 1%. Combining the data from 2500 samples should allow highly accurate estimation (imputation) of the variants and genotypes for each sample that were not seen directly by the light sequencing. All samples from the 1000 genomes are available as lymphoblastoid cell lines (LCLs) and LCL derived DNA from the Coriell Cell Repository as part of the NHGRI Catalog. The sequence and alignment data generated by the 1000genomes project is made available as quickly as possible via their mirrored ftp sites. ftp://ftp.1000genomes.ebi.ac.uk ftp://ftp-trace.ncbi.nlm.nih.gov/1000genomes
View all literature mentionsSoftware application for discovering long shared segments of Identity by Descent (IBD) between pairs of individuals in a large population. It takes as input genotype or haplotype marker data for individuals (as well as an optional known pedigree) and generates a list of all pairwise segmental sharing.
View all literature mentionsTHIS RESOURCE IS NO LONGER IN SERVICE, documented August 22, 2016. A multi-country collaboration among scientists and funding agencies to develop a public resource where genetic similarities and differences in human beings are identified and catalogued. Using this information, researchers will be able to find genes that affect health, disease, and individual responses to medications and environmental factors. All of the information generated by the Project will be released into the public domain. Their goal is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared. Public and private organizations in six countries are participating in the International HapMap Project. Data generated by the Project can be downloaded with minimal constraints. HapMap project related data, software, and documentation include: bulk data on genotypes, frequencies, LD data, phasing data, allocated SNPs, recombination rates and hotspots, SNP assays, Perlegen amplicons, raw data, inferred genotypes, and mitochondrial and chrY haplogroups; Generic Genome Browser software; protocols and information on assay design, genotyping and other protocols used in the project; and documentation of samples/individuals and the XML format used in the project.
View all literature mentionsA dataset containing the full genomic sequence of 1,700 individuals, freely available for research use. The 1000 Genomes Project is an international research effort coordinated by a consortium of 75 companies and organizations to establish the most detailed catalogue of human genetic variation. The project has grown to 200 terabytes of genomic data including DNA sequenced from more than 1,700 individuals that researchers can now access on AWS for use in disease research free of charge. The dataset containing the full genomic sequence of 1,700 individuals is now available to all via Amazon S3. The data can be found at: http://s3.amazonaws.com/1000genomes The 1000 Genomes Project aims to include the genomes of more than 2,662 individuals from 26 populations around the world, and the NIH will continue to add the remaining genome samples to the data collection this year. Public Data Sets on AWS provide a centralized repository of public data hosted on Amazon Simple Storage Service (Amazon S3). The data can be seamlessly accessed from AWS services such Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic MapReduce (Amazon EMR), which provide organizations with the highly scalable compute resources needed to take advantage of these large data collections. AWS is storing the public data sets at no charge to the community. Researchers pay only for the additional AWS resources they need for further processing or analysis of the data. All 200 TB of the latest 1000 Genomes Project data is available in a publicly available Amazon S3 bucket. You can access the data via simple HTTP requests, or take advantage of the AWS SDKs in languages such as Ruby, Java, Python, .NET and PHP. Researchers can use the Amazon EC2 utility computing service to dive into this data without the usual capital investment required to work with data at this scale. AWS also provides a number of orchestration and automation services to help teams make their research available to others to remix and reuse. Making the data available via a bucket in Amazon S3 also means that customers can crunch the information using Hadoop via Amazon Elastic MapReduce, and take advantage of the growing collection of tools for running bioinformatics job flows, such as CloudBurst and Crossbow.
View all literature mentions