Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Predicted gene expression in ancestrally diverse populations leads to discovery of susceptibility loci for lifestyle and cardiometabolic traits.

American journal of human genetics | 2022

One mechanism by which genetic factors influence complex traits and diseases is altering gene expression. Direct measurement of gene expression in relevant tissues is rarely tenable; however, genetically regulated gene expression (GReX) can be estimated using prediction models derived from large multi-omic datasets. These approaches have led to the discovery of many gene-trait associations, but whether models derived from predominantly European ancestry (EA) reference panels can map novel associations in ancestrally diverse populations remains unclear. We applied PrediXcan to impute GReX in 51,520 ancestrally diverse Population Architecture using Genomics and Epidemiology (PAGE) participants (35% African American, 45% Hispanic/Latino, 10% Asian, and 7% Hawaiian) across 25 key cardiometabolic traits and relevant tissues to identify 102 novel associations. We then compared associations in PAGE to those in a random subset of 50,000 White British participants from UK Biobank (UKBB50k) for height and body mass index (BMI). We identified 517 associations across 47 tissues in PAGE but not UKBB50k, demonstrating the importance of diverse samples in identifying trait-associated GReX. We observed that variants used in PrediXcan models were either more or less differentiated across continental-level populations than matched-control variants depending on the specific population reflecting sampling bias. Additionally, variants from identified genes specific to either PAGE or UKBB50k analyses were more ancestrally differentiated than those in genes detected in both analyses, underlining the value of population-specific discoveries. This suggests that while EA-derived transcriptome imputation models can identify new associations in non-EA populations, models derived from closely matched reference panels may yield further insights. Our findings call for more diversity in reference datasets of tissue-specific gene expression.

Pubmed ID: 35263625 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

  • Agency: NHGRI NIH HHS, United States
    Id: U01 HG007417
  • Agency: WHI NIH HHS, United States
    Id: 75N92021D00005
  • Agency: NHGRI NIH HHS, United States
    Id: U01 HG004790
  • Agency: NHGRI NIH HHS, United States
    Id: U01 HG007416
  • Agency: NIH HHS, United States
    Id: S10 OD028685
  • Agency: NHGRI NIH HHS, United States
    Id: U01 HG007376
  • Agency: NCI NIH HHS, United States
    Id: P50 CA058223
  • Agency: NHLBI NIH HHS, United States
    Id: 75N92021D00001
  • Agency: NHLBI NIH HHS, United States
    Id: T32 HL007824
  • Agency: NHLBI NIH HHS, United States
    Id: R01 HL151152
  • Agency: NIDDK NIH HHS, United States
    Id: P30 DK020541
  • Agency: NHLBI NIH HHS, United States
    Id: R01 HL142825
  • Agency: NICHD NIH HHS, United States
    Id: R01 HD057194
  • Agency: NHLBI NIH HHS, United States
    Id: T32 HL007055
  • Agency: NHLBI NIH HHS, United States
    Id: R01 HL149683
  • Agency: Medical Research Council, United Kingdom
    Id: MC_PC_17228
  • Agency: NHLBI NIH HHS, United States
    Id: 75N92021D00002
  • Agency: NHLBI NIH HHS, United States
    Id: K99 HL130580
  • Agency: NHLBI NIH HHS, United States
    Id: R01 HL143885
  • Agency: NHGRI NIH HHS, United States
    Id: U01 HG007419
  • Agency: NHLBI NIH HHS, United States
    Id: R01 HL142302
  • Agency: NHGRI NIH HHS, United States
    Id: L30 HG009840
  • Agency: NHLBI NIH HHS, United States
    Id: T32 HL129982
  • Agency: NHGRI NIH HHS, United States
    Id: U01 HG004729
  • Agency: NIMH NIH HHS, United States
    Id: R01 MH118278
  • Agency: NHGRI NIH HHS, United States
    Id: R56 HG010297
  • Agency: WHI NIH HHS, United States
    Id: 75N92021D00003
  • Agency: NHGRI NIH HHS, United States
    Id: R01 HG010297
  • Agency: NHLBI NIH HHS, United States
    Id: R21 HL140419
  • Agency: NHGRI NIH HHS, United States
    Id: U01 HG007397
  • Agency: WHI NIH HHS, United States
    Id: 75N92021D00004
  • Agency: Medical Research Council, United Kingdom
    Id: MC_QA137853
  • Agency: NHGRI NIH HHS, United States
    Id: U01 HG004801
  • Agency: NIDDK NIH HHS, United States
    Id: R01 DK122503

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


1000 Genomes: A Deep Catalog of Human Genetic Variation (tool)

RRID:SCR_006828

International collaboration producing an extensive public catalog of human genetic variation, including SNPs and structural variants, and their haplotype contexts, in an effort to provide a foundation for investigating the relationship between genotype and phenotype. The genomes of about 2500 unidentified people from about 25 populations around the world were sequenced using next-generation sequencing technologies. Redundant sequencing on various platforms and by different groups of scientists of the same samples can be compared. The results of the study are freely and publicly accessible to researchers worldwide. The consortium identified the following populations whose DNA will be sequenced: Yoruba in Ibadan, Nigeria; Japanese in Tokyo; Chinese in Beijing; Utah residents with ancestry from northern and western Europe; Luhya in Webuye, Kenya; Maasai in Kinyawa, Kenya; Toscani in Italy; Gujarati Indians in Houston; Chinese in metropolitan Denver; people of Mexican ancestry in Los Angeles; and people of African ancestry in the southwestern United States. The goal Project is to find most genetic variants that have frequencies of at least 1% in the populations studied. Sequencing is still too expensive to deeply sequence the many samples being studied for this project. However, any particular region of the genome generally contains a limited number of haplotypes. Data can be combined across many samples to allow efficient detection of most of the variants in a region. The Project currently plans to sequence each sample to about 4X coverage; at this depth sequencing cannot provide the complete genotype of each sample, but should allow the detection of most variants with frequencies as low as 1%. Combining the data from 2500 samples should allow highly accurate estimation (imputation) of the variants and genotypes for each sample that were not seen directly by the light sequencing. All samples from the 1000 genomes are available as lymphoblastoid cell lines (LCLs) and LCL derived DNA from the Coriell Cell Repository as part of the NHGRI Catalog. The sequence and alignment data generated by the 1000genomes project is made available as quickly as possible via their mirrored ftp sites. ftp://ftp.1000genomes.ebi.ac.uk ftp://ftp-trace.ncbi.nlm.nih.gov/1000genomes

View all literature mentions

1000 Genomes Project and AWS (tool)

RRID:SCR_008801

A dataset containing the full genomic sequence of 1,700 individuals, freely available for research use. The 1000 Genomes Project is an international research effort coordinated by a consortium of 75 companies and organizations to establish the most detailed catalogue of human genetic variation. The project has grown to 200 terabytes of genomic data including DNA sequenced from more than 1,700 individuals that researchers can now access on AWS for use in disease research free of charge. The dataset containing the full genomic sequence of 1,700 individuals is now available to all via Amazon S3. The data can be found at: http://s3.amazonaws.com/1000genomes The 1000 Genomes Project aims to include the genomes of more than 2,662 individuals from 26 populations around the world, and the NIH will continue to add the remaining genome samples to the data collection this year. Public Data Sets on AWS provide a centralized repository of public data hosted on Amazon Simple Storage Service (Amazon S3). The data can be seamlessly accessed from AWS services such Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic MapReduce (Amazon EMR), which provide organizations with the highly scalable compute resources needed to take advantage of these large data collections. AWS is storing the public data sets at no charge to the community. Researchers pay only for the additional AWS resources they need for further processing or analysis of the data. All 200 TB of the latest 1000 Genomes Project data is available in a publicly available Amazon S3 bucket. You can access the data via simple HTTP requests, or take advantage of the AWS SDKs in languages such as Ruby, Java, Python, .NET and PHP. Researchers can use the Amazon EC2 utility computing service to dive into this data without the usual capital investment required to work with data at this scale. AWS also provides a number of orchestration and automation services to help teams make their research available to others to remix and reuse. Making the data available via a bucket in Amazon S3 also means that customers can crunch the information using Hadoop via Amazon Elastic MapReduce, and take advantage of the growing collection of tools for running bioinformatics job flows, such as CloudBurst and Crossbow.

View all literature mentions