Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

A Comparative Analysis of Genetic Ancestry and Admixture in the Colombian Populations of Chocó and Medellín.

G3 (Bethesda, Md.) | 2017

At least 20% of Colombians identify as having African ancestry, yielding the second largest population of Afro-descendants in Latin America. To date, there have been relatively few studies focused on the genetic ancestry of Afro-Latino populations. We report a comparative analysis of the genetic ancestry of Chocó, a state located on Colombia's Pacific coast with a population that is >80% Afro-Colombian. We compared genome-wide patterns of genetic ancestry and admixture for Chocó to six other admixed American populations, with an emphasis on a Mestizo population from the nearby Colombian city of Medellín. One hundred sample donors from Chocó were genotyped across 610,545 genomic sites and compared with 94 publicly available whole genome sequences from Medellín. At the continental level, Chocó shows mostly African genetic ancestry (76%) with a nearly even split between European (13%) and Native American (11%) fractions, whereas Medellín has primarily European ancestry (75%), followed by Native American (18%) and African (7%). Sample donors from Chocó self-identify as having more African ancestry, and conversely less European and Native American ancestry, than can be genetically inferred, as opposed to what we previously found for Medellín, where individuals tend to overestimate levels of European ancestry. We developed a novel approach for subcontinental ancestry assignment, which allowed us to characterize subcontinental source populations for each of the three distinct continental ancestry fractions separately. Despite the clear differences between Chocó and Medellín at the level of continental ancestry, the two populations show overall patterns of subcontinental ancestry that are highly similar. Their African subcontinental ancestries are only slightly different, with Chocó showing more exclusive shared ancestry with the modern Yoruba (Nigerian) population, and Medellín having relatively more shared ancestry with West African populations in Sierra Leone and Gambia. Both populations show very similar Spanish ancestry within Europe and virtually identical patterns of Native American ancestry, with main contributions from the Embera and Waunana tribes. When the three subcontinental ancestry components are considered jointly, the populations of Chocó and Medellín are shown to be most closely related, to the exclusion of the other admixed American populations that we analyzed. We consider the implications of the existence of shared subcontinental ancestries for Colombian populations that appear, at first glance, to be clearly distinct with respect to competing notions of national identity that emphasize ethnic mixing (mestizaje) vs. group-specific identities (multiculturalism).

Pubmed ID: 28855283 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

  • Agency: Intramural NIH HHS, United States
    Id: Z99 LM999999
  • Agency: Intramural NIH HHS, United States
    Id: ZIA LM082713
  • Agency: Intramural NIH HHS, United States
    Id: ZIA LM082713-04

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


ADMIXTURE (tool)

RRID:SCR_001263

A software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets. It uses the same statistical model as STRUCTURE but calculates estimates much more rapidly using a fast numerical optimization algorithm. It uses a block relaxation approach to alternately update allele frequency and ancestry fraction parameters. Each block update is handled by solving a large number of independent convex optimization problems, which are tackled using a fast sequential quadratic programming algorithm. Convergence of the algorithm is accelerated using a novel quasi-Newton acceleration method.

View all literature mentions

PLINK (tool)

RRID:SCR_001757

Open source whole genome association analysis toolset, designed to perform range of basic, large scale analyses in computationally efficient manner. Used for analysis of genotype/phenotype data. Through integration with gPLINK and Haploview, there is some support for subsequent visualization, annotation and storage of results. PLINK 1.9 is improved and second generation of the software.

View all literature mentions

1000 Genomes Project and AWS (tool)

RRID:SCR_008801

A dataset containing the full genomic sequence of 1,700 individuals, freely available for research use. The 1000 Genomes Project is an international research effort coordinated by a consortium of 75 companies and organizations to establish the most detailed catalogue of human genetic variation. The project has grown to 200 terabytes of genomic data including DNA sequenced from more than 1,700 individuals that researchers can now access on AWS for use in disease research free of charge. The dataset containing the full genomic sequence of 1,700 individuals is now available to all via Amazon S3. The data can be found at: http://s3.amazonaws.com/1000genomes The 1000 Genomes Project aims to include the genomes of more than 2,662 individuals from 26 populations around the world, and the NIH will continue to add the remaining genome samples to the data collection this year. Public Data Sets on AWS provide a centralized repository of public data hosted on Amazon Simple Storage Service (Amazon S3). The data can be seamlessly accessed from AWS services such Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic MapReduce (Amazon EMR), which provide organizations with the highly scalable compute resources needed to take advantage of these large data collections. AWS is storing the public data sets at no charge to the community. Researchers pay only for the additional AWS resources they need for further processing or analysis of the data. All 200 TB of the latest 1000 Genomes Project data is available in a publicly available Amazon S3 bucket. You can access the data via simple HTTP requests, or take advantage of the AWS SDKs in languages such as Ruby, Java, Python, .NET and PHP. Researchers can use the Amazon EC2 utility computing service to dive into this data without the usual capital investment required to work with data at this scale. AWS also provides a number of orchestration and automation services to help teams make their research available to others to remix and reuse. Making the data available via a bucket in Amazon S3 also means that customers can crunch the information using Hadoop via Amazon Elastic MapReduce, and take advantage of the growing collection of tools for running bioinformatics job flows, such as CloudBurst and Crossbow.

View all literature mentions

KING (tool)

RRID:SCR_009251

Software toolset that makes use of high-throughput SNP data typically seen in a genome-wide association study (GWAS) for applications such as family relationship inference and population structure identification (entry from Genetic Analysis Software)

View all literature mentions