Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

MAPPIN: a method for annotating, predicting pathogenicity and mode of inheritance for nonsynonymous variants.

Nucleic acids research | 2017

Nonsynonymous single nucleotide variants (nsSNVs) constitute about 50% of known disease-causing mutations and understanding their functional impact is an area of active research. Existing algorithms predict pathogenicity of nsSNVs; however, they are unable to differentiate heterozygous, dominant disease-causing variants from heterozygous carrier variants that lead to disease only in the homozygous state. Here, we present MAPPIN (Method for Annotating, Predicting Pathogenicity, and mode of Inheritance for Nonsynonymous variants), a prediction method which utilizes a random forest algorithm to distinguish between nsSNVs with dominant, recessive, and benign effects. We apply MAPPIN to a set of Mendelian disease-causing mutations and accurately predict pathogenicity for all mutations. Furthermore, MAPPIN predicts mode of inheritance correctly for 70.3% of nsSNVs. MAPPIN also correctly predicts pathogenicity for 87.3% of mutations from the Deciphering Developmental Disorders Study with a 78.5% accuracy for mode of inheritance. When tested on a larger collection of mutations from the Human Gene Mutation Database, MAPPIN is able to significantly discriminate between mutations in known dominant and recessive genes. Finally, we demonstrate that MAPPIN outperforms CADD and Eigen in predicting disease inheritance modes for all validation datasets. To our knowledge, MAPPIN is the first nsSNV pathogenicity prediction algorithm that provides mode of inheritance predictions, adding another layer of information for variant prioritization.

Pubmed ID: 28977528 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

None

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


dbNSFP (tool)

RRID:SCR_005178

A database for functional prediction and annotation of all potential non-synonymous single-nucleotide variants (nsSNVs) in the human genome. Version 2.0 is based on the Gencode release 9 / Ensembl version 64 and includes a total of 87,347,043 nsSNVs and 2,270,742 essential splice site SNVs. It compiles prediction scores from six prediction algorithms (SIFT, Polyphen2, LRT, MutationTaster, MutationAssessor and FATHMM), three conservation scores (PhyloP, GERP++ and SiPhy) and other related information including allele frequencies observed in the 1000 Genomes Project phase 1 data and the NHLBI Exome Sequencing Project, various gene IDs from different databases, functional descriptions of genes, gene expression and gene interaction information, etc. Some dbNSFP contents (may not be up-to-date though) can also be accessed through variant tools, ANNOVAR, KGGSeq, UCSC Genome Browser''s Variant Annotation Integrator, Ensembl Variant Effect Predictor and HGMD.

View all literature mentions

ClinVar (tool)

RRID:SCR_006169

Archive of aggregated information about sequence variation and its relationship to human health. Provides reports of relationships among human variations and phenotypes along with supporting evidence. Submissions from clinical testing labs, research labs, locus-specific databases, expert panels and professional societies are welcome. Collects reports of variants found in patient samples, assertions made regarding their clinical significance, information about submitter, and other supporting data. Alleles described in submissions are mapped to reference sequences, and reported according to HGVS standard.

View all literature mentions

Biological General Repository for Interaction Datasets (BioGRID) (tool)

RRID:SCR_007393

Curated protein-protein and genetic interaction repository of raw protein and genetic interactions from major model organism species, with data compiled through comprehensive curation efforts.

View all literature mentions

1000 Genomes Project and AWS (tool)

RRID:SCR_008801

A dataset containing the full genomic sequence of 1,700 individuals, freely available for research use. The 1000 Genomes Project is an international research effort coordinated by a consortium of 75 companies and organizations to establish the most detailed catalogue of human genetic variation. The project has grown to 200 terabytes of genomic data including DNA sequenced from more than 1,700 individuals that researchers can now access on AWS for use in disease research free of charge. The dataset containing the full genomic sequence of 1,700 individuals is now available to all via Amazon S3. The data can be found at: http://s3.amazonaws.com/1000genomes The 1000 Genomes Project aims to include the genomes of more than 2,662 individuals from 26 populations around the world, and the NIH will continue to add the remaining genome samples to the data collection this year. Public Data Sets on AWS provide a centralized repository of public data hosted on Amazon Simple Storage Service (Amazon S3). The data can be seamlessly accessed from AWS services such Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic MapReduce (Amazon EMR), which provide organizations with the highly scalable compute resources needed to take advantage of these large data collections. AWS is storing the public data sets at no charge to the community. Researchers pay only for the additional AWS resources they need for further processing or analysis of the data. All 200 TB of the latest 1000 Genomes Project data is available in a publicly available Amazon S3 bucket. You can access the data via simple HTTP requests, or take advantage of the AWS SDKs in languages such as Ruby, Java, Python, .NET and PHP. Researchers can use the Amazon EC2 utility computing service to dive into this data without the usual capital investment required to work with data at this scale. AWS also provides a number of orchestration and automation services to help teams make their research available to others to remix and reuse. Making the data available via a bucket in Amazon S3 also means that customers can crunch the information using Hadoop via Amazon Elastic MapReduce, and take advantage of the growing collection of tools for running bioinformatics job flows, such as CloudBurst and Crossbow.

View all literature mentions