Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Cell Identity Codes: Understanding Cell Identity from Gene Expression Profiles using Deep Neural Networks.

Scientific reports | 2019

Understanding cell identity is an important task in many biomedical areas. Expression patterns of specific marker genes have been used to characterize some limited cell types, but exclusive markers are not available for many cell types. A second approach is to use machine learning to discriminate cell types based on the whole gene expression profiles (GEPs). The accuracies of simple classification algorithms such as linear discriminators or support vector machines are limited due to the complexity of biological systems. We used deep neural networks to analyze 1040 GEPs from 16 different human tissues and cell types. After comparing different architectures, we identified a specific structure of deep autoencoders that can encode a GEP into a vector of 30 numeric values, which we call the cell identity code (CIC). The original GEP can be reproduced from the CIC with an accuracy comparable to technical replicates of the same experiment. Although we use an unsupervised approach to train the autoencoder, we show different values of the CIC are connected to different biological aspects of the cell, such as different pathways or biological processes. This network can use CIC to reproduce the GEP of the cell types it has never seen during the training. It also can resist some noise in the measurement of the GEP. Furthermore, we introduce classifier autoencoder, an architecture that can accurately identify cell type based on the GEP or the CIC.

Pubmed ID: 30787315 RIS Download

Research resources used in this publication

None found

Additional research tools detected in this publication

Antibodies used in this publication

None found

Associated grants

None

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


ToppCluster (tool)

RRID:SCR_001503

A tool for performing multi-cluster gene functional enrichment analyses on large scale data (microarray experiments with many time-points, cell-types, tissue-types, etc.). It facilitates co-analysis of multiple gene lists and yields as output a rich functional map showing the shared and list-specific functional features. The output can be visualized in tabular, heatmap or network formats using built-in options as well as third-party software. It uses the hypergeometric test to obtain functional enrichment achieved via the gene list enrichment analysis option available in ToppGene.

View all literature mentions

Wikipedia (tool)

RRID:SCR_004897

Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Its 19 million articles (over 3.6 million in English) have been written collaboratively by volunteers around the world, and almost all of its articles can be edited by anyone with access to the site. As of July 2011, there were editions of Wikipedia in 282 languages. Wikipedia was launched in 2001 by Jimmy Wales and Larry Sanger and has become the largest and most popular general reference work on the Internet, ranking around seventh among all websites on Alexa and having 365 million readers. The name Wikipedia was coined by Larry Sanger and is a combination of wiki (a technology for creating collaborative websites, from the Hawaiian word wiki, meaning quick) and encyclopedia. Wikipedia''s departure from the expert-driven style of encyclopedia building and the large presence of unacademic content has been noted several times. Some have noted the importance of Wikipedia not only as an encyclopedic reference but also as a frequently updated news resource because of how quickly articles about recent events appear. Although the policies of Wikipedia strongly espouse verifiability and a neutral point of view, critics of Wikipedia accuse it of systemic bias and inconsistencies (including undue weight given to popular culture), and allege that it favors consensus over credentials in its editorial processes. Its reliability and accuracy are also targeted. A 2005 investigation in Nature showed that the science articles they compared came close to the level of accuracy of Encyclopedia Britannica and had a similar rate of serious errors.

View all literature mentions

ggplot2 (tool)

RRID:SCR_014601

Open source software package for statistical programming language R to create plots based on grammar of graphics. Used for data visualization to break up graphs into semantic components such as scales and layers.

View all literature mentions