Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages: AnnotationHub, ChIPSeeker, ComplexHeatmap, pathview, ELMER, GAIA, MINET, RTCGAToolbox, TCGAbiolinks.
Pubmed ID: 28232861 RIS Download
Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.
Software repository for R packages related to analysis and comprehension of high throughput genomic data. Uses separate set of commands for installation of packages. Software project based on R programming language that provides tools for analysis and comprehension of high throughput genomic data.
View all literature mentionsCurated protein-protein and genetic interaction repository of raw protein and genetic interactions from major model organism species, with data compiled through comprehensive curation efforts.
View all literature mentionsSet of software modules for performing common ChIP-seq data analysis tasks across the whole genome, including positional correlation analysis, peak detection, and genome partitioning into signal-rich and signal-poor regions. The tools are designed to be simple, fast and highly modular. Each program carries out a well defined data processing procedure that can potentially fit into a pipeline framework. ChIP-Seq is also freely available on a Web interface.
View all literature mentionsSoftware package to create a QC report for an AffyBatch object. The report is intended to allow the user to quickly assess the quality of a set of arrays in an AffyBatch object.
View all literature mentionsAlgorithm for identifying broad peaks in diffuse ChIP-seq datasets.
View all literature mentionsSoftware package that implements and enhances circular visualization in R. Due to natural born feature of R to draw statistical graphics, this package can provide more general and flexible way to visualize huge information in circular style.
View all literature mentionsNetwork of ftp and web servers around world that store identical, up to date, versions of code and documentation for R. Package archive network for R programming language.
View all literature mentionsTHIS RESOURCE IS NO LONGER IN SERVICE. Documented on February 28,2023. Software Python package that provides infrastructure to process data from high-throughput sequencing assays. While the main purpose of HTSeq is to allow you to write your own analysis scripts, customized to your needs, there are also a couple of stand-alone scripts for common tasks that can be used without any Python knowledge.
View all literature mentionsSoftware for numerical and graphical summaries of RNA-Seq read data. Within-lane normalization procedures to adjust for GC-content effect (or other gene-level effects) on read counts: loess robust local regression, global-scaling, and full-quantile normalization (Risso et al., 2011). Between-lane normalization procedures to adjust for distributional differences between lanes (e.g., sequencing depth): global-scaling and full-quantile normalization (Bullard et al., 2010).
View all literature mentionsWeb-based application for testing for locus-locus interaction using genetic association. It is based upon the case-control study design and is designed so that non-specialists may routinely apply tests for interaction. GAIA allows simple testing of both additive and additive plus dominance interaction models and includes permutation testing to appropriately correct for multiple testing. The application is useful for both candidate gene based studies and genome-wide association studies. For large scale studies GAIA includes a screening approach which prioritizes loci for further interaction analysis. (entry from Genetic Analysis Software)
View all literature mentionsOpen source software package for statistical programming language R to create plots based on grammar of graphics. Used for data visualization to break up graphs into semantic components such as scales and layers.
View all literature mentionsSoftware package to arrange multiple heatmaps and support various annotation graphics. Used to visualize associations between different sources of data sets and to reveal potential patterns.
View all literature mentions