Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Features that define the best ChIP-seq peak calling algorithms.

Briefings in bioinformatics | 2017

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is an important tool for studying gene regulatory proteins, such as transcription factors and histones. Peak calling is one of the first steps in the analysis of these data. Peak calling consists of two sub-problems: identifying candidate peaks and testing candidate peaks for statistical significance. We surveyed 30 methods and identified 12 features of the two sub-problems that distinguish methods from each other. We picked six methods GEM, MACS2, MUSIC, BCP, Threshold-based method (TM) and ZINBA] that span this feature space and used a combination of 300 simulated ChIP-seq data sets, 3 real data sets and mathematical analyses to identify features of methods that allow some to perform better than the others. We prove that methods that explicitly combine the signals from ChIP and input samples are less powerful than methods that do not. Methods that use windows of different sizes are more powerful than the ones that do not. For statistical testing of candidate peaks, methods that use a Poisson test to rank their candidate peaks are more powerful than those that use a Binomial test. BCP and MACS2 have the best operating characteristics on simulated transcription factor binding data. GEM has the highest fraction of the top 500 peaks containing the binding motif of the immunoprecipitated factor, with 50% of its peaks within 10 base pairs of a motif. BCP and MUSIC perform best on histone data. These findings provide guidance and rationale for selecting the best peak caller for a given application.

Pubmed ID: 27169896 RIS Download

Associated grants

  • Agency: NHLBI NIH HHS, United States
    Id: P01 HL089707
  • Agency: NHLBI NIH HHS, United States
    Id: UM1 HL098179

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


Bioconductor (tool)

RRID:SCR_006442

Software repository for R packages related to analysis and comprehension of high throughput genomic data. Uses separate set of commands for installation of packages. Software project based on R programming language that provides tools for analysis and comprehension of high throughput genomic data.

View all literature mentions

ChIP-seq (tool)

RRID:SCR_001237

Set of software modules for performing common ChIP-seq data analysis tasks across the whole genome, including positional correlation analysis, peak detection, and genome partitioning into signal-rich and signal-poor regions. The tools are designed to be simple, fast and highly modular. Each program carries out a well defined data processing procedure that can potentially fit into a pipeline framework. ChIP-Seq is also freely available on a Web interface.

View all literature mentions

ChIPsim (tool)

RRID:SCR_001293

Software package providing a general framework for the simulation of ChIP-seq data. Although currently focused on nucleosome positioning the package is designed to support different types of experiments.

View all literature mentions

MUlti SImulation Coordinator (tool)

RRID:SCR_001756

Software that allows large scale neuron simulators to communicate during runtime. It allows exchange of data among parallel applications in a cluster environment, interconnects large-scale neuronal network simulators with each other or with other tools, participates in multi-simulations, and is continuously developed and extended. Three simulators currently have MUSIC interfaces: Moose, NEURON and NEST. Three applications execute in parallel while exchanging data via MUSIC. The software interface promotes interoperability by allowing models written for different simulators to be simulated together in a larger system. It enables re-usability of models or tools by providing a standard interface. As data are distributed over a number of processors, it is non-trivial to coordinate data transfer so that it reaches the correct destination at the correct time. Current and future simulators can make use of MUSIC - compliant general purpose tools and participate in multi-simulations, for example when: * Different parts of a complex nervous system model are optimally implemented in different simulators, and need to communicate with each other. * Post-processing of generated data is needed, where the amounts of data are too large for intermediate storage, and requires the simulator to pass the data directly to the post-processing module. A standard interface enables straight-forward independent third-party development and community sharing of interoperable software tools for parallel processing. * Library and utilities are written in C++, uses MPI. * It is possible to add a MUSIC interface to existing simulators. * Works independently, no assumptions are made about other applications to facilitate development of general purpose tools. * Performance Data transport with high bandwidth and low latency.

View all literature mentions

HTSeq (tool)

RRID:SCR_005514

THIS RESOURCE IS NO LONGER IN SERVICE. Documented on February 28,2023. Software Python package that provides infrastructure to process data from high-throughput sequencing assays. While the main purpose of HTSeq is to allow you to write your own analysis scripts, customized to your needs, there are also a couple of stand-alone scripts for common tasks that can be used without any Python knowledge.

View all literature mentions

ENCODE (tool)

RRID:SCR_006793

Encyclopedia of DNA elements consisting of list of functional elements in human genome, including elements that act at protein and RNA levels, and regulatory elements that control cells and circumstances in which gene is active. Enables scientific and medical communities to interpret role of human genome in biology and disease. Provides identification of common cell types to facilitate integrative analysis and new experimental technologies based on high-throughput sequencing. Genome Browser containing ENCODE and Epigenomics Roadmap data. Data are available for entire human genome.

View all literature mentions

edgeR (tool)

RRID:SCR_012802

Bioconductor software package for Empirical analysis of Digital Gene Expression data in R. Used for differential expression analysis of RNA-seq and digital gene expression data with biological replication.

View all literature mentions

ChIPpeakAnno (tool)

RRID:SCR_012828

Software package that includes functions to retrieve the sequences around the peak, obtain enriched Gene Ontology terms, find the nearest gene, exon, miRNA or custom features such as most conserved elements.

View all literature mentions

GM12878 (tool)

RRID:CVCL_7526

Cell line GM12878 is a Transformed cell line with a species of origin Homo sapiens (Human)

View all literature mentions