Getting rid of noise in sequencing data

Klasfeld, Roulé, and Wagner develop a user-friendly tool to get rid of artifacts in chromatin immunoprecipitation data.

 Doris Wagner, Samantha Klasfeld and Thomas Roulé

Background Chromatin immunoprecipitation followed by sequencing (ChIP-seq) and other genomic approaches reveal transcription factor occupancy at target loci, providing insight into gene activation and repression. These methods rely on amplification of factor-associated and control DNA and amplification artifacts have been identified that obscure detection of biologically meaningful binding events.

Question We asked whether we can develop a simple, versatile method to remove these artifacts. We then asked how incorporation of this tool into an improved ChIP-seq analysis pipeline affects insight into factor occupancy.

Findings We were able to remove artifactual signals using a combination of common ChIP-seq analysis tools and control samples in a method we call greenscreen. A greenscreen filter can be generated in any new organism with a single ChIP experiment that has at least two controls (input samples). We developed and tested greenscreen filters for Arabidopsis, rice and human and found that filtering out peaks that overlap with greenscreen regions is required to detect similarities and differences between biological  ChIP replicates, to test for overlap in genome occupancy by different transcription factors and to quantify changes in factor binding in different conditions. When linked to an optimized ChIP-seq pipeline we present, greenscreen furthermore leads to identification of more true peaks. The greenscreen tool thus improves ability to answer biological questions with ChIP-seq and related approaches.

Next steps We would want to develop, test and optimize greenscreen filters for other plant species. We would like to know whether greenscreen or an adapted version thereof can filter artifactual signals from other datasets such as HiC, Cut&Run, Cut&Tag, and ATAC-seq.


Samantha Klasfeld, Thomas Roulé, and Doris Wagner (2022) Greenscreen: A simple method to remove artifactual signals and enrich for true peaks in genomic datasets including ChIP-seq data