Pitfalls of Mapping High-Throughput Sequencing Data to Repetitive Sequences: Piwi s Genomic Targets Still Not Identified

Similar documents
A Transgenerational Process Defines pirna Biogenesis in Drosophila virilis

pirna pathway targets active LINE1 elements to establish the repressive H3K9me3markingermcells

Repressive Transcription

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Nature Structural & Molecular Biology: doi: /nsmb.2419

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Peak-calling for ChIP-seq and ATAC-seq

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

ChIP-seq data analysis

Nature Genetics: doi: /ng Supplementary Figure 1. Immunofluorescence (IF) confirms absence of H3K9me in met-2 set-25 worms.

Supplemental Figure 1. Genes showing ectopic H3K9 dimethylation in this study are DNA hypermethylated in Lister et al. study.

MIR retrotransposon sequences provide insulators to the human genome

High Throughput Sequence (HTS) data analysis. Lei Zhou

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

SUPPLEMENTARY INFORMATION

Piwi function and pirna cluster regulation : Drosophila melanogaster

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

MODULE 3: TRANSCRIPTION PART II

Supplemental Figures Legends and Supplemental Figures. for. pirna-guided slicing of transposon transcripts enforces their transcriptional

Histones modifications and variants

Table S1. Total and mapped reads produced for each ChIP-seq sample

Eukaryotic Gene Regulation

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016

RNA-seq Introduction

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

The Epigenome Tools 2: ChIP-Seq and Data Analysis

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

Tutorial. ChIP Sequencing. Sample to Insight. September 15, 2016

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

EPIGENOMICS PROFILING SERVICES

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand

Small RNAs and how to analyze them using sequencing

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

The Insulator Binding Protein CTCF Positions 20 Nucleosomes around Its Binding Sites across the Human Genome

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Session 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016

Transcript-indexed ATAC-seq for immune profiling

Metadata of the chapter that will be visualized online

cis-regulatory enrichment analysis in human, mouse and fly

Table S1. Relative abundance of AGO1/4 proteins in different organs. Table S2. Summary of smrna datasets from various samples.

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region

CTCF-Mediated Functional Chromatin Interactome in Pluripotent Cells

Allelic reprogramming of the histone modification H3K4me3 in early mammalian development

An epigenetic approach to understanding (and predicting?) environmental effects on gene expression

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Supplementary Figures

Supplemental Figure 1. Small RNA size distribution from different soybean tissues.

The genetics of heterochromatin. in metazoa. mutations by means of X-ray irradiation" "for the discovery of the production of

Analysis of the peroxisome proliferator-activated receptor-β/δ (PPARβ/δ) cistrome reveals novel co-regulatory role of ATF4

RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB

Nature Biotechnology: doi: /nbt.1904

2009 LANDES BIOSCIENCE. DO NOT DISTRIBUTE.

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Hands-On Ten The BRCA1 Gene and Protein

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.

Plasticity in patterns of histone modifications and chromosomal proteins in Drosophila heterochromatin

This is a published version of a paper published in PLoS genetics. Access to the published version may require subscription.

Chip Seq Peak Calling in Galaxy

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

Exploring chromatin regulation by ChIP-Sequencing

Supplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni.

mirna Dr. S Hosseini-Asl

Genetics and Genomics in Medicine Chapter 6 Questions

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

Sirt1 Hmg20b Gm (0.17) 24 (17.3) 877 (857)

Comparative analyses of histone H3K9 trimethylations in the heart and spleen of normal humans

ddm1a (PFG_3A-51065) ATG ddm1b (PFG_2B-60109) ATG osdrm2 (PFG_3A-04110) osdrm2 osdrm2 osdrm2

Yingying Wei George Wu Hongkai Ji

Inferring Biological Meaning from Cap Analysis Gene Expression Data

Supplementary Figure 1 IL-27 IL

Small RNAs and how to analyze them using sequencing

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Figure 1: Final annotation map of Contig 9

Measuring DNA Methylation with the MinION. Winston Timp Department of Biomedical Engineering Johns Hopkins University 12/1/16

User Guide. Association analysis. Input

Circular RNAs (circrnas) act a stable mirna sponges

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD,

Epigenetics DNA methylation. Biosciences 741: Genomics Fall, 2013 Week 13. DNA Methylation

SUPPLEMENTAL INFORMATION

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Assignment 5: Integrative epigenomics analysis

Cross species analysis of genomics data. Computational Prediction of mirnas and their targets

Global regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR)

Genome-wide Association Studies (GWAS) Pasieka, Science Photo Library

Testi del Syllabus. Testi in italiano. Resp. Did. SCHOEFTNER STEFAN Matricola: Docente SCHOEFTNER STEFAN, 6 CFU

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing

Pirna Sequence Variants Associated With Prostate Cancer In African Americans And Caucasians

Package NarrowPeaks. August 3, Version Date Type Package

A complete next-generation sequencing workfl ow for circulating cell-free DNA isolation and analysis

Obstacles and challenges in the analysis of microrna sequencing data

Transcription:

Matters Arising Pitfalls of Mapping High-Throughput Sequencing Data to Repetitive Sequences: Piwi s Genomic Targets Still Not Identified Highlights d Published ChIP-seq datasets do not reveal Piwi s genomic binding sites Authors Georgi K. Marinov, Jie Wang,..., Julius Brennecke, Katalin Fejes Toth d Loss of Piwi does not lead to a broad redistribution of Pol II to transposons Correspondence julius.brennecke@imba.oeaw.ac.at (J.B.), kft@caltech.edu (K.F.T.) In Brief Piwi silences transposon transcription in Drosophila ovaries. A previous report claimed the identification of Piwi s genomic binding sites by ChIP-seq. Marinov et al. re-analyzed the published datasets and find no support for an enrichment of Piwi at transposons. Instead, previous conclusions result from flawed bioinformatics analyses. Piwi s genomic binding sites remain unknown. Marinov et al., 2015, Developmental Cell 32, 765 771 March 23, 2015 ª2015 Elsevier Inc. http://dx.doi.org/10.1016/j.devcel.2015.01.013

Developmental Cell Matters Arising Pitfalls of Mapping High-Throughput Sequencing Data to Repetitive Sequences: Piwi s Genomic Targets Still Not Identified Georgi K. Marinov, 1,7 Jie Wang, 2,7 Dominik Handler, 3 Barbara J. Wold, 1 Zhiping Weng, 4 Gregory J. Hannon, 5 Alexei A. Aravin, 1 Phillip D. Zamore, 6 Julius Brennecke, 3, * and Katalin Fejes Toth 1, * 1 Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA 2 Department of Biochemistry, University at Buffalo, Buffalo, NY 14214, USA 3 Institute of Molecular Biotechnology of the Austrian Academy of Sciences IMBA, Vienna Biocenter (VBC), 1030 Vienna, Austria 4 Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA 5 Watson School of Biological Sciences, Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA 6 Howard Hughes Medical Institute, RNA Therapeutics Institute and Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA 7 Co-first author *Correspondence: julius.brennecke@imba.oeaw.ac.at (J.B.), kft@caltech.edu (K.F.T.) http://dx.doi.org/10.1016/j.devcel.2015.01.013 SUMMARY Huang et al. (2013) recently reported that chromatin immunoprecipitation sequencing (ChIP-seq) reveals the genome-wide sites of occupancy by Piwi, a pirna-guided Argonaute protein central to transposon silencing in Drosophila. Their study also reported that loss of Piwi causes widespread rewiring of transcriptional patterns, as evidenced by changes in RNA polymerase II occupancy across the genome. Here we reanalyze their data and report that the underlying deep-sequencing dataset does not support the authors genome-wide conclusions. INTRODUCTION PIWI-clade Argonaute proteins and their small RNA guides, PIWI-interacting RNAs (pirnas), collaborate to repress selfish genetic elements such as transposons in animal gonads (Malone and Hannon, 2009; Siomi et al., 2011). The 23 30 nt pirnas guide PIWI proteins to targets with complementary sequences. One of the three Drosophila PIWI-clade proteins, Piwi is localized to the nucleus and represses transposon expression via transcriptional gene silencing (TGS). Target repression is accompanied by reduced RNA polymerase II (Pol II) occupancy and increased trimethylation of histone H3 Lysine 9 (H3K9me3), a mark of heterochromatin (Le Thomas et al., 2013; Rozhkov et al., 2013; Shpiz et al., 2011; Sienski et al., 2012; Wang and Elgin, 2011). By analogy to centromeric silencing in Schizosaccharomyces pombe (Bühler and Moazed, 2007; Grewal, 2010), these data suggest that pirnas guide Piwi to nascent transcripts at target loci where Piwi promotes TGS and heterochromatin formation. Such a model is intuitively consistent with the findings of Huang et al. (Huang et al., 2013), who reported strong chromatin immunoprecipitation sequencing (ChIP-seq) enrichments for Piwi at many genomic regions, typically transposons, for which complementary pirnas are observed. ChIP experiments in our laboratories, however, have consistently failed to detect significant enrichment of Piwi at Piwi-repressed transposons, despite the use of various cross-linking conditions and different antibodies and tags for immunoprecipitation. We therefore reanalyzed the published ChIP-seq data (Huang et al., 2013). We determined (1) the degree of enrichment for Piwi at transposon loci and (2) the changes in Pol II occupancy at transposon loci upon loss of Piwi. In both cases, our independent analyses failed to confirm the published conclusions. Instead, we found that different data processing methods underlie the different outcomes. We conclude that the genome-wide pattern of Piwi occupancy remains an open question despite multiple attempts to map it using contemporary ChIP-seq methods. RESULTS No Significant Enrichment of Piwi at Transposon Loci in the Huang et al. Datasets For the re-analysis of the Huang et al. deep sequencing data (Huang et al., 2013) we used standard read mapping procedures and retained only reads that align to the genome with % 2 mismatches (for details, see Supplemental Experimental Procedures). For comparative purposes, we applied this strategy to a published H3K9me3 ChIP-seq dataset from Drosophila ovaries (Muerdter et al., 2013). This histone mark is enriched in heterochromatin and on transposons and other genomic repeats. It is also present at transposon insertions repressed by nuclear Piwi via the pirna pathway. To ask whether the Piwi ChIP-seq dataset was enriched for transposon sequences, we first mapped all genome-mapping ChIP-seq and input control reads to a comprehensive list of consensus transposon sequences for Drosophila melanogaster. For each, we calculated normalized RPM values (Reads Per Million sequenced reads; for details, see Supplemental Experimental Procedures). This resulted in Piwi occupancy levels for transposons that were indistinguishable from background (Figure 1A). In contrast, the H3K9me3 mark was as much as 10-fold enriched over most transposons. These results are in marked contrast with the conclusion that 86% of the Piwi Developmental Cell 32, 765 771, March 23, 2015 ª2015 Elsevier Inc. 765

A B C Figure 1. Piwi Is Not Enriched over Transposons in the Huang et al. Dataset (A) Absence of enrichment in the Piwi ChIP-seq dataset and high enrichment of H3K9me3 (from Muerdter et al., 2013) over consensus transposons; each dot corresponds to a transposon consensus sequence. (B) The concentration of Piwi signal over transposons in the Huang et al. dataset arises from failure to normalize multiply mapping reads. Shown is the region from Figure 2C of Huang et al. (2013). Top: Piwi ChIP-seq and background (input) data from Huang et al. showing (1) unique alignments; (2) all alignments, with reads normalized for mapping multiplicity; and (3) all alignments, with each alignment treated as a uniquely mapped read. Bottom: data processed per Huang et al. The enrichment of Piwi over repetitive elements is only observed when no multi-read normalization is applied and is seen in both ChIP and control datasets. (C) The minimal Piwi ChIP-seq enrichment observed over some individual transposable elements is well within the range of experimental noise. Shown is the cumulative distribution function (CDF) of the ratio between total ChIP RPM and control/background RPM for each DNA, LINE, or LTR repetitive element (each dot represents an individual TE insertion). Piwi ChIP-seq data from Huang et al. (red) and H3K9me3 data from Muerdter et al. (blue) are plotted alongside the cumulative distribution for 11 transcription factor ChIP-seq datasets from modencode (gray), for which there is no expectation of enrichment at repetitive elements. Only repeat instances with at least 10 RPM in at least one of the ChIP and control datasets for each ChIP/background pairing were included. H3K9me3 showed high average enrichment over background at most of the elements in all three classes. In contrast, the Piwi ChIP-seq data were well within the range of the distributions for modencode transcription factors. ChIP-seq signal overlaps with transposons and repetitive sequences (Huang et al., 2013). Our analysis of the Piwi ChIPseq data does also not support the ChIP-qPCR data presented by Huang et al. in their Figure S1 (Huang et al., 2013), which shows that DNA fragments of two transposons (F-element and 1360) were retrieved at least 10-fold more efficiently in Piwi ChIP experiments compared to control IPs; in fact, neither of these transposons was detectably enriched in the Piwi ChIPseq dataset in our analysis, although both were significantly enriched in the H3K9me3 ChIP-seq data (Figure S1). The positive H3K9me3 ChIP-seq outcome from our analysis shows that a heterochromatin-associated mark can be and was successfully captured and associated DNA efficiently sequenced. This argues against scenarios in which ChIP-enriched heterochromatic regions are detected by qpcr, even though they are missed by ChIP-seq because they are especially poor substrates for library building and/or sequencing. Of note, Huang et al. reported similar Piwi enrichments when ChIP-qPCR experiments were conducted from dissected ovaries compared to whole flies (Figure S1 of Huang et al., 2013). Because Piwi is expressed at high levels only in gonadal cells, ChIP-qPCR signals are predicted to be diluted by somatic nuclei when whole flies instead of gonads are used as experimental input. Next, we analyzed the Piwi ChIP-seq data at the genomic level. Figure 1B depicts a genomic region harboring three transposon insertions; this same region is shown in Figure 2C of Huang et al. (Huang et al., 2013). Read coverage for Piwi ChIPseq and the corresponding input datasets was calculated in three ways: (1) considering only reads that map the genome uniquely, (2) considering all reads mapping to the genome but normalizing each for the number of times it mapped to the genome, and (3) considering all alignments as if each is a unique read, without any normalization. None of the three transposon insertions nor their immediate genomic neighborhoods stood out 766 Developmental Cell 32, 765 771, March 23, 2015 ª2015 Elsevier Inc.

A B Figure 2. Distribution of Piwi and H3K9me3 over Repetitive Elements in the Genome (A and B) The average signal distribution over LINE repetitive elements for ChIP (red) and background (yellow) datasets for Piwi from Huang et al. (2013) (A) and for H3K9me3 from Muerdter et al. (2013) (B). The background-normalized enrichment is in black. The 100 bp around the beginning and the end of individual elements are shown to scale; the rest of each LINE element is rescaled to 100 units. The repeat-masker repetitive element annotation from the UCSC Genome Browser was used. A clear enrichment over background is observed in H3K9me3 datasets, even when only uniquely aligning reads are considered. In contrast, the Piwi dataset from Huang et al. is essentially indistinguishable from background. for the transcription factors on the same set of transposons (Figure 1C). In contrast, the H3K9me3 mark was strongly enriched over all transposon classes. Taken together, these analyses show that the published Piwi ChIP-seq datasets do not support a specific enrichment of Piwi at transposons. in the Piwi ChIP data compared to the background when (1) unique reads or (2) normalized reads were considered (Figure 1B). When the reads were (3) not corrected for mapping to multiple genomic sites, transposons emerged as strong peaks relative to flanking genomic sequences. However, transposons also emerged as strong peaks when the control dataset, the input genomic DNA itself, was mapped without accounting for mapping multiplicity. We used each of the three mapping strategies to determine the genome-wide average read density for Piwi ChIP and input datasets over the three major transposable element classes in Drosophila (e.g., LINE elements in Figure 2). In all cases, we found no enrichment of Piwi over background, whereas the H3K9me3 dataset again displayed strong enrichment. Finally, we asked whether the Piwi ChIP dataset was enriched for Piwi due to Piwi occupying a subset of the thousands of transposon insertions in the Drosophila genome. Such a subset might go undetected when analyzing genome-wide average signals. We compared the enrichment of Piwi at individual transposons with that of eleven transcription factors whose genomewide occupancy has been determined from early fly embryos (modencode Consortium, 2010; Nègre et al., 2011); none of these developmental regulators is expected to be selectively enriched at transposon loci. Again, we found no specific enrichment of Piwi at transposon loci: the enrichment of Piwi at transposons was well within the range of enrichment observed The Huang et al. Computational Pipeline Generates Artificial Enrichment of ChIP-Seq Datasets at Repetitive Loci To identify the discrepancy between our standard analysis pipeline and that of Huang et al., we examined the computational pipeline used in their studies (originally described in Yin et al., 2011), which the authors kindly shared with us. Rather than defining enrichments by the ratio of ChIP versus input sample reads, the Huang et al. pipeline identifies genomic regions of Piwi enrichment via a multi-step procedure (see Figure S2 and Supplemental Experimental Procedures for details). Two features of this pipeline could artificially amplify minor differences between ChIP and control datasets into large apparent enrichments at transposons. First, the pipeline makes no correction for reads mapping to multiple genomic locations. Of course, one single read must come from a single genomic locus, no matter how many times it maps to the genome, so all widely used mapping software either randomly assign a multiply mapping read to a single locus or apportion the read among the multiple loci. Without such standard corrections for mapping multiplicity, all datasets both ChIP-seq and input genomic DNA produce artificially elevated signals at repetitive loci such as transposons. Considering that Huang et al. apply a cutoff threshold (see Experimental Procedures), this artificially elevated signal focuses the analysis strongly toward repetitive regions. Second, although the subsequent analysis does take the input datasets into account, it does so in a non-standard way by applying nonlinear transformations to the resulting signal tracks. The consequence is that the final score displays positive enrichments but sets negative enrichments (i.e., depletions) to Developmental Cell 32, 765 771, March 23, 2015 ª2015 Elsevier Inc. 767

A C B (legend on next page) 768 Developmental Cell 32, 765 771, March 23, 2015 ª2015 Elsevier Inc.

zero. Ultimately, the combination of these steps leads to exclusively positive enrichments preferentially at transposons (Figure 1B), while signal in the direction of depletion is obscured. The algorithm is particularly prone to creating artificial peaks from ChIP-seq datasets with low signal-to-noise ratios (see below). By way of example, we recapitulated the Huang et al. analysis, but swapping the input background and Piwi ChIP-seq data, and then calculated the percentage of signal at annotated repeats. Strikingly, treating the genomic DNA input as the experiment and the PIWI ChIP-seq as the control produced strong signal enrichment at transposons. In fact, an even higher proportion of the final signal mapped to repeats in this analysis than when the data sets were correctly assigned to experiment and control (Figure 3A). The identity of the particular repeats contributing to the final signal, however, differed as is expected if the result stems from erroneously identifying amplified positive noise for true signals. Figure 3B displays the final Huang et al. scores for Piwi ChIP-seq over background and background over Piwi ChIP-seq at three individual, fulllength transposon insertions (Figure 3B). While some transposon insertions showed high signal in the Piwi/background track (e.g., roo), others showed high enrichment in the background/piwi track (e.g., Max) and some transposon insertions showed a mixed signal, in which different portions of the element are highly enriched in either the background or the ChIP tracks (e.g., blood). These observations also suggest that the Huang et al. pipeline has the somewhat counterintuitive effect of generating much higher enrichments over transposons for ChIP datasets that contain very little or no true signal than it does for ChIP datasets that are strongly enriched at genomic features other than transposons. In the latter case, transposons are globally depleted relative to the control because a high fraction of reads is concentrated in regions of true occupancy located elsewhere in the genome. This is not the case in input and poorly enriching ChIP experiments leading to a higher apparent enrichment over TE sequences. Indeed, when we calculated the percentage of signal at transposons for the modencode transcription factor ChIP-seq dataset using the method of Huang et al., we observed highly variable results (Figure 3C). For some developmental regulators, the Huang et al., signal on repeats was similar to the Piwi dataset, while other factors displayed little signal on transposons. The experimental characterization of the true genomic distribution of Piwi on chromatin thus remains an unresolved challenge. The difficulty in obtaining high-quality Piwi ChIP-seq datasets likely reflects the complexity of recovering DNA sequences that are transiently tethered to Piwi protein via nascent RNA. The inherent difficulty in shearing heterochromatin may also contribute to the problem (Teytelman et al., 2009). No Support for Widespread Transcriptional Changes in piwi Mutants Based on the same computational pipeline, Huang et al. also reported that in piwi mutants Pol II is broadly redistributed from protein-coding genes to transposons. We calculated consensus transposon RPM values for the Pol II ChIP-seq datasets and their respective controls (Figure 4A). We found no clear differences between Pol II enrichments over transposons in wild-type versus piwi mutant flies. In both samples, Pol II was depleted at transposons compared to the input (Figures 4A and 4B), likely due to its enrichment at protein-coding genes in the Pol II ChIP-seq data but not the input control. In contrast, Huang et al. reported that Pol II concentrated on transposons in piwi mutants compared to wild-type. A meta-profile of Pol II occupancy at all protein-coding loci showed an 2-fold greater enrichment at promoters in wild-type compared to the mutant (Figure S3). For the piwi mutant dataset this means that proportionally fewer reads originate from expressed genes versus the remainder of the genome. In consequence, more background reads from transposons are recovered, and these are then amplified by the Huang et al. pipeline. Taken together, our analyses find no support for a widespread role of Piwi in specifying patterns of transcription at transposons in the published ChIP-seq datasets. On the other hand, loss of Piwi has been shown in several studies to lead to pronounced changes in Pol II occupancy at pirna-pathway-repressed transposon loci (Le Thomas et al., 2013; Rozhkov et al., 2013; Sienski et al., 2012). We note that these studies analyzed isolated ovaries or cultured ovarian somatic cells rather than entire flies. One conclusion of these studies is that biologically meaningful analyses of Piwi function using ChIP experiments require the use of isolated tissues where nuclear Piwi is highly expressed: the gonads. The biologically relevant pattern of Piwi genomic occupancy remains unknown. Piwi associates with pirnas complementary to virtually all transposon families, and loss of Piwi leads to the selective loss of the of H3K9me3 mark at several transposon insertions (Sienski et al., 2012). These observations suggest that sequence complementarity between pirnas and nascent target transcripts dictates the chromatin occupancy of Piwi. Considering the technical difficulties that have surrounded Piwi ChIP-seq, a first step toward identifying Piwi binding sites should be to verify direct occupancy at one or a few functional genomic target sites using alternative methods Figure 3. The Huang et al. Data Processing Pipeline Generates Artificial Enrichment over Repetitive Regions The Piwi ChIP-seq and input/background datasets were processed following the Huang et al. pipeline ( Piwi ChIP ). In addition, the pipeline was also run swapping the ChIP and the input, i.e., the control sample was treated as ChIP and vice versa, resulting in the background track. (A) The fraction of signal mapping to transposable elements was calculated, revealing higher enrichment in the background than in the Piwi ChIP-seq dataset. (B) Strong apparent enrichment over individual transposable elements was observed in the ChIP track (upper track), as reported by Huang et al., but also in the background track (lower track), and even over different portions of the same transposable element in both tracks (middle track), strongly arguing that the enrichment over transposable elements reported by Huang et al. is a computational artifact. Signal observed on individual copies correlates well with enrichment profiles when mapped to the consensus sequence of the respective transposons (shown below each track). Sequences showing enrichment in the background are indicated with gray blocks to depict the correlations between the signal on individual TE copies and the consensus sequence. (C) Fraction of signal (calculated with the Huang et al. pipeline) mapping to transposable elements for the modencode transcription factor set. Developmental Cell 32, 765 771, March 23, 2015 ª2015 Elsevier Inc. 769

A Figure 4. No Redistribution of Pol II over Transposons Is Observed in piwi Mutant Files (A) Scatterplot displaying Pol II ChIP-seq RPM values versus input RPM values over consensus transposable elements in wild-type and piwi mutant flies. (B) Shown are Pol II ChIP-seq and input RPM levels over the transposon consensus sequences of F-element and mdg3. B such as Dam-ID (van Steensel and Henikoff, 2000). These validated sites could then be used as internal standards to establish approaches for the mapping of Piwi on chromatin across the genome. EXPERIMENTAL PROCEDURES Data Processing A detailed description of our computational analysis is provided in the Supplemental Experimental Procedures. In summary, the data from Huang et al. (2013) as well as from Muerdter et al. (2013) were processed using both the Huang et al. pipeline and more conventional procedures, incorporating three different signal normalization approaches. We aligned reads to the Drosophila melanogaster genome (dm3) using Bowtie (Langmead, 2010; version 0.12.7) and then generated signal tracks by calculating: (1) normalized (RPM) coverage using only uniquely alignable reads; (2) RPM coverage using all alignments, weighting each according to the number of locations in the genome to which the read maps; and (3) RPM coverage using all alignments treated as if they were uniquely aligned reads (i.e., without normalization for multi-mappers, as in the Huang et al. pipeline). The Huang et al. pipeline was reproduced according to the description and parameters presented in Yin et al. (2011). Briefly, it begins by recursively aligning reads with SOAP, allowing up to five mismatches and four indels. Alignments are then converted into 5 0 coordinates, the chromosomes are split into 50 base pair (bp) bins, and each alignment contributes to ten bins according to a weighting scheme that decreases its weight in more distant bins. The scores are then normalized according to the total number of alignments (rather than the total number of reads, i.e., no multi-mapping normalization is applied) and a critical value is calculated for each ChIP/Input pair so that beyond that value the bin values are always higher in the ChIP than in the control dataset (Figure S2); a normalizer score is calculated based on the bins with values lower than the critical value, and is applied to the ChIP. The ChIP is further normalized by subtracting the background. Critically, when this step is performed, negative values are set to zero, leading to loss of data over regions of depletion relative to background. Finally, scores are divided by the trimmed mean, log-transformed, and again set to zero if negative. Repeat Analysis RepeatMasker annotation, downloaded from the UCSC Genome Browser, was used for the analysis of repetitive element coverage in genomic space. Consensus repetitive elements were downloaded from FlyBase (Marygold et al., 2013); reads were aligned against them using Bowtie, allowing for three mismatches and unlimited multi-mappers, and normalized RPM values calculated for each element. SUPPLEMENTAL INFORMATION Supplemental Information includes Supplemental Experimental Procedures and three figures and can be found with this article online at http://dx.doi. org/10.1016/j.devcel.2015.01.013. AUTHOR CONTRIBUTIONS G.K.M., J.W., and D.H. performed the computational analyses; all authors analyzed the data and wrote the manuscript. ACKNOWLEDGMENTS We thank Haifan Lin for kindly sharing the detailed computational pipeline underlying the analyses in Huang et al. (2013). Received: July 17, 2014 Revised: December 18, 2014 Accepted: January 14, 2015 Published: March 23, 2015 770 Developmental Cell 32, 765 771, March 23, 2015 ª2015 Elsevier Inc.

REFERENCES Bühler, M., and Moazed, D. (2007). Transcription and RNAi in heterochromatic gene silencing. Nat. Struct. Mol. Biol. 14, 1041 1048. Grewal, S.I. (2010). RNAi-dependent formation of heterochromatin and its diverse functions. Curr. Opin. Genet. Dev. 20, 134 141. Huang, X.A., Yin, H., Sweeney, S., Raha, D., Snyder, M., and Lin, H. (2013). A major epigenetic programming mechanism guided by pirnas. Dev. Cell 24, 502 516. Langmead, B. (2010). Aligning short sequencing reads with Bowtie. Curr Protoc Bioinformatics. Chapter 11. Unit 11 17. http://dx.doi.org/10.1002/ 0471250953.bi1107s32. Le Thomas, A., Rogers, A.K., Webster, A., Marinov, G.K., Liao, S.E., Perkins, E.M., Hur, J.K., Aravin, A.A., and Tóth, K.F. (2013). Piwi induces pirna-guided transcriptional silencing and establishment of a repressive chromatin state. Genes Dev. 27, 390 399. Malone, C.D., and Hannon, G.J. (2009). Small RNAs as guardians of the genome. Cell 136, 656 668. Marygold, S.J., Leyland, P.C., Seal, R.L., Goodman, J.L., Thurmond, J., Strelets, V.B., and Wilson, R.J.; FlyBase consortium (2013). FlyBase: improvements to the bibliography. Nucleic Acids Res. 41, D751 D757. modencode Consortium, Roy, S., Ernst, J., Kharchenko, P.V., Kheradpour, P., Negre, N., Eaton, M.L., Landolin, J.M., Bristow, C.A., Ma, L., et al. (2010). Identification of functional elements and regulatory circuits by Drosophila modencode. Science 330, 1787 1797. Muerdter, F., Guzzardo, P.M., Gillis, J., Luo, Y., Yu, Y., Chen, C., Fekete, R., and Hannon, G.J. (2013). A genome-wide RNAi screen draws a genetic framework for transposon control and primary pirna biogenesis in Drosophila. Mol. Cell 50, 736 748. Nègre, N., Brown, C.D., Ma, L., Bristow, C.A., Miller, S.W., Wagner, U., Kheradpour, P., Eaton, M.L., Loriaux, P., Sealfon, R., et al. (2011). A cis-regulatory map of the Drosophila genome. Nature 471, 527 531. Rozhkov, N.V., Hammell, M., and Hannon, G.J. (2013). Multiple roles for Piwi in silencing Drosophila transposons. Genes Dev. 27, 400 412. Shpiz, S., Olovnikov, I., Sergeeva, A., Lavrov, S., Abramov, Y., Savitsky, M., and Kalmykova, A. (2011). Mechanism of the pirna-mediated silencing of Drosophila telomeric retrotransposons. Nucleic Acids Res. 39, 8703 8711. Sienski, G., Dönertas, D., and Brennecke, J. (2012). Transcriptional silencing of transposons by Piwi and maelstrom and its impact on chromatin state and gene expression. Cell 151, 964 980. Siomi, M.C., Sato, K., Pezic, D., and Aravin, A.A. (2011). PIWI-interacting small RNAs: the vanguard of genome defence. Nat. Rev. Mol. Cell Biol. 12, 246 258. Teytelman, L., Ozaydin, B., Zill, O., Lefrançois, P., Snyder, M., Rine, J., and Eisen, M.B. (2009). Impact of chromatin structures on DNA processing for genomic analyses. PLoS ONE 4, e6700. van Steensel, B., and Henikoff, S. (2000). Identification of in vivo DNA targets of chromatin proteins using tethered dam methyltransferase. Nat. Biotechnol. 18, 424 428. Wang, S.H., and Elgin, S.C. (2011). Drosophila Piwi functions downstream of pirna production mediating a chromatin-based transposon silencing mechanism in female germ line. Proc. Natl. Acad. Sci. USA 108, 21164 21169. Yin, H., Sweeney, S., Raha, D., Snyder, M., and Lin, H. (2011). A high-resolution whole-genome map of key chromatin modifications in the adult Drosophila melanogaster. PLoS Genet. 7, e1002380. Developmental Cell 32, 765 771, March 23, 2015 ª2015 Elsevier Inc. 771