Identification of endometrial cancer methylation features using a combined. methylation analysis method DISSERTATION

Size: px
Start display at page:

Download "Identification of endometrial cancer methylation features using a combined. methylation analysis method DISSERTATION"

Transcription

1 Identification of endometrial cancer methylation features using a combined methylation analysis method DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Michael P Trimarchi B.S. Biomedical Sciences Graduate Program The Ohio State University 2016 Dissertation Committee: Joanna L Groden, Advisor Paul Goodfellow Ralf A. Bundschuh Jeffrey Parvin Pearlly Yan

2 Copyrighted by Michael P Trimarchi 2016

3 Abstract Introduction: DNA methylation is a stable epigenetic mark that is frequently altered in tumors. DNA methylation marks are attractive biomarkers for disease states given the stability of DNA methylation in living cells and in biologic specimens. Widespread accumulation of methylation in regulatory elements in some cancers (termed the CpG island methylator phenotype, CIMP) can play an important role in tumorigenesis. High resolution assessment of CIMP for the entire genome, however, remains cost prohibitive and requires quantities of DNA that are not available for many tissue samples of interest. Genome-wide scans of methylation have been undertaken for large numbers of tumors, and higher resolution analyses have been performed for a limited number of cancer specimens. Yet methods for analyzing these large datasets and integrating findings from different studies have not been fully developed. An approach was developed to profile CIMP by combining the strengths of two different methylome profiling techniques. Methods: Methylomes for 76 primary endometrial cancer and 12 normal endometrial samples were generated using methylated fragment capture and second generation sequencing (MethylCap-seq). Publically available data from The Cancer Genome Atlas (TCGA) for 203 endometrial cancers, analyzed using the Infinium HumanMethylation 450 beadchip, were compared to the MethylCap-seq data. A MethylCap-seq quality control module was ii

4 developed to exclude sequencing samples with poor-quality methylation data from analysis. Additional MethylCap-seq datasets were also used to develop and validate the quality control module. Results: Analysis of total methylation in promoter CpG islands (CGIs) identified a subset of tumors with a methylator phenotype. I developed a 13- region methylation signature associated with a hypermethylator state using a training set of five highly methylated and eight lowly methylated tumors. The signature was validated using data from TCGA. High signature methylation score was associated with mismatch repair deficiency, high mutation rate, and low somatic copy number alteration in TCGA test set. In addition, the methylation signature distinguished >90% of endometrioid endometrial tumors from normal controls in the test set. Furthermore, classification of tumors by signature methylation score proved highly robust, showing good agreement with previously published methylation clusters for the test set as well as consistent ranking of tumors across alternative signatures. Conclusion: I identified a methylation signature for a hypermethylator phenotype in endometrial cancer and developed methods that could prove useful for identifying extreme methylation phenotypes in other cancers. iii

5 Acknowledgments My sincere thanks to my committee and my former advisor Tim H.M. Huang for making this project possible, as well as the current and former members of the Yan and Groden labs. Special thanks to Paul Goodfellow for helping reorient the project after my former advisor left, and to Ralf Bundschuh for his guidance in data analysis. iv

6 Vita June North Olmsted High School June B.S. Microbiology, Minor in Chemistry, The Ohio State University June 2007 to present... Graduate Research Associate, Biomedical Sciences Graduate Program, The Ohio State University Publications 2016 Michael P Trimarchi, Pearlly Yan, Joanna Groden, Ralf Bundschuh, Paul Goodfellow. Identification of endometrial cancer methylation features using a combined methylation analysis method. Manuscript in preparation Michael P Trimarchi, Mark Murphy, David Frankhouser, Benjamin AT Rodriguez, John Curfman, Guido Marcucci, Pearlly Yan, Ralf Bundschuh. Enrichment-based DNA methylation analysis using next-generation sequencing: sample exclusion, estimating changes in global methylation, and the contribution of replicate lanes. BMC Genomics 2012, 13(Suppl 8):S6 (17 December 2012). v

7 2012 Rodriguez B, Tam HH, Frankhouser D, Trimarchi M, Murphy M, Kuo C, Parikh D, Ball B, Schwind S, Curfman J, Blum W, Marcucci G, Yan P, Bundschuh R. A Scalable, Flexible Workflow for MethylCap-Seq Data Analysis. BMC Genomics 2012, 13(Suppl 6):S14 (26 October 2012) Yan P, Frankhouser D, Murphy M, Tam HH, Rodriguez B, Curfman J, Trimarchi M, Geyer S, Wu YZ, Whitman SP, Metzeler K, Walker A, Klisovic R, Jacob S, Grever MR, Byrd JC, Bloomfield CD, Garzon R, Blum W, Caligiuri MA, Bundschuh R, Marcucci G. Genome-wide methylation profiling in decitabinetreated patients with acute myeloid leukemia. Blood Jul Trimarchi MP, Mouangsavanh M, Huang TH., Cancer epigenetics: a perspective on the role of DNA methylation in acquired endocrine resistance. Chin J Cancer Nov;30(11): Cottrell, C. E., Bir, N., Varga, E., Alvarez, C. E., Bouyain, S., Zernzach, R., Thrush, D. L., Evans, J., Trimarchi, M., Butter, E. M., Cunningham, D., Gastier-Foster, J. M., McBride, K. L. and Herman, G. E. Contactin 4 as an autism susceptibility locus. Autism Res Jun;4(3): vi

8 Fields of Study Major Field: Biomedical Sciences Specialization: Cancer Research Interdisciplinary Specialization: Biomedical, Clinical & Translational Sciences vii

9 Table of Contents Abstract... ii Acknowledgments... iv Vita... v List of Figures... xii List of Tables... xiv Chapter 1. Background... 1 I. Epigenetics in cancer, with a focus on DNA methylation... 1 Chromatin model... 1 DNA methylation... 1 DNA methylation in cancer COMMENT: perhaps this is a better place for you to refer to the You and Jones Cancer Cell review Histone modifications... 3 DNA methylation interaction with histone modifications... 4 DNA methylation as a biomarker and therapeutic target... 5 II. Methylome Profiling... 6 Analysis methods... 6 Methylome Profiling: The Biology viii

10 Chapter 2. Thesis Rationale and Research Objectives Chapter 3. Enrichment-based DNA methylation analysis using nextgeneration sequencing: quality control, estimating changes in global methylation and the effects of increased sequencing depth I. Introduction II. Results and Discussion Quality control exclusion criteria reduce noise in methylation signal and improve analytical power The effect of additional sequencing lanes on quality control metrics The global methylation indicator (GMI) correlates inversely with an in vitro methylated tracer sequence III. Methods Patient samples Methylated-DNA capture (MethylCap-seq) MethylCap-seq experimental quality control and exclusion criteria Standard sequence file processing and alignment Standard global methylation analysis workflow Calculation of noise in methylation signal Calculation of the Global Methylation Indicator (GMI) Assessment of methylated fragment enrichment using an in vitro methylated construct ix

11 IV. Conclusions Chapter 4. Identification of endometrial cancer methylation features using a combined methylation analysis method I. Introduction II. Results Characterizing a CpG island methylator phenotype Methylation signature construction and technical validation Methylation signature stratifies endometrioid endometrial tumors by methylation phenotype and distinguishes tumors from normal controls. 56 High methylation score is associated with mismatch repair deficiency high mutation rate, and low somatic copy number alteration Methodological validity A signature for CIMP III. Discussion IV. Conclusion V. Methods Patient samples MethylCap-seq quality control MethylCap-seq data analysis Infinium validation of methylation signature candidates Computation of methylation score using the 13 promoter CGI signature 78 x

12 In silico analysis of TCGA endometrioid endometrial tumors Replicate signature analysis Chapter 5. Summary and Discussion I. Summary II. Discussion References Appendix A. Supplementary data for Chapter Appendix A1 QC table for endometrial cancer study Appendix A2 Replicate lane correlation, endometrial QC passed vs. QC failed samples Appendix A3 QC table for ovarian study Appendix A4 QC, GMI, plasmid RPM table for AML study Appendix B. Supplementary data for Chapter Appendix B1 - Cohort characteristics Appendix B2 - Sequencing summary xi

13 List of Figures Figure 1. QC exclusion criteria reduce noise in methylation signals Figure 2. Replicate sequencing lanes for MethylCap-seq experiments correlate highly Figure 3. Additional lanes of sequencing data moderately increase saturation but greatly increase 5X CpG coverage Figure 4. Global methylation indicator scales inversely with read counts from a "spiked" in vitro methylated construct Figure 5. Endometrioid endometrial malignancies show increased methylation in promoter CGI compared to unmatched normal controls Figure 6. Loci methylated in CGI-H tumors account for many of the cancerassociated methylation gains Figure 7. Methylation of 13 promoter-associated CGI distinguishes tumors with high promoter CGI methylation (CGI-H) from those with baseline promoter CGI methylation (CGI-0) Figure 8. Methylation signature stratifies endometrioid endometrial tumors by methylation phenotype and distinguishes tumors from normal controls Figure 9. Aggregate promoter CGI methylation shows weaker stratification potential compared to the 13-region methylation signature Figure 10. Methylation of EPHX3 is associated with decreased gene expression xii

14 Figure 11. High methylation score is associated with mismatch repair deficiency, high mutation rate, and low somatic copy number alteration Figure 12. Replicate 13-region methylation signatures rank tumors similarly.69 xiii

15 List of Tables Table 1. Differentially Methylated Regions, Endometrial Tumors vs. Nonmalignant Endometrial Tissue Table 2. Term enrichment associated with hypermethylated promoter CGI (MSigDB Perturbation) Table 3. Promoter CGI that distinguish CGI-H from CGI-0 tumors as measured by MethylCap-seq Table 4. Promoter CGI included in the final signature after validation by Infinium Table 5. Promoter CGI discarded from the final signature after validation by Infinium Table 6. Correlation between promoter CGI methylation and gene expression in TCGA tumors xiv

16 Chapter 1. Background I. Epigenetics in cancer, with a focus on DNA methylation Chromatin model DNA in cells does not exist freely, but associates with proteins to form a complex termed chromatin. According to the beads on a string model, DNA (the string) in cells is wound around nucleosomes (the beads), which in turn are composed of protein histones. The four core histone subunits (H2A, H2B, H3, H4) form heterodimers that complex together as an octamer to form the nucleosome core. The nucleosomes in turn are connected to a scaffolding by linker histones (H1, H5), contributing to higher order chromatin structure. Changes in patterns of DNA methylation and histone modification alter the conformation of chromatin and accessibility of the DNA to transcriptional machinery, either by directly introducing steric hindrance or indirectly causing surrounding chromatin to adopt an open or closed conformation. In addition, epigenetic modifications may cause individual nucleosomes to shift laterally across the DNA, exposing some areas of DNA for transcription and covering others [1]. DNA methylation DNA methylation is a covalent modification that occurs at cytosine nucleotides, in particular cytosines that precede a guanine (CpGs) [2]. The 1

17 process is catalyzed by DNA methyltransferases (DNMTs), which transfer the methyl group from S-adenosylmethionine to the target cytosine. Two families of DNMTs have been identified: DNMT1, which predominantly functions in maintenance of DNA methylation during DNA replication, and DNMT3, which is thought to be primarily responsible for de novo CpG methylation [3]. CpGs are strikingly rare in the genome compared to what would be expected from probabilistic estimates, and outside of transcribed regions CpGs are generally methylated. Areas of high CpG content, termed CpG islands, are found in approximately 40% of mammalian promoters, and unlike CpGs in the rest of the genome are usually unmethylated [4]. The methylation state of CpG islands in promoters can be an important factor controlling gene expression, with heavy methylation blocking gene transcription and sparse methylation permitting it. In addition, evidence suggests that heavy methylation throughout a region of chromatin can mediate long-range silencing that extends even to adjacent unmethylated genes [5]. DNA methylation in cancer COMMENT: perhaps this is a better place for you to refer to the You and Jones Cancer Cell review. Precisely controlled DNA methylation is important for imprinting (allelespecific expression of some genes) and cell differentiation in normal cells. In cancer cells, aberrant patterns of DNA methylation are frequently observed. In general, cancer cells feature global hypomethylation and focal promoter hypermethylation. Global hypomethylation has been tied to genomic instability, loss of imprinting, and activation of oncogenes [4,6]. Promoter hypermethylation, namely within CpG islands, leads to epigenetic silencing of 2

18 the target gene. In contrast to the rest of the genome, tumor suppressor promoters are frequently hypermethylated in tumors, suggesting that epigenetic silencing of tumor suppressors can be functionally equivalent to loss-of-function mutations [7]. DNA methylation perturbation has also been linked to resistance to therapeutics [8-10]. While epigenetic changes and genetic mutations may be mutually sufficient in some cases, mutations can also perturb the epigenome, and epigenetic changes can lead to increased mutation rate [11]. Histone modifications Histones also represent a target for epigenetic regulation, and H2A, H2B, H3, and H4 are all known to be post-translationally modified. These modifications generally occur on the N-terminal tails which protrude from the nucleosome, though H2A, H2B, and H3 are also known to be modified in the core globular domains. A wide range of modifications have been observed, including methylation, acetylation, phosphorylation, ubiquitination, SUMOylation, and ADP-ribosylation, with methylation (lysine and arginine residues) and acetylation (lysine residues) being the most studied [12]. The potentially limitless combinations of modifications and the clear role of histone modification in epigenetic regulation have inspired the concept of a histone code, a pattern of modifications that directs cellular development, selectively filtering the more expansive genetic code [13]. While the concept of a histone code remains controversial, several general themes have emerged. Histone acetylation is generally associated with opening of the chromatin and an activating effect on gene expression, with deacetylation having opposite 3

19 effects and being responsible for gene silencing. Histone methylation can be either activating or repressive, depending on the residue, the number of methyl groups on the residue (ranging from 0 to 3), and the presence of other histone modifications. While the effect of histone acetylation could potentially be explained by electrostatic interactions (as acetylation decreases the affinity of histones for DNA and thus might be hypothesized to push the chromatin apart), the variable effects of histone methylation suggests other enzyme intermediates. Indeed, the enzymes responsible for modifying histones (e.g., histone acetylases, deacetylases, and methyltransferases) are themselves often found complexed with other proteins (e.g., the Polycomb repressor complex (PRC)) which may help mediate downstream effects [14]. DNA methylation interaction with histone modifications Changes in patterns of histone modifications are associated with promoter CpG island DNA hypermethylation, which in turn is associated with the epigenetic silencing of tumor suppressor genes commonly found in various cancers. Recent evidence shows that histone methylation, depending on the residues involved, may either recruit (H3K27, H3K9) or prevent recruitment (H3K4) of DNMT3. Recruitment of DNMT3 subsequently results in DNA methylation and transcriptional downregulation of the target gene, providing a link between the two pathways of epigenetic control. This model would explain why CpG island hypermethylation in tumors is often correlated with increased H3K27 and H3K9 methylation, and decreased H3K4 methylation. Another model suggests that repressive histone modifications and DNA methylation may be mutually sufficient in some instances, such that 4

20 the presence of one can compensate for the other. For example, Polycomb group proteins (e.g., EZH2) associated with repressive histone modifications can recruit DNMT3, leading to hypermethylation of the DNA, loss of the repressive histone modifications, and maintenance of silencing. In addition, the enzymes that mediate and maintain epigenetic control (e.g., Polycomb group proteins) are often dysregulated in cancer cells, showing a mechanism by which cancer cells can manipulate gene expression on a more global scale [15]. DNA methylation as a biomarker and therapeutic target The study of cancer epigenetics has great potential to further the diagnosis and treatment of cancer. Promoter DNA hypermethylation is common in tumors, and could potentially serve as an early diagnostic marker. In prostate cancer, the gene GSTP1 is found to be methylated in 90% of clinical malignancies, but not benign hyperplasias, and can be detected in a variety of bodily fluids, including blood plasma [16]. Discovery of similar genes for breast cancer could potentially lead to more effective early detection strategies. In addition, DNA methylome profiles and histone modification maps could potentially serve as prognostic tools once a tumor is identified. As expression levels of certain genes is correlated with response to treatment, and epigenetic factors exert a powerful and durable effect on gene expression, such whole genome methods have the potential to become powerful tools for personalized treatment. Elucidation of epigenetic mechanisms may lead to the development of new treatments that specifically target epigenetic abnormalities or 5

21 vulnerabilities in cancer cells. Examples of two such classes of drugs include DNA demethylating agents and histone deacetylase inhibitors, which have shown promise in the treatment of leukemia and T-cell lymphoma, respectively [4]. Broadly demethylating or acetylating the genome may have adverse consequences, as use of such agents has been shown to promote metastasis and resistance to other therapeutics in some contexts [17,18]. In addition, promoter methylation of genes like MGMT may make some tumors more susceptible to specific therapies, and demethylating these regions could be counterproductive [19]. Greater understanding of epigenetic pathways should enable development of more targeted therapeutics with fewer offtarget effects. II. Methylome Profiling Affordable genome-wide approaches for assessing methylation have opened the way for methylome profiling of human cohorts. As sequencing costs continue to plummet, the limiting factor in genome-wide studies will no longer be the costs of sequencing, but the costs of analyzing the enormous datasets that result. While next-generation sequencing is becoming increasingly standardized, analysis is a complex task and an ongoing area of research. This section will summarize analytical methodology involved in methylome profiling, then discuss recent findings that are shaping methylome analysis. Analysis methods This section provides a conceptual overview of methylome analysis that begins with a discussion of methylation detection methods, and then 6

22 focuses on several topics specific to MethylCap-seq: relative vs. absolute methylation, normalization, and quantification of MethylCap-seq data. A summary of notable literature utilizing MethylCap-seq is also presented. For additional background, see two excellent reviews on methylome profiling methods and analysis [20,21]. Detection methods Because methylation marks are erased by typical amplification techniques (e.g., PCR), DNA must be specially processed prior to amplification and sequencing. One of three methods is typically used to detect DNA methylation: methylation-sensitive restriction enzyme digestion, sodium bisulfite treatment, or methylation-specific fragment capture (also referred to as affinity enrichment) [22]. Processed DNA is then incorporated into libraries for sequencing. Raw sequencing data, typically consisting of millions of short reads, is then aligned to the appropriate genome, providing a DNA methylation map of the sample. The methylation detection methods yield different types of signal, with corresponding advantages and disadvantages [21,23-25]. Sodium bisulfite treatment converts unmethylated cytosines to thymines (via a uracil intermediate), leaving only methylated cytosines (assuming complete conversion). Overlapping DNA fragments are then compared to determine the methylation percentage of particular CpGs (a measure of absolute methylation), which is comparable across the genome and between samples. However, this single-base resolution has a significant tradeoff: profiling every CpG in the human genome requires as much as a billion aligned reads, which is presently unfeasible for most clinical studies. 7

23 Instead, reads are typically focused in regions of interest, such as by selecting for CpG dense regions (e.g., reduced representation bisulfite sequencing). Thus breadth of genomic coverage is sacrificed for single-base resolution. Affinity enrichment-based methods (e.g., MethylCap-seq) capture methylated fragments from the sample, with the resolution determined by the fragment size (typically ~ bp in our studies). Because the fragments were captured, we know that they are methylated relative to the rest of the sample. However, absence of methylation must be indirectly inferred from absence of signal, which can be problematic when coverage depth is insufficient (in which case absence of signal is not equivalent to absence of methylation). Signal must be normalized to account for variable sequencing lane yield; this is accomplished by dividing aligned read counts in a region by the total aligned read count. The result is a relative methylation profile of a sample. These relative methylation profiles can be compared between samples (if the assumption of comparable overall methylation between samples is valid this assumption is usually not tested directly), and across the genome within the sample if further normalization for CpG density is performed. Confusing the matter, multiple published studies claim to convert relative methylation to absolute methylation values using CpG normalization [23,26]; while this may produce results that correlate on average with bisulfite-based methods, it is doubtful that such methods could account for systematic differences in global methylation between samples or groups of samples. In theory, an absolute methylation profile could be reconstructed by pairing sequencing data with a scaling factor, a measure of the overall methylation in a sample relative to 8

24 other samples. Enrichment-based methods, despite their lower resolution vs. bisulfite-based methods, can achieve much greater coverage of CpGs in the genome with the same sequencing depth [23]. Normalization and quantification of MethylCap-seq signal Normalization and quantification of methylation signal can be envisioned as a 5-step process. Fragment estimation and sequencing yield normalization are essential elements of the process. Count aggregation and exclusion of duplicate reads are recommended, while CpG density normalization is optional. Fragment estimation involves extending the short reads from the sequencer to reconstruct the original captured fragments. Count aggregation aggregates the reads in genomic bins of set size, which smoothes the data and condenses it. As methylation of adjacent CpGs is highly correlated and resolution of enrichment-based methods is inherently limited, count aggregation is a simple way to package the data with minimal data loss. Sequencing yield normalization normalizes signal to account for arbitrary sequencing lane yield; this is accomplished by dividing aligned read counts in a region by the total aligned read count. Duplicate exclusion attempts to correct for PCR artifacts introduced by genome amplification, where certain fragments are replicated more frequently relative to other fragments, sometimes by orders of magnitude. With the assumption that sequencing depth is low compared to the diversity of fragments used to generate the sequencing library, in the case that multiple reads share the same exact sequence all but one are assumed to be PCR artifacts, and thus these duplicate reads are excluded from analysis. Duplicate exclusion can 9

25 potentially result in analysis artifacts, as regions with higher CpG density or higher methylation may be more prone to random duplicates representing identical fragments from different cells (as opposed to PCR artifacts). Tools for MethylCap-seq data processing Two popular tools for processing of enrichment-based data are Batman [27] and MEDIPS [26]. After working briefly with MEDIPS, we developed our own custom workflow for enrichment-based sequencing data to enable greater flexibility in our analyses [28]. Batman was developed to address the poor agreement between enrichment-based methods and bisulfite-based methods especially problematic since bisulfite-based methods are often used to validate the findings of enrichment-based methods. The authors use a Bayesian deconvolution strategy to account for differences in CpG density that cause genomic regions with high CpG density (and thus more opportunities for methylation) to be preferentially enriched, a factor that does not affect bisulfite-based methods. Fragment estimation assumes a 500bp fragment size, and count aggregation uses 100bp bins. The tool is implemented in Java. While the tool greatly improves the agreement between enrichmentbased and bisulfite-based data, the Bayesian deconvolution model is computationally intensive. In addition, the tool is optimized for use for use with microarray data, and sequencing data requires substantial preprocessing before it can be input into Batman. MEDIPS similarly applies CpG density normalization to improve agreement with bisulfite-based methods. The authors use coupling factors 10

26 rather than a Bayesian deconvolution model, simplifying calculations and improving computation speed vs. Batman [26]. Fragment estimation and count aggregation bin sizes are customizable; default bin size is 50bp. MEDIPS is implemented in R, and use requires some familiarity with R. MEDIPS includes a methylprofiling function to compare two samples for differentially methylated regions (DMRs). Unfortunately, in most scenarios researchers wish to compare two or more groups of samples, and thus the feature has limited usefulness. While MEDIPS is functional, the tool is still slow and RAM intensive, and we found it difficult to use for larger studies involving many samples. The workflow used for this project skips the CpG density normalization step in favor of simplicity and transparency. Fragment estimation and count aggregation is customizable; I used 150bp fragment size and 500bp bins for our dataset of endometrial tumors profiled using MethylCap-seq. Implemented in a combination of bash and C++, the workflow is optimized for parallel processing in a supercomputing environment, and outputs a binary count file that is used for downstream analysis. Other published methods Harris et. al compared enrichment-based and bisulfite-based methylome profiling methods [24]. For enrichment-based methods, fragment estimation assumed a 150bp fragment size, duplicate reads were excluded, counts were aggregated in 200bp or 1000bp windows, and CpG density normalization was performed by iteratively comparing CpG contribution per estimated fragment with the regional fragment distribution. For comparison 11

27 between methods, methylation was binarized by setting tag density or betavalue thresholds (0.2 for the bisulfite-based methods). To avoid overlap during region classification, regions were assigned to UCSC features in a set priority order (promoter > exon > UTR > intron > intergenic). Bock et. al reported a method for integrated analysis of enrichmentbased data alongside bisulfite-based data [23]. For enrichment-based methods, duplicate reads were excluded, counts were not aggregated, and linear regression models were used to normalize for CpG density. They observed that combining lanes of data for the same sample improved correlation with bisulfite-based data. Statistical testing for differentially methylated regions was performed using Fisher s exact test. Lan et. al applied a bi-asymmetric-laplace model to disentangle signal from neighboring CpGs and thus output a methylation score for each CpG [29]. Notably, they eschewed fragment estimation in favor of a peak-finding approach (commonly used for ChIP-seq analyses), with the basic assumption that an adequate pile of fragments surrounding a methylated site provides sufficient information to attribute signal to individual CpGs. The approach claims to be able to increase resolution beyond the natural limits of fragment size (typically ~150bp) to less than 50bp, and the same approach was applied in a subsequent research article [30]. MethylCap-seq literature Sequencing-based profiling of the methylome began in earnest around 2008, with a subset of those studies using a MethylCap-seq approach. This section 12

28 summarizes the primary literature utilizing MethylCap-seq by category: methods article, methods comparison, research article. Method articles [27,29,31,32] Methods comparison [23-25,33] Research articles [30,34-36] [37-50] Methylome Profiling: The Biology This section will focus on biological findings resulting from and influencing methylome profiling. As the success of computational investigation hinges on knowing where to look and what to look for, this section will particularly emphasize genomic regions of interest. Promoter CGI methylation Methylation within promoter CGIs is well established as a mechanism of epigenetic silencing, and is usually the first genomic feature examined in any methylome profiling study. Many array-based methods, including early variants of the Infinium HumanMethylation bead array, only interrogate CpGs within promoter CGI. CGI are rarely methylated in the genome, even more so if they lie within promoters. Methylation of promoter CGI is thought to mediate silencing by suppressing the binding of transcription factors necessary for transcription initiation, most likely by recruiting suppressive factors, typically complexed with methyl-cpg-binding proteins, that provide a direct steric 13

29 hindrance or maintain the DNA in a closed conformation that is inaccessible to transcription factor complexes. CGI shore methylation Andrew Feinberg s group proposed that methylation changes associated with changes in gene expression tended to occur not in CGIs or promoters, but in flanking regions. This is consistent with so-called methylation spread theory [51], which postulates that methylation changes begin at foci adjacent to a gene promoter and gradually spread into the promoter, with transcriptional potential dropping off early in this process. Using the CHARM assay, which pairs methylation-sensitive restriction enzyme digestion with a tiling array, they showed that CGI shores (defined as flanking regions bp distant from a CGI) were the primary hotspots for epigenetic change in both colon cancer and hematopoietic progenitor differentiation [52,53]. They further showed that cancer-associated methylation accumulated in shores associated with tissue-specific methylation, also consistent with methylation spread theory. The role of CpG shores has since been independently confirmed in other contexts, including induced pluripotent stem cells [54], acute myeloid leukemia [55], and bladder cancer [56]. First exon methylation Early methylome profiling showed that first exons were frequently unmethylated together with CGI and promoters, suggesting a possible role in gene regulation [57]. This potential was confirmed in a recent study by Brenet et. al, which compared first exon and promoter methylation and showed that 14

30 first exon methylation was more tightly coupled to expression in an acute myeloid leukemia cell line [37]. Gene body methylation While first exons are typically unmethylated, gene bodies are frequently methylated [57,58]. Multiple studies have reported a direct correlation between gene body methylation and gene expression [41,59,60]. Particularly intriguing is that exon/intron borders often show sharp transitions in methylation, with exons heavily methylated and introns less methylated. While the role of gene body methylation remains unclear, the distinct pattern of methylation in exons and introns suggests that methylation in tandem with nucleosome positioning may play a role in alternative gene splicing. Intragenic CGI methylation While the role of promoter CGI methylation in gene silencing is well established, the role of intragenic CGI methylation is less clear. Using a method similar to MethylCap-seq, Adrian Bird s group found that intragenic CGI were frequently differentially methylated compared to promoter CGI and promoter CGI shores in differentiated mouse hematopoietic cells [42]. Unexpectedly, intragenic CGI methylation was associated with gene silencing. It is not clear whether this finding translates to human cells and other contexts; a previous study by their group found that intragenic CGI were differentially methylated with similar frequency in mouse and human cells (embryonic stem cells vs. whole blood), although intragenic CGI in colon cancer cells were not differentially methylated compared to normal tissue [49]. Partially Methylated Domains 15

31 Lister et. al introduced the concept of partially methylated domains (PMDs): large (median size ~150kB) genomic regions that showed reduced methylation and expression compared to surrounding regions [61]. Curiously, these PMDs were seen in differentiated cell lines but not embryonic stem cell lines. Reduced methylation seen in PMDs was restored when the cells were reprogrammed via induced pluripotency, suggesting a role for PMDs in cell differentiation [62]. Laird s group subsequently uncovered a unique pattern of aberrant methylation in colon tumors: focal CGI methylation occurring within extended regions of reduced methylation compared to surrounding regions [63]. These regions of reduced methylation overlapped significantly with Lister s PMDs and previously reported multi-gene domains of long range epigenetic silencing (LRES) in prostate cancer [5]. Significantly, methylation gains in tumors vs. normal tissue were more likely to occur in PMDs, and constitutive methylation was less likely to occur in PMDs. PMDs were seen in colon tumors and immortalized somatic cell lines but not in adjacent normal tissue, however only the colon tumors showed focal CGI hypermethylation within the PMDs. PMDs corresponded to nuclear-lamina-associated domains (LADs) seen in a fibroblast cell line, suggesting a general biological concept behind this striking methylation phenotype. Feinberg's group uncovered large blocks of contiguous hypomethylation in multiple colon tumors relative to normal tissue [64]. Notably, methylation loss was predominantly in regions of high methylation (~75%) in normal tissue, to intermediate levels in tumors (~55%). These 16

32 blocks overlapped highly with Lister's PMDs. Most genes in the blocks were silenced in both tumors and normal tissue; however genes from the blocks that were expressed in normal tissue were at increased risk for silencing in tumors. The reverse was also true; block genes silenced in normal tissue were often reexpressed in the tumors, with considerable variation between tumors. Taken together, one could speculate that PMDs represent potential hotspots for epigenetic change: zones where tumor suppressors are particularly vulnerable to silencing and oncogenes can be reactivated. The publications listed here all used whole genome bisulfite sequencing; it remains to be seen whether the findings can be recapitulated in large cohorts using less costly methods. Methylation and transcription factor binding sites While regulation of gene expression via promoter methylation is generally thought to be mediated through its effects on transcription factor binding, the role of DNA methylation at distal enhancers is not well characterized. Using whole genome bisulfite sequencing data from a mouse embryonic stem cell and a neural progenitor cell line, Stadler et. al characterized short stretches of low-intermediate methylation (10-50%) which they termed LMRs [65]. These LMRs tended to be distal from promoters and bore chromatin marks of enhancers. Furthermore, the authors demonstrated that the characteristic methylation state of LMRs was established by transcription factor binding. 17

33 More broadly, DNase I hypersensitivity, chromatin marks (e.g., H3K4me1 and H3K27ac), and binding by the histone acetyltransferase p300 are known to be associated with enhancers [66]. An ENCODE track provides compiled transcription factor binding sites and DNase I hypersensitivity regions from aggregate cell lines. The degree of context dependency of these data, and thus their applicability to studies in other tissues and malignancies, remains unclear. When examining profound epigenetic alterations such as those expected in cancer, comparisons with enhancer and transcription factor data from the same samples or same tissues would likely yield more robust results. Exon methylation and regulation of alternative splicing While several genomic profiling studies revealed sharp methylation changes defining exon and intron boundaries (see the discussion of gene body methylation), the effects of such methylation and a mechanism for these effects remained elusive. A recent study by Shukla et. al revealed a role for exon methylation in alterative splicing [67]. Exon methylation inhibited CTCF binding in human B-cell lines, decreasing incorporation of the exons into RNA transcripts. Exon incorporation in turn was mediated by RNA polymerase pausing during transcript elongation. In addition to DNA methylation, alternative splicing may be marked by other epigenetic factors including H3K36me3 [68]. 18

34 Chapter 2. Thesis Rationale and Research Objectives DNA methylation is a stable epigenetic mark often perturbed during carcinogenesis [4]. Growing evidence demonstrates a role for DNA methylation both as a regulator of gene expression and as a potential biomarker [69,70]. Widespread accumulation of methylation in regulatory elements in specific cancer types, termed the CpG island methylator phenotype (CIMP), may even direct carcinogenesis [71,72]. Few studies, however, have examined the CpG island methylator phenotype genome-wide in a clinical cohort, as until recently analyzing methylation genome-wide in a large number of samples was cost-prohibitive. As a result, the methodology for analyzing these large datasets remains poorly developed. The Infinium HumanMethylation beadchip is a popular tool for sampling methylation at many loci across the genome [73], but the suitability of this tool for defining CIMP de novo in an uncharacterized cancer type (as opposed to verifying CIMP) has not been thoroughly evaluated. Studies show general but not perfect agreement between Infinium and other methylome profiling methods in tumors and normal tissue [23,25], but agreement has not yet been evaluated in the context of the profound methylation dysregulation associated with CIMP. A clear need exists for a study that validated Infinium-based CIMP discovery using an alternative methylome profiling method. 19

35 To address this problem, I analyzed methylation in endometrioid endometrial cancers and unmatched normal endometria [74]. These data were generated using MethylCap-seq. Initial analyses suggested deficiencies in library preparation in some samples that could compromise the validity of methylation calls, leading me to develop a quality control module to identify and control for these issues. I also analyzed the Infinium methylation data from an endometrioid endometrial cohort from The Cancer Genome Atlas Consortium, which later published a cluster analysis of the Infinium data [75]. I set out to identify CIMP using endometrioid endometrial cancer as the model cancer and MethylCap-seq as the methylome-profiling method The goals of this project were: 1) develop and validate a MethylCap-seq quality control module to identify and exclude samples with spurious methylation data; 2) develop a method to identify putative CIMP tumors using the MethylCap-seq platform; 3) describe the differences between putative CIMP tumors using the Infinium platform; 4) compare agreement in CIMP classification between my method based on MethylCap-seq and the published clustering method based on Infinium; and 5) demonstrate a signature that could classify CIMP tumors prospectively without a full methylome profiling experiment. Chapter 3 addresses goal 1 as well as the general methodology developed for these analyses. Chapter 4 addresses goals 2 through 5 and presents data that were collected along the way. This thesis tests the hypothesis that a combined methylation analysis method using MethylCapseq and Infinium can be used to define CIMP de novo in endometrial cancer. 20

36 Chapter 3. Enrichment-based DNA methylation analysis using nextgeneration sequencing: quality control, estimating changes in global methylation and the effects of increased sequencing depth. I. Introduction Epigenetic mechanisms are responsible for determining and maintaining cell fate, stably differentiating various tissues in the human body. Epigenetic changes can induce chromatin remodeling, leading to lasting effects on gene expression. Epigenetic aberrations, including perturbed DNA methylation, have been implicated in a diverse array of cancer-associated pathways, including silencing of tumor suppressors [4], activation of oncogenes [6], promotion of metastasis [17] and resistance to therapeutic drugs [8-10]. DNA methylation is thought to be a secondary or tertiary effector in the epigenetic cascade [76], with its accumulation in promoter CpG islands leading to durable gene silencing [58]. Due to its biological and experimental stability as an epigenetic marker, DNA methylation is a potential biomarker for disease, especially cancer [70]. While methylation marks are erased by standard PCR, other methods preserve this information, including affinity-based capture of methylated regions and bisulfite conversion [21]. Low-cost genome sequencing is revolutionizing the field of DNA methylation analysis. New techniques are enabling deep methylome profiling of biological samples on an unprecedented scale [21]. These sequencing projects can yield several GB of data per sample, presenting significant 21

37 bioinformatic challenges. While standardized pipelines have been developed for the most common elements of analysis such as sequencing data processing and alignment, methylome analysis often relies on custom methods specific to the methylation assay. To address quality control concerns and to determine required sequencing depth, analytic methods must not only be assay-specific but must also be tailored to the specific experiment [20,24]. MethylCap-seq is a method for genome-wide profiling of DNA methylation that relies on enrichment of methylated DNA fragments using methyl-cpg binding domain protein 2 (MBD2) [77]. These fragments are then shotgun-sequenced, yielding a map of DNA methylation across the genome. Advantages of MethylCap-seq compared to other platforms include costeffective profiling of CpG dense regions and true whole genome coverage [25]. Optimization of MethylCap-seq data collection and analysis requires attention to data quality and saturation. Proper quality control ensures that methylation calls are not impacted by inconsistencies in methylation enrichment. Greater saturation increases statistical confidence in methylation calls, especially in sparsely methylated regions [78]. A challenge currently facing enrichment-based methods that rely on sequencing, such as MethylCap-seq, is estimating changes in global methylation. Information on absolute methylation levels is erased during sequencing normalization, which is necessary to control for day-to-day 22

38 variations in sequencing output. Thus, this information must be recaptured using other computational methods or assays. In this study, we provide evidence-based guidelines for quality control, illustrate the effects of additional reads on quality control metrics and provide empirical support for a global methylation indicator, an analytic tool that correlates with overall methylation levels. We hope these findings will help other groups standardize and optimize their MethylCap-seq experiments to take advantage of this promising methylome profiling method. II. Results and Discussion Quality control exclusion criteria reduce noise in methylation signal and improve analytical power. Our automated quality control (QC) module, based on MEDIPS [26], was implemented to identify technical problems in the sequencing data and flag potentially spurious samples. One goal of the QC module was to provide rapid feedback to investigators regarding dataset quality, facilitating protocol optimization prior to committing resources to a larger scale sequencing project. A second goal was to identify samples that should be excluded from analyses due to data validity concerns. The validity of a MethylCap-seq experiment is dependent on enrichment of methylated fragments prior to sequencing. A failure in enrichment invalidates any downstream data, and therefore identifying such failures is vital. Also important is verifying the statistical reproducibility of the data for each sample. Generating replicate sequencing lanes for each sample to assess experimental reproducibility 23

39 empirically is often not cost effective, and thus addressing this issue computationally is desirable. Similarly, the confidence in methylation calls is related to the breadth and strength of signal at the CpGs in the genome. We assessed enrichment of methylated fragments using the CpG enrichment parameter, which compares the frequency of CpGs in the sequenced sample with the frequency of CpGs in a reference genome (hg18 for this study). Statistical reproducibility was assessed by calculation of saturation, the Pearson correlation of two random partitions of the sequenced sample [26]. Breadth and strength of methylation signal was assessed using 5X CpG coverage, representing the fraction of CpG loci with five or more reads in the sample compared to the total number of CpGs in the reference genome. These QC parameters were calculated for each sample using MEDIPS [26]. Appendix A1 demonstrates the results of the QC module for the Endometrial Dataset. 203 lanes of sequencing data were generated for 101 unique samples. 43 lanes failed QC, representing 21 unique samples. To assess how lanes that pass QC might differ from lanes that failed QC, we computed the noise in methylation signal, representing percentage of uniquely aligned extended reads falling in 500bp bins without CpG dinucleotides (Figure 1). Median noise in samples that failed QC (6.40%) was more than three-fold greater than in samples that did not fail QC (2.04%, p<0.001), and closely resembled noise in input (7.82%). Excluding "QC-failed" lanes did not significantly decrease median noise levels (2.04 vs. 2.22, p=0.08) but did greatly decrease the variation in noise levels between samples. As the distribution of noise levels is positively skewed and not normal, a small 24

40 number of outliers would not be expected to significantly shift the median noise level. To investigate whether the additional noise seen in "QC-failed" samples impacted sequencing reproducibility, we computed the Pearson correlation between replicate lanes of samples that passed QC (n = 68) vs. those that failed QC (n = 9) (Appendix A2). Replicates of samples that passed QC correlated more highly than replicates of samples that failed QC (average r = 0.90 vs. 0.59; p<0.001). Variation in replicate correlation between samples was also noticeably less in the QC pass group (relative standard deviation = 6.7% vs. 27.1%). We surmise that failures in methylation enrichment result in a more random sampling of the fragment distribution regardless of methylation status, resulting in increased signal in regions where methylation should not be detectable. 25

41 Figure 1. QC exclusion criteria reduce noise in methylation signals. Percentages of uniquely aligned reads falling in 500bp bins containing no CpG dinucleotides pre- and post-qc analysis were plotted as a standard box plot for samples prior to QC filtering, samples that passed QC, and samples that did not pass QC in the endometrial dataset. An input from a sample that was not subjected to methylation capture is included for reference. The number of samples in each group is included above the baseline. Values for replicate lanes in each group were averaged, and samples were compared statistically using a Wilcoxon rank-sum test. Whiskers indicate 10 th and 90 th percentiles. 13.5% of 500bp bins in the genome are classified as CpGbarren. 26

42 As the goal of many methylome profiling studies is to identify differentially methylated regions (DMRs) between biological groups, we next assessed whether our QC exclusion criteria might improve our analytical power to detect DMRs. We compared DMRs between 89 endometrial tumors and 12 nonmalignant endometrial tissue samples across several genomic features. Excluding sequencing lanes that failed QC (corresponding to 19 tumor and 2 nonmalignant samples) resulted in more DMRs in every genomic feature assessed (Table 1). The greatest gains were seen in promoters and CpG shores, where the number of DMRs increased 22-fold and 2-fold, respectively. Gains in CpG islands and promoter-associated CpG islands were more modest (1.6-fold and 1.05-fold). These results trend inversely with CpG density, perhaps reflecting greater benefit from QC exclusion in regions where coverage is lower. We speculate that the improvements in DMR detection resulting from exclusion of samples that fail QC would be even greater when working with smaller sample sizes or biological groups with more similar methylation patterns. Table 1. Differentially Methylated Regions, Endometrial Tumors vs. Nonmalignant Endometrial Tissue Genomic feature CpG islands Promoter- associated CpG shores Promoters All samples Samples Passing QC only The effect of additional sequencing lanes on quality control metrics 27

43 DNA sequencing cores are frequently asked whether additional lanes of sequencing data are necessary or desirable for MethylCap-seq experiments. To address this question, we analyzed a large dataset of ovarian tumors of which 7 samples had been resequenced (using the same genomic library), for a total of 15 lanes (Appendix A3). First, the degree of correlation between the replicate lanes was analyzed to ensure that additional lanes of data would not introduce excessive variation. As shown in Figure 2, replicate lanes from sequencing the same library twice correlated highly (R 2 value of 0.98). Note that the question here was the value of additional sequencing lanes and not of additional technical or biological replicates the correlation between technical or biological replicates would be expected to be much lower than the correlation between two lanes sequencing the same library. CpG enrichment, saturation and 5X coverage were then evaluated for individual lanes and combined lanes (Figure 3). CpG enrichment varied somewhat between samples (range: ), but was extremely similar for replicate lanes (<1% percent deviation from the combined lane on average). Saturation improved modestly from a median of 0.79 to a median of As saturation values for individual lanes of MethylCap-seq data typically range from 0.6 to 0.85 for single lanes in our hands and we consider a saturation value of 0.6 acceptable for analysis, this improvement may be inconsequential although it is statistically significant. 5X coverage improved noticeably from a median of 0.21 to a median of 0.28, representing an average 38% gain. As 5X coverage represents a minimum signal level needed to reliably 28

44 differentiate a methylated locus from a locus with no methylation (or the absence of a methylation signal), we speculate that this increase could significantly increase the statistical power to detect DMRs, particularly in small or lightly methylated regions. 29

45 Figure 2. Replicate sequencing lanes for MethylCap-seq experiments correlate highly. Replicate lanes for each sample were randomly assigned to two partitions, and the average rpm of 6000 (of 6M) randomly selected 500bp bins were compared between partitions. 30

46 Figure 3. Additional lanes of sequencing data moderately increase saturation but greatly increase 5X CpG coverage. Variations in CpG enrichment (A), saturation (B), and 5X coverage (C) were assessed for 15 lanes of data in the ovarian study corresponding to 7 samples by generating plots of individual lanes and combined replicate lanes for each sample. (D) Average percent deviation of the individual lanes from the combined lane for each sample was plotted for each parameter. Error bars for (D) represent standard error. Asterisks represent Student t-test p<0.05. The global methylation indicator (GMI) correlates inversely with an in vitro methylated tracer sequence. We recently proposed a computational method to compare genomewide changes in methylation patterns between samples in a given experiment 31

47 [77]. As MethylCap-seq signal (in reads) is normalized by raw read counts to adjust for variability in lane yield, two samples with identically distributed methylation yet different absolute levels of methylation would be expected to yield identical normalized methylation signals at any given loci. The GMI method relies on the observation that in vitro methylated samples display characteristic changes in the methylation signal distribution as quantified in a MethylCap-seq experiment, and these changes are CpG-density dependent. Methylation signal shifts from low CpG content regions to high CpG content regions; this difference can be quantified by calculating the area under the curve of the average normalized methylation signal plotted across CpG density. The GMI is a potentially powerful tool for capturing differences in methylation distribution between samples. In an effort to validate the GMI as a surrogate for global methylation, we developed a complementary analysis utilizing an in vitro methylated construct. Samples from an acute myeloid leukemia (AML) study were used for this analysis [39]. This methylated construct was spiked into the genomic DNA in the AML samples prior to sonication at a defined concentration and subjected to methylated DNA enrichment along with the sample DNA. The "spike-in" was originally intended to verify successful enrichment; if enrichment occurs, PCR for the methylated plasmid would show increased copy number after enrichment. Additionally, this "spike-in" also indicates global methylation levels in a sample since the methylated plasmid competes with the natively methylated genomic DNA fragments for binding to the MBD protein. When the proportion of methylated to unmethylated genomic 32

48 fragments is high prior to enrichment, the methylated plasmid "spike-in" gets enriched relatively less, and vice versa. Indeed, we found that read counts aligned to the plasmid correlate inversely with GMI (Figure 4, Appendix A4). This result provides empirical evidence that GMI can capture changes in absolute global methylation levels for MethylCap-seq experiments. Such a metric could be useful for gauging response to treatments that are known or expected to alter the methylome. 33

49 Figure 4. Global methylation indicator scales inversely with read counts from a "spiked" in vitro methylated construct. The pires2-egfp plasmid was in vitro methylated and "spiked" at a set concentration into each of 14 samples from the decitabine study prior to sequencing. After sequencing, GMI was calculated and plotted against the inverse of the number of normalized reads aligning to the plasmid. A linear best fit was drawn through the experimental points (p = 0.036, R 2 = 0.318). III. Methods Patient samples 89 endometrioid endometrial cancer and 12 unmatched nonmalignant endometrium samples were obtained from Washington University. All studies 34

50 involving human samples were approved by the Human Studies Committee at the Washington University and at The Ohio State University. Seven ovarian cancer samples from a larger patient cohort were obtained from TriService General Hospital, Taipei, Taiwan. All studies involving human ovarian cancer samples were approved by the Institutional Review Boards of TriService General Hospital and National Defense Medical Center. Fourteen bone marrow samples from a single-center Phase II clinical trial involving patients with acute myeloid leukemia (AML) at The Ohio State University were obtained for this investigation. The study design and the results of the trial for the entire patient cohort have been reported elsewhere [79]. All studies involving these samples were approved by The Ohio State University Human Studies Committee. Methylated-DNA capture (MethylCap-seq) Enrichment of methylated DNA was performed with the Methyl Miner Kit (Invitrogen) according to the manufacturer s protocol as previously described [77]. Briefly, one microgram of sonicated DNA was incubated at room temperature on a rotator mixer in a solution containing 3.5 micrograms of MBD-Biotin Protein coupled to M-280 Streptavidin Dynabeads. Noncaptured DNA was removed by collecting beads with bound methylated DNA on a magnetic stand and washing three times with Bind/Wash Buffer. Enriched, methylated DNA was eluted from the bead complex with 1M NaCl and purified by ethanol precipitation. Library generation and 36-bp single- 35

51 ended sequencing were performed on the Illumina Genome Analyzer IIx according to the manufacturer s standard protocol. MethylCap-seq experimental quality control and exclusion criteria The automated quality control (QC) module was implemented as previously described [77]. Pre-aligned sorted.txt files from the Illumina CASAVA 1.7 pipeline were used to reduce turnaround time for analysis. In brief, duplicate alignments were removed from the aligned sequencing file (a correction for potential PCR artifacts), and the resulting output was loaded into an R workspace. MEDIPS [26] was used to analyze CpG enrichment, saturation, and CpG coverage. Sequencing lanes were excluded using the following thresholds: CpG enrichment < 1.4, saturation < 0.5 and CpG 5x coverage < These criteria and thresholds were chosen for technical relevance and their ability to identify known technical issues without a bias for specific biological groups. Samples were excluded if any of the criteria were not met. As CpG coverage was assessed qualitatively for analysis of the endometrial dataset, five lanes of data with borderline 5x CpG coverage were not excluded that would have qualified for exclusion due to this criterion. For the DMR comparison (Table 1), methylation signal was normalized for each lane and averaged among replicate lanes for each sample. The All group includes samples with merged QC pass lanes, samples with merged QC fail lanes, and samples with merged QC pass and QC fail lanes, and thus is not a straight sum of the QC-passed and QC-failed samples. 36

52 For the reproducibility comparison (Appendix A2), Pearson r was calculated using two replicate lanes corresponding to each sample represented in the QC pass and QC fail groups. When a sample had more than two replicate lanes in a single group, two lanes were randomly chosen for the analysis. Samples lacking two replicate lanes in either the QC pass or QC fail group were excluded from this analysis. Lanes corresponding to the same sample but generated using different library preparations were also excluded. Sequencing and QC summaries corresponding to the datasets referenced in this chapter can be viewed in Appendices A2, A3, and A4. Standard sequence file processing and alignment Sequence files were processed and aligned as previously described [77]. Briefly, QSEQ files from the Illumina CASAVA1.7 pipeline were converted to FASTA format, duplicate reads removed (to control for PCR bias), and uniquely aligned with Bowtie to generate SAM files using the following options: -f -t -p 1 -n 3 l 32 -k 1 -m 1 -S -y --chunkmbs 1024 max best [80]. Duplicate alignments (reads aligning to the same genomic position) were removed using SAMtools [81]. Standard global methylation analysis workflow Aligned sequence files in SAM format were analyzed using a custom analysis workflow as previously described [77]. Briefly, aligned reads were extended to the average fragment length (as determined by BioAnalyzer fragment analysis) and counted in 500bp bins genome-wide. The resulting 37

53 count distribution was normalized against the total aligned reads by conversion to reads per million (RPM). Methylation was categorized by genomic feature as follows: CpG islands (CGI, as defined in the UCSC genome browser), promoters (2kB in length, 1kB upstream and downstream of the TSS), CGI shores (200bp to 2kb distant from both ends of each CGI), and the first exon of RefSeq genes. CGI were further subdivided by proximity to promoters (within 10kB upstream or 1kB downstream of a 2kB promoter), and 2kB promoters were subdivided by overlap with CGI. Differentially methylated regions were identified by summing RPM across the bins for each locus in the genomic feature, then performing a Wilcoxon rank sum test to assess differences in these summed RPMs between sample groups. Results were then adjusted for multiple comparisons by setting a false discovery rate (FDR) cutoff of Calculation of noise in methylation signal Noise in methylation signal, representing extended reads falling in regions without CG dinucleotides, was quantified as the summation of reads falling into bins with zero CpG content. If a sample in a given group had multiple lanes of data, noise was computed for each lane individually and averaged among replicate lanes in the group. As a single sample could have a lane that passed QC and a lane that failed QC, the number of samples in each group does not sum to the total number of samples in the study. 38

54 Calculation of the Global Methylation Indicator (GMI) To assess genome-wide changes in methylation patterns for each sample in an experiment, a custom parameter termed the global methylation indicator (GMI) was calculated as previously described [77]. Briefly, normalized read counts (in RPM) were classified by CpG density and averaged to construct a methylation distribution. The average RPM were then summed across the distribution (i.e., the estimated area under the methylation distribution curve) to yield the GMI. Assessment of methylated fragment enrichment using an in vitro methylated construct Experimental procedure The 5.3Kb plasmid vector pires2-egfp, which contains three CpG islands, was used to assess methylated fragment enrichment. The construct was linearized with Nhe I and then in vitro methylated with M.SssI. The methylated "spiked-in" DNA was quantified by the Qubit High Sensitivity Assay and diluted. Plasmid was spiked into genomic DNA at a concentration of 1.5pg plasmid / 1µg genomic DNA (~2.5 plasmid copies per cell) prior to sonication of genomic DNA for library generation. Analysis Reads mapping to the construct were identified by converting QSEQ files to FASTA format as described above, then aligning the files with Bowtie using the following options: -q -t -p 1 -n 3 -l 32 -k 1 -S --chunkmbs max --best. Duplicate reads were retained for this analysis. To control for 39

55 variation in construct aligned read counts attributable to fluctuations in lane yield, construct aligned read counts were normalized against the total raw read counts by conversion to reads per million (RPM). IV. Conclusions This study shows that post-sequencing QC metrics can exclude poor quality samples from analysis, decreasing noise in methylation signal and improving power to detect DMRs. Furthermore, we show that resequenced lanes from the same library correlate very well, and that additional lanes of data have a small impact on saturation (data reproducibility) and a large impact on 5X CpG coverage (confidence in methylation calls at a given locus). Finally, we demonstrate that our computational indicator of global methylation correlates with an unrelated method that utilizes spike-in of DNA with known methylation status. These findings verify that MethylCap-seq, with appropriate quality control, is a reliable tool that provides reproducible relative methylation information on a feature by feature basis, provides information about levels of global methylation, and can be used to analyze large patient cohorts of hundreds of patients. 40

56 Chapter 4. Identification of endometrial cancer methylation features using a combined methylation analysis method I. Introduction During carcinogenesis cells within solid tumors acquire numerous aberrations, including mutations that alter the coding sequence of genes, as well as changes in gene expression. Changes in gene expression may be mediated in many ways including: altered transcription factor levels or function, mutations in DNA binding elements, mirnas, and chromatin remodeling. Chromatin remodeling, including epigenetic modifications to histones and DNA methylation, normally plays a key role in cell differentiation, stably switching cellular pathways on/off until the cells reach a terminally differentiated state that is typically irreversible. Epigenetic aberrancies can allow cancer cells to silence tumor suppressors and re-express oncogenes, giving tumor cells an additional option besides mutation to dysregulate key pathways [82]. DNA methylation is one of the better understood mechanisms of epigenetic control. DNA methylation in humans is mediated by the DNA methyltransferases DNMT1 and DNMT3, which add a methyl group to the 5 carbon of cytosine. In differentiated cells, DNA methylation occurs in the context of cytosine followed by guanine (CpG) in the DNA sequence. DNA methylation in promoter CpG islands (CGI) has been shown to mediate stable 41

57 gene silencing. Tumor suppressor silencing via DNA methylation is found in a wide range of tumor types [69]. While methylation marks are erased by standard PCR, several methods have been developed to preserve this information, including affinity-based capture of methylated regions and bisulfite conversion [21]. DNA methylation is increasingly being recognized as a potential biomarker [70]. The CpG island methylator phenotype (CIMP) is a cancer-specific accumulation of DNA methylation in the CpG islands of some tumors. Originally identified more than 15 years ago in colorectal cancer [83], CIMP has since been identified in multiple cancer types: glioma [84], breast cancer [85], acute myeloid leukemia [86], gastric cancer [87], clear cell renal cell carcinoma [88], oral squamous cell carcinoma [89], and hindbrain ependyomas [90]. CIMP may occur early in tumorigenesis: CIMP can be detected in colorectal serrated adenomas prior to malignant progression and the development of microsatellite instability [91]. Early diagnosis presents an opportunity for early intervention and improved outcomes, including lower treatment morbidity and lower recurrence rate. CIMP classification may have prognostic value: CIMP is associated with good prognosis in some cancer types (e.g., colorectal, breast) and poor prognosis in others (e.g., renal cell carcinoma) [72]. Accurate differentiation of relatively benign and aggressive cancer subtypes allows benign subtypes to be treated less aggressively and aggressive subtypes to be treated more aggressively. CIMP could also represent a therapeutic target for demethylating therapies [92]. Despite its 42

58 potential diagnostic, prognostic and therapeutic value, CIMP and its manifestations in different cancer types remain poorly understood. Defining CIMP in a new cancer type requires extensive methylome profiling. However, methylome profiling studies examining CIMP genomewide in endometrial cancer cohorts remain sparse. Several methods have emerged to profile the methylome, including the Infinium beadchip [73] and affinity-based methylation capture followed by shotgun sequencing (e.g., MethylCap-seq [77]). The Infinium beadchip is frequently used in clinical studies for its cost-effectiveness, scalability for large cohorts, high accuracy, and user-friendly analysis pipeline. The method relies on hybridization of bisulfite-converted DNA to the beadchip, followed by single-base extension. The end result is a readout of percent methylation for individual CpGs, with ~7 CpGs assessed per promoter CGI using the HumanMethylation 450 kit, or roughly 8% of the CpGs in promoter CGI. Methylation of nearby CpGs is not measured, but is assumed to be similar. MethylCap-seq is one of several affinity-based capture methods that leverage shotgun sequencing to assess methylation patterns. MethylCap-seq uses the MBD2 protein to capture methylated fragments, which are then sequenced to yield piles of methylation tags across the genome. By comparing tag frequency between samples, relative methylation levels can be inferred for a given region. As sequencing costs continue to fall, MethylCap-seq and similar methods will become increasingly cost-effective. For analysis of promoter CGI, MethylCap-seq has a particular advantage over Infinium: average methylation over the regions is measured, rather than assumed. 43

59 In this study, we developed a 13-region signature that stratifies endometrioid endometrial tumors by CpG island methylation status and distinguishes tumors from both normal control and adjacent normal tissue. This signature is based on a training set of MethylCap-seq data and validated for use on Infinium datasets. We also demonstrate a general method for identifying methylator phenotypes based on total promoter CGI methylation. This signature could prove useful for detecting and classifying endometrioid endometrial carcinomas, as well as catalyzing research into the role of dysregulated methylation in this disease. II. Results Characterizing a CpG island methylator phenotype Methylome data were analyzed from a previously published MethylCap-seq dataset of 76 endometrioid endometrial primary carcinomas and 12 non-matched normal control samples (hereafter referred to as the discovery set) [93]. To assess the extent of CpG island (CGI) methylation in tumors relative to normal controls, we compared overall methylation in CGIs as well as in other genomic features known to acquire methylation during carcinogenesis [37,52] (Figure 5A). Tumors overall showed a nearly 2-fold increase in methylation of promoter CGI, with reduced gains in CGI shores. These increases correlated with methylation gains in promoters containing CGI (but not promoters without CGI), as well as gains in first exons. To examine the variability of tumor methylation gains between different tumors, promoter CGI methylation was plotted for each tumor and compared to 44

60 normal controls (Figure 5B). Tumors displayed a spectrum of methylation gains ranging from slightly below the levels of normal controls to 5-fold higher. 45

61 Defining the CIMP Training Set Fold Change RPM A * * * * * * Normal Malignant B Promoter CGI methylation (RPM) Normal n=12 Malignant n=76 CGI-H (n=5) CGI-0 (n=8) Normal Malignant Figure 5. Endometrioid endometrial malignancies show increased methylation in promoter CGI compared to unmatched normal controls. (A) MethylCap-seq normalized signal in the discovery set was compared between malignancies (n=76) and normals (n=12) and plotted across autosomes for several genomic features: CpG islands (CGI), promoters (1kB upstream and downstream of the TSS), CGI shores (200bp to 2kb distant from both ends of each CGI), and the first exon of RefSeq genes. CGI were further subdivided by proximity to promoters (within 10kB upstream or 1kB downstream of a 2kB promoter), and 2kB promoters were subdivided by overlap with CGI. Bars denote mean 46

62 fold change relative to normal controls; error bars depict 25th and 75th percentiles. Asterisks denote Bonferroni-adjusted Wilcoxon rank sum test p<0.05. (B) Most endometrioid endometrial malignancies show increased promoter CGI methylation compared to normal controls, with a subset of highly methylated tumors (CGI-H) showing over three-fold more methylation compared to tumors with similar methylation to normal controls (CGI- 0). To identify where methylation changes occur in the most methylated of tumors, thresholds were drawn at the upper and lower extremes of the tumor methylation spectrum (dotted lines). To rule out the contribution of an enrichment bias, which would produce differences in methylation signal based on the efficiency of methylated fragment enrichment rather than biological methylation, we compared mitochondrial methylation amongst the 76 tumors as a negative control. Mitochondrial and nuclear methylation utilize different cellular machinery and occur in different cellular compartments [94,95], thus a positive correlation between mitochondrial methylation and CGI methylation would indicate a potential enrichment bias. No significant correlation was seen between mitochondrial methylation and promoter CGI methylation (Spearman r=-0.15, p=0.2). To define the changes in methylation underlying a putative CpG island methylator phenotype, we examined regional methylation differences between tumors with the highest (CGI-H) and lowest (CGI-0) levels of promoter CpG 47

63 island methylation (Figure 5B, dashed lines). Consistent with the premise of a CpG island methylator phenotype, methylation differences were almost exclusively unidirectional (4672 hypermethylated vs. 17 hypomethylated), and encompassed a large fraction of all promoter CGI (29%) (Figure 6B). To focus on methylation changes that might occur during carcinogenesis, the identified regions were further overlapped with methylation differences observed in tumors vs. normal samples (Figure 6A) promoter CGI were shared between the two comparisons (Figure 6C). 49% of CGI-H v. CGI-0 hypermethylated loci were also hypermethylated in tumors compared to normal tissue, which was twice the expected rate of 23% (Chi squared p- value < 0.001), suggesting that many of these loci represent hotspots in the genome that are particularly vulnerable to aberrant methylation in endometrial tissue. Pathway analysis of these shared promoter CGI showed enrichment for known targets of epigenetic regulation, including targets of the Polycomb Repressor Complex and regions known to be methylated in other cancers (Table 2). 48

64 A Comparison of methylation differences between tumors and normals B Comparison of methylation differences between hypermethylator and nonhypermethylator tumors C Comparison of regions methylated in hypermethylator tumors and frequently methylated in endometrioid endometrial cancer Hyper Hypo Unchanged Hyper Hypo Unchanged Malignant (n=76) v. Normal (n=12) CGI-H (n=5) v. CGI-0 (n=8) Malignant v. Normal CGI-H v. CGI-0 Figure 6. Loci methylated in CGI-H tumors account for many of the cancer-associated methylation gains. (A) Endometrioid endometrial tumors show increased methylation of over 20% of promoter CGI compared to unmatched normal controls. Differentially methylated promoter CGI were quantified in the discovery set. Depicted is the proportion of loci that were hypermethylated (Hyper), hypomethylated (Hypo), or unchanged in malignancies relative to normals. (B) CGI-H tumors gain methylation at nearly 30% of promoter CGI compared to tumors with methylation similar to normal controls (CGI- 0). (C) Differentially methylated regions between Malignant v. Normal and CGI-H v. CGI-0 show considerable overlap. Hypermethylated promoter CGI from (A) and (B) were intersected to identify cancerspecific hypermethylated regions. 49

65 Table 2. Term enrichment associated with hypermethylated promoter CGI (MSigDB Perturbation) Hypermethylated promoter CGI, Overlap comparison Malignant v. Normal CGI-H v. CGI-0 Term Name Hyper FDR Q-Val Hyper Fold Enrichment Hyper Foreground Region Hits Hyper Total Regions Genes identified as targets of the Polycomb protein SUZ12 Genes possessing promoter trimethylated H3K27 (H3K27me3) 2.76 x x Polycomb Repression Complex 2 (PRC) targets 2.77 x Genes identified as targets of the Polycomb protein EED Genes with hypermethylated DNA in lung cancer samples 4.18 x x Genes de novo DNA methylated in cancer Methylation signature construction and technical validation To identify a methylation signature that could be translated to other platforms (including Infinium and assays of individual regions), 16 candidates 50

66 were selected from the 2269 promoter CGI from Figure 6C (Table 3). The 2269 promoter CGI were sorted by Kruskal-Wallis p-value and fold difference for the CGI-H vs. CGI-0 comparison; potential candidates were excluded that overlapped regions associated with copy number amplification in TCGA endometrioid samples (8 of the top 50). In addition, a threshold Student p- value of p<0.01 (not corrected for multiple comparisons) was imposed to exclude candidates with large fold differences that appeared to be driven by outlier samples (5 of the top 21). Methylation of 15/16 candidates robustly distinguished the CGI-H and CGI-0 tumors; TMEM115 however showed hypermethylation in only 3 of 6 CGI-H tumors (Figure 7A). 51

67 Table 3. Promoter CGI that distinguish CGI-H from CGI-0 tumors as measured by MethylCap-seq Gene symbol Chromosomal location (hg19) CpG count fold change RPM TMEM115 chr3: TBX18 chr6: NODAL chr10: SVEP1 chr9: TNFSF11 chr13: OR10H2 chr19: KDM2B chr12: FGF12 chr3: APCDD1L chr20: EPHX3 chr19: ASCL1 chr12: EXOC3L4 chr14: SMOC2 chr6: B4GALNT1 chr12: GRM8 chr7: VILL chr3:

68 CGI-H CGI-0 CGI-H CGI-0 A MethylCap-seq B Infinium C 1.0 Signature average beta-value p=0.007 Log2 norm index Signature CGI Log2 norm index Discarded Candidate CGI 0.0 CGI-0 (n=5) CGI-H (n=6) Figure 7. Methylation of 13 promoter-associated CGI distinguishes tumors with high promoter CGI methylation (CGI-H) from those with baseline promoter CGI methylation (CGI-0). 11 tumors from the discovery set of 76 endometrioid endometrial tumors were chosen for technical validation via Infinium and indexed using a signature of 13 promoter-associated CGI. Two candidate regions that showed <0.1 difference in beta-value and p>0.05 between groups in the technical validation Infinium dataset were discarded. An additional region that showed a negative difference in beta-value was also discarded. (A) Methylation was compared between CGI-H (n=6) and CGI-0 tumors (n=5) across 16 promoter-associated CGI signature candidates using MethylCap-seq. Relative methylation was compared between regions by normalizing to the region average, then applying a log2 transformation. (B) Methylation differences seen in (A) were technically validated using the Infinium HumanMethylation 450 platform. Tumors were indexed using the average beta-value of all probes in each region (total of 88 probes), and relative methylation was compared as in (A). Regions (columns) 53

Epigenetics. Lyle Armstrong. UJ Taylor & Francis Group. f'ci Garland Science NEW YORK AND LONDON

Epigenetics. Lyle Armstrong. UJ Taylor & Francis Group. f'ci Garland Science NEW YORK AND LONDON ... Epigenetics Lyle Armstrong f'ci Garland Science UJ Taylor & Francis Group NEW YORK AND LONDON Contents CHAPTER 1 INTRODUCTION TO 3.2 CHROMATIN ARCHITECTURE 21 THE STUDY OF EPIGENETICS 1.1 THE CORE

More information

EPIGENOMICS PROFILING SERVICES

EPIGENOMICS PROFILING SERVICES EPIGENOMICS PROFILING SERVICES Chromatin analysis DNA methylation analysis RNA-seq analysis Diagenode helps you uncover the mysteries of epigenetics PAGE 3 Integrative epigenomics analysis DNA methylation

More information

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Philipp Bucher Wednesday January 21, 2009 SIB graduate school course EPFL, Lausanne ChIP-seq against histone variants: Biological

More information

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014 Not IN Our Genes - A Different Kind of Inheritance! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014 Epigenetics in Mainstream Media Epigenetics *Current definition:

More information

Jayanti Tokas 1, Puneet Tokas 2, Shailini Jain 3 and Hariom Yadav 3

Jayanti Tokas 1, Puneet Tokas 2, Shailini Jain 3 and Hariom Yadav 3 Jayanti Tokas 1, Puneet Tokas 2, Shailini Jain 3 and Hariom Yadav 3 1 Department of Biotechnology, JMIT, Radaur, Haryana, India 2 KITM, Kurukshetra, Haryana, India 3 NIDDK, National Institute of Health,

More information

Epigenetics Armstrong_Prelims.indd 1 04/11/2013 3:28 pm

Epigenetics Armstrong_Prelims.indd 1 04/11/2013 3:28 pm Epigenetics Epigenetics Lyle Armstrong vi Online resources Accessible from www.garlandscience.com, the Student and Instructor Resource Websites provide learning and teaching tools created for Epigenetics.

More information

Genetics and Genomics in Medicine Chapter 6 Questions

Genetics and Genomics in Medicine Chapter 6 Questions Genetics and Genomics in Medicine Chapter 6 Questions Multiple Choice Questions Question 6.1 With respect to the interconversion between open and condensed chromatin shown below: Which of the directions

More information

R. Piazza (MD, PhD), Dept. of Medicine and Surgery, University of Milano-Bicocca EPIGENETICS

R. Piazza (MD, PhD), Dept. of Medicine and Surgery, University of Milano-Bicocca EPIGENETICS R. Piazza (MD, PhD), Dept. of Medicine and Surgery, University of Milano-Bicocca EPIGENETICS EPIGENETICS THE STUDY OF CHANGES IN GENE EXPRESSION THAT ARE POTENTIALLY HERITABLE AND THAT DO NOT ENTAIL A

More information

FOLLICULAR LYMPHOMA- ILLUMINA METHYLATION. Jude Fitzgibbon

FOLLICULAR LYMPHOMA- ILLUMINA METHYLATION. Jude Fitzgibbon FOLLICULAR LYMPHOMA- ILLUMINA METHYLATION Jude Fitzgibbon j.fitzgibbon@qmul.ac.uk Molecular Predictors of Clinical outcome Responders Alive Non-responders Dead TUMOUR GENETICS Targeted therapy, Prognostic

More information

SUPPLEMENTARY FIGURES: Supplementary Figure 1

SUPPLEMENTARY FIGURES: Supplementary Figure 1 SUPPLEMENTARY FIGURES: Supplementary Figure 1 Supplementary Figure 1. Glioblastoma 5hmC quantified by paired BS and oxbs treated DNA hybridized to Infinium DNA methylation arrays. Workflow depicts analytic

More information

Gene Expression DNA RNA. Protein. Metabolites, stress, environment

Gene Expression DNA RNA. Protein. Metabolites, stress, environment Gene Expression DNA RNA Protein Metabolites, stress, environment 1 EPIGENETICS The study of alterations in gene function that cannot be explained by changes in DNA sequence. Epigenetic gene regulatory

More information

Dominic J Smiraglia, PhD Department of Cancer Genetics. DNA methylation in prostate cancer

Dominic J Smiraglia, PhD Department of Cancer Genetics. DNA methylation in prostate cancer Dominic J Smiraglia, PhD Department of Cancer Genetics DNA methylation in prostate cancer Overarching theme Epigenetic regulation allows the genome to be responsive to the environment Sets the tone for

More information

Histones modifications and variants

Histones modifications and variants Histones modifications and variants Dr. Institute of Molecular Biology, Johannes Gutenberg University, Mainz www.imb.de Lecture Objectives 1. Chromatin structure and function Chromatin and cell state Nucleosome

More information

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression Chromatin Array of nuc 1 Transcriptional control in Eukaryotes: Chromatin undergoes structural changes

More information

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information

ChIP-seq data analysis

ChIP-seq data analysis ChIP-seq data analysis Harri Lähdesmäki Department of Computer Science Aalto University November 24, 2017 Contents Background ChIP-seq protocol ChIP-seq data analysis Transcriptional regulation Transcriptional

More information

Are you the way you are because of the

Are you the way you are because of the EPIGENETICS Are you the way you are because of the It s my fault!! Nurture Genes you inherited from your parents? Nature Experiences during your life? Similar DNA Asthma, Autism, TWINS Bipolar Disorders

More information

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Accessing and Using ENCODE Data Dr. Peggy J. Farnham 1 William M Keck Professor of Biochemistry Keck School of Medicine University of Southern California How many human genes are encoded in our 3x10 9 bp? C. elegans (worm) 959 cells and 1x10 8 bp 20,000

More information

Back to the Basics: Methyl-Seq 101

Back to the Basics: Methyl-Seq 101 Back to the Basics: Methyl-Seq 101 Presented By: Alex Siebold, Ph.D. October 9, 2013 Field Applications Scientist Agilent Technologies Life Sciences & Diagnostics Group Life Sciences & Diagnostics Group

More information

Allelic reprogramming of the histone modification H3K4me3 in early mammalian development

Allelic reprogramming of the histone modification H3K4me3 in early mammalian development Allelic reprogramming of the histone modification H3K4me3 in early mammalian development 张戈 Method and material STAR ChIP seq (small-scale TELP-assisted rapid ChIP seq) 200 mouse embryonic stem cells PWK/PhJ

More information

Stem Cell Epigenetics

Stem Cell Epigenetics Stem Cell Epigenetics Philippe Collas University of Oslo Institute of Basic Medical Sciences Norwegian Center for Stem Cell Research www.collaslab.com Source of stem cells in the body Somatic ( adult )

More information

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Gordon Blackshields Senior Bioinformatician Source BioScience 1 To Cancer Genetics Studies

More information

Eukaryotic transcription (III)

Eukaryotic transcription (III) Eukaryotic transcription (III) 1. Chromosome and chromatin structure Chromatin, chromatid, and chromosome chromatin Genomes exist as chromatins before or after cell division (interphase) but as chromatids

More information

What are the determinants of DNA demethylation following treatment of AML cell lines and patient samples with decitabine?

What are the determinants of DNA demethylation following treatment of AML cell lines and patient samples with decitabine? What are the determinants of DNA demethylation following treatment of AML cell lines and patient samples with decitabine? by Robert John Hollows This project is submitted in partial fulfilment of the requirements

More information

Biochemical Determinants Governing Redox Regulated Changes in Gene Expression and Chromatin Structure

Biochemical Determinants Governing Redox Regulated Changes in Gene Expression and Chromatin Structure Biochemical Determinants Governing Redox Regulated Changes in Gene Expression and Chromatin Structure Frederick E. Domann, Ph.D. Associate Professor of Radiation Oncology The University of Iowa Iowa City,

More information

Nature Structural & Molecular Biology: doi: /nsmb.2419

Nature Structural & Molecular Biology: doi: /nsmb.2419 Supplementary Figure 1 Mapped sequence reads and nucleosome occupancies. (a) Distribution of sequencing reads on the mouse reference genome for chromosome 14 as an example. The number of reads in a 1 Mb

More information

mirna Dr. S Hosseini-Asl

mirna Dr. S Hosseini-Asl mirna Dr. S Hosseini-Asl 1 2 MicroRNAs (mirnas) are small noncoding RNAs which enhance the cleavage or translational repression of specific mrna with recognition site(s) in the 3 - untranslated region

More information

The Epigenome Tools 2: ChIP-Seq and Data Analysis

The Epigenome Tools 2: ChIP-Seq and Data Analysis The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang zang@virginia.edu http://zanglab.com PHS5705: Public Health Genomics March 20, 2017 1 Outline Epigenome: basics review ChIP-seq overview

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Pan-cancer analysis of global and local DNA methylation variation a) Variations in global DNA methylation are shown as measured by averaging the genome-wide

More information

Comparative DNA methylome analysis of endometrial carcinoma reveals complex and distinct deregulation of cancer promoters and enhancers

Comparative DNA methylome analysis of endometrial carcinoma reveals complex and distinct deregulation of cancer promoters and enhancers Zhang et al. BMC Genomics 2014, 15:868 RESEARCH ARTICLE Open Access Comparative DNA methylome analysis of endometrial carcinoma reveals complex and distinct deregulation of cancer promoters and enhancers

More information

Tissue of origin determines cancer-associated CpG island promoter hypermethylation patterns

Tissue of origin determines cancer-associated CpG island promoter hypermethylation patterns RESEARCH Open Access Tissue of origin determines cancer-associated CpG island promoter hypermethylation patterns Duncan Sproul 1,2, Robert R Kitchen 1,3, Colm E Nestor 1,2, J Michael Dixon 1, Andrew H

More information

RNA-seq Introduction

RNA-seq Introduction RNA-seq Introduction DNA is the same in all cells but which RNAs that is present is different in all cells There is a wide variety of different functional RNAs Which RNAs (and sometimes then translated

More information

Analysis of shotgun bisulfite sequencing of cancer samples

Analysis of shotgun bisulfite sequencing of cancer samples Analysis of shotgun bisulfite sequencing of cancer samples Kasper Daniel Hansen Postdoc with Rafael Irizarry Johns Hopkins Bloomberg School of Public Health Brixen, July 1st, 2011 The

More information

September 20, Submitted electronically to: Cc: To Whom It May Concern:

September 20, Submitted electronically to: Cc: To Whom It May Concern: History Study (NOT-HL-12-147), p. 1 September 20, 2012 Re: Request for Information (RFI): Building a National Resource to Study Myelodysplastic Syndromes (MDS) The MDS Cohort Natural History Study (NOT-HL-12-147).

More information

Session 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016

Session 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016 Session 6: Integration of epigenetic data Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016 Utilizing complimentary datasets Frequent mutations in chromatin regulators

More information

Eukaryotic Gene Regulation

Eukaryotic Gene Regulation Eukaryotic Gene Regulation Chapter 19: Control of Eukaryotic Genome The BIG Questions How are genes turned on & off in eukaryotes? How do cells with the same genes differentiate to perform completely different,

More information

Transcript-indexed ATAC-seq for immune profiling

Transcript-indexed ATAC-seq for immune profiling Transcript-indexed ATAC-seq for immune profiling Technical Journal Club 22 nd of May 2018 Christina Müller Nature Methods, Vol.10 No.12, 2013 Nature Biotechnology, Vol.32 No.7, 2014 Nature Medicine, Vol.24,

More information

Epigenetics DNA methylation. Biosciences 741: Genomics Fall, 2013 Week 13. DNA Methylation

Epigenetics DNA methylation. Biosciences 741: Genomics Fall, 2013 Week 13. DNA Methylation Epigenetics DNA methylation Biosciences 741: Genomics Fall, 2013 Week 13 DNA Methylation Most methylated cytosines are found in the dinucleotide sequence CG, denoted mcpg. The restriction enzyme HpaII

More information

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012 Elucidating Transcriptional Regulation at Multiple Scales Using High-Throughput Sequencing, Data Integration, and Computational Methods Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder

More information

Regulation of Gene Expression in Eukaryotes

Regulation of Gene Expression in Eukaryotes Ch. 19 Regulation of Gene Expression in Eukaryotes BIOL 222 Differential Gene Expression in Eukaryotes Signal Cells in a multicellular eukaryotic organism genetically identical differential gene expression

More information

Epigenomics. Ivana de la Serna Block Health Science

Epigenomics. Ivana de la Serna Block Health Science Epigenomics Ivana de la Serna Block Health Science 388 383-4111 ivana.delaserna@utoledo.edu Outline 1. Epigenetics-definition and overview 2. DNA methylation/hydroxymethylation 3. Histone modifications

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality. Supplementary Figure 1 Assessment of sample purity and quality. (a) Hematoxylin and eosin staining of formaldehyde-fixed, paraffin-embedded sections from a human testis biopsy collected concurrently with

More information

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing

More information

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. Supplementary Figure 1 Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. (a) Pearson correlation heatmap among open chromatin profiles of different

More information

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes Kaifu Chen 1,2,3,4,5,10, Zhong Chen 6,10, Dayong Wu 6, Lili Zhang 7, Xueqiu Lin 1,2,8,

More information

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types Epigenome Informatics Workshop Bioinformatics Research Laboratory 1 Introduction Active or inactive states of transcription factor

More information

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017 Epigenetics Jenny van Dongen Vrije Universiteit (VU) Amsterdam j.van.dongen@vu.nl Boulder, Friday march 10, 2017 Epigenetics Epigenetics= The study of molecular mechanisms that influence the activity of

More information

SUPPLEMENTAL INFORMATION

SUPPLEMENTAL INFORMATION SUPPLEMENTAL INFORMATION GO term analysis of differentially methylated SUMIs. GO term analysis of the 458 SUMIs with the largest differential methylation between human and chimp shows that they are more

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature10866 a b 1 2 3 4 5 6 7 Match No Match 1 2 3 4 5 6 7 Turcan et al. Supplementary Fig.1 Concepts mapping H3K27 targets in EF CBX8 targets in EF H3K27 targets in ES SUZ12 targets in ES

More information

Peak-calling for ChIP-seq and ATAC-seq

Peak-calling for ChIP-seq and ATAC-seq Peak-calling for ChIP-seq and ATAC-seq Shamith Samarajiwa CRUK Autumn School in Bioinformatics 2017 University of Cambridge Overview Peak-calling: identify enriched (signal) regions in ChIP-seq or ATAC-seq

More information

Integrated analysis of sequencing data

Integrated analysis of sequencing data Integrated analysis of sequencing data How to combine *-seq data M. Defrance, M. Thomas-Chollier, C. Herrmann, D. Puthier, J. van Helden *ChIP-seq, RNA-seq, MeDIP-seq, Transcription factor binding ChIP-seq

More information

Epigenetic programming in chronic lymphocytic leukemia

Epigenetic programming in chronic lymphocytic leukemia Epigenetic programming in chronic lymphocytic leukemia Christopher Oakes 10 th Canadian CLL Research Meeting September 18-19 th, 2014 Epigenetics and DNA methylation programming in normal and tumor cells:

More information

DNA methylation & demethylation

DNA methylation & demethylation DNA methylation & demethylation Lars Schomacher (Group Christof Niehrs) What is Epigenetics? Epigenetics is the study of heritable changes in gene expression (active versus inactive genes) that do not

More information

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Department of Biomedical Informatics Department of Computer Science and Engineering The Ohio State University Review

More information

Supplementary Figure S1. Appearacne of new acetyl groups in acetylated lysines using 2,3-13 C 6 pyruvate as a tracer instead of labeled glucose.

Supplementary Figure S1. Appearacne of new acetyl groups in acetylated lysines using 2,3-13 C 6 pyruvate as a tracer instead of labeled glucose. Supplementary Figure S1. Appearacne of new acetyl groups in acetylated lysines using 2,3-13 C 6 pyruvate as a tracer instead of labeled glucose. (a) Relative levels of both new acetylation and all acetylation

More information

Molecular Cell Biology. Prof. D. Karunagaran. Department of Biotechnology. Indian Institute of Technology Madras

Molecular Cell Biology. Prof. D. Karunagaran. Department of Biotechnology. Indian Institute of Technology Madras Molecular Cell Biology Prof. D. Karunagaran Department of Biotechnology Indian Institute of Technology Madras Module-9 Molecular Basis of Cancer, Oncogenes and Tumor Suppressor Genes Lecture 6 Epigenetics

More information

Mechanisms of alternative splicing regulation

Mechanisms of alternative splicing regulation Mechanisms of alternative splicing regulation The number of mechanisms that are known to be involved in splicing regulation approximates the number of splicing decisions that have been analyzed in detail.

More information

PRC2 crystal clear. Matthieu Schapira

PRC2 crystal clear. Matthieu Schapira PRC2 crystal clear Matthieu Schapira Epigenetic mechanisms control the combination of genes that are switched on and off in any given cell. In turn, this combination, called the transcriptional program,

More information

Nature Immunology: doi: /ni Supplementary Figure 1. DNA-methylation machinery is essential for silencing of Cd4 in cytotoxic T cells.

Nature Immunology: doi: /ni Supplementary Figure 1. DNA-methylation machinery is essential for silencing of Cd4 in cytotoxic T cells. Supplementary Figure 1 DNA-methylation machinery is essential for silencing of Cd4 in cytotoxic T cells. (a) Scheme for the retroviral shrna screen. (b) Histogram showing CD4 expression (MFI) in WT cytotoxic

More information

Ch. 18 Regulation of Gene Expression

Ch. 18 Regulation of Gene Expression Ch. 18 Regulation of Gene Expression 1 Human genome has around 23,688 genes (Scientific American 2/2006) Essential Questions: How is transcription regulated? How are genes expressed? 2 Bacteria regulate

More information

Results. Abstract. Introduc4on. Conclusions. Methods. Funding

Results. Abstract. Introduc4on. Conclusions. Methods. Funding . expression that plays a role in many cellular processes affecting a variety of traits. In this study DNA methylation was assessed in neuronal tissue from three pigs (frontal lobe) and one great tit (whole

More information

Chromatin-Based Regulation of Gene Expression

Chromatin-Based Regulation of Gene Expression Chromatin-Based Regulation of Gene Expression.George J. Quellhorst, Jr., PhD.Associate Director, R&D.Biological Content Development Topics to be Discussed Importance of Chromatin-Based Regulation Mechanism

More information

Epigenetics: The Future of Psychology & Neuroscience. Richard E. Brown Psychology Department Dalhousie University Halifax, NS, B3H 4J1

Epigenetics: The Future of Psychology & Neuroscience. Richard E. Brown Psychology Department Dalhousie University Halifax, NS, B3H 4J1 Epigenetics: The Future of Psychology & Neuroscience Richard E. Brown Psychology Department Dalhousie University Halifax, NS, B3H 4J1 Nature versus Nurture Despite the belief that the Nature vs. Nurture

More information

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region methylation in relation to PSS and fetal coupling. A, PSS values for participants whose placentas showed low,

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc. Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets

More information

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles ChromHMM Tutorial Jason Ernst Assistant Professor University of California, Los Angeles Talk Outline Chromatin states analysis and ChromHMM Accessing chromatin state annotations for ENCODE2 and Roadmap

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Heatmap of GO terms for differentially expressed genes. The terms were hierarchically clustered using the GO term enrichment beta. Darker red, higher positive

More information

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory Computational aspects of ChIP-seq John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory ChIP-seq Using highthroughput sequencing to investigate DNA

More information

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Introduction RNA splicing is a critical step in eukaryotic gene

More information

The Insulator Binding Protein CTCF Positions 20 Nucleosomes around Its Binding Sites across the Human Genome

The Insulator Binding Protein CTCF Positions 20 Nucleosomes around Its Binding Sites across the Human Genome The Insulator Binding Protein CTCF Positions 20 Nucleosomes around Its Binding Sites across the Human Genome Yutao Fu 1, Manisha Sinha 2,3, Craig L. Peterson 3, Zhiping Weng 1,4,5 * 1 Bioinformatics Program,

More information

The silence of the genes: clinical applications of (colorectal) cancer epigenetics

The silence of the genes: clinical applications of (colorectal) cancer epigenetics The silence of the genes: clinical applications of (colorectal) cancer epigenetics Manon van Engeland, PhD Dept. of Pathology GROW - School for Oncology & Developmental Biology Maastricht University Medical

More information

DiffVar: a new method for detecting differential variability with application to methylation in cancer and aging

DiffVar: a new method for detecting differential variability with application to methylation in cancer and aging Genome Biology This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formatted PDF and full text (HTML) versions will be made available soon. DiffVar: a new method for detecting

More information

EpiQuik Circulating Acetyl Histone H3K18 ELISA Kit (Colorimetric)

EpiQuik Circulating Acetyl Histone H3K18 ELISA Kit (Colorimetric) EpiQuik Circulating Acetyl Histone H3K18 ELISA Kit (Colorimetric) Base Catalog # PLEASE READ THIS ENTIRE USER GUIDE BEFORE USE Uses: The EpiQuik Circulating Acetyl Histone H3K18 ELISA Kit (Colorimetric)

More information

DNA methylation: a potential clinical biomarker for the detection of human cancers

DNA methylation: a potential clinical biomarker for the detection of human cancers DNA methylation: a potential clinical biomarker for the detection of human cancers Name: Tong Samuel Supervisor: Zigui CHEN Date: 1 st December 2016 Department: Microbiology Source: cited from Jakubowski,

More information

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016 Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf Rachel Schwope EPIGEN May 24-27, 2016 What does H3K4 methylation do? Plant of interest: Vitis vinifera Culturally important

More information

Solving Problems of Clustering and Classification of Cancer Diseases Based on DNA Methylation Data 1,2

Solving Problems of Clustering and Classification of Cancer Diseases Based on DNA Methylation Data 1,2 APPLIED PROBLEMS Solving Problems of Clustering and Classification of Cancer Diseases Based on DNA Methylation Data 1,2 A. N. Polovinkin a, I. B. Krylov a, P. N. Druzhkov a, M. V. Ivanchenko a, I. B. Meyerov

More information

Experimental Design For Microarray Experiments. Robert Gentleman, Denise Scholtens Arden Miller, Sandrine Dudoit

Experimental Design For Microarray Experiments. Robert Gentleman, Denise Scholtens Arden Miller, Sandrine Dudoit Experimental Design For Microarray Experiments Robert Gentleman, Denise Scholtens Arden Miller, Sandrine Dudoit Copyright 2002 Complexity of Genomic data the functioning of cells is a complex and highly

More information

An epigenetic approach to understanding (and predicting?) environmental effects on gene expression

An epigenetic approach to understanding (and predicting?) environmental effects on gene expression www.collaslab.com An epigenetic approach to understanding (and predicting?) environmental effects on gene expression Philippe Collas University of Oslo Institute of Basic Medical Sciences Stem Cell Epigenetics

More information

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies Supplementary note: Comparison of deletion variants identified in this study and four earlier studies Here we compare the results of this study to potentially overlapping results from four earlier studies

More information

Expert-guided Visual Exploration (EVE) for patient stratification. Hamid Bolouri, Lue-Ping Zhao, Eric C. Holland

Expert-guided Visual Exploration (EVE) for patient stratification. Hamid Bolouri, Lue-Ping Zhao, Eric C. Holland Expert-guided Visual Exploration (EVE) for patient stratification Hamid Bolouri, Lue-Ping Zhao, Eric C. Holland Oncoscape.sttrcancer.org Paul Lisa Ken Jenny Desert Eric The challenge Given - patient clinical

More information

2009 LANDES BIOSCIENCE. DO NOT DISTRIBUTE.

2009 LANDES BIOSCIENCE. DO NOT DISTRIBUTE. [Epigenetics 4:2, 1-6; 16 February 2009]; 2009 Landes Bioscience Research Paper Determining the conservation of DNA methylation in Arabidopsis This manuscript has been published online, prior to printing.once

More information

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data Bioinformatics methods, models and applications to disease Alex Essebier ChIP-seq experiment To

More information

Dynamic reprogramming of DNA methylation in SETD2- deregulated renal cell carcinoma

Dynamic reprogramming of DNA methylation in SETD2- deregulated renal cell carcinoma /, Vol. 7, No. 2 Dynamic reprogramming of DNA methylation in SETD2- deregulated renal cell carcinoma Rochelle L. Tiedemann 1, Ryan A. Hlady 2, Paul D. Hanavan 3, Douglas F. Lake 3, Raoul Tibes 4,5, Jeong-Heon

More information

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells. Supplementary Figure 1 Characteristics of SEs in T reg and T conv cells. (a) Patterns of indicated transcription factor-binding at SEs and surrounding regions in T reg and T conv cells. Average normalized

More information

Epigenetic Mechanisms

Epigenetic Mechanisms RCPA Lecture Epigenetic chanisms Jeff Craig Early Life Epigenetics Group, MCRI Dept. of Paediatrics Overview What is epigenetics? Chromatin The epigenetic code What is epigenetics? the interactions of

More information

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD,

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD, Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD, Crawford, GE, Ren, B (2007) Distinct and predictive chromatin

More information

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK CHAPTER 6 DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK Genetic research aimed at the identification of new breast cancer susceptibility genes is at an interesting crossroad. On the one hand, the existence

More information

Overview: Conducting the Genetic Orchestra Prokaryotes and eukaryotes alter gene expression in response to their changing environment

Overview: Conducting the Genetic Orchestra Prokaryotes and eukaryotes alter gene expression in response to their changing environment Overview: Conducting the Genetic Orchestra Prokaryotes and eukaryotes alter gene expression in response to their changing environment In multicellular eukaryotes, gene expression regulates development

More information

Cancer and Gene Regulation

Cancer and Gene Regulation OpenStax-CNX module: m44548 1 Cancer and Gene Regulation OpenStax College This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 By the end of this section,

More information

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing PacBio Americas User Group Meeting Sample Prep Workshop June.27.2017 Tyson Clark, Ph.D. For Research Use Only. Not

More information

Introduction to Genetics

Introduction to Genetics Introduction to Genetics Table of contents Chromosome DNA Protein synthesis Mutation Genetic disorder Relationship between genes and cancer Genetic testing Technical concern 2 All living organisms consist

More information

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Supplementary Figure 1 7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Regions targeted by the Even and Odd ChIRP probes mapped to a secondary structure model 56 of the

More information

Epigenetic Biomarkers of Breast Cancer Risk: Across the Breast Cancer Prevention Continuum

Epigenetic Biomarkers of Breast Cancer Risk: Across the Breast Cancer Prevention Continuum Epigenetic Biomarkers of Breast Cancer Risk: Across the Breast Cancer Prevention Continuum Mary Beth Terry, Jasmine A. McDonald, Hui Chen Wu, Sybil Eng and Regina M. Santella Abstract Epigenetic biomarkers,

More information

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction Optimization strategy of Copy Number Variant calling using Multiplicom solutions Michael Vyverman, PhD; Laura Standaert, PhD and Wouter Bossuyt, PhD Abstract Copy number variations (CNVs) represent a significant

More information

Imprinting. Joyce Ohm Cancer Genetics and Genomics CGP-L2-319 x8821

Imprinting. Joyce Ohm Cancer Genetics and Genomics CGP-L2-319 x8821 Imprinting Joyce Ohm Cancer Genetics and Genomics CGP-L2-319 x8821 Learning Objectives 1. To understand the basic concepts of genomic imprinting Genomic imprinting is an epigenetic phenomenon that causes

More information

Where Splicing Joins Chromatin And Transcription. 9/11/2012 Dario Balestra

Where Splicing Joins Chromatin And Transcription. 9/11/2012 Dario Balestra Where Splicing Joins Chromatin And Transcription 9/11/2012 Dario Balestra Splicing process overview Splicing process overview Sequence context RNA secondary structure Tissue-specific Proteins Development

More information

Introduction. Introduction

Introduction. Introduction Introduction We are leveraging genome sequencing data from The Cancer Genome Atlas (TCGA) to more accurately define mutated and stable genes and dysregulated metabolic pathways in solid tumors. These efforts

More information

Evaluating Classifiers for Disease Gene Discovery

Evaluating Classifiers for Disease Gene Discovery Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics

More information

Fragile X Syndrome. Genetics, Epigenetics & the Role of Unprogrammed Events in the expression of a Phenotype

Fragile X Syndrome. Genetics, Epigenetics & the Role of Unprogrammed Events in the expression of a Phenotype Fragile X Syndrome Genetics, Epigenetics & the Role of Unprogrammed Events in the expression of a Phenotype A loss of function of the FMR-1 gene results in severe learning problems, intellectual disability

More information