Search for DNA methylation biomarkers in the circulating DNA of prostate and colorectal cancer

Similar documents
The silence of the genes: clinical applications of (colorectal) cancer epigenetics

R. Piazza (MD, PhD), Dept. of Medicine and Surgery, University of Milano-Bicocca EPIGENETICS

Genetics and Genomics in Medicine Chapter 6 Questions

Development of Carcinoma Pathways

Epigenetics. Lyle Armstrong. UJ Taylor & Francis Group. f'ci Garland Science NEW YORK AND LONDON

Gene Expression DNA RNA. Protein. Metabolites, stress, environment

Introduction to Genetics

Multistep nature of cancer development. Cancer genes

mirna Dr. S Hosseini-Asl

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Biochemistry of Cancer and Tumor Markers

Tumor suppressor genes D R. S H O S S E I N I - A S L

Imprinting. Joyce Ohm Cancer Genetics and Genomics CGP-L2-319 x8821

Jayanti Tokas 1, Puneet Tokas 2, Shailini Jain 3 and Hariom Yadav 3

Oncogenes and Tumor Suppressors MCB 5068 November 12, 2013 Jason Weber

Chapter 9. Cells Grow and Reproduce

Epigenetics Armstrong_Prelims.indd 1 04/11/2013 3:28 pm

Neoplasia 2018 lecture 11. Dr H Awad FRCPath

Fragile X Syndrome. Genetics, Epigenetics & the Role of Unprogrammed Events in the expression of a Phenotype

Epigenetics and Chromatin Remodeling

CANCER. Inherited Cancer Syndromes. Affects 25% of US population. Kills 19% of US population (2nd largest killer after heart disease)

Caring for a Patient with Colorectal Cancer. Objectives. Poll question. UNC Cancer Network Presented on 10/15/18. For Educational Use Only 1

oncogenes-and- tumour-suppressor-genes)

TUMOR M ARKERS MARKERS

BIOLOGY OF CANCER. Definition: Cancer. Why is it Important to Understand the Biology of Cancer? Regulation of the Cell Cycle 2/13/2015

Ch. 18 Regulation of Gene Expression

LESSON 3.2 WORKBOOK. How do normal cells become cancer cells? Workbook Lesson 3.2

DNA methylation & demethylation

Clonal evolution of human cancers

Oncology 101. Cancer Basics

Eukaryotic Gene Regulation

Cancer. Questions about cancer. What is cancer? What causes unregulated cell growth? What regulates cell growth? What causes DNA damage?

Exploitation of Epigenetic Changes to Distinguish Benign from Malignant Prostate Biopsies

Are you the way you are because of the

BIT 120. Copy of Cancer/HIV Lecture

Stem Cell Epigenetics

Cell Death and Cancer. SNC 2D Ms. Papaiconomou

Cancer. October is National Breast Cancer Awareness Month

Serrated Polyps and a Classification of Colorectal Cancer

Genetics of Oncology. Ryan Allen Roy MD July 8, 2004 University of Tennessee

Test Bank for Robbins and Cotran Pathologic Basis of Disease 9th Edition by Kumar

Lecture 1: Carcinogenesis

Cancer and Gene Alterations - 1

CELL BIOLOGY - CLUTCH CH CANCER.

Biochemistry of Carcinogenesis. Lecture # 35 Alexander N. Koval

Test Bank for Robbins and Cotran Pathologic Basis of Disease 9th Edition by Kumar

Early Embryonic Development

DNA methylation: a potential clinical biomarker for the detection of human cancers

Update on Exact Sciences Molecular CRC Screening Test. November 16 th, 2011

Chromatin-Based Regulation of Gene Expression

Prostate cancer was the most commonly diagnosed type of cancer among Peel and Ontario male seniors in 2002.

EPIGENOMICS PROFILING SERVICES

Epigenetics: Basic Principals and role in health and disease

Tumor Markers Yesterday, Today & Tomorrow. Steven E. Zimmerman M.D. Vice President & Chief Medical Director

Blood Based Screening

SALSA MS-MLPA KIT ME011-A1 Mismatch Repair genes (MMR) Lot 0609, 0408, 0807, 0407

Dominic J Smiraglia, PhD Department of Cancer Genetics. DNA methylation in prostate cancer

Generating Mouse Models of Pancreatic Cancer

DNA Methylation and Cancer

Colon Cancer and Hereditary Cancer Syndromes

Cancers of unknown primary : Knowing the unknown. Prof. Ahmed Hossain Professor of Medicine SSMC

September 20, Submitted electronically to: Cc: To Whom It May Concern:

Circulating DNA: a new diagnostic gold mine? q

Section D: The Molecular Biology of Cancer

Bihong Zhao, M.D, Ph.D Department of Pathology

609G: Concepts of Cancer Genetics and Treatments (3 credits)

colorectal cancer Colorectal cancer hereditary sporadic Familial 1/12/2018

Cancer Genetics. What is Cancer? Cancer Classification. Medical Genetics. Uncontrolled growth of cells. Not all tumors are cancerous

Overview of Cancer. Mylene Freires Advanced Nurse Practitioner, Haematology

Biochemical Determinants Governing Redox Regulated Changes in Gene Expression and Chromatin Structure

Anatomic Molecular Pathology: An Emerging Field

Determination Differentiation. determinated precursor specialized cell

Aberrant Promoter CpG Methylation is a Mechanism for Lack of Hypoxic Induction of

A class of genes that normally suppress cell proliferation. p53 and Rb..ect. suppressor gene products can release cells. hyperproliferation.

Section D. Genes whose Mutation can lead to Initiation

Regulation of Gene Expression in Eukaryotes

BIO360 Fall 2013 Quiz 1

Validation of QClamp as Next Generation Liquid Biopsy Technique for Colorectal Cancer He James Zhu M.D. Ph.D

A patient s guide to understanding. Cancer. Screening

Index. Note: Page numbers of article titles are in boldface type.

Transformation of Normal HMECs (Human Mammary Epithelial Cells) into Metastatic Breast Cancer Cells: Introduction - The Broad Picture:

Clinical Biochemistry Department City Hospital

Introduction. Cancer Biology. Tumor-suppressor genes. Proto-oncogenes. DNA stability genes. Mechanisms of carcinogenesis.

Asingle inherited mutant gene may be enough to

Joachim Eberle Head of R&D, Roche Centralized Diagnostics

Problem Set 5 KEY

Epigenetic Biomarkers of Breast Cancer Risk: Across the Breast Cancer Prevention Continuum

Cancer and Gene Regulation

Epigenomics. Ivana de la Serna Block Health Science

Analysis of Human DNA in Stool Samples as a Technique for Colorectal Cancer Screening

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

Aberrant DNA methylation of MGMT and hmlh1 genes in prediction of gastric cancer

AllinaHealthSystems 1

DNA Methylation of Tumor Suppressor and Metastasis Suppressor Genes in Circulating Tumor Cells and corresponding Circulating Tumor DNA

Bowel cancer screening and prevention

The Future of Cancer. Lawrence Tsui Global Risk Products Actuary Swiss Reinsurance Company Hong Kong. Session Number: WBR8

of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed.

Molecular biology :- Cancer genetics lecture 11

OverView Circulating Nucleic Acids (CFNA) in Cancer Patients. Dave S.B. Hoon John Wayne Cancer Institute Santa Monica, CA, USA

Epigenetics DNA methylation. Biosciences 741: Genomics Fall, 2013 Week 13. DNA Methylation

Transcription:

Search for DNA methylation biomarkers in the circulating DNA of prostate and colorectal cancer by Mina Park A thesis submitted in conformity with the requirements for the degree of Master of Science Graduate department of Pharmacology and Toxicology University of Toronto Copyright by Mina Park (2012)

Search for DNA methylation biomarkers in the circulating DNA of prostate and colorectal cancer Mina Park Master of Science, 2012 Graduate Department of Pharmacology and Toxicology University of Toronto ABSTRACT Early diagnosis represents an effective way to improve patient prognosis in cancer. New opportunities for cancer diagnosis and screening may arise from identification of cancer-specific epigenetic alterations in the cell-free circulating DNA (cirdna). This study investigated biomarkers at the level of DNA methylation in the plasma cirdna of individuals affected with prostate cancer or colorectal cancer. A methylation-sensitive restriction enzyme-based method was used to enrich methylated DNA fractions, which were interrogated on CpG island and human genome tiling microarrays. A number of genes and non-coding loci exhibited differential methylation between prostate cancer patients and controls. The candidate loci identified from these microarray experiments underwent verification by bisulfite modification coupled with pyrosequencing. Our results suggest that microarray-based studies of DNA methylation in the cirdna can be a promising avenue for the identification of epigenetic biomarkers in cancer. ii

ACKNOWLEDGEMENTS I would like to thank my supervisor, Dr. Art Petronis, for giving me the opportunity to work under his supervision for the past few years. The support and academic training I have received have been invaluable. My gratitude goes to Dr. Rene Cortese, to whom I am indebted for his wonderful guidance and mentorship throughout my degree. I would also like to thank all the other members of the Krembil Epigenetics Laboratory. Their assistance, encouragement, and camaraderie, both in and out of lab, have been instrumental in the completion of this degree. My thanks also goes to my advisor, Dr. Albert Wong, for his advice throughout my program. Finally, I would like to my family and friends for their unrelenting support over the years. I am and will ever remain grateful, for it has truly made all the difference. iii

TABLE OF CONTENTS Title Abstract Acknowledgements Table of contents List of tables List of figures Abbreviations i ii iii iv vii viii ix 1.0 Introduction 1.1 Overview of the problem 1 1.2 Epigenetics 2 1.2.1 DNA methylation 2 1.2.2 DNA methylation and transcriptional repression 3 1.3 DNA methylation changes in cancer 4 1.3.1 Global genomic hypomethylation 5 1.3.2 Single-locus DNA hypomethylation 5 1.3.3 DNA hypermethylation 6 1.3.4 CpG island methylator phenotype 7 1.4 Cancer biomarkers 7 1.4.1 Sensitivity and specificity 8 1.5 Cell-free circulating DNA 9 1.5.1 Origins of circulating DNA 9 1.5.2 Mechanisms for DNA release into the circulation 10 1.5.3 Circulating DNA in cancer 11 1.5.4 Circulating DNA and cancer biomarkers 11 1.6 Studies of DNA methylation in the circulating DNA of cancer patients 13 1.6.1 Need for large scale studies of DNA methylation for identification of cancer biomarkers in the circulating DNA 15 1.7 Prostate cancer 15 1.7.1 Prostate specific antigen 16 iv

1.7.2 Studies of DNA methylation in the circulating DNA of prostate cancer patients 17 1.8 Colorectal cancer 18 1.8.1 Screening modalities used in colorectal cancer 18 1.8.2 Studies of DNA methylation in the circulating DNA of colorectal cancer patients 20 1.9 Research objectives 20 2.0 Materials and methods 2.1 Samples 22 2.1.1 Prostate cancer study 22 2.1.2 Colorectal cancer study 22 2.2 DNA extraction 23 2.2.1 Prostate cancer study 23 2.2.2 Colorectal cancer study 23 2.3 DNA methylation detection 23 2.3.1 Principle of DNA methylation detection 23 2.3.2 DNA blunting 24 2.3.3 Adaptor ligation 24 2.3.4 DNA methylation-sensitive enzyme digest 25 2.3.5 Adaptor-mediated PCR 25 2.4 Microarray experiments and data analysis 27 2.4.1 Prostate cancer study 27 2.4.1.1 Microarrays 27 2.4.1.2 Microarray data analysis 27 2.4.2 Colorectal cancer study 28 2.4.2.1 Microarrays 28 2.4.2.2 Microarray data analysis 28 2.5 Fine mapping of individual CpG locations 29 2.5.1 Principle of fine mapping 29 2.5.2 Bisulfite treatment and whole bisulfitome amplification 29 2.5.3 Nested PCR 30 2.5.4 Pyrosequencing 32 3.0 Results 3.1 Microarray methylation analysis in the circulating DNA of prostate cancer 33 v

3.2 Verification of microarray findings by fine mapping of cytosines on selected genes 38 3.2.1 Genes showing statistically significant differential methylation by pyrosequencing 39 3.2.2 Concordance of microarray and pyrosequencing data 42 3.2.3 Predictive value of differential circulating DNA methylation in RNF219 42 3.3 Microarray methylation analysis in the circulating DNA of colorectal cancer 44 3.3.1 Potential candidate loci in the circulating DNA of colorectal cancer 49 4.0 Discussion 4.1 DNA methylation differences in the plasma circulating DNA of cancer patients 50 4.1.1 Prostate cancer study 51 4.1.2 Colorectal cancer study 52 4.2 Discovery of candidate gene markers 53 4.2.1 Prostate cancer study 54 4.2.1.1 Replication of microarray data by pyrosequencing 54 4.2.1.2 Performance characteristics of RNF219 55 4.2.2 Colorectal cancer study 57 4.3 Future directions 59 5.0 References 61 vi

LIST OF TABLES Table 1. Primers used for amplification of the external locus in nested PCR 31 Table 2. Primers used for amplification of the internal locus in nested PCR 31 Table 3. Loci that were significantly differentially methylated and mapped to repetitive elements, in order of significance 35 Table 4. Loci that were significantly differentially methylated and mapped to unique sequences, in order of significance 37 Table 5. Loci selected for fine mapping of methylated CpG positions 38 Table 6. Top 48 loci located within genes exhibiting differential methylation between colorectal cancer samples and controls 47 Table 7. Top 52 loci located in intergenic regions that exhibit differential methylation between colorectal cancer samples and controls 48 vii

LIST OF FIGURES Figure 1. Principle of DNA methylation detection technology in plasma cirdna 26 Figure 2. Volcano plot of microarray data in prostate cancer and control samples using FDR-adjusted p-value as statistics 34 Figure 3. Pyrosequencing results showing methylation status in CpG sites of RNF219 in prostate cancer and control samples 40 Figure 4. Pyrosequencing results showing methylation status in CpG sites of SIX3 in prostate cancer and control samples 40 Figure 5. Pyrosequencing results showing methylation status in CpG sites of KIAA1539 in prostate cancer and control samples 41 Figure 6. Sample pyrosequencing results showing methylation status of loci in prostate cancer and control samples 41 Figure 7. Differential methylation values obtained from microarrays and pyrosequencing for candidate loci 43 Figure 8. DNA methylation in the RNF219 gene in two independent sample sets 43 Figure 9: Predictive accuracy of cirdna methylation level in the RNF219 gene 44 Figure 10. Volcano plot of microarray data in colorectal cancer and control samples 46 viii

ABBREVIATIONS BPH cac CIMP cirdna DNMT FDR FOBT mc PCR PSA ROC benign prostatic hyperplasia carboxylcytosine CpG island methylator phenotype cell-free circulating DNA DNA methyltransferase false discovery rate fecal occult blood test methylated cytosine polymerase chain reaction prostate specific antigen receiver operator curve ix

1.0 INTRODUCTION 1.1 Overview of the problem Cancer is a group of diseases that share the central characteristic of uncontrolled cellular proliferation. Cancer is a major public health problem. In 2008, there were an estimated 12.7 million new cancer cases, with the risk of dying from cancer before the age of 75 at 11.2%. Cancer is also a leading cause of death, responsible for 7.6 million deaths, accounting for 13% of all deaths that year [1]. Early diagnosis of cancer is one of the best ways to reduce cancer-related mortality. By detecting the tumour at an early stage, the chances for available treatment options to be successful increase dramatically [2-4]. There are a number of cancers for which diagnosis at an early stage of disease is associated with improved survival outcomes. In colorectal cancer, 5-year survival when the cancer is diagnosed while it is localized to the colon is 90.1%, but falls to 11.7% if the cancer is diagnosed after it has metastasized. Similarly, 5-year survival for localized prostate cancer is 100%, but survival for diagnoses of metastatic prostate cancer falls to 28.7% [5]. Traditional cancer diagnosis is based on assessing the morphology of cancer cells [6]. This method is suitable for diagnosing cancer in sites of the body that are easily accessible, such as the cervix or blood. However, in cancers for which cells are not easily accessible, diagnosis by morphological assessment requires tumour biopsies gained by invasive methods [7]. Discovering biomarkers that can detect cancer-specific changes in peripheral and easily accessible tissues is a way to bypass the challenge of lack of access to adequate testing material for cancer diagnosis. 1

1.2 Epigenetics New opportunities for cancer diagnosis and screening may arise from identification of cancer-specific epigenetic alterations. Epigenetics refers to heritable changes in gene expression that are not based on the underlying DNA sequence [8]. In the human genome, the two main epigenetic mechanisms are modifications of histones, which are the main protein components of chromatin, and methylation of the cytosine nucleotide in DNA [9]. Alterations in chromatin structure are mediated through post-translational modifications of histone such as acetylation, methylation, and phosphorylation. These modifications are able to change the conformation of chromatin between an open, transcriptionally active form known as euchromatin and a condensed, transcriptionally inactive form known as heterochromatin [10]. DNA methylation refers to the covalent addition of a methyl group to position 5 of the cytosine pyrimidine ring [11], and it represents a relatively stable and conserved mark, which makes it an appealing option for epigenetic studies. 1.2.1 DNA methylation In humans, DNA methylation occurs primarily in the context of a cytosine followed by a guanine, which is known as a CpG dinucleotide. It is estimated that 1% of cytosine moieties and between 70 80% of CpGs are methylated in humans [12]. Methylated CpGs are located mainly in repetitive genomic regions [13]. In contrast, CpG islands, which are areas that show a high density of CpG sites and are typically associated with active transcription [14], contain largely unmethylated CpGs. Approximately 60% of genes are estimated to be associated with a CpG island in their promoter regions [15]. 2

In addition to methylated cytosine (mc), the existence of additional modifications to cytosine has been recently discovered. The Tet family of enzymes has been found to oxidize mc to hydroxymethylated cystosine [16], and to catalyze these oxidated substrates into formylcytosine and carboxylcytosine (cac) [17]. This conversion is hypothesized to play an important role in demethylation, through the excision of cac by thymine-dna glycosylase to yield unmethylated cytosine [18]. However, because most of the techniques used in epigenetic studies of cancer have not differentiated mc from other forms of modified cytosine [19], the term DNA methylation will be used throughout this work to refer to such modifications. DNA methylation is mediated by a family of enzymes called DNA methyltransferases (DNMTs). There are currently four identified DNMTs that play a role in DNA methylation [11]. DNMT1 is proposed to be the maintenance DNMT, responsible for copying methylation patterns from hemimethylated templates to daughter strands during DNA replication [11]. DNMT3a and 3b are de novo methyltransferases, which set up the methylation patterns early in development [20]. DNMT3L is thought to facilitate de novo methylation, by binding to DNMT3a and 3b and stimulating their activity [21]. 1.2.2 DNA methylation and transcriptional regulation DNA methylation has been associated with transcriptional repression [22], thereby playing an important role in the regulation of gene expression. One mode of repression is for DNA methylation to physically impede the binding of transcription factors [23]. Another mode is for methylated DNA to mediate transcriptional repression by attracting proteins that compact chromatin, disposing it to an inactive heterochromatic state [24]. A family of proteins containing a methyl-binding domain that is involved in this process has been characterized, the most studied 3

of which is methyl CpG binding protein 2 (MeCP2) [25]. It is important to note, however, that the rule of high density of mc and suppression of transcriptional activity applies only to regulatory regions, such as promoters. One group of researchers investigating methylation on the X chromosome found patterns of gene body hypermethylation in the active X chromosome compared to the inactive X chromosome [26]. This finding suggests that DNA methylation in gene bodies is associated with gene expression, in contrast to the repressive effect that DNA methylation in promoter regions has on gene expression. Given its important role in transcriptional regulation, DNA methylation is crucial for proper biological development and functioning. DNA methylation is essential for genomic imprinting [27], X-chromosome inactivation [28], and differentiation and maintenance of cellular identity [29-30]. Aberrant DNA methylation has been implicated in a large spectrum of human diseases, ranging from imprinting disorders such as Beckwith-Wiedemann syndrome and Prader- Willi syndrome [27], to complex diseases, the most studied of which is cancer [31]. 1.3 DNA methylation changes in cancer Abnormal patterns of DNA methylation are one of the most common alterations found in cancer [32-39]. Cancer cells exhibit a global loss of DNA methylation in addition to a gain of methylation in some CpG islands [39]. These alterations provide tumour cells with a growth advantage by elevating their genetic instability and allowing them to accrue progressive changes that support their continued proliferation and metastasis [27]. 4

1.3.1 Global genomic hypomethylation Loss of DNA methylation was the first epigenetic alteration identified in cancer cells [35]. Global genomic hypomethylation is largely due to loss of methylation in repetitive DNA sequences [40], and it has been seen universally across various cancers as well as in some premalignant adenomas [41]. Moreover, the degree of hypomethylation has been associated with disease severity and metastatic potential [35, 40]. There are many functional implications of global DNA hypomethylation as it relates to cancer. By weakening transcriptional repression, DNA hypomethylation can facilitate chromosomal instability, which is another hallmark of tumour cells [27]. Experiments in which methylation was depleted showed that loss of DNA methylation leads to aneuploidy and chromosomal rearrangements [42], which are thought to be primarily due to loss of methylated cytosines in centromeric or pericentric regions [43]. 1.3.2 Single-locus DNA hypomethylation Hypomethylation in coding sequences has also been observed in cancer [41]. A recent study found that CpG islands can be normally methylated in somatic tissues [44], and that the hypomethylation of these islands in cancer can activate nearby genes [45]. This has been found in genes with no known relationship to the disease, such as the growth hormone (GH), α- chorionic gonadatropin (αhcg), and γ-globin (HBG1) in colorectal cancer [46]. However hypomethylation has also been found in genes whose activation contributes to tumorigenesis [45]. There are several examples of genes activated by hypomethylation in cancer, and they include oncogenes such as homeobox proto-oncogene (HOX11) in leukemia [47], v-myc myelocytomatosis viral oncogene homolog (C-MYC) in colorectal cancer [48], and v-ha-ras 5

Harvey rat sarcoma viral oncogene homolog (HRAS) in melanoma [36], as well as nononcogenes such as trefoil factor 1 (ps2) in breast cancer, which is implicated in the control of cell proliferation [49], and carbonic anhydrase 9 (MN/CA9) in renal cell carcinoma [50]. Moreover, hypomethylation in genes can disrupt genomic imprinting through activation of the normally silent allele, and there is a vast array of cancers that exhibit such loss of imprinting [34]. 1.3.3 DNA hypermethylation Hypermethylation of DNA in cancer occurs concomitantly with global genomic hypomethylation. Hypermethylation of DNA frequently occurs in the CpG islands of gene promoters and, in many cases, is associated with transcriptional silencing [40, 51]. It is estimated that an average of 600 out of the approximately 45, 000 CpG islands in the genome are hypermethylated in cancer [52]. Hypermethylation of promoters is an important mechanism for inactivation of tumour suppressor genes [53], and aberrant hypermethylation and downregulation have been observed in genes involved in the cell cycle, DNA repair, cell signaling, chromatin remodeling, transcription, and apoptosis for almost every type of tumour [27]. Studies have found that patterns of CpG hypermethylation occur in a cancer type-specific fashion [52], in both sporadic as well as inherited cancers of the same tumour-type [32]. In these studies, cancer-associated DNA methylation was found to vary with the kind of cancer under investigation [32, 52]. It is suggested that this may be due to different growth selection pressures or individual CpG island susceptibilities in each tumour-type [52]. Promoter hypermethylation in certain CpG islands may confer a selective advantage for the survival of a specific cell type [54]. Hence, the reason for certain genes to be downregulated in one type of cancer versus another is 6

because there are important cellular consequences to lack of expression of that gene that promotes the growth of tumours of a specific tissue [54]. Known hypermethylated genes in different cancers include glutathione S-transferase P (GSTP1) in prostate cancer [55], breast cancer 1, early onset (BRCA1) in breast and ovarian cancers [32, 56], and mutl homolog 1, colon cancer, nonpolyposis type 2 (hmlh1) in gastric, colorectal, and endothelial cancers [37, 57-58]. 1.3.4 CpG island methylator phenotype One theory suggests that there is a CpG island methylator phenotype (CIMP) in human cancers. This theory developed from studies in colorectal cancer which found a subset of cancers that displayed a 3 5 fold increased frequency of aberrant hypermethylation in multiple loci, and this pattern of methylation in a cluster of genes was not seen in the remaining cases [59]. According to this theory, CIMP cancers are biologically unique compared to other cancers, with differences in genetics, histology, pathology, and clinical attributes [39]. However, this is still a very controversial concept with no consensus in the choice of genes that are included in a panel to distinguish CIMP cancers from other types [45]. 1.4 Cancer biomarkers A biomarker is defined as, a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention [60]. Biomarkers can have many clinical applications in disease detection and monitoring. Biomarkers can be used to detect the presence of a disease [60], and an example of a commonly used screening tool is the Pap smear, where abnormal cells may indicate 7

cervical cancer [6]. Prognostic biomarkers predict the natural course of disease in an individual. For instance, the Breast Cancer Profiling, or H/I, test looks at the ratio of expression of homeobox B13 (HOXB13) to interleukin 17 receptor B (IL17RB) in tumour tissue and estimates the probability of disease recurrence in an individual after the original tumour has been resected, with increasing risk associated with higher ratios [61]. Predictive biomarkers are used to assess whether a patient will benefit from a particular treatment based on the characteristics of their disease [60]. They are also used in breast cancer, as patients whose tumours overexpress the v- erb-b2 erythroblastic leukemia viral oncogene homolog 2 (HER2) gene may respond to treatment with trastuzumab, whereas those whose tumours express the estrogen receptor may benefit from treatment with tamoxifen [62]. In addition, biomarkers can be used to measure the treatment effects of a drug on the tumour, and these are called pharmacodynamic biomarkers [63]. 1.4.1 Sensitivity and specificity Potential biomarkers for cancer diagnosis have to distinguish between individuals who do and do not have cancer, and the way to assess the performance of a biomarker is to calculate the proportion of patients whose test results correctly identify those with the disease. The sensitivity of a biomarker refers to the proportion of true positive cases that are correctly identified by the test [64]. In contrast, the specificity of a biomarker is the proportion of true negative cases that are correctly identified by the test [64]. These performance characteristics can be assessed from case-control studies. An example of the performance characteristics for a commonly used biomarker in prostate cancer screening, prostate specific antigen, are 80% sensitivity and 20% specificity [55]. Another example is the Pap test, which is widely used in screening for cervical cancer, and has 58% sensitivity and 69% specificity [65]. 8

1.5 Cell-free circulating DNA Discovering blood-based biomarkers is an appealing option as blood is a minimally invasive and easily accessible specimen [7]. There are several potential biomarker targets in blood that exhibit cancer-related differences including DNA, RNA, and proteins [6]. Of these, DNA biomarkers are especially attractive as DNA is easily stored, and far more stable in comparison to RNA and proteins. Moreover, only small amounts of DNA are required for the analysis as it is possible to amplify the template through PCR. 1.5.1 Origins of circulating DNA Cell-free circulating DNA (cirdna) refers to fragments of extracellular DNA that flow freely in the circulation. CirDNA is believed to originate from dead cells through apoptosis and necrosis [66-69]. Apoptosis refers to a process of programmed cell death involving the action of enzymes called caspases [66]. A major hallmark of apoptosis is internucleosomal cleavage of chromatin, which results in DNA fragments that exhibit a ladder-like pattern of 180 bp [70]. Necrosis is cell death that results from physical or chemical trauma [67]. High molecular weight DNA fragments are expected after necrosis, as it causes nonspecific and incomplete digestion of DNA [71]. The size distribution of DNA extracted from human plasma show fragments of both 180-bp and high molecular weight, suggesting cirdna originates from both cell death processes [69]. Furthermore, experiments that induced apoptosis and necrosis in mice liver cells found that DNA recovered from plasma resulted in a 180 bp ladder pattern and high molecular weight fragments (>10, 000 bp), respectively [69]. DNA in plasma has also been shown to circulate in the form of nucleosomes [66, 69], which is expected after apoptosis, and 9

being bound in protein likely protects DNA from further enzymatic digestion in the bloodstream [70]. 1.5.2 Mechanisms for DNA release into the circulation Though the precise manner in which cells release DNA into the circulation is unknown, some theories regarding this mechanism have been postulated. Cells that die by apoptosis and necrosis are rapidly cleared from the circulation through phagocytosis by macrophages and other cellular scavengers [68, 70]. Macrophages may play a role in the release of DNA from cells that die by necrosis. Cell culture studies have shown that macrophages that engulf necrotic cells release digested DNA into the medium, in contrast to macrophages that engulf apoptotic cells, which do not [67]. During apoptosis, DNA gets fragmented and sequestered within blebs that move to the cell surface and can be released into circulation [72]. In vitro studies have shown that apoptotic cells release DNA spontaneously as they die [66]. These studies suggest that apoptotic cells may release DNA directly into the blood while the release of DNA from necrotic cells is dependent on other cellular factors [66]. Another possibility is that living cells can actively release DNA into the circulation [71]. Cell culture studies in lymphocytes show that they can release DNA into the supernatant in the absence of cell death [73]. Furthermore, it was shown that actively released DNA also displays a ladder-like pattern, suggesting that the ladder-like pattern seen in cirdna may not be due solely to apoptosis [74]. Thus, there are several possible mechanisms by which DNA can enter the bloodstream and other bodily fluids. 10

1.5.3 Circulating DNA in cancer CirDNA levels have been widely reported to be elevated in a number of cancers, including those of the colon [75-76], pancreas [76], prostate [77], breast [75, 78], lung [75, 79], ovary, uterus, and cervix [75]. Reported concentrations of cirdna in plasma range from 0 to > 1,000 ng/ml of blood in cancer patients [77-79], compared to healthy subjects, who have between 0 to 100 ng of cirdna per ml of blood [79]. These values reflect a considerable variation in cirdna concentrations in both groups, which can be partly attributed to the different techniques used to quantify cirdna as well as the different treatments of DNA that were employed by the different studies [80]. Taking an average of multiple studies, cancer patients have 180 ng/ml of cirdna while healthy subjects have 30 ng/ml of cirdna [80]. It is thought that the high rates of cell apoptosis and necrosis in a tumour is related to the greater amounts of DNA that are found in the circulation of cancer patients. One explanation is that as tumour enlarge, they are likely to outgrow their blood supply, leading to hypoxia-induced cell necrosis and apoptosis in large regions of the tumour [68, 80]. This would lead to increased phagocytosis of tumour cells by macrophages and DNA release by apoptosis, corresponding to higher levels of cirdna. Moreover, excessive cell death could lead to reduced clearance of circulating nucleic acids by the liver and kidney [80]. 1.5.4 Circulating DNA and cancer biomarkers CirDNA has been a focus of biomarker research in oncology ever since tumour cells were shown to release their DNA into blood in 1987 [81]. Studies aiming to discover the cellular origins of cirdna in cancer were able to detect alterations in cirdna that matched those of the primary tumour, which led to the conclusion that some of the DNA circulating in blood comes 11

from tumour cells [69, 73, 82-83]. For example, Chen et al were able to detect cancer-specific microsatellite instability in the plasma of small cell lung carcinoma patients [82]. However, there are additional cellular sources of cirdna. Jahr et al found that cirdna derives from both tumour and nontumour cells, with T-cells and endothelial cells making minimal contributions [69]. Moreover, they hypothesized that the nontumour fraction of cirdna originates from tumour cells that lie in the vicinity of the tumour, which get degenerated as the tumour grows [69]. This is consistent with other findings that show cirdna containing cancer-related mutations to represent only a small fraction of the total cirdna detected in plasma [68]. Cancer biomarkers in cirdna have the potential to examine a number of different alterations, including microsatellite instability, mutations, integrity, and methylation [79-80]. Microsatellites are short repetitive nucleotide sequences in the genome that are 1 6 bps long and polymorphic for length [79, 84]. Microsatellite instability is characterized by discrepancies in the number of nucleotide repeats found within microsatellites in tumour versus normal DNA [85], and can be demonstrated as loss of heterozygosity or as a shift after gel separation [79, 86]. Microsatellite instability is a common trait of cancer, and several studies have demonstrated that it is possible to detect microsatellite instability in the cirdna of cancer patients [82, 86-87]. However, studies looking at DNA extracted from the primary tumours and plasma of individuals have shown discrepancies regarding the detection of tumour-related loss of heterozygosity in cirdna, and the overall sensitivity of tests for detecting microsatellite instability in cirdna is very low, at only 0.5% [79]. Detecting tumour-specific gene mutations in cirdna is another opportunity for biomarker development. The most frequently analyzed mutations are in v-ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS) and tumor protein 53 (p53), as they are the most 12

commonly mutated genes in cancer [79]. However, despite the high frequency of these mutations in tumours, assays in cirdna show inconsistent results for patients who test for these alterations [79]; several studies have found tumour-specific mutations in less than 10% of samples [88-91]. One reason for these results may be that the tumour-specific mutation is present infrequently in the cirdna, and is masked by the presence of wild-type DNA [80]. Moreover, mutations in KRAS can be found in the cirdna of patients with non-neoplastic disease, such as chronic pancreatitis [92], as well as in those of healthy controls [93]. Thus, issues of sensitivity and specificity are major drawbacks to this approach [80]. Assessing cirdna integrity has been a more recent advancement in cancer biomarker assay development. CirDNA integrity is measured as the ratio of longer to shorter DNA fragments [94], and studies looking at DNA integrity typically examine the integrity of repeat sequences such as ALU and LINE1 [80]. Greater cirdna integrity has been associated with cancer in a number of studies [78, 94-96]. However, because repeat sequences are interspersed throughout the genome, the ability to specify a cancer type is lost in these assays, though the sensitivity of the test may be enhanced [80]. 1.6 Studies of DNA methylation in the circulating DNA of cancer patients Detecting DNA methylation is a promising avenue for discovering biomarkers in cirdna [80]. Assays for DNA methylation have several advantages. Tumour-specific DNA methylation represents a stable marker that will generally not be lost [79], and there are particular genes that are frequently methylated in certain cancers [80]. Hence, a plethora of studies has been dedicated to detecting cancer-specific DNA methylation markers in circulation. The vast majority of these studies have employed a candidate-gene approach based on existing knowledge of DNA 13

methylation changes in tumour tissues. The candidate gene approach requires a priori knowledge and selection of cancer-related genes, which comes from genes that have already been discovered to be methylated in tumours [80]. Once selected, methylation at these loci can be detected by treatment with sodium bisulfite, which converts unmethylated cytosines, but not methylated cytosines, into uracil (and into thymine by subsequent PCR amplification). The modified DNA can then be analyzed by methylation specific PCR (MSP), which uses primer sets specific for methylated or unmethylated DNA [97], real-time PCR [79], or DNA sequencing [98]. Using these methods, differential methylation of several genes in cirdna has been found in numerous cancer types [79]. For example, one study in breast cancer found methylation of adenomatous polyposis coli (APC) in 29%, of Ras association domain family member 1 (RASSF1A) in 56%, and of death-associated protein kinase 1 (DAPK) in 35% of patients plasma (n=35), and no methylation in any of the genes in the plasma DNA of 20 healthy controls and eight patients with benign breast disease [99]. Another study in lung cancer found methylation of APC in the plasma of 47% of lung cancer patients (n = 89) and in no healthy controls (n = 50) [100]. In ovarian cancer patients, BRCA1 was methylated in 18%, and RASSF1A in 40%, of plasma samples (n = 50), and methylation in these genes was not found in any healthy controls (n = 40) [101]. Similar studies have identified differential cirdna methylation in the plasma of patients with gastrointestinal, renal, hepatocellular, colorectal, prostatic, esophageal, cervical, and bladder cancer [79]. To date, there have been few microarray-based studies of cirdna methylation in cancer, and the ones that have been conducted have been for pancreatic [102], breast [103], colorectal [104], and ovarian cancers [105-106]. The arrays used in these studies were custom designed [107] and were limited to just 56 genes. However, they were able to find different methylation 14

profiles in the plasma cirdna of cancer patients and healthy controls. For instance, using a panel of five genes identified by this method, a sensitivity of 85% and a specificity of 61% was achieved for detecting ovarian cancer [106]. Of these genes, one was previously unreported to be involved in cancer [106]. 1.6.1 Need for large scale studies of DNA methylation for identification of cancer biomarkers in the circulating DNA A shortcoming to the studies that have been conducted to date in the cirdna methylome, which refers to the distribution of methylated cytosines across the entire genome [108], is that they have only tested a small number of selected genes. Moreover, using the candidate gene approach has precluded the discovery of novel biomarkers in cirdna that are currently unknown to be altered in cancer. Hence, it is likely that many informative markers have been missed. In order to find the best biomarkers, it is necessary to use a use an approach which can comprehensively analyze methylation patterns in the cirdna of cancer. This method can be used to identify specific and unknown regions that are differentially methylated in cirdna [109], and the results from such studies can inform the selection of genes that will be investigated for future analysis. 1.7 Prostate cancer Prostate cancer is the most common cancer for men in the developed world [1]. In 2010 in the United States alone, it was responsible for 217,730 new cases and for 32, 505 deaths [110]. Prostate cancer occurs in the prostate gland and is classified as an adenocarcinoma, arising from prostate gland epithelial cells [111]. The leading risk factor for developing this disease is age [111]. Most cases of prostate cancer present asymptomatically, especially in the earlier stages, 15

though urinary tract symptoms such as urgency, frequency, and incomplete emptying have been associated with prostate cancer [111]. 1.7.1 Prostate specific antigen Prostate cancer is one of the few cancers for which a molecular marker is routinely used for detection, risk stratification, and monitoring [112]. Prostate specific antigen (PSA) has been used since 1994 to screen asymptomatic populations beginning at 50 years of age [112]. PSA is a serine protease whose physiological role is believed to be liquefying the seminal fluid [113], and it is present in conditions of normal health [114]. The traditional threshold that is used for biopsy is 4.0 ng/ml of blood [111], with values above this amount warranting follow-up. For biopsy, samples of the prostate are taken using transrectal ultrasound-guided needles [111]. Patient management after a positive diagnosis of prostate cancer includes watchful waiting to see if the cancer progresses and is found to be more aggressive, radiation therapy to control cell growth, or surgical intervention to remove the prostate [111]. Despite the common use of PSA in clinical practice, the effectiveness of PSA as a diagnostic tool has come under strong questioning in recent years. The performance characteristics of PSA are mixed, with a sensitivity of 80% and a specificity of 20% [55]. As PSA is a marker that is specific to the prostate, not prostate cancer, increased PSA levels are detected in non-neoplastic conditions of the prostate, such as benign prostatic hyperplasia (BPH) and prostatitis [112]. This leads to unnecessary biopsies which are both costly and carry significant risks and stress for patients [115]. Another challenge to PSA screening is that it leads to the overdiagnosis and overtreatment of clinically insignificant tumours. Estimates based on autopsies of men who have died from unrelated causes suggest that 70% of men in their 60s have 16

a latent form of prostate cancer [112, 116]. It is therefore possible that a significant proportion of positive PSA tests detect tumours that would otherwise not have any clinical impact on an individual, either because the person would die of other causes before the tumour could progress or because it is essentially benign [116]. Corroborating this theory is a meta-analysis of six randomized controlled trials totalling 387, 286 participants which found that PSA screening had no effect on mortality [117]. Therefore, there exists a real need for improved diagnostic markers for prostate cancer. 1.7.2 Studies of DNA methylation in the circulating DNA of prostate cancer patients There have been several studies looking at DNA methylation in the cirdna of prostate cancer patients. The most widely studied and promising methylation marker in the cirdna of prostate cancer is GSTP1, which is a common pathological DNA methylation event in prostate cancer [55, 118]. A meta-analysis of 22 different studies concluded that the average specificity of GSTP1 methylation was 89%, much higher than that of PSA; however, its sensitivity was lower than that of PSA, at 52% [55]. Other studies have found additional differentially methylated regions. For example, one study looking at DNA methylation in metastatic prostate cancer found hypermethylation of ATP-binding cassette, sub-family B, member 1 (MDR1) in 15 (83.3%), endothelin receptor type B (EDNRB) in 9 (50%), retinoic acid receptor, beta (RARβ) in 7 (38.9%), GTSP1 in 5 (27.8%) and RASSF1A in 3 (16.7%) metastatic prostate serum samples [119]. However, as most of the earlier studies tested only a small number of selected genes, there is a high likelihood that the most informative markers have been missed. 17

1.8 Colorectal cancer Colorectal cancer is the fourth most common cancer in the world [1], and in the United States, it was responsible for 102,900 new cancer cases and 51,370 deaths in 2010 [110]. There are a number of risk factors associated with developing colorectal cancer including age, a diet rich in processed food and red meats, obesity, physical inactivity, high consumption of alcohol, smoking, and a family history of the disease [120]. Though there are several types of colorectal cancers, the vast majority of cases (> 90%) are adenocarcinomas [121] which originate from the lining of the colon [122]. The most common form of colorectal cancer presentation is to a primary care provider with non-urgent symptoms [123], such as rectal bleeding, diarrhea, constipation, weight loss, and abdominal pain [124], though a smaller percentage of colorectal cancer cases also presents as emergencies [125]. 1.8.1 Screening modalities used in colorectal cancer It is widely acknowledged that the stage of colorectal cancer at the moment of diagnosis is the main prognostic factor for this disease [2, 4]. Five-year survival for patients diagnosed when their tumours are limited to the colon exceeds 90%, but drops to below 70% once the cancer has spread to lymph nodes, and below 12% when the cancer has already metastasized [5]. The most commonly used screening modalities for colorectal cancer detection can be divided into those based on finding markers in stool and those based on structural examinations of the colon. Fecal occult blood tests (FOBTs) are based on the observation that there are small but unobservable amounts of blood (occult blood) released into the bowel lumen in colorectal cancer [126]. FOBTs detect this blood loss in stool and are the main screening tool used in Europe and 18

Canada [127]. There are two main types of FOBTs. Guaiac FOBTs (gfobt) work by detecting the peroxidase activity of hemoglobin when it interacts with guaiac in the presence of hydrogen peroxide. This interaction converts the colorless guaiac to a blue color [128]. In contrast, immunochemical FOBTs (ifobts) use antibodies directed against human globin to detect blood in stool. Though FOBTs represent a non-invasive diagnostic method, a limitation common to both of these tests is their poor sensitivity. The reported sensitivity of gfobts is generally low and varies widely, from 11 to 64%, and specificity varies from 90 to 98%, depending on the test brand used [129]. The performance characteristics of ifobts are improved compared to gfobts, but their sensitivities are still variable, ranging from 56 to 89%, with specificities of 91 to 97% [126]. Moreover, because FOBTs are designed to detect blood in stool, they are a nonspecific test for colorectal cancer, as gastrointestinal bleeding can result from other causes such as ulcers, inflammatory bowel disease, or the use of anticoagulant/antiplatelet medications, resulting in false positive results. Colonoscopy is the gold standard for detecting colorectal cancer [126, 128], and it is the most commonly used screening modality in the United States [126]. The sensitivity of colonoscopies for detecting colorectal cancer is 95%, and the specificity ranges from 95 to 99% [126]. However, routine diagnosis of colorectal cancer by colonoscopy is impractical as the procedures are invasive, consume a great deal of healthcare resources, and present significant inconvenience and discomfort to patients. Given the effectiveness of early diagnosis and treatment in colorectal cancer and the performance gaps of current non-invasive colorectal cancer markers, the development of novel biomarkers could result in improved patient outcomes for colorectal cancer. 19

1.8.2 Studies of DNA methylation in the circulating DNA of colorectal cancer patients Epigenetic studies of cirdna of colorectal cancer have included the analysis of hmlh1, cyclin-dependent kinase inhibitor 2A (p16ink4a), septin 9 (SEPT9), and DAPK, with methylation differences at these loci being detected in 16% (n = 19), 36% (n = 58), 71% (n = 133) 17% (n = 122), respectively, of examined samples [79, 130]. Of these genes, SEPT9 has been commercially developed and is marketed as Epi procolon in Europe and the Middle East [131]. The reported sensitivity and specificity for this test range between 68 72% and 89 93%, respectively [132]. Though promising, the moderate sensitivity of this test is something that can be improved upon in future studies. 1.9 Research objectives Despite the potential for discovering novel DNA methylation biomarkers in the cirdna of cancer patients, studies that comprehensively analyze the cancer circulating methylome have not been conducted. The moderate performance characteristics of the DNA methylation markers discovered to date show promise but also show much room for improvement. Dedicated epigenome wide studies in the cirdna could identify novel DNA methylation biomarkers in regions of the genome that have currently been unexplored. The key objective of this study was to perform a comprehensive study of the plasma cirdna methylome in prostate cancer and colorectal cancer. Our general hypothesis driving this study was that cancer patients and controls would exhibit different methylation profiles in their plasma cirdna. For the prostate cancer study, which has been completed, a CpG island microarray-based scan covering 12, 192 loci was performed on the methylation-enriched plasma cirdna of 20

prostate cancer patients (n = 20) and healthy controls (n = 20). Based on the results of the microarray data analysis, thirteen genes were selected for fine mapping using sodium bisulfite treatment and pyrosequencing, as well as replication in a second, independent sample set. For the colorectal cancer study, which is still underway, high resolution tiling microarrays were used to test plasma cirdna samples from 93 colorectal cancer patients and 100 matched controls. The microarray platform was substantially expanded by using Affymetrix tiling arrays which contained over 6.5 million probes covering the entirety of chromosomes 1 and 6. Like in prostate cancer, the goal was to find genes and other regions of significantly differential methylation between colorectal cancer patients and controls and identify informative epigenetic markers. 21

2.0 MATERIALS AND METHODS 2.1 Samples 2.1.1 Prostate cancer study The first sample set consisted of individuals with a confirmed diagnosis of prostate cancer (n = 20, ages 68.9 ± 6.2 yrs) and healthy controls (n = 20, ages 46.3 ± 6.4 yrs). Cases and controls were recruited from several hospitals in Novosibirsk, Russia. The second sample set consisted of 20 prostate cancer patients and 18 control individuals, who were diagnosed as having BPH. Cases and controls in the second sample set were matched by age (mean group ages were 68.7 ± 6.8 yrs and 69.1 ± 7.2 yrs for cases and controls, respectively). They were recruited from the Vilnius University Hospital, in Vilnius, Lithuania. All individuals across both sample sets were male Caucasians, while the tissue type for prostate cancer patients was T2-3N0MX. All the participants provided written informed consent and the research protocol was approved by the research ethic boards from CAMH (Toronto, Canada), Vilnius University Hospital (Vilnius, Lithuania), and the Institute of Chemical Biology and Fundamental Medicine (Novosibirsk, Russia). 2.1.2 Colorectal cancer study The sample set for the colorectal cancer study consisted of patients diagnosed with T1-4N0MX colorectal cancer (n = 93; ages 70.4 ± 12.6) and unaffected individuals (n = 100; ages 57.0 ± 8.1). There were equivalent numbers of male and female individuals in both study groups (46 male and 47 female colorectal cancer patients; 50 male and 50 female unaffected controls). Plasma samples of the colorectal cancer patients were provided by the Ontario Tumor Bank (Toronto, ON), while unaffected controls were provided by the Colon Family Registry (National 22

Institutes of Health, USA). All participants provided written informed consent, and research protocols were approved by the research ethics boards of CAMH and the Ontario Institute for Cancer Research in Toronto, Canada. 2.2 DNA extraction 2.2.1 Prostate cancer study Blood samples from all individuals were collected and the plasma fraction was separated by centrifugation and frozen at -80 C prior to nucleic acid extraction. Total cirdna was isolated from 1 ml of plasma using the GF-1 Nucleic Acid Extraction Kit (Vivantis, Selangor, Malaysia) in the first sample set and QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany) in the second sample set, according to manufacturers instructions. Isolated total plasma cirdna was stored at -20 C until use. 2.2.2 Colorectal cancer study Total cirdna was isolated from 0.5 ml of plasma samples from cases and controls using the QIAamp Circulating Nucleic Acids Kit (Qiagen, Valencia, CA) according to the manufacturer s instructions. CirDNA was eluted in 100 ml EB buffer and stored at -20 C until use. 2.3 DNA methylation detection 2.3.1 Principles of DNA methylation detection The method for DNA methylation detection used in the prostate and colorectal cancer studies is based on the preferential enrichment of methylated DNA, followed by interrogation on 23

microarray platforms (Figure 1). Briefly, adaptors were ligated to the ends of the isolated total cirdna prior to an enzymatic digestion using methylation-sensitive enzymes, which do not cut in the presence of mc. After the enzymatic digestion, the only fragments that remain intact for subsequent PCR amplification are those that contain methylated CpG sites; hence, the methylated fraction becomes enriched. Moreover, as the PCR reaction conditions in this protocol were optimized for shorter templates, any large-size genomic DNA that might be present in the total plasma cirdna was not amplified. 2.3.2 DNA blunting DNA was enzymatically treated to create blunt double-stranded ends. Fifty ng of total cirdna was incubated with 1X NEB Buffer 2, 100 µm dntps, 60 units of T4 DNA polymerase, and 2 ng of BSA in 112.2 µl final volume. The mixture was incubated at 12ºC for 20 minutes, then transferred to ice. Blunted DNA was purified using a standard phenol-chloroform DNA isolation with 120 µl of phenol:chloroform:isoamyl alcohol (25:24:1), followed by ethanol precipitation. After precipitation, samples were dissolved in 25 µl of water. 2.3.3 Adaptor ligation Two types of blunt universal adaptors were prepared by annealing two oligonucleotides (prostate cancer study: ojw 102: GCGGTGACCCGGGAGATCTGAATTC and ojw 103:GAATTCAGATC; colorectal cancer study: RCB1: ATTTGAACCCCTTCATGGGTACCA and RCB2: TGGGGAAGTACCCATGGT). The adaptors were prepared in a reaction of 100 µl consisting of two oligo sequences at 40 µm in 1 M Tris (ph 7.9). The solution was heated to 95ºC for 5 minutes then incubated at 70ºC for 2 minutes and cooled at 25ºC for 2 minutes, then 24

incubated for 16 hours at 4ºC. The annealed adaptors were stored at -20ºC prior to use. Samples were incubated at 16ºC for 8 hours in a reaction volume of 50.2 µl containing 1X NEB ligation buffer, 0.1 pmol of adaptors, and 200 units of T4 DNA ligase. 2.3.4 DNA methylation-sensitive enzyme digestion Enzyme digestion was performed using methylation-sensitive restriction enzymes, which do not cut when the corresponding restriction sites are methylated. 10 U each of HpaII, HpyCH4IV, and HinP1, with 3 µl of 10X NEB Buffer 1, were added to one-half of the adaptorligated template, in a total reaction volume of 56 µl. The samples were incubated for 8 hours at 37ºC then heated to 65ºC for 20 minutes to deactivate the enzymes. 2.3.5 Adaptor-mediated PCR The digestion product was amplified under the following conditions: 1X PCR buffer, 2.8 mm MgCl 2, 275 µm aminoallyl dntp mix, 1.6 µm primer, and 25 unit of Taq DNA polymerase. Amplification for prostate cancer study samples were performed in 100 µl final volume with 25 µl of the digestion template and the ojw 102 primer. For the colorectal cancer samples, amplification was performed with RCB2 primer in 400 µl final volume with 50 µl of the digested template, as a larger amount of PCR product was required for subsequent microarray hybridization. PCR conditions were 72 ºC for 5 minutes, then 30 cycles of 95ºC for 1 minute, 94ºC for 40 seconds, and 67ºC for 2 and a half minutes, followed by a final elongation stage of 72ºC for 5 minutes. PCR products were checked by 1% gel electrophoresis, and purified with the Qiagen MinElute kit. DNA concentration and quality was assessed by spectrophotometry using the Nanodrop 2000, with the 260/280 nm absorbance ratio of 1.8 indicating pure DNA. 25

Figure 1. Principle of DNA methylation detection technology in plasma cirdna. After the enrichment of the methylated fraction of the cirdna, PCR products will be obtained only in templates from cirdna either containing methylated CpG positions (enriched methylated fraction) or lacking restriction sites (no informative fraction). DNA samples isolated from plasma consist of fragmented circulating DNA originating from apoptosis/necrosis in tumor cells (right) and larger size genomic DNA originating from circulating cells (i.e. lymphocytes) or other cellular sources (left). First, universal adaptors (magenta boxes) are ligated to the ends of all DNA molecules. Next, samples are digested with DNA methylation sensitive restriction enzymes. These enzymes will cut only at unmethylated CpG positions (white circles) but not in methylated CpG positions (black circles). Digested DNA is then amplified using primers that bind to the universal adaptors (green arrows). During the PCR reaction, DNA polymerase extends primers (dashed green lines) according to its processivity and the optimized reaction conditions. PCR products will be obtained only from undigested short templates that have ligated adaptors at both sides (mainly from tumour circulating DNA). In longer templates (which are expected from genomic DNA), the DNA polymerase cannot extend primers in the distance between 5 and 3 adaptors and therefore they will not be amplified. 26

2.4 Microarray experiments and data analysis 2.4.1 Prostate cancer study 2.4.1.1 Microarrays The microarray experiments performed on the prostate cancer sample set utilized a two channel, reference-pool based experimental design. In this design, DNA from individual samples is labelled with the fluorescent dye Cy3, and DNA from a common reference pool is labelled with Cy5. After hybridization to a single microarray, the difference between the hybridization intensities can be used to determine methylation differences, and comparing relative changes in each sample against the common reference pool allows the comparison of samples across arrays [133]. For the reference pool, DNA was extracted from the white blood cells of 20 individuals (13 females, 7 males) who were unrelated to this project. Participants were recruited from CAMH in Toronto, Canada, and provided informed consent. DNA was extracted using phenolchloroform isolation followed by ethanol precipitation. Isolated DNA was pooled and sheared by sonication to 200-500 bp fragments, prior to undergoing the methylation detection protocol. Two technical replicates of microarrays were used for all patients and controls. 1.5 μg of purified DNA was labeled using Cy3 for the cirdna samples and Cy5 for the reference pool, and was hybridized to University Health Network HCGI12K arrays containing 12, 192 probes [134]. 2.4.1.2 Microarray data analysis Hybridized microarrays were scanned using the Axon 4000B scanner, and signals were obtained using the GenePix Pro software version 6.1.0.4. The intensity values that were obtained 27

underwent extensive quality control. Data were corrected for background noise and were normalized using a variance stabilization and normalization method to yield a raw p-value based on a moderated t-statistic [135]. The raw p-values were corrected for multiple testing using false discovery rates (FDR), to yield FDR-adjusted p-values. After normalization, microarray signals were calculated for each probe by subtracting the signal intensities obtained from the cirdna sample (Cy3) from the ones obtained from the reference pool (Cy5). Coefficients were calculated as the ratio of methylation between prostate cancer and control individuals, with positive values signifying hypermethylation and negative values indicating hypomethylation in prostate cancer. 2.4.2 Colorectal cancer study 2.4.2.1 Microarrays Samples from the colorectal cancer study set were hybridized to Affymetrix GeneChip Human Tiling 2.0R A arrays, which have over 6.5 million probes covering chromosomes 1 and 6 at a 35-bp resolution (probes are 25 bp long with a 10 bp gap). Affymetrix microarrays use a single channel system using only one fluorophore. The design enables the hybridization of a single sample to each microarray, after which signal intensities are compared across microarrays and between different sample groups [133]. 9 µg of purified cirdna amplification product was used for DNA fragmentation, labelling and hybridization experiments, according to Affymetrix protocols. The Genechip Fluidics Station 450 and GeneChip Operating Software were used. 2.4.2.2 Microarray data analysis Hybridized microarrays were scanned using the Affymetrix GeneChip Scanner 3000 and CEL files were generated. Extensive data normalization was performed using the Affytiling and limma 28

packages implemented in the bioconductor suite (www.bioconductor.org). The raw intensities were corrected for background noise then underwent quantile normalization [136]. Normalized intensity values for each probe were corrected for multiple testing by local FDR. 2.5 Fine mapping of individual CpG locations 2.5.1 Principle of fine mapping Fine mapping of individual CpG sites relies on the selective conversion of unmethylated cytosine to uracil after treatment with sodium bisulfite, while methylated cytosines remain unchanged. After PCR amplification, uracil is amplified as thymine while mc remains as cytosine, thereby allowing the detection of methylation at a single base-pair resolution. Thirteen loci were selected from the microarray experiment for the further analysis and fine mapping of methylated cytosines. They were TRK-fused gene (TFG), atonal homolog 8 (ATOH8), SIX homeobox 3 (SIX3), NudC domain containing 3 (NUDCD3), protocadherin beta 1 (PCDHB1), KIAA1539 protein (KIA1539), ring finger protein 219 (RNF219), heparanase 2 (HPSE2), discs large homolog 2 (DLG2), guanine nucleotide binding protein (G protein) gamma 7 (GNG7), core-binding factor, runt domain alpha subunit 2 translocated to 2 (CBFA2T2), zinc finger CCCH-type containing 4 (ZC3H4) and ArfGAP with GTPase domain ankyrin repeat and PH domain 1 (AGAP1). 2.5.2 Bisulfite treatment and whole bisulfitome amplification In this study, the remaining cirdna from the samples used in the microarray studies was bisulfite-treated using the Qiagen Epitect kit. In order to enable the amplification of minimal amounts of template, bisulfite-treated cirdna was amplified using the Epitect Whole 29

Bisulfitome kit (Qiagen, Mississauga, ON), according to the manufacturer s instructions. This kit allows the whole genome amplification of bisulfite-treated DNA (whole bisulfitome amplification). 2.5.3 Nested PCR Nested PCR was used for single locus amplification, as it allows for greater yield of amplification product while reducing the chances of contamination. The technique involves two successive runs of PCR, with the second reaction amplifying a target within the amplicons from the first reaction. Briefly, 5 µl of bisulfite-treated and whole bisulfitome amplified-dna was added to a reaction mix containing 1X Hotstart PCR buffer with 1.5mM MgCl 2, 120 nm specific primers, 200 nm dntps and 0.65 units of Hotstart Taq polymerase. The DNA was first amplified in 10 cycles with an external primer, then in 40 cycles with an internal primer. The list of primers that were used for each gene can be found in Tables 1 and 2. The reaction conditions were 95 C for 15 minutes, followed by the appropriate number of cycles of 95 C for 1 min, 55 C for 45 sec, 72 C for 1 minute, and a final extension step of 72 C for 10 minutes. 30

Table 1. Primers used for amplification of the external locus in nested PCR. External Fragment Locus Forward Primer Reverse Primer ATOH8 AGGAGGTAGGTTTTGGGTTAAG CCTCCCTCTCTCCCTTTCT CBFA2T2 TAAAAATATTTTGAGTTAGGGGGTT CAAAACCAACTCCCATTAAACAC CENTG2 TTGTATGAGATATTGAGAGTATTAT ACAAAAAAAACCTATACCCTCTAA DLG2 TGGGGGGTTTAAGTTTTTTTGAT CCTCTTAAACTCTCTCTTCAAAAT GNG7 TTGGTTGTTTTTGAGGTTGGGT AAACCCCTTACAAAAAAAATAAACT HPSE2 TAGTAGAGATAGGGTTTTTTTATGT TCTTTCACTAATTATCCTCCACA KIAA1539 TTAAAGGAGGAAGGAGGAGATA AAACCCTCAAACTAATAACTTTAAC NUDCD3 TAGGGTTATTTTTTAGGTTTAGGTA TTTCTAAATAAACCCCTAACAAACT PCDHB1 AAGTATGTGATTAAGTGGATATTTA AACTCCTAACCTCAAATAATCT RNF219 GTTATATTTGTTTGGGGAAGGTAA ACCCAAATAAATCCATTAATCA SIX3 TTGTTAGTTTTTTTGTTGGGAGAAAT ACTTTCCCACCCCAACCCTA TFG GTTTTAAATTTTTTGAGAGTTGGTT AAATAATTCACCCCCATTCCTA ZC3H4 GATTTGAGGGAGAGAGGGAA ACCTTCAACTCTTTCTAACTCTC Table 2. Primers used for amplification of the internal locus in nested PCR. 1 Internal Fragment Locus Forward Primer Reverse Primer ATOH8 B-ATTGGGTTTTTGTGTAAATTGAGG CTACCTCCTTACCAACATTTCT CBFA2T2 B-GTTTGTATTTGGAGAATTTAGGTG TATAACCAAAACAATAACCCAAACT CENTG2 B-TTGGGATGAGGTAAAAAATAGA ACACACACTCAAACAAATAACTAAA DLG2 B-GTTGTTTGGGAATGTAGTTTAAA TCAAAATTCTTTTCAACTTTCCCT GNG7 B-GGGTTTTTTAGTTTGAGTTTTTAGT TACCACCTCTATATAATCTACCA HPSE2 B-GTGTTGGGATTGTAGGTATGA AACACTAAATTTAACAACTATCTAC KIAA1539 B-AGGAAGGAGGAGATAAAGTGAT CCCCTCTAAACTTATCATCACA NUDCD3 B-AGGGAGAATAGTTTTAGTTTTGTT ATAAAAATATAACCACCCTCAAAC PCDHB1 B-TTGTTGTGTTTATATAATATTGAAA TAATCTCCCCACCTTAACCT RNF219 B-GTGATTGTGGGTATAGTTATAAAAT ACTACCCCCATCTCCCAAAA SIX3 B-AGTAGAAATTTTTAGAGGAAGTTAA TTCCCACCCCAACCCTAAA TFG B-TTTTGAGAGTTGGTTGTAGTAGA TCAACTATTACTACAATAATCAACA ZC3H4 B-GAGGAAGGGTGAGATGGGA ATCTCTACCCCTCTCCTACA 1 B stands for biotin-labelled 31

2.5.4 Pyrosequencing Pyrosequencing technology was used for site-specific DNA methylation analysis. Briefly, this technology involves a single-stranded biotinylated PCR fragment that is hybridized to a sequencing primer in a reaction containing DNA polymerase, ATP sulfurylase, luciferase and apyrase. When a dntp complementary to the next base in the template is added to the reaction, its incorporation is catalyzed by DNA polymerase and this causes the release of pyrophosphate (PPi) in amounts equivalent to the number of added dntps. ATP sulfurylase converts PPi to ATP, which drives the generation of light by luciferase. This light is then detected by a camera and displayed as a peak on a Pyrogram [137]. PCR products were analyzed by pyrosequencing using a Qiagen PyroMark Q24 according to the manufacturer s standard protocol. Methylation values at single CG positions were assessed using the PyroMark Q24 1.0.10 software. The primers used for pyrosequencing were the reverse primers used for amplification of the internal fragments in nested PCR (Table 2). 32

3.0 RESULTS 3.1 Microarray methylation analysis in the circulating DNA of prostate cancer The methylation profiles of the plasma cirdna of 40 individuals (20 with prostate cancer and 20 healthy controls) were interrogated through a microarray scan covering 12, 192 regions. The experiment was based on a double-channel, reference pool design that hybridized the sample DNA and the reference DNA to a single microarray. The raw data were corrected for background noise, normalized within and between arrays, and FDR-corrected for multiple testing. The FDR was used in this study as it is a less conservative method for multiple testing correction. Normalized signal intensities were then compared against the reference pool between the prostate cancer and control samples. As the profiling technology was based on enrichment of methylated DNA, higher intensity signals corresponded to higher levels of methylation. In total, 197 regions were found to exhibit significant methylation differences between prostate cancer and controls, with FDR adjusted p < 0.05. Figure 2 shows a volcano plot which compares the FDR-adjusted p-values against the differential methylation between prostate cancer patients versus healthy controls across all the interrogated sites. Of the loci that reached statistical significance, 20 showed increased and 177 showed decreased DNA methylation in the prostate cancer samples. From the 197 regions displaying significant differential methylation, 133 were able to be mapped to genomic positions. The remainder were unable to be mapped, as the probes for the microarrays were generated from a CpG island library, where the remapping of probes is not fully complete [134]. Of the 133 mapped loci, 85 corresponded to repetitive elements, and 79 of these loci showed decreased methylation, while 6 showed increased methylation in prostate cancer compared to controls (Table 3). 48 of the significantly 33

differentially methylated loci represented unique genomic regions. Of these, 6 and 42 showed increased and decreased cirdna methylation in prostate cancer, respectively (Table 4). Figure 2. Volcano plot of microarray data in prostate cancer and control samples using FDRadjusted p-value as statistics. 197 regions were significantly differentially methylated between the groups (FDR- adjusted p-value <0.05). The X-axis represents DNA methylation differences between groups, with coefficients expressed in a log 2 scale. Samples with increased microarray signals in prostate cancer and control individuals had positive and negative coefficients, respectively. The Y-axis represents log 10 - transformed p-values adjusted by multiple test correction. The number of counts represented by each point in the plot is shown in a color gradient, from light gray to black (representing 1 and 600 probes, respectively). The horizontal red line depicts the cutoff value for adjusted p-value (-log 10 (0.05) = 1.3). The vertical red line depicts the 0 value, i.e. no differences in DNA methylation between cases and controls. 34