Metadata of the chapter that will be visualized online ChapterTitle Chapter Sub-Title Advanced Analysis of Human Plasma Circulating DNA Sequences Produced by Parallel Tagged Sequencing on the 454 Platform Chapter CopyRight - Year Springer Science+Business Media B.V. 2010 (This will be the copyright line in the final PDF) Book Name Circulating Nucleic Acids in Plasma and Serum Corresponding Author Family Name Vaart Given Name Address van der Maniesh Biochemistry School for Physical and Chemical Sciences, North-West University Potchefstroom, South Africa Manieshv@gmail.com Author Family Name Semenov Given Name Dmitry V. Institute of Chemical Biology and Fundamental Medicine SB RAS Address Novosibirsk, Russia Semenov@niboch.nsc.ru Author Family Name Kuligina Given Name Elena V. Institute of Chemical Biology and Fundamental Medicine SB RAS Address Novosibirsk, Russia Author Family Name Richter Given Name Vladimir A. Institute of Chemical Biology and Fundamental Medicine SB RAS Address Novosibirsk, Russia Author Family Name Pretorius Given Name Piet J. Institute of Chemical Biology and Fundamental Medicine SB RAS
Address Novosibirsk, Russia Piet.pretorius@nwu.ac.za Abstract Keywords (separated by '-') The structure of human plasma circulating nucleic acids is currently extensively studied for an acquisition and extension of fundamental knowledge on DNA and RNA functions inside cells as well as between cells in extracellular fluids of both humans and animals. Previously, we reported data on the general analysis of DNA sequences from plasma of 10 healthy individuals and 12 prostate cancer patients. In order to further characterize this array of sequences we performed comparative analysis of chromosome distribution, repeat content and epigenetic characteristics of plasma DNA. It was found that Long terminal repeats (LTR) [Endogenous retrovirus-related (ERVL) and Mammalian apparent LTR-retrotransposon (MaLR)] DNA were elevated in plasma of healthy individuals while repeats of other classes were at the same or lowered frequency compared to random genome DNA. Satellite repeats attributed to chromosome 12 were elevated in plasma of prostate cancer patients. Epigenome and chromatin structure attributes of circulating DNA emphasized an elevated frequency of histone H3 containing the dimethylated lysine 27 (H3K27me2) associated DNA. The elevated frequency of LTR repeats in circulating human DNA can represent the species-specificity of a hypothetical active DNA release by human cells, and histone H3K27me2 may be involved in this process. 454 sequencing - Plasma circulating DNA - Prostate cancer
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Chapter 5 Advanced Analysis of Human Plasma Circulating DNA Sequences Produced by Parallel Tagged Sequencing on the 454 Platform Maniesh van der Vaart, Dmitry V. Semenov, Elena V. Kuligina, Vladimir A. Richter, and Piet J. Pretorius Abstract The structure of human plasma circulating nucleic acids is currently extensively studied for an acquisition and extension of fundamental knowledge on DNA and RNA functions inside cells as well as between cells in extracellular fluids of both humans and animals. Previously, we reported data on the general analysis of DNA sequences from plasma of 10 healthy individuals and 12 prostate cancer patients. In order to further characterize this array of sequences we performed comparative analysis of chromosome distribution, repeat content and epigenetic characteristics of plasma DNA. It was found that Long terminal repeats (LTR) [Endogenous retrovirus-related (ERVL) and Mammalian apparent LTR-retrotransposon (MaLR)] DNA were elevated in plasma of healthy individuals while repeats of other classes were at the same or lowered frequency compared to random genome DNA. Satellite repeats attributed to chromosome 12 were elevated in plasma of prostate cancer patients. Epigenome and chromatin structure attributes of circulating DNA emphasized an elevated frequency of histone H3 containing the dimethylated lysine 27 (H3K27me2) associated DNA. The elevated frequency of LTR repeats in circulating human DNA can represent the species-specificity of a hypothetical active DNA release by human cells, and histone H3K27me2 may be involved in this process. Keywords 454 sequencing Plasma circulating DNA Prostate cancer Abbreviations ERVL LINEs LTR Endogenous retrovirus-related Long interspersed nuclear elements Long terminal repeats M. van der Vaart (B) Biochemistry, School for Physical and Chemical Sciences, North-West University, Potchefstroom, South Africa e-mail: Manieshv@gmail.com P. Gahan (ed.), Circulating Nucleic Acids in Plasma and Serum, DOI 10.1007/978-90-481-9382-0_5, C Springer Science+Business Media B.V. 2010
M. van der Vaart et al. 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 MaLR SINEs Introduction Mammalian apparent LTR-retrotransposon Short interspersed nuclear elements Development of plasma/serum DNA-based diagnostics directly depends on the results of DNA sequencing, comprehensive bioinformatic analysis of the sequencing data and elucidation of the fundamental principles that orchestrate the liberation of DNA by human cells (van der Vaart and Pretorius 2010). Recently Beck et al. (2009) using the 454 sequencing platform, reported results from the analysis of 450,000 serum DNA sequences from 51 healthy humans. They concluded that nonspecific DNA release is not the sole origin of circulating nucleic acids. In this report we present the results of advanced analysis of human circulating DNA based on 454 platform sequencing of plasma DNA from 10 healthy individuals and 12 prostate cancer patients (van der Vaart et al. 2009). Methods Plasma DNA were collected and isolated from 22 individuals using a phenolchloroform extraction method and subsequently sequenced using a parallel tagged sequencing method on the 454 platform (van der Vaart et al. 2009). Initial bioinformatic analysis was performed as previously described (van der Vaart et al. 2009) and further analysis followed using the genome browser database developed by the university of California Santa Cruz (UCSC: genome.ucsc.edu) and EpiGRAPH software (Bock et al. 2009). Results Chromosome Distribution of Circulating DNA The array of obtained circulating DNA sequences was optimally aligned with the referenced human genome assembly (hg_18). Chromosome distribution of circulating DNA was calculated and results were compared between the healthy and cancer groups (Table 5.1). No significant differences between the groups were observed. Repeat Content of Circulating DNA Previously we characterized the overall repeat content of plasma DNA determined by bulk counting of the repeats in the whole array of sequences (van der Vaart et al. 2009). In the present work repeat content was evaluated as overlaps of
5 Advanced Analysis of Human Plasma Circulating DNA Sequences 91 Table 5.1 Chromosome distribution of circulating DNA (nonredundant array) AQ1 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 Chr Relative number of locations mean (%) ± SD Prostate cancer patients Healthy individuals Total number of locations Mann- Whitney test P > U Apparent density Chromosome (locations/mb) length 1 7.54± 3.02 8.07 ± 3.4 1 283 247 1.14 2 7.81± 2.51 7.61 ± 1.37 0.87 288 243 1.19 3 4.88± 0.93 5.78 ± 1.37 0.1 192 200 0.96 4 5.53± 1.78 4.71 ± 2 0.34 181 191 0.95 5 6.28± 2.23 4.42 ± 1.58 0.12 191 181 1.06 6 4.68± 2 4.81± 1.55 0.87 164 171 0.96 7 4.38± 1.7 5.26 ± 1.81 0.2 173 159 1.09 8 4.28± 1.89 4.93 ± 1.93 0.2 162 146 1.11 9 3.62± 1.92 4.70 ± 1.35 0.18 148 140 1.05 10 4.98 ± 1.29 4.85 ± 1.53 1 179 135 1.32 11 4.18 ± 1.66 4.45 ± 2.07 0.77 160 135 1.19 12 4.53 ± 2.14 3.40 ± 1.48 0.37 135 132 1.02 13 3.51 ± 1.2 2.94 ± 1.54 0.15 109 114 0.96 14 3.58 ± 1.85 3.60 ± 1.74 0.62 122 106 1.15 15 2.59 ± 1.63 2.09 ± 1.08 0.37 82 100 0.82 16 2.66 ± 1.52 3.16 ± 1.6 0.53 101 88.8 1.14 17 2.46 ± 1.47 2.77 ± 1.47 0.87 97 78.8 1.23 18 2.08 ± 0.97 2.00 ± 1.05 0.53 78 76.1 1.02 19 2.46 ± 1.31 1.77 ± 1.06 0.13 77 63.8 1.21 20 2.05 ± 1.14 2.20 ± 1.07 0.97 80 62.4 1.28 21 0.98 ± 0.85 1.17 ± 0.84 0.64 41 46.9 0.87 22 1.30 ± 0.48 1.34 ± 0.6 0.97 48 49.7 0.97 X 2.76± 1.01 2.48 ± 0.43 0.87 94 155 0.61 Y 0.26± 0.53 0.47 ± 0.48 0.2 13 57.8 0.22 a 10.63 ± 3.81 11.02 ± 2.78 1 387 a Doubtfully located. circulating DNA unique genome loci with RepeatMasker annotations of the human genome (genome.ucsc.edu). It was determined that Short interspersed nuclear elements (SINEs) were slightly elevated and Long interspersed nuclear elements (LINEs) were slightly underrepresented in circulating DNA from healthy individuals. ERVL repeats were elevated in the plasma DNA from healthy individuals and MaLR repeats were moderately elevated in both groups (Table 5.2). These results confirmed our previous findings (van der Vaart et al. 2009). Circulating Centromeric Satellite DNA It is known, that the attribution of satellite DNA to chromosomes represents experimental and bioinformatical problems. Centromeric satellites of the circulating DNA
M. van der Vaart et al. 136 Table 5.2 Relative content of repeats in circulating DNA 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 Repeats Class (% in referenced genome a ) Family subfamily a Prostate cancer patients mean (%) ± SD Healthy individuals mean (%) ± SD Mann-Whitney test Differences significant (±) P value DNA (2.84) 2.58 ± 1.51 2.09 ± 1.25 0.62 LINEs (20.42) 16.49 ± 6.30 16.65 ± 3.02 0.67 Low complexity (N/A) 0.07 ± 0.13 0.29 ± 0.38 0.11 LTR (8.29) 11.04 ± 3.34 13.40 ± 3.10 0.11 ERV 5.38 ± 2.03 8.43 ± 3.53 + 0.03 ERV1 (2.81) 3.45 ± 1.58 3.89 ± 2.72 0.92 ERVK (0.31) 0.16 ± 0.36 0.35 ± 0.48 0.23 ERVL (1.44) 1.76 ± 1.13 4.19 ± 3.72 + 0.03 MaLR (3.65) 5.66 ± 1.91 4.97 ± 1.96 0.25 Satellite (N/A) 7.39 ± 2.40 6.65 ± 1.60 0.62 Centromeric 5.43 ± 2.21 4.67 ± 1.98 0.62 Other 1.95 ± 1.32 1.98 ± 0.68 0.92 Simple repeat (N/A) 0.48 ± 0.51 0.37 ± 0.17 0.97 SINE (13.4%) 8.62 ± 3.34 11.84 ± 4.27 0.31 Alu (10.6) 8.62 ± 3.34 11.85 ± 4.27 0.11 MIR (2.54) 3.84 ± 3.72 2.51 ± 1.66 0.53 Other (srprna, rrna, scrna, 0.20 ± 0.35 0.15 ± 0.19 0.85 7SK, SVA, Unk) Unrepeated 49.29 ± 7.15 46.03 ± 5.69 0.28 a Repeat content of referenced human genome represented according to (International Human Genome Sequencing Consortium 2001). are preliminarily classified in Table 5.3 according to the most reliable matches in the human genome assembly (hg_18). With this approach, we compared apparent attributes of the circulating satellite DNA. Satellite DNA that was attributed to precentromeric regions of chromosome 12 was elevated in the plasma of cancer patients while satellites attributed to random locations on chromosome 9 were underrepresented in the plasma of cancer patients (Table 5.3). Thus, detailed analysis of the centromeric regions of chromosome 12 can be useful for the development of novel prostate cancer diagnostic and prognostic markers. Analysis of Epigenomic Characteristics with EpiGRAPH Software To further characterize the features of circulating DNA we used EpiGRAPH software for advanced genome and epigenome analysis. EpiGRAPH allowed us to compare a thousand variables of the inputted genome locations with the attributes of randomly generated genome loci of similar length and distribution (Bock et al. 2009). Using EpiGRAPH we confirmed previous findings that ERVL repeats are 2 3 times more frequently represented in plasma DNA of healthy individuals than in
5 Advanced Analysis of Human Plasma Circulating DNA Sequences 181 Table 5.3 Apparent chromosome distribution of circulating centromeric DNA 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 Chr Relative number of locations of centromeric satellites mean (%) ± SD Prostate cancer patients Healthy individuals 1 8.33± 6.91 6.16 ± 5.63 0.457 2 2.54± 6.28 2.87 ± 7.86 0.883 3 0.00± 0.00 0.50 ± 1.58 0.315 5 6.20± 12.24 10.06 ± 11.97 0.342 6 3.05± 5.03 7.36 ± 10.51 0.336 7 14.46± 15.19 16.28 ± 11.09 0.594 8 4.29± 11.46 3.59 ± 6.25 0.737 9 2.34± 4.51 2.05 ± 3.73 1.000 10 3.38 ± 6.56 3.92 ± 7.34 0.834 11 13.89 ± 13.07 11.85 ± 10.22 0.869 12 26.18 ± 20.56 10.02 ± 10.41 0.050 17 2.40 ± 3.79 1.00 ± 3.16 0.261 18 0.44 ± 1.52 3.77 ± 10.48 0.473 19 7.97 ± 8.07 10.17 ± 14.74 0.945 20 0.49 ± 1.70 1.11 ± 3.51 0.895 X 2.17± 4.44 2.29 ± 4.24 0.867 9 random 1.88 ± 4.95 6.38 ± 8.22 0.051 17 random 0.00 ± 0.00 0.63 ± 1.98 0.315 Mann-Whitney test p value Fig. 5.1 Statistical significances of differences between histone-bound circulating DNA loci versus random genome cuts for different arrays of sequences (determined with EpiGRAPH Bock et al. 2009). Gray filled rows indicate parameters that are not significantly different between random genomic DNA versus circulating DNA. Lines and arrows connect positions of identical histone-variables in the tables
M. van der Vaart et al. 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 plasma of prostate cancer patients (p <9 10 6 ). EpiGRAPH data allow us to conclude that of the number of histone isoforms, isoform H3K27me2 is the constant protein-partner distinguishing circulating DNA from randomly chosen genome cuts (Fig. 5.1). Conclusion DNA, containing ERVL repeats, are represented significantly higher in the circulating plasma DNA of healthy individuals than in genomic DNA. MaLR repeats are elevated both in plasma of healthy individuals and in plasma of prostate cancer patients. Circulating satellite DNA attributed to the subcentromeric regions of chromosome 12 may represent challenging biomarkers of prostate cancer. Circulating DNA was enriched by loci that bound to histone H3K27me2, indicating the involvement of this histone in the externalization and stability of plasma DNA, however this needs to be confirmed and evaluated with further investigation. Acknowledgments This work was partially supported by The National Research Foundation of South Africa (MvdV), The Russian Federal Agency for Science and Innovations, #02.512.11.2257, #02.522.12.2005 (DVS) and by SB RAS Grant #18 (DVS). References Beck J, Urnovitz HB, Riggert J et al (2009) Profile of the circulating DNA in apparently healthy individuals. Clin Chem 55:730 738 Bock CK, Halachev K, Buch J et al (2009) EpiGRAPH: user friendly software for statistical analysis and prediction of (epi) genomic data. Genome Biol 10:R14 International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860 921 van der Vaart M, Pretorius PJ (2010) Is the role of circulating DNA as a biomarker of cancer being prematurely overrated? Clin Biochem 43:26 36 van der Vaart M, Semenov DV, Kuligina EV et al (2009) Characterisation of circulating DNA by parallel tagged sequencing on the 454 platform. Clin Chim Acta 409:21 27
This is an Author Query Page Integra 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 Chapter 5 Q. No. Query AQ1 The footnote has been deleted and changed to a. Please confirm.