Session 6: Integration of epigenetic data Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016
Utilizing complimentary datasets
Frequent mutations in chromatin regulators Watson et al, Emerging patterns of somatic mutations in cancer, Nat Rev Gen, 2013
Dawson & Kouzarides, Cell, 2012
High rate of mutation in chromatin regulators Kandoth et al, Nature, 2013
https://www.encodeproject.org
Chromatin immunoprecipitation (ChIP-seq) Transcription factors, chromatin-associate proteins, and other DNA-binding proteins (generally with formaldehyde crosslinking) Histone modifications (micrococcal nuclease treatment) Sonication to ~200-400 bp fragments Park PJ, Nature Reviews Genetics, 2009
ChIP-seq from clinical samples?
ENCODE (The Encyclopedia Of DNA Elements)
ENCODE (human + model organisms)
Integrating TCGA & ENCODE data Can we help interpret genome variations (e.g., germline or somatic mutations) with ENCODE-type data? Sequencing of cancer genomes has revealed many mutations in the non-coding regions. Do they belong to UTRs, enhancers, transcription factor binding sites, DNase hypersensitivity sites, etc.? Science, 2012
Chromatin landscape influences mutation rate Heterochromatin has elevated mutation rate Schuster-Bockler & Lehner,Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature, 2012
Chromatin state annotation of the genome Kharchenko et al, Nature, 2011
Annotating the genome with chromatin states Ernst et al, Mapping and analysis of chromatin state dynamics in nine human cell types, Nature, 2011
GWAS Non-coding variants enriched in regulatory DNA (DNase I hypersensitive sites) Maurano et al, Systematic localization of common disease-associated variation in regulatory DNA, Science, 2013
ATAC-seq Nature Methods, 2013
FunSeq Khurana et al, Integrative annotation of variants from 1092 humans: application to cancer genomics Science, 2013 Use enrichment of rare SNPs in 1000G as an estimate of purifying selection Regions of negative selection ( sensitive and ultrasensitive ) are enriched for disease-causing mutations
Roadmap Epigenomics
Roadmap Epigenomics
Matching genomic and epigenomic profiles There are no epigenetic data in TCGA other than DNA methylation Available epigenome data (ENCODE, Roadmap, etc) are mostly for normal tissues and cell lines - are they informative for my cancer genomes?
Cell-of-origin can be predicted by chromatin features Mutation rate is highly variable along the cancer genome Chromatin accessibility / chromatin modification / replication timing can explain the variance in mutation rates along cancer genomes (up to 86%) Cell type of origin of a cancer can be predicted based on the distribution of mutations along its genome. Polak et al, Cell-of-origin chromatin organization shapes the mutational landscape of cancer, Nature, 2015
Nucleosome footprint informs tissue-of-origin Sequencing cell-free DNA yields a genome-wide map of in vivo nucleosome occupancy! Identify tissues-of-origin of cell-free DNA using nucleosome maps Snyder et al, Cell-free DNA Comprises an In Vivo Nucleosome Footprint that Informs Its Tissues-Of-Origin, Cell, 2016
3D Interaction Data Dekker, Marti-Renom, Mirny, Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data, Nat Rev Gen, 2013
3D interactions: in situ Hi-C Rao et al, Cell, 2014
Hi-C Interaction Map Dekker et al, Nat Rev Gen, 2013
Histone modification & chromatin domains Ho, Jung, Liu et al, Comparative analysis of metazoan chromatin organization, Nature, 2014
Features in Hi-C data In situ Hi-C in GM12878 ~5 billion contacts, ~1 kb resolution. Rao et al, A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping, Cell, 2014
Promoter-enhancer loops are associated gene activation Rao et al, Cell, 2014
Features in Hi-C data Genomes are partitioned into contact domains (median: 185 kb), associated with distinct patterns of histone marks and segregated into subcompartments. ~10,000 loops. - frequently link promoters and enhancers - correlate with gene activation - conserved across cell types Loop anchors typically occur at domain boundaries and bind CTCF Rao et al, Cell, 2014
Disruption of CTCF binding Enhancers interact with their genes through loops. These interactions are usually contained within larger domains involving CTCF and cohesins Could somatic mutations disrupt the boundaries of these domains? Could the disruption of the boundaries lead to activation of oncogenes? Lucy Jung
Frequent mutations at CTCF-binding sites 213 colorectal tumors + ChIP-exo (CTCF/ Rad21) Katainen et al, CTCF/cohesin-binding sites are frequently mutated in cancer, Nature Genetics, 2015
Elevated mutations at CTCF binding sites across tumor types Lucy Jung
IDH mutant gliomas Flavahan et al, Insulator dysfunction and oncogene activation in IDH mutant gliomas, Nature, 2016 CTCF is methylationsensitive! Hypermethylation at cohesin/ctcf-binding sites compromises binding its binding Loss of CTCF at boundary -> aberrant enhancer interaction with PDGFRA (oncogene) Treatment with a demethylating agent partially restores insulator function and downregulates PDGFRA; CRISPR-mediated disruption of the CTCF motif in IDH wild-type gliomaspheres upregulates PDGFRA and increases proliferation
Oncogene activation via loop disruption in T-ALL Perturbation of boundaries in non-malignant cells activated oncogenes. Hnisz et al., Activation of proto-oncogenes by disruption of chromosome neighborhoods, Science, 2016
3D chromosomal interaction data A Common Fund project (2015-20) A large number of investigators with genomics and imaging expertise Will generate a large number of 3D interaction maps for the community