Genome Control in Cell Identity and Disease! Development and cell identity Loss of cell identity and disease New diagnostics and therapeutics
Development and cell identity! 30,000,000,000 cells!!
Control of Cell Identity: Gene Expression Programs! Figure 2 24,000 protein-coding genes B-cell Pancreatic exocrine Hepatocyte The program in any one cell type: ~ 14,000 genes are expressed at >1 mrna molecule/cell ~ 3% of these are cell-type specific MyoD C/EBPα or C/EBPβ OCT4 SOX2 KLF4 MYC PDX1 NGN3 MAFA PRDM16 C/EBPβ BRN2 ASCL1 MYTL1 (NEUROD1) GATA4 TBX5 MEF2C GATA4 HNF1A FOXA3 or HNF4A FOXA1/2/3 BRN2 ASCL1 MYTL1 NR5A1 WT1 DMRT1 GATA4 SOX9 Muscle Macrophage Pluripotent Islet β-cell Brown fat Neuron Cardiomyocyte Hepatocyte Neuron stem cell Sertoli
Control of Gene Expression Programs:! Enhancers and Transcription Factors! Transcription factors Enhancer binding Cohesin Mediator Initiation Pol II Elongation Paused Pol II
A small number of master transcription factors dominate control of gene expression programs! Figure 2 B-cell Pancreatic exocrine Hepatocyte MyoD C/EBPα or C/EBPβ OCT4 SOX2 KLF4 MYC PDX1 NGN3 MAFA PRDM16 C/EBPβ BRN2 ASCL1 MYTL1 (NEUROD1) GATA4 TBX5 MEF2C GATA4 HNF1A FOXA3 or HNF4A FOXA1/2/3 BRN2 ASCL1 MYTL1 NR5A1 WT1 DMRT1 GATA4 SOX9 Muscle Macrophage Pluripotent Islet β-cell Brown fat Neuron Cardiomyocyte Hepatocyte Neuron stem cell Sertoli
Core Transcriptional Regulatory Circuitry: ES Cells! Super-! enhancer! Gene! Master! Transcription! Factor! SE Oct4 Oct4 SE Sox2 Sox2 SE SE Nanog Klf4 Nanog Klf4 Cell Identity Genes! SE Esrrb Esrrb SE Prdm14 Prdm14
Summary: Development and cell identity! Many different cells Same genome subset of 24,000 genes are on Gene control by master transcription factors 30,000,000,000 cells!!
Genome Control in Cell Identity and Disease! Development and cell identity Loss of cell identity and disease New diagnostics and therapeutics
Willy Shoemaker & Wilt Chamberlin!! Genome Sequence Variation! 22 pairs autosomes, 2 sex chromosomes!! 6 billion bp!! Variation ~1/1200 bp!!
Disease! 5000 genetic diseases: cancer, cardiovascular, autoimmune, etc. Each child carries ~ 6 deleterious genes Probability of developing genetic disease: 67% Example: Risk of cancer developing dying Females: 38% 20% Males: 44% 23% Sources: McKusick, Mendelian Inheritance in Man; American Cancer Society
Loss of Cell Identity Due to Misregulated Gene Expression Programs! Figure 2 24,000 protein-coding genes B-cell Pancreatic exocrine Small changes in key regulators can have large impact on gene expression program Hepatocyte MyoD C/EBPα or C/EBPβ OCT4 SOX2 KLF4 MYC PDX1 NGN3 MAFA PRDM16 C/EBPβ BRN2 ASCL1 MYTL1 (NEUROD1) GATA4 TBX5 MEF2C GATA4 HNF1A FOXA3 or HNF4A FOXA1/2/3 BRN2 ASCL1 MYTL1 NR5A1 WT1 DMRT1 GATA4 SOX9 Muscle Macrophage Pluripotent Islet β-cell Brown fat Neuron Cardiomyocyte Hepatocyte Neuron stem cell Sertoli
Disease-associated sequence variation often occurs in enhancers! 5,303 SNPs from 1,675 GWAS studies Coding vs. non-coding SNPs Proximity to enhancers Other 38% 10% Neurological/ behavioral Cancer 10% 8% 5% 6% Metabolic 18% 5% Cardiovascular Diabetes Immune/ Autoimmune Kidney/lung/liver Coding SNPs 7% 93% Non-coding SNPs # of non-coding SNPs 0 1000 2000 3000-100 -50 0 +50 +100 Distance to the nearest enhancer (kb) 83% of trait-associated non-coding SNPs occur in the ~33% of the genome covered by all enhancer regions defined by H3K27Ac Maurano et al. and Stamatoyannopoulis, Science 2012 Hnisz et al., Cell 2013
Reminder: Enhancers and Transcription Factors in Control of Gene Expression Programs! Transcription factors Enhancer binding Cohesin Mediator Initiation Pol II Elongation Paused Pol II
Disease-associated sequence variation in enhancers can alter transcription factor binding: cancer! Enhancers:! Sequence variation:! Enhancerpromoter signal Colon cancer! Tcf712 binding MYC gene! High levels of Myc gene expression contribute to tumorigenesis!! Sequence variation affects Tcf712 transcription factor binding in colon and consequently MYC gene expression! Tuupanen et al, Nature Genetics 2009; Kaur Sur et al, Science 2012
Relative enrichment 4 Disease-associated sequence variation in enhancers can alter transcription factor binding: hemoglobinopathies! 3 2 kb 20 BCL11A GATA1 H3K27me3 PolII Locus HS3 HS-40 OCT4 GAPDH Bauer et al, Science 2013 1 2 3 1 Enhancer:! Sequence variation:! Gata1 TF binding:! 0 BCL11a! 20kb 20 kb chr2:60,640 kb 60,720 kb 60,800 kb 200 kb H3K4me1 H3K4me1 H3K4me3 H3K27ac H3K4me3 Enhancer signal SNP decreases binding of Gata1 in erythroid cells, leading to decreased BCL11a expression TAL1 BCL11A GATA1 PolII H3K27me3 H3K27ac A B 0 DNase I Brain +58 +55 +62 Low Bcl11a leads to elevated fetal hemoglobin level *Corresponding author. E-mail: stuart_orkin@dfci.harvard.edu Seattle, WA 98195, USA. 8 Department of Medicine, University of Washington, Seattle, WA 98195, USA. 9 Department of Pediatrics, Stanford University, Palo Alto, CA 94304, USA. 10 T-lymphocyte I DNase Harvard School of Public Health, Boston, MA 02115, USA. Erythroid DNase I B-lymphocyte DNase I
Genome Control in Cell Identity and Disease! Development and cell identity Loss of cell identity and disease New diagnostics and therapeutics
High Throughput Genome Sequencing is Changing Preclinical and Clinical Landscape! Genome diagnostics!
Drugging Transcriptional Control! Cancer! Tumor cell! Disease!Tumor death! Tumor cell! Diabetes! Low metabolic! activity,! Insulin-resistant adipocyte! Disease! Health! High metabolic activity,! insulin-sensitive adipocyte!
Drugging Transcriptional Control! Transcription factors Cofactors Enhancer binding Cohesin Mediator Initiation Pol II Elongation Paused Pol II
Regenerative Medicine: Reprogramming Cell Identity! Many different cell types are being generated through reprogramming
However, not all gene control is well-understood: Epigenetics! 3 years 50 years Iden4cal DNA methyla4on pa?erns give yellow, Different methyla4on pa?erns are green or red Iden4cal genes, epigene4c differences