Introduction to Systems Biology of Cancer Lecture 2

Similar documents
Session 4 Rebecca Poulos

Session 4 Rebecca Poulos

The Cancer Genome Atlas & International Cancer Genome Consortium

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc

Supplemental Information. Integrated Genomic Analysis of the Ubiquitin. Pathway across Cancer Types

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

The Cancer Genome Atlas Pan-cancer analysis Katherine A. Hoadley

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

RNA-seq Introduction

Regulation of Gene Expression in Eukaryotes

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

ncounter Assay Automated Process Immobilize and align reporter for image collecting and barcode counting ncounter Prep Station

cis-regulatory enrichment analysis in human, mouse and fly

Session 6: Integration of epigenetic data. Peter J Park Department of Biomedical Informatics Harvard Medical School July 18-19, 2016

ncounter Assay Automated Process Capture & Reporter Probes Bind reporter to surface Remove excess reporters Hybridize CodeSet to RNA

ChIP-seq data analysis

RNA SEQUENCING AND DATA ANALYSIS

Gene Regulation Part 2

CRISPR/Cas9 Enrichment and Long-read WGS for Structural Variant Discovery

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Biomarker development in the era of precision medicine. Bei Li, Interdisciplinary Technical Journal Club

Peak-calling for ChIP-seq and ATAC-seq

The Biology and Genetics of Cells and Organisms The Biology of Cancer

Deep-Sequencing of HIV-1

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing

Ch. 18 Regulation of Gene Expression

TCGA. The Cancer Genome Atlas

The Cancer Genome Atlas

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

Prokaryotes and eukaryotes alter gene expression in response to their changing environment

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

Eukaryotic transcription (III)

Epigenetics: The Future of Psychology & Neuroscience. Richard E. Brown Psychology Department Dalhousie University Halifax, NS, B3H 4J1

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.

Genetics and Genomics in Medicine Chapter 6 Questions

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

Overview: Conducting the Genetic Orchestra Prokaryotes and eukaryotes alter gene expression in response to their changing environment

Eukaryotic Gene Regulation

BIMM 143. RNA sequencing overview. Genome Informatics II. Barry Grant. Lecture In vivo. In vitro.

mirna Dr. S Hosseini-Asl

An epigenetic approach to understanding (and predicting?) environmental effects on gene expression

Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser

A Statistical Framework for Classification of Tumor Type from microrna Data

The Epigenome Tools 2: ChIP-Seq and Data Analysis

Ambient temperature regulated flowering time

Future applications of full length virus genome sequencing

BIO360 Fall 2013 Quiz 1

Cancer Informatics Lecture

Profiles of gene expression & diagnosis/prognosis of cancer. MCs in Advanced Genetics Ainoa Planas Riverola

Inference of Isoforms from Short Sequence Reads

File Name: Supplementary Information Description: Supplementary Figures and Supplementary Tables. File Name: Peer Review File Description:

Metabolomic and Proteomics Solutions for Integrated Biology. Christine Miller Omics Market Manager ASMS 2015

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

RNA- seq Introduc1on. Promises and pi7alls

RESEARCHER S NAME: Làszlò Tora RESEARCHER S ORGANISATION: Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC)

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

OverView Circulating Nucleic Acids (CFNA) in Cancer Patients. Dave S.B. Hoon John Wayne Cancer Institute Santa Monica, CA, USA

Imprinting. Joyce Ohm Cancer Genetics and Genomics CGP-L2-319 x8821

New technologies reaching the clinic

NEXT GENERATION SEQUENCING. R. Piazza (MD, PhD) Dept. of Medicine and Surgery, University of Milano-Bicocca

Session 2: Biomarkers of epigenetic changes and their applicability to genetic toxicology

Circular RNAs (circrnas) act a stable mirna sponges

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples

AIDS - Knowledge and Dogma. Conditions for the Emergence and Decline of Scientific Theories Congress, July 16/ , Vienna, Austria

Supplementary Figure 1. Schematic diagram of o2n-seq. Double-stranded DNA was sheared, end-repaired, and underwent A-tailing by standard protocols.

Machine-Learning on Prediction of Inherited Genomic Susceptibility for 20 Major Cancers

Stochastic Simulation of P53, MDM2 Interaction

DSB. Double-Strand Breaks causate da radiazioni stress ossidativo farmaci

Removal of Shelterin Reveals the Telomere End-Protection Problem

Transcriptome Analysis

BIO360 Fall 2013 Quiz 1

RNA SEQUENCING AND DATA ANALYSIS

DNA-seq Bioinformatics Analysis: Copy Number Variation

micrornas (mirna) and Biomarkers

DSB. Double-Strand Breaks causate da radiazioni stress ossidativo farmaci

R. Piazza (MD, PhD), Dept. of Medicine and Surgery, University of Milano-Bicocca EPIGENETICS

SUPPLEMENTARY FIGURES: Supplementary Figure 1

Results. Abstract. Introduc4on. Conclusions. Methods. Funding

Lectures 13: High throughput sequencing: Beyond the genome. Spring 2017 March 28, 2017

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

National Disease Research Interchange Annual Progress Report: 2010 Formula Grant

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

High Throughput Sequence (HTS) data analysis. Lei Zhou

Genomic Methods in Cancer Epigenetic Dysregulation

Lyon, 1 3 February 2012 Auditorium

GEN ETICS AN D GEN OM ICS IN CANCER PREVENTION AN D TREATM EN T. Robert Nathan Slotnick MD PhD Director, Medical Genetics and Genomics

Genes, Aging and Skin. Helen Knaggs Vice President, Nu Skin Global R&D

BIO360 Quiz #1. September 14, Name five of the six Hallmarks of Cancer (not emerging hallmarks or enabling characteristics ): (5 points)

Alternative RNA processing: Two examples of complex eukaryotic transcription units and the effect of mutations on expression of the encoded proteins.

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

ESCs were lysed with Trizol reagent (Life technologies) and RNA was extracted according to

Liposarcoma*Genome*Project*

Clasificación Molecular del Cáncer de Próstata. JM Piulats

PI3K Background. The SignalRx R & D pipeline is shown below followed by a brief description of each program:

Transcription:

Introduction to Systems Biology of Cancer Lecture 2 Gustavo Stolovitzky IBM Research Icahn School of Medicine at Mt Sinai DREAM Challenges

High throughput measurements: The age of omics

Systems Biology deals with four main tasks Measurements New High Throughput Omics technologies Modeling Data exploration, deterministic statistical System Characterization & Predictions: Clinical & Biological Oscillatory region Non-oscillatory region dp53 = s p53 δ p53 p53 dt * n dmdm2 [ TP53 ( t τ )] = smdm2 + kmdm2 δ * n n dt [ TP53 ( t τ )] + K dtp53 = r dt TP53 * dtp53 = k dt fp dmdm 2 = r dt p53 µ * TP53 ATM TP53 + K MDM mdm2 [ µ 2 TP53 TP53 ν p MDM * k TP53 ν 2 TP53 rp + ( ν mdm2 TP53 MDM 2 TP53 + K MDM 2 µ * + k TP53 k * TP53 MDM 2 * TP53 + K * ATM ) ] MDM 2 * ATM + K * TP53 ATM TP53 + K Model testing and Validation mdm2 * TP53 MDM 2 d rp a * d fp p P53 basal transcription Low High Low High Mdm2 basal transcription

What do we need to measure in cancer research Given what we saw in the Lecture 1, we need to measure the elements of the genome that are disregulated, as well as their functional consequences. At the DNA level sequence (static) Mutations, Copy number alterations, Loss of heterozygosity, Translocations Epigenetics (static) DNA methylation, histone modifications (methylation, acetylation) At the RNA level, quantify amount (functional) Non-coding RNA, microrna, mrna, splice variants

What do we need to measure in cancer research At the protein level Protein amounts, phosphorylation and other postranslational modifications. Interactions maps Protein (e.g. TF)-DNA interactions, protein-protein interactions Phenotypes Cell viability, patient survival, Patient response to treatment

Omics Technologies

Many biological experiments involve sequencing

DNA Technology Milestones From Nature Milestones, DNA Technologies

Sanger Sequencing

Automatized Sanger Sequencing

Sanger Sequencing

Progress in sequencing 2003 First genome was a mixture of several volunteers Took 13 years (1990-2003), 3,000 scientists, $2.7 Billion Technology: Sanger Sequencing 2007 Second Genome J.C.Venter s genome Took 4 years (2003-2007), 30 scientists, $100 Million Technology: Improved Sanger Sequencing 2008 Third Genome James Watson Took 4.5 months (2008), ~30 scientists, $1.5 Million Technology: 454 (second generation, pyrosequencing) end 2014 ~ 250,000 Genomes Today sequencing costs < $1K Second GenerationTechnologies: 454 (defunct), Solid, Illumina (market leader), Third Generation Technologies: PacBio, Oxford nanopores

Sequencing is now at ~$1K

RNA-seq

Library Construction Illumina sequencing Poly A-based cdna synthesis Before Library Construc;on 1. Poly-A Selection (Total RNA mrna) 2. mrna fragmentiaton 3. First strand synthesis 4. Second strand synthesis

Library Construction Illumina sequencing Prepare for adapter ligation Adapter ligation

Flow cell with oligos Illumina sequencing Attach DNA to Surface Bridge Amplification

Illumina sequencing Bridge amplification Fragments become double stranded Denature the ds molecules

Illumina sequencing Bridge amplification Complete Amplification Sequencing by Synthesis Determine 1 st base

Sequencing by Synthesis Illumina sequencing Image 1 st base Determine 2 nd base

Sequencing by Synthesis Illumina sequencing Image 2 nd base Sequence over multiple Cycles

Other Sequencing Technologies Emulsion PCR, electrical detection of ph change Ion Torrent Single cell, optical detection, long reads PacBio

Other Sequencing Technologies Single cell, electrical detection, long reads Oxford Nanopore

Mapping RNA-seq reads to a reference genome reveals expression SOX2 Gene

Units of RNA-seq More reads map to longer genes. If comparing different genes, use RPKM: Read Per Kilobase Transcript Per Million Reads. If comparing genes to genes across different patients: CPM or Counts Per Million reads (Out of 1M reads, how many mapped to a given gene.)

Noise characteristics Low technical noise (~Poisson distribution) Biological noise can be big

ChIP-seq

Regulatory Genomics and the Biology of Transcription Factors There are 1,500 TF in humans Transcription factor (TF) binds to DNA and controls transcription: promotes or represses the recruitment of the RNA polymerase

TF determine gene regulatory circuits There are 1,500 TF in humans They activate or silence target genes The connectivity of TFs to targets defines transcriptional regulation networks Many network motifs present such as: Feed-forward loops (ensure signals) Fan-outs (amplify signals) Feed-back loops (create pulses) see Uri Alon s work Networks reveal cell logic Rick Young, MIT (Pioneer of ChIP-chip & ChIP-Seq)

ChIP-Seq: study TF-DNA interactions ChIP-Seq: Chromatin Immuno-precipitation followed by sequencing Selects proteins out with an antibody specific to that protein Sequences any of the DNA that is sticking to the selected proteins. From the reads, can we identify where the proteins are binding

ChIP-Seq protocol

ChIP-Seq Example: OCT4 binding in SOX2 Region in mouse ES cells Slide from David Gifford, MIT OpenCourseWare

The ENCODE Project https://www.encodeproject.org

Cancer omics: Learning from patient cohorts

The Cancer Genome Atlas (TCGA) A resource of matched tumor and normal tissues from 11,000 patients with 12 cancer types Cervical cancer Cholangiocarcinoma Esophageal carcinoma Liver hepatocellular carcinoma Mesothelioma Pancreatic ductal adenocarcinoma Paraganglioma & Pheochromocytoma Sarcoma Testicular germ cell cancer Thymoma Uterine carcinosarcoma Uveal melanoma A lot of data available. Go to https://tcga-data.nci.nih.gov/tcga/tcgadownload.jsp To explore data download

The Cancer Genome Atlas (TCGA) The Cancer Genome Atlas (TCGA) Research Network has reported integrated genome-wide studies of twelve distinct malignancies in 3,527 cases

Classical classification of cancer is based on cell of origin. Cancer genomics has found, additionally, that each tissue type can be further divided into 3 to 4 molecular subtypes This paper asks the question: Is there an alternative taxonomy beyond the tissue of origin? Based on 6 omics platforms: A pan-cancer classification.

mrna expression yielded 16 clusters of patients amongst the 12 tumor types

CNV yielded 8 clusters of patients amongst the 12 tumor types

How did they clustered using the 6 genomic platforms? For each sample (patient) and each genomic platform the authors created a binary vector of size = # of clusters Patient k cluster assignment in each platform CNV RNA-seq Then concatenate the clusters Patient k represented by binary vector across platforms 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 CNV 1 CNV 1 CNV 3 CNV 4 CNV 5 CNV 6 CNV 7 CNV 8 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

Perform patient clustering on the binary vectors Patient 1 Patient 2 Patient 3... Patient 3576

Consensus Clustering yielded 13 Pan Cancer clusters

This paper s results suggest that cell-oforigin rather than pathway based features dominate the molecular taxonomy of diverse tumor types. However, based on this study, one in ten cancer patients would be classified differently by this new molecular taxonomy versus our current tissue-of-origin tumor classification system.

If used to guide therapeutic decisions, this reclassification would affect a significant number of patients to be considered for nonstandard treatment regimens.

Proposed homework Read: The Cancer Genome Atlas Research Network, Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin, Cell 158, 929 944, August 14, 2014. Bring 1 important take home message Or Read: Trapnell et. al, Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks, Nat Protoc. 1;7(3):562-78, March 2012. Try to make sense of the RNA-seq. Or Explore the TCGA (The Cancer Genome Atlas) (cancergenome.nih.gov) Data Portal (tcga-data.nci.nih.gov/tcga/tcgahome2.jsp) dataportal. Try to download some files.