The Cancer Genome Atlas & International Cancer Genome Consortium

Similar documents
Session 4 Rebecca Poulos

Session 4 Rebecca Poulos

The Cancer Genome Atlas

The Cancer Genome Atlas Pan-cancer analysis Katherine A. Hoadley

Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser

Supplemental Information. Integrated Genomic Analysis of the Ubiquitin. Pathway across Cancer Types

Introduction to Systems Biology of Cancer Lecture 2

List of Available TMAs in the PRN

Machine-Learning on Prediction of Inherited Genomic Susceptibility for 20 Major Cancers

Cancer Informatics Lecture

Original Article International Cancer Genome Consortium Data Portal a one-stop shop for cancer genomics data

Nicholas Borcherding, Nicholas L. Bormann, Andrew Voigt, Weizhou Zhang 1-4

Discovery Dataset. PD Liver Luminal B/ Her-2+ Letrozole. PD Supraclavicular Lymph node. PD Supraclavicular Lymph node Luminal B.

Supplementary Figure 1: LUMP Leukocytes unmethylabon to infer tumor purity

Nature Getetics: doi: /ng.3471

The CGDS-R library. Anders Jacobsen January 17, 2018

About OMICS Group Conferences

Distinct cellular functional profiles in pan-cancer expression analysis of cancers with alterations in oncogenes c-myc and n-myc

CODING TUMOUR MORPHOLOGY. Otto Visser

Biomarker development in the era of precision medicine. Bei Li, Interdisciplinary Technical Journal Club

Supplementary Figures

Lyon, 1 3 February 2012 Auditorium

Integrated Analysis of Copy Number and Gene Expression

TCGA-Assembler: Pipeline for TCGA Data Downloading, Assembling, and Processing. (Supplementary Methods)

Supplementary Tables. Supplementary Figures

TCGA. The Cancer Genome Atlas

RNA SEQUENCING AND DATA ANALYSIS

Advances in Brain Tumor Research: Leveraging BIG data for BIG discoveries

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

Supplementary Figure 1: High-throughput profiling of survival after exposure to - radiation. (a) Cells were plated in at least 7 wells in a 384-well

Clasificación Molecular del Cáncer de Próstata. JM Piulats

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer

Tableau user guide for all Official Statistics

New Opportunities in Cancer Research National Academy of Sciences Advancing Health Research in Poland and the United States December 3, 2009

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

Genetics/Genomics: role of genes in diagnosis and/risk and in personalised medicine

Genomic Methods in Cancer Epigenetic Dysregulation

Contents. 1.5 GOPredict is robust to changes in study sets... 5

Introduction. Introduction

User s Manual Version 1.0

DNA-seq Bioinformatics Analysis: Copy Number Variation

Genomic Instability. Kent Nastiuk, PhD Dept. Cancer Genetics Roswell Park Cancer Institute. RPN-530 Oncology for Scientist-I October 18, 2016

THE UC SANTA CRUZ GENOMICS INSTITUTE. David Haussler Distinguished Professor, Biomolecular Engineering Scientific Director

SUPPLEMENTARY METHODS

New molecular targets in lung cancer therapy

HALLA KABAT * Outreach Program, mircore, 2929 Plymouth Rd. Ann Arbor, MI 48105, USA LEO TUNKLE *

Nature Methods: doi: /nmeth.3115

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

The 100,000 Genomes Project

Pancreas Cancer Genomics

Fusion Analysis of Solid Tumors Reveals Novel Rearrangements in Breast Carcinomas

The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical genomic driver associations

Cancer troublemakers: a tale of usual suspects and novel villains

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

CUP: Treatment by molecular profiling

NGS in Cancer Pathology After the Microscope: From Nucleic Acid to Interpretation

Cancer develops as a result of the accumulation of somatic

Hands-On Ten The BRCA1 Gene and Protein

Supplemental Table 1.1: Prostate cancer prognostic tools

SUPPLEMENTARY INFORMATION

Oncology 101. Cancer Basics

National Cancer Drugs Fund List - Approved

On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles

BRCAplus. genetic testing for hereditary breast cancer

ESMO Clinical Practice Guidelines. ECCO GUIDELINES FORUM Brussels, November 27

Supplementary Information

New Trials status update ATARI/NCRI

Cancer prevalence. Chapter 7

The 2015 donations will be transformative:

FACULTY MEMBERSHIP APPLICATION Tulane Cancer Center

Screening for novel oncology biomarker panels using both DNA and protein microarrays. John Anson, PhD VP Biomarker Discovery

ncounter Assay Automated Process Immobilize and align reporter for image collecting and barcode counting ncounter Prep Station

Results. Abstract. Introduc4on. Conclusions. Methods. Funding

R2: web-based genomics analysis and visualization platform

Activation of cellular proto-oncogenes to oncogenes. How was active Ras identified?

Applications of IHC. Determination of the primary site in metastatic tumors of unknown origin

Overview of Hong Kong Cancer Statistics of 2015

Action to Cure Kidney Cancer Campaign to Fund Kidney Cancer Research

Nature Medicine: doi: /nm.4439

Structured Pathology Reporting of Cancer Newsletter

Section D. Genes whose Mutation can lead to Initiation

EPIGENOMICS PROFILING SERVICES

Macmillan Publications

LinkedOmics. A Web-based platform for analyzing cancer-associated multi-dimensional data. Manual. First edition 4 April 2017 Updated on July 3, 2017

Oncology Drug Development

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

Histopathological diagnosis of CUP

Plasma-Based Molecular Testing for Early Detection of Lung Cancer

CDH1 truncating alterations were detected in all six plasmacytoid-variant bladder tumors analyzed by whole-exome sequencing.

Immunohistochemical classification of lung carcinomas and mesotheliomas. Prof. Mogens Vyberg NordiQC Institute of Pathology Aalborg, Denmark

Outline. Outline. Phillip G. Febbo, MD. Genomic Approaches to Outcome Prediction in Prostate Cancer

information in the genome organism: phenotype from its genotype, given a specific environment

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

LncRNA TUSC7 affects malignant tumor prognosis by regulating protein ubiquitination: a genome-wide analysis from 10,237 pancancer

Carcinoma of Unknown Primary (CUP)

Oncogenic nexus of cancerous inhibitor of protein phosphatase 2A (CIP2A): an oncoprotein with many hands

COST ACTION CA IDENTIFYING BIOMARKERS THROUGH TRANSLATIONAL RESEARCH FOR PREVENTION AND STRATIFICATION OF COLORECTAL CANCER (TRANSCOLOCAN)

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

New Drug development and Personalized Therapy in The Era of Molecular Medicine

IPA Advanced Training Course

Transcription:

The Cancer Genome Atlas & International Cancer Genome Consortium Session 3 Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW 31 st July 2014 1 st August 2014

Facts on cancer In 2007 over 12 million new cases were diagnosed globally and approximately 7.6 million cancer deaths occurred Without new prevention, diagnosis and treatment programs, by 2050, these numbers are expected to raise to 27 million new cases and 17.5 million cancer deaths Garcia et al, Global Cancer Facts & Figures 2007, Atlanta, GA, American Cancer Society 2007.

Cancer is a disease of the genome Challenge in treating cancer: Every patient is different. Every tumour is different, even in the same patient. Tumours can be highly heterogeneous High rate of genomic abnormalities (few drivers, many passenger) Healthy 46 chromosomes Example cancer 59 chromosomes

What can go wrong in cancer genomes? Type of change DNA mutations DNA structural variations Copy number variation (CNV) DNA methylation mrna expression changes mirna expression changes Protein expression Some common technology to study changes WGS, WXS WGS CGH array, SNP array, WGS Methylation array, RRBS, WGBS mrna expression array, RNA-seq mirna expression array, mirna-seq Protein arrays, mass spectrometry WGS = whole genome sequencing, WXS = whole exome sequencing RRBS = reduced representation bisulfite sequencing, WGBS = whole genome bisulfite sequencing

Goal of cancer genomics Identify changes in the genomes of tumors that drive cancer progression. Identify new targets for therapy. Select drugs based on the genomics of the tumour i.e. personalised therapy.

The Cancer Genome Atlas (TCGA) Lunched in 2006 as a pilot and expanded in 2009 Objective is to make high-quality data publicly available to the cancer research community

Types of Cancers AML Breast Ductal* Breast Lobular/Breast Other Bladder (pap and non-pap) Cervical adeno & squamous Colorectal* Clear cell kidney* DLBCL Endometrial carcinoma* Esophageal adeno & squamous Gastric adenocarcinoma GBM* Head and Neck Squamous* Hepatocellular Lower Grade Glioma Lung adenocarcinoma* Lung squamous* Melanoma Ovarian serous cystadenocarcinoma* Papillary kidney Pancreas Prostate Sarcoma (dediff lipo, UPS, leiomyosarcoma) Papillary Thyroid* * Reached target of 500 tumours

Separate rare tumours project Adrenocortical Carcinoma Chromophobe kidney Mesothelioma Paraganglioma/Pheochromocytoma Uterine Carcinosarcoma Thymoma Uveal Melanoma Testicular Germ Cell Cholangiocarcinoma Diffuse Large B Cell Lymphoma

Current TCGA sampling progress

Types of data Core dataset: Pathology report Histology images Clinical data Whole exome-seq SNP 6.0 array mrnaseq mirnaseq Methylation array Future datasets: 50x Whole-genome sequencing Bisulfide sequencing Protein Array

Accessing TCGA data https://tcga-data.nci.nih.gov/tcga/

Click on Cases with Data for tumour of interest.

Data levels explained: https://tcgadata.nci.nih.gov/tcga/tcgadatatype.jsp

Generally: Level 1 = Raw data Level 2 = A little processed Level 3 = Normalised and processed Raw data require application to dbgap Raw sequence data is held at CGHub (https://cghub.ucsc.edu/ ) https://tcga-data.nci.nih.gov/tcga/tcgadatatype.jsp

Click on RNA-seq to select all RNA-seq samples

After filling out email and hitting download, a link to the achieve with the files below will be sent to you

Overall TCGA data portal is difficult to use Data portal is great for downloading data in large tab delimited format perfect for a bioinformatician, but files difficult to use for average biologist. Fortunately there are some alternatives: ICGC data portal (http://dcc.icgc.org/) cbioportal (www.cbioportal.org/)

International Cancer Genome Consortium (ICGC) Founded in 2007 A collaboration between 22 countries. Goal: To obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical and societal importance across the globe. Incorporates data from TCGA and the Sanger Cancer Genome Project.

Working groups

ICGC Samples Currently 67 tumour projects

ICGC Samples As of 27-Sept-2013

Data types Mandatory: Genomic DNA analyses of tumors (and matching control DNA) are core elements of the project. Complementary (Recommended): Additional studies of DNA methylation and RNA expression are recommended on the same samples that are used to find somatic mutations. Optional: Proteomic analyses Metabolomic analyses Immunohistochemical analyses

Data access policy

ICGC data portal (http://dcc.icgc.org/) Click on cancer projects

Cancer project view click on BRCA-US

Click on Genome Viewer

Looking at mutations in specific genes Type in ERG here

Gene centric view Drill down on types of mutation

Looks to be mostly intronic

Advanced Search Find out which cancers commonly have ERG missense mutations Type in ERG Note go back to home page and click on Advanced search

Go to Mutations tab Shows distribution of tumours with ERG missense mutations. Check missense Hover mouse to display name

Limitations of data portal The data portal is mutation centric i.e. All queries are related to retrieving tumours/samples with particular mutations in a particular gene. If we just want expression/methylation data for a particular gene still have to download the data. But at least data format is more user-friendly

Downloading data from ICGC Click download data Select cancer type of interest Note: go back to Advance search on home page

Can also download via Data repository link from home page. The advantage of ICGC is that data for all samples is in a single file so it is easier to work with in Excel (if file is small) or Galaxy (if file is big)

cbioportal (www.cbioportal.org/ ) A data analysis portal to TCGA data. Provides functions for visualisation, analysis and download of data. Maintained by Memorial Sloan-Kettering Cancer Center

Features of cbioportal Visualising frequency of mutations Correlation between occurrence of mutations Correlation of expression and CNV or methylation Visualisation of mutations Survival analysis Network analysis Gao et al (2013) Sci. Signal

Investigate mutations and CNA in ERG in AML 1. Select cancer study (AML, Provisional) 2. Select the type of aberration you are interested in. 3. Select the sample set 4. Type in gene (can accept any number) In the above query, we are telling cbioportal to perform an analyse comparing all AML samples with ERG mutation or CNA and those without ERG mutation nor CNA.

OncoPrint 8 out of 187 samples have amplification of ERG

Plots correlation ERG expression with CNA Samples with amplification possibly have higher expression

Survival analysis We know that high ERG expression is associated with poor survival (Marcucci et al JCO 2005). Seems like ERG amplification is also associated with poor survival.

Network analysis Not that interesting here, but would be more useful with a larger input gene set.

Bookmark can make URL to immediate share analysis with collaborators

Can also do gene summary across cancer types

Exercises 1. Download patient clinical annotations for AML using TCGA data portal and then using the ICGC data portal. 2. What is the cancer with most frequent RUNX1 mutations? And which cancer has the most RUNX1 missense mutation? (Use ICGC data portal) 3. Do AML patients with DNMT3a mutation have worst survival? (Use cbioportal)