Session 4 Rebecca Poulos

Similar documents
Session 4 Rebecca Poulos

The Cancer Genome Atlas & International Cancer Genome Consortium

The Cancer Genome Atlas Pan-cancer analysis Katherine A. Hoadley

Supplemental Information. Integrated Genomic Analysis of the Ubiquitin. Pathway across Cancer Types

The Cancer Genome Atlas

Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser

Machine-Learning on Prediction of Inherited Genomic Susceptibility for 20 Major Cancers

Introduction to Systems Biology of Cancer Lecture 2

Original Article International Cancer Genome Consortium Data Portal a one-stop shop for cancer genomics data

Nicholas Borcherding, Nicholas L. Bormann, Andrew Voigt, Weizhou Zhang 1-4

Distinct cellular functional profiles in pan-cancer expression analysis of cancers with alterations in oncogenes c-myc and n-myc

Cancer Informatics Lecture

The CGDS-R library. Anders Jacobsen January 17, 2018

List of Available TMAs in the PRN

Supplementary Tables. Supplementary Figures

FACULTY MEMBERSHIP APPLICATION Tulane Cancer Center

Nature Getetics: doi: /ng.3471

User s Manual Version 1.0

Supplementary Figure 1: LUMP Leukocytes unmethylabon to infer tumor purity

Tableau user guide for all Official Statistics

Supplementary Figures

Hands-On Ten The BRCA1 Gene and Protein

Cancer troublemakers: a tale of usual suspects and novel villains

CODING TUMOUR MORPHOLOGY. Otto Visser

Enterprise Interest Thermo Fisher Scientific / Employee

The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical genomic driver associations

Development status of OPDIVO (nivolumab) 1

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

RNA SEQUENCING AND DATA ANALYSIS

TCGA. The Cancer Genome Atlas

Discovery Dataset. PD Liver Luminal B/ Her-2+ Letrozole. PD Supraclavicular Lymph node. PD Supraclavicular Lymph node Luminal B.

Lyon, 1 3 February 2012 Auditorium

Supplemental Table 1.1: Prostate cancer prognostic tools

Information for You and Your Family

THE UC SANTA CRUZ GENOMICS INSTITUTE. David Haussler Distinguished Professor, Biomolecular Engineering Scientific Director

Genomic Instability. Kent Nastiuk, PhD Dept. Cancer Genetics Roswell Park Cancer Institute. RPN-530 Oncology for Scientist-I October 18, 2016

About OMICS Group Conferences

Fusion Analysis of Solid Tumors Reveals Novel Rearrangements in Breast Carcinomas

Development status of OPDIVO (nivolumab) 1

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

UK Biobank. Death and Cancer Outcomes Report. September

The 2015 donations will be transformative:

Action to Cure Kidney Cancer Campaign to Fund Kidney Cancer Research

Interventions for non-metastatic squamous cell carcinoma of the skin: a systematic review and pooled analysis of observational studies

Advances in Brain Tumor Research: Leveraging BIG data for BIG discoveries

incidence rate x 100,000/year

File Name: Supplementary Information Description: Supplementary Figures and Supplementary Tables. File Name: Peer Review File Description:

TCGA-Assembler: Pipeline for TCGA Data Downloading, Assembling, and Processing. (Supplementary Methods)

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

Annual Flash Report (unaudited) Fiscal Year ended March 31, 2017

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Cancer develops as a result of the accumulation of somatic

Contents. 1.5 GOPredict is robust to changes in study sets... 5

CDH1 truncating alterations were detected in all six plasmacytoid-variant bladder tumors analyzed by whole-exome sequencing.

PRELIMINARY PROGRAM MAY 31 - JUNE 4, 2019

SUPPLEMENTARY METHODS

The 100,000 Genomes Project

Integration of Cancer Genome into GECCO- Genetics and Epidemiology of Colorectal Cancer Consortium

Development Pipeline Progress Status. February 1, 2019

SUPPLEMENTARY INFORMATION

MAPK Pathway. CGH Next Generation Sequencing. Molecular Tools in Care of Patients with Pigmented Lesions 7/20/2017

Test Bank for Robbins and Cotran Pathologic Basis of Disease 9th Edition by Kumar

Valida5on of a Microsatellite Instability Assay by NGS

Clasificación Molecular del Cáncer de Próstata. JM Piulats

Module 3: Pathway and Drug Development

New molecular targets in lung cancer therapy

IPA Advanced Training Course

BRCAplus. genetic testing for hereditary breast cancer

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

Structured Pathology Reporting of Cancer Newsletter

CentoCancer STRIVE FOR THE MOST COMPLETE INFORMATION

Supplementary Information

The mutations that drive cancer. Paul Edwards. Department of Pathology and Cancer Research UK Cambridge Institute, University of Cambridge

GYNplus: A Genetic Test for Hereditary Ovarian and/or Uterine Cancer

Biomarker development in the era of precision medicine. Bei Li, Interdisciplinary Technical Journal Club

GYNplus. genetic testing for hereditary ovarian and/or uterine cancer

Macmillan Publications

Oncology 101. Cancer Basics

Integrated Analysis of Copy Number and Gene Expression

SHN-1 Human Digestive Panel Test results

Nature Methods: doi: /nmeth.3115

Identification of Potential Therapeutic Targets by Molecular and Genomic Profiling of 628 Cases of Uterine Serous Carcinoma

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

New Opportunities in Cancer Research National Academy of Sciences Advancing Health Research in Poland and the United States December 3, 2009

Supplementary Online Content

Cell Reproduction Virtual Lab. Interphase Prophase Metaphase Anaphase Telophase. Name

Genomic Methods in Cancer Epigenetic Dysregulation

Application of Whole Genome Microarrays in Cancer: You should be doing this test!!

Overview of Hong Kong Cancer Statistics of 2015

DNA-seq Bioinformatics Analysis: Copy Number Variation

NOSCAN CLINICAL MANAGEMENT GUIDELINE (CMG) AND NOSCAN CHEMOTHERAPY REVIEW (NCR) STATUS DOCUMENT May Status (G / A / R) Status (G / A / R)

Gene-microRNA network module analysis for ovarian cancer

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Nature Medicine: doi: /nm.4439

Cancer in the organ donor

RNA SEQUENCING AND DATA ANALYSIS

Structured Pathology Reporting of Cancer Newsletter

On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles

Transcription:

The Cancer Genome Atlas (TCGA) & International Cancer Genome Consortium (ICGC) Session 4 Rebecca Poulos Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW 28 th 29 th January 2016

Facts on cancer An estimated 126,800 new cases of cancer will be diagnosed in Australia this year, with that number set to rise to 150,000 by 2020 Cancer is a leading cause of death in Australia. In 2012, > 43,000 people died from cancer, accounting for about 3 in every 10 deaths. Source: Cancer Council Australia (2016)

Cancer is a disease of the genome Challenges in treating cancer: Every patient is different Every tumour is different, even in the same patient Tumours can be highly heterogeneous High rate of genomic abnormalities (few drivers, many passenger) Healthy 46 chromosomes Example cancer 59 chromosomes

What can go wrong in cancer genomes? Types of changes DNA mutations - Point mutations - Insertions & deletions DNA structural variations Copy number variation (CNV) DNA methylation mrna expression changes mirna expression changes Protein expression Some common technologies used to study these changes WGS; WXS WGS CGH array; SNP array; WGS Methylation array; RRBS; WGBS mrna expression array; RNA-seq mirna expression array; mirna-seq Protein arrays; mass spectrometry WGS = whole genome sequencing, WXS = whole exome sequencing RRBS = reduced representation bisulfite sequencing, WGBS = whole genome bisulfite sequencing

Goal of cancer genomics Identify changes in the genomes of tumors that drive cancer progression Identify new targets for therapy Select drugs based on the genomics of the tumour i.e. personalised therapy

Cancer Sequencing Projects The Cancer Genome Atlas (TCGA) Led by NIH Initiated in 2006 (as a pilot program ) and expanded in 2009 Aim: To make the genomes of 20 cancers publically available Update today: 33 cancer types & subtypes analysed (11,000 samples)

TCGA pipeline Publically available for researchers

Types of Cancers Breast Ductal carcinoma Lobular carcinoma Central nervous system Glioblastoma multiforme Lower grade glioma Endocrine Adrenocortical carcinoma Papillary thyroid carcinoma Paraganglioma and pheochromocytoma Gastrointestinal Cholangiocarcinoma Colorectal Adenocarcinoma Liver Hepatocellular Carcinoma Pancreatic Ductal Adenocarcinoma Stomach-Esophageal Cancer Gynecological Cervical Cancer Ovarian Serous Cystadenocarcinoma Uterine Carcinosarcoma Uterine Corpus Endometrial Carcinoma Head and neck Squamous cell carcinoma Uveal melanoma Hematologic Acute myeloid leukemia Thymoma Skin Cutaneous melanoma Soft tissue Sarcoma Thoracic Lung Adenocarcinoma Lung Squamous Cell Carcinoma Mesothelioma Urologic Chromophobe Renal Cell Carcinoma Clear Cell Kidney Carcinoma Papillary Kidney Carcinoma Prostate Adenocarcinoma Testicular Germ Cell Cancer Urothelial Bladder Carcinoma

Datasets Data types Clinical data Images Microsatellite instability DNA sequencing mirna sequencing Protein expression mrna & RNA sequencing Array-based expression DNA methylation Copy number Data access tiers Open access - De-identified - Requires no certification Controlled access - No direct identifiers - Must complete Data Access Request (DAR) form

Click on Cases with Data for tumour of interest.

Data levels explained: https://tcgadata.nci.nih.gov/tcga/tcgadatatype.jsp The green A indicates that the relevant data is available for that sample

Generally: Level 1 = Raw data Level 2 = A little processed Level 3 = Normalised and processed Raw data is controlled access data and requires application to dbgap Raw sequence data is held at CGHub (https://cghub.ucsc.edu/ ) https://tcga-data.nci.nih.gov/tcga/tcgadatatype.jsp

Click on RNA-seq to select all RNA-seq samples

After filling out email and hitting download, a link to the achieve with the files below will be sent to you

TCGA data portal is difficult to use Data portal is great for downloading data in large tab delimited format perfect for a bioinformatician Data portal files are difficult to use for the average biologist Fortunately there are some alternatives: cbioportal (www.cbioportal.org/) ICGC data portal (http://dcc.icgc.org/)

cbioportal (www.cbioportal.org/ ) A data analysis portal to TCGA data Provides functions for visualisation, analysis and download of data. Maintained by Memorial Sloan-Kettering Cancer Center

Features of cbioportal Visualising frequency of mutations Correlation between occurrence of mutations Correlation of expression and CNV or methylation Visualisation of mutations Survival analysis Network analysis Gao et al (2013) Sci. Signal

In this query, we are telling cbioportal to perform an analyse comparing all AML samples with ERG mutation or CNA and those without ERG mutation nor CNA. Select cancer study (AML, Provisional) Select the type of aberration you are interested in (Mutations & CNA) Select the sample set (Tumour samples with CAN data) Type in gene - can accept any number. (For this example, we will look at ERG)

OncoPrint 9 out of 191 samples have alteration in ERG: - 8 samples have amplifications of ERG - 1 sample has a deep deletion of ERG

Plots correlation ERG expression with CNA Samples with amplification possibly have higher expression

Survival analysis We know that high ERG expression is associated with poor survival (Marcucci et al JCO 2005). Seems like ERG amplification is also associated with poor survival.

Network analysis This network analysis is not that interesting, but it could be more useful with a larger input gene set.

Bookmark You can make a URL to immediately share analysis with collaborators

Gene summaries across cancer types Select all cancers Select the type of aberration you are interested in (Mutations & CNA) Type in gene - can accept any number. (For this example, we will look at ERG)

Gene summaries across cancer types Aberration frequency Types of aberration Data types in analysis Cancer types

Cancer Sequencing Projects International Cancer Genome Consortium (ICGC) Collaboration between 22 countries Initiated in 2007 Aim: To catalogue genomic abnormalities in tumours from 50 different cancer types & subtypes Update today: 88 projects, 17 jurisdictions, >25,000 tumour genomes Uses data from TCGA and the Sanger Cancer Genome Project

Working groups

ICGC Samples

Mandatory: Data types Genomic DNA analyses of tumors (and matching control DNA) are core elements of the project. Complementary (Recommended): Additional studies of DNA methylation and RNA expression are recommended on the same samples that are used to find somatic mutations. Optional: Proteomic analyses Metabolomic analyses Immunohistochemical analyses

Data access policy

ICGC data portal (http://dcc.icgc.org/) Click on cancer projects

Cancer project view click on BRCA-US

Click on Genome Viewer

View top mutated genes

Or search by mutation ID/location

Looking at mutations in specific genes Type in BRAF here

Gene centric view General information Then scroll down

Gene centric view Types of cancers with the mutation

Gene centric view More detailed information

Gene centric view Click for more detail on the mutations

Hover over the section of the pie chart to see what region it represents Each mutation has a unique ID (click for more info)

Advanced Search Find out which cancers commonly have BRAF missense mutations Go to the home page and click advanced search

Advanced Search Find out which cancers commonly have BRAF missense mutations Search for BRAF

Advanced Search Find out which cancers commonly have BRAF missense mutations Go to the mutations tab Select missense Hover mouse to see details. Most common cancers with BRAF missense mutations are thyroid cancer and melanoma.

Limitations of data portal The data portal is mutation centric i.e. All queries are related to retrieving tumours/samples with particular mutations in a particular gene If you just want expression/methylation data for a particular gene you still have to download the data

Downloading data from ICGC Go to the home page and click advanced search

Downloading data from ICGC Click download donor data Select cancer type of interest Note: go back to Advance search on home page

Downloading data from ICGC Select the data types of interest Click Submit

Downloading data from ICGC Or download from the data repository

Click through filters to choose what data you want Then download the data you selected The advantage of ICGC is that data for all samples is in a single file so it is easier to work with in Excel (if file is small) or Galaxy (if file is big).

Summary - There are global cancer genome sequencing projects with publically available data - TCGA data can downloaded or easily viewed through cbioportal - ICGC data can be downloaded or viewed from the user interface

Exercises 1. Download patient clinical annotations for AML (TCGA dataset) using TCGA data portal and then using the ICGC data portal. 2. Using the ICGC data portal: a. What is the cancer with most frequent RUNX1 mutations? b. Which cancer has the most RUNX1 frameshift mutations? 3. Using cbioportal: a. Do kidney renal papillary cell carcinoma patients with BAP1 mutations have worse survival than those without?