The Cancer Genome Atlas July 14, 2011 Kenna M. Shaw, Ph.D. Deputy Director The Cancer Genome Atlas Program
TCGA: Core Objectives Launched in 2006 as a pilot and expanded in 2009, the goals of TCGA are to: Establish the needed infrastructure, environment, community where the big fish swim together Develop a scalable pipeline beginning with highest quality samples Determine the feasibility of a large-scale, high throughput, systematic approach to identifying all of the relevant genetic alterations in cancer Systematically evaluate two cancers using a statistically-robust sample set (500 cancers and matched controls) Make the data publicly and broadly available to the cancer communities in a manner that protected patient privacy 2
TCGA: No Platform Left Behind 25 forms of cancer glioblastoma multiforme (brain) squamous carcinoma (lung) serous cystadenocarcinoma (ovarian) Etc. Etc. Etc. Biospecimen Core Resource with more than 150 Tissue Source Sites 6 Cancer Genomic Characterization Centers 3 Genome Sequencing Centers 7 Genome Data Analysis Centers Data Coordinating Center Multiple data types Clinical diagnosis Treatment history Histologic diagnosis Pathologic report/images Tissue anatomic site Surgical history Gene expression/rna sequence Chromosomal copy number Loss of heterozygosity Methylation patterns mirna expression DNA sequence RPPA (protein) Subset for Mass Spec 3
TCGA: Lessons Learned from the Pilot #1: It s About the Pathways, People! 4
TCGA: Lessons Learned from the Pilot #2: Comparing Cancer Types is Like Comparing Apples and Oranges 5
TCGA: Lessons Learned from the Pilot #3: If you build it [a data portal], they will come. 6
Making an Exhaustible Resource Inexhaustible #4: The model CAN work and we can make it happen. 7
TCGA: Lessons Learned from the Pilot #4: Slow and steady wins the race. 8
TCGA: Platforms- Then and Now Platform Pilot Expansion SNP/CNV Affy SNP 6.0 Agilent CGH Array Illumina 1M Duo 9 Affy SNP 6.0 Low Pass Sequencing* Methylation Infinium Array Infinium Array mrna Agilent 244K Array Affy Human Exon Array Affy U133 Array RNAseq mirna Agilent 8 x 15K Array RNAseq Mutation 600-1000 genes DNAseq: 90% whole exomes 10% whole genomes *- Not all samples currently receiving low pass sequencing for Copy Number/Rearrangement assays More information on platforms and data available at: http:/tcga-data.nci.nih.gov/tcga/tcgaplatformdesign.jsp
TCGA Tumor Types AML Breast Ductal Breast Lobular/Breast Other Bladder (pap and non-pap) Cervical adeno & squamous Colon Clear cell kidney DLBCL Endometrial carcinoma Esophageal adeno & squamous Gastric adenocarcinoma GBM Head and Neck Squamous 10 Hepatocellular Lower Grade Glioma Lung adeno Lung squamous Melanoma Ovarian serous cystadenocarcinoma Papillary kidney Pancreas Prostate Rectal Sarcoma (dediff lipo, UPS, leiomyosarcoma) Thyroid
Sample Criteria Limit Askable Questions 10,000 10 Primary tumor only (except for melanoma) Malignant (no in situ cases) Snap frozen, <60min from clamp to LN2 ~ 50-100 mg (aka no biopsies) Pathology review of tissue sent to TCGA No more than 20% necrosis ; 60% tumor cells No prior treatment Normal tissue: Blood (buffy coat/white cells); some adjacent normal tissue allowable but limited Clinical annotation IRB approval for use in TCGA 11
Tumor Project Progress 12
Race & Ethnicity Data Summary Need to collaborate with biobanks that serve more diverse communities SNP data might be better metric for some information due to a) limited success in getting data and b) concerns with self-reported data 13
Acknowledgements Margi Sheth (Tumor Project Groups) John Demchok (Clinical Data Quality Manager) Martin Ferguson (Clinical Informatics) Julie Gastier-Foster/Robert Penny (BCRs) TCGA Research Network Kenna Shaw: shawk@mail.nih.gov First Annual TCGA Scientific Symposium: http://www.capconcorp.com/meeting/2011/tcga/default.asp 14