Prognostic and predictive biomarkers Marc Buyse International Drug Development Institute (IDDI) Louvain-la-Neuve, Belgium marc.buyse@iddi.com 1
Prognostic biomarkers (example: gene signature) 2
PROGNOSTIC GENE SIGNATURES Measure 25,000 genes in RNA from breast tumors Apply algorithm to identify classifier Class of good prognosis Class of poor prognosis 3
MAMMAPRINT Measure 25,000 genes in RNA from breast tumors Apply algorithm to identify classifier Good prognosis (no metastases at 5 years) Agendia 24,479 probe sets Poor prognosis (metastases within 5 years) Ref: van t Veer et al, Nature 2002; 415: 539. 4
5
6
7
GENE SIGNATURES IN BREAST CANCER 70-gene «Amsterdam» signature (MammaPrint, Agendia) 76-gene «Rotterdam» signature (Veridex) 21-gene assay (Oncotype DX, Genomic Health) 97-gene «genomic grade» (MapQuant Dx, Ipsogen) and others These signatures were identified using different criteria and include different sets of genes. Yet they are broadly similar in their ability to classify patients to good or poor prognosis. 8
Ref: van de Vijver et al, NEJM 2002;347,1999
Impressive odds ratio: OR = (31 / 18) / (3 / 26) = 15.0 Ref: van de Vijver et al, NEJM 2002;347,1999
BUT poor predictive accuracy Sensitivity = 31 / 34 =.91 Specificity = 26 / 44 =.59
Sensitivity Even Excellent Prognostic Models may Have Poor Discriminative Power 1-Specificity Ref: Pepe, Statist Med 2005; 24:3687. 12
Sensitivity Even Excellent Prognostic Models may Have Poor Discriminative Power Sensitivity = 91% 1-Specificity Ref: Pepe, Statist Med 2005; 24:3687. 13
Sensitivity Even Excellent Prognostic Models may Have Poor Discriminative Power Sensitivity = 91% Specificity = 59% 1-Specificity Ref: Pepe, Statist Med 2005; 24:3687. 14
Probability of distant metastasis at 5 years ALL SIGNATURES HAVE POOR PREDICTIVE ACCURACY Sensitivity: 52/56=93% 100% 90% 80% 70% 60% 50% 40% 30% 20% Good-Prognosis Group Poor-Prognosis Group 10% Specificity: 55/115=48% 0% -75-60 -45-30 -15 0 10 25 40 55 70 85 100 115 130 145 Relapse Hazard Score Adapted from Foekens, Erasmus Medical Center, Rotterdam, the Netherlands
ALL SIGNATURES HAVE POOR PREDICTIVE ACCURACY Average risk = 7% 14% 31% < 18 18-30 > 30 Ref: Paik et al, NEJM 2004;351,2817
Even Excellent Prognostic Models may Have Poor Discriminative Power Ref: Royston et al, JNCI 2008; 100:92. 17
Even Excellent Prognostic Models may Have Poor Discriminative Power Ref: Royston et al, JNCI 2008; 100:92. 18
Even Excellent Prognostic Models may Have Poor Discriminative Power 2-20 months Ref: Royston et al, JNCI 2008; 100:92. 19
THE YOUDEN INDEX AS A MEASURE OF PREDICTIVE ACCURACY Youden Index = Sensitivity + Specificity 1 Worst Useless * -1 0 Perfect 1 * Useless because a classification independent of true risk would classify patients equally well Ref: Youden D. Index for rating diagnostic tests. Cancer 1950; 3: 32.
THE YOUDEN INDEX AS A MEASURE OF PREDICTIVE ACCURACY Youden Index = Sensitivity + Specificity 1 Worst Useless * -1 0 Perfect 1 Region of interest * Useless because a classification independent of true risk would classify patients equally well Ref: Youden D. Index for rating diagnostic tests. Cancer 1950; 3: 32.
Sensitivity = 31 / 34 =.91 Specificity = 26 / 44 =.59 Youden Index =.91 +.59 1 = 0.50
CLINICO-PATHOLOGICAL PROGNOSTIC FACTORS ALSO HAVE POOR PREDICTIVE ACCURACY Metastases within 5 years Sensitivity Specificity Youden Index Gene signature (Amsterdam validation) Gene signature (independent validation) Nottingham Prognostic Index 0.91 0.59 0.50 0.90 0.42 0.32 0.91 0.32 0.23 Adjuvant! Online 0.87 0.29 0.16 St Gallen criteria 0.96 0.10 0.06 Ref: Buyse et al, JNCI 2006; 98:1183.
THE YOUDEN INDEX FOR EARLY BREAST CANCER PROGNOSTIC CLASSIFICATIONS Amsterdam validation Y = 0.50 Independent validation Y = 0.32 Nottingham Prognostic Index Y = 0.23 Adjuvant! Online Y = 0.16 St Gallen Y = 0.06-1 0 1
CLINICO-PATHOLOGICAL PROGNOSTIC FACTORS ALSO HAVE POOR PREDICTIVE ACCURACY Metastases within 5 years Sensitivity Specificity Youden Index Gene signature 0.90 0.42 0.32 Adjuvant! Online 0.87 0.29 0.16 Adjuvant! Online concordant with gene signature 0.93 0.28 0.21 Adjuvant! Online discordant with gene signature 0.40 0.30-0.30 Ref: Buyse et al, JNCI 2006; 98:1183.
THE YOUDEN INDEX FOR EARLY BREAST CANCER PROGNOSTIC CLASSIFICATIONS Adjuvant! discordant with gene signature Y = -0.30 Independent validation Y = 0.32 Adjuvant! Online Y = 0.16 Adjuvant! concordant with gene signature Y = 0.21-1 0 1
IRRELEVANT SIGNATURES DISCRIMINATE HIGH AND LOW RISK PATIENTS! Ref: Venet et al, PLoS Computational Biol 2011; 7:e1002240.
IRRELEVANT SIGNATURES DISCRIMINATE HIGH AND LOW RISK PATIENTS! Effect of postprandial laughter on peripheral blood mononuclear cells Skin fibroblast localization Social defeat (mice brain) Ref: Venet et al, PLoS Computational Biol 2011; 7:e1002240.
RANDOM SIGNATURES DISCRIMINATE HIGH AND LOW RISK PATIENTS! Of 1890 signatures from the MsigDB database, 67% are associated with breast cancer outcome at P <.05 27% are associated with breast cancer outcome at P < 10-5 Cell proliferation integrates most of the prognostic information present in the breast cancer transcriptome. Ref: Venet et al, PLoS Computational Biol 2011; 7:e1002240.
SOME SIGNATURES REMAIN SIGNIFICANT AFTER ADJUSTMENT FOR PROLIFERATION Red squares: OS hazard ratios after adjustment for the top 1% genes more strongly correlated with proliferating cell nuclear antigen (PCNA) Ref: Venet et al, PLoS Computational Biol 2011; 7:e1002240.
Predictive biomarkers (example: gene mutation) 33
A Predictive Biomarker in NSCLC 34
Gefitinib vs. Chemotherapy in NSCLC Ref: Mok et al, NEJM 2009;361:947 35
Gefitinib vs. Chemo: EGFR Mutation Ref: Slides courtesy of Astra-Zeneca 36
Gefitinib vs. Chemo: No EGFR Mutation Ref: Slides courtesy of Astra-Zeneca 37
Gefitinib is Either Better or Worse (Qualitative Interaction) Ref: Slides courtesy of Astra-Zeneca 38
Validation of Predictive Biomarkers Need to show interaction between the biomarker levels at baseline, or changes of biomarker over time, and treatment effect Hence, randomized evidence is usually needed (to estimate treatment effects by biomarker levels reliably) And, interaction test has very low power 39
Gefitinib is Either Better or Worse (Qualitative Interaction) Treatment HR = 0.74 Interaction HR = HR(EGFR M+) / HR (EGFR M-) = 0.48 / 2.85 = 0.17 interaction effect size >> treatment effect size Interaction highly significant 40
Interaction Test Has a Very Low Power Inflation factor required to increase the sample size so that the interaction test has the same power as the original sample size had for the overall treatment effect. Ref: Brookes et al., J Clin Epidemiol 2004;57:229 41
Interaction Test Has a Very Low Power 1 Inflation factor required to increase the sample size so that the interaction test has the same power as the original sample size had for the overall treatment effect. Ref: Brookes et al.; J Clin Epidemiol 2004;57:229 42
Interaction Test Has a Very Low Power 4 1 Inflation factor required to increase the sample size so that the interaction test has the same power as the original sample size had for the overall treatment effect. Ref: Brookes et al.; J Clin Epidemiol 2004;57:229 43
Interaction Test Has a Very Low Power 0.5 Inflation factor required to increase the sample size so that the interaction test has the same power as the original sample size had for the overall treatment effect. Ref: Brookes et al.; J Clin Epidemiol 2004;57:229 44
Interaction Test Has a Very Low Power 16 0.5 Inflation factor required to increase the sample size so that the interaction test has the same power as the original sample size had for the overall treatment effect. Ref: Brookes et al.; J Clin Epidemiol 2004;57:229 45
Validation and clinical utility Validation is relatively straightforward for prognostic biomarkers (independent of treatment) but, what is the clinical relevance? Validation is extremely challenging for predictive biomarkers (need for large randomized evidence) but clinical relevance is clear! 46