Critical Appraisal Skills. Professor Dyfrig Hughes Health Economist AWMSG

Critical Appraisal Skills Professor Dyfrig Hughes Health Economist AWMSG

Critical appraisal of economic evaluations Quality of the underlying evidence Quality of the analysis Quality of reporting

1. Quality of the underlying evidence Consider the inputs to an economic evaluation Population Costs Utilities Events Event probabilities Treatment effect ADRs Survival..

Population Aligned with the licensed indication? Representative of eligible population in Wales? Considered uncertainty due to small populations? How does the modelled population reflect the trial population? Are there sub-groups that may be more relevant?

Resource use Resource implications should be identified, measured and valued within a Welsh context (i.e. using Welsh data on resource utilisation and unit costs). Submitted economic evaluations that do not include Welsh data are required to include a comment on the validity of using resource data from outside Wales, and make reference to any relevant differences in the healthcare environments. Data from any other UK country, or elsewhere, will not be accepted where Wales-specific data is available.

Costs Perspective Items of resources used (type) Is the list complete? Did they consider every possible cost impact of the new medicine? Items of resources used (number) How did they estimate the number of resources used? Based on expert opinion, evidence, data on file? Unit cost per item Standard sources? Relevant to Wales?

Costs Based on list price, not discounted (unless part of agreed PAS/WPAS)? Relevant to Wales where possible PEDW Patient Episode Database for Wales Comparable to HES in England tinyurl.com/pedwdata

Treatment effect Efficacy or effectiveness? May have been derived from more than one trial Appropriate systematic review and meta-analytic techniques required Non-inferiority equivalence Beware of selective use of sub-groups and inappropriate trials and whether the trial evidence ties in with the licensed indication(s) Always interested in the expected outcome e.g. mean survival (= area under the survival curve)

Effectiveness Indirect treatment comparisons only applicable if there are no direct trials of the relevant comparator Informed by a systematic review of the evidence Follow best practice: full details of SR, reasons for inclusion/exclusion, tests for heterogeneity, (in)consistency, etc.

Comparator Have all the appropriate comparators been considered? Have they modelled treatment pathways representative of care in Wales Did they seek (Welsh) clinical opinion?

Clinical inputs Adverse events Important source of morbidity and costs in many instances Need to make sure that the incidence of ADRs used in the model reflects that observed in the trial (or other) data source And that HR-QoL adjustments are made

Utilities Based predominantly on the EQ-5D (3L) Not validated in children Alternatives include HUI2, CHU-9D, AQoL-6D Are there quality of life impacts that are not captured in the QALY calculation? Use of disease-specific preference-based measures is increasing Sensitive to change, but not comparable across disease areas Companies should use primary QoL data from trial where possible and avoid unnecessary mapping

2. Quality of the analysis Extrapolating beyond trial data Trials are often of insufficient duration to capture all costs & consequences Leads to biased estimate of the ICER Modelled extrapolation e.g. Survival analysis Incorporate into Markov model

Modelling Models for extrapolation of benefit, specification of health states etc should be transparent, validated, subjected to different scenario analyses Did they consider alternative model specifications DES may be more applicable than Markov Is an overly complicated model necessary (reduces transparency)? Impact of structural uncertainty on the ICER

Markov Model Technique for analysing repeated events Models finite number of defined health states Useful for chronic diseases Individuals entering model progress from one state to another according to a set of transition probabilities

Markov Model Structure Patient Asymptomatic disease Progressive disease Pt 1 Pt 2 Death Pt n = transitional probability Pt 3

Markov Model Example 100 identical patients simultaneously enter the process If Pt 1 = 0.1; Pt 2 = 0.3; Pt 3 = 0.1 At t=t+1 80 patients will re-enter symptom-free 10 will progress to state of disease progression 10 will be dead At t=t+2 64 patients will remain symptom-free 15 will be in the state of disease progression 21 will be dead and so on..

100 Trial evidence Extrapolation 80 60 40 PFS Progression Dead 20 0 0 5 10 15 20 25

Trial Evidence Extrapolation Time Time Asymptomatic PFS Diseased Progressive Dead Dead Total Total 0 100 0 0 100 1 80 10 10 100 2 64 15 21 100 3 51.2 16.9 31.9 100 4 41.0 17.0 42.1 100 5 32.8 16.0 51.3 100 6 26.2 14.4 59.3 100 7 21.0 12.7 66.3 100 8 16.8 11.0 72.2 100 9 13.4 9.4 77.2 100 10 10.7 7.9 81.3 100 11 8.6 6.6 84.8 100 12 6.9 5.5 87.6 100 13 5.5 4.5 90.0 100 14 4.4 3.7 91.9 100 15 3.5 3.0 93.4 100 16 2.8 2.5 94.7 100 17 2.3 2.0 95.7 100 18 1.8 1.6 96.6 100 19 1.4 1.3 97.2 100 20 1.2 1.1 97.8 100 21 0.9 0.9 98.2 100

Analysis For each health state Attach a utility score Attach a cost Treatment might be expected to have effect on one of the transition probabilities Sum for Tx(A), Tx(B) over a lifetime Calculate ICER

Pt 1 = 0.1; Pt 2 = 0.3; Pt 3 = 0.1 Time PFS Prog Dead 0 100 0 0 1 80.0 10.0 10.0 2 64.0 15.0 21.0 3 51.2 16.9 31.9 19 1.4 1.3 97.2 20 1.2 1.1 97.8 21 0.9 0.9 98.2 22 0.7 0.7 98.6 23 0.6 0.6 98.8 Cost Utility PFS 400 0.8 Progressed 800 0.6 Expected QALYs = 5/12 = 0.42 QALYs Expected lifetime costs = 3,306

Pt 1 = 0.05; Pt 2 = 0.3; Pt 3 = 0.1 Time PFS Prog Dead 0 100 0 0 1 85.0 5.0 10.0 2 72.3 7.8 20.0 3 61.4 9.0 29.6 19 4.6 1.5 94.0 20 3.9 1.3 94.9 21 3.3 1.1 95.6 22 2.8 0.9 96.3 23 2.4 0.8 96.8 Cost Utility PFS 800 0.8 Progressed 800 0.6 Expected QALYs = 5.9/12 = 0.49 QALYs Expected lifetime costs = 6,079

ICER = 6079-3306 0.49 0.42 = 37,059 per QALY gained Not cost effective.

Extrapolation Beware submissions often choose parametric function based on one that makes ICER look lowest! Different parametric functions Diagnostics, visual inspection Based on fit to the observed data Duration of treatment benefit in extrapolated phase Nil Same as treatment phase and continues at the same level Diminishes in the long term Plausibility 12 week trial => lifetime benefit? Expert clinical opinion on plausibility

Curve fitting

Case study Sorafenib for the treatment of patients with advanced renal cell carcinoma who have failed prior interferon-α or interleukin-2 based therapy or are considered unsuitable for such therapy

Economic analysis Cost-utility model comparing sorafenib (plus BSC) with BSC alone Model was based on the TARGET trial, but results were extrapolated to a 10-year time horizon

Comparator Although [AWTTC] recognised that at the time of submission no other product was licensed, it remains that BSC is unlikely to be an appropriate comparator when treatment with interferon-α or interleukin-2 has failed

TARGET trial Almost 20% not previously exposed to cytokine (interferon-α or interleukin-2) based therapy Placebo control (BSC) Primary endpoint overall survival Secondary endpoint included Progressionfree survival

TARGET trial Because of benefits in progression-free survival, FDA advised to close the placebo arm, and allow cross-over After median 6.6 month follow-up Sorafenib group: 22% patients died Placebo group: 27% patients died HR 0.72, 95% CI 0.54 to 0.94

Economic model Patients entering the model were all in the progression free survival state Transition of patients among the health states estimated from the TARGET trial 3-month cycles Extrapolate from the endpoint of the trial to 10 years

Percentage of patients Percentage of patients Percenta Output 20% from model (trial time horizon) 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 40% 30% 10% 100% 0% 90% 0% 80% 70% 60% 50% 40% 30% 20% 10% 0% 1 4 7 10 13 16 19 22 25 28 31 34 Cycles elapsed PFS Progression Dead 1 4 7 10 13 16 19 22 25 28 31 34 37 40 Cycles elapsed 1 4 7 10 13 16 19 22 25 28 31 3 PFS Progression Dead

Percentage of patients Extrapolating to 10 years 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 1 4 7 10 13 16 19 22 25 28 31 34 37 40 Cycles elapsed PFS Progression Dead

Model vs. Trial (median survival - months) Model Clinical trial Sorafenib 28.5 19.3 Placebo 13.5 15.9

What they should have done

Utility values used in the model Utility values were obtained from a survey of 31 UK clinicians, who were asked to complete the EQ-5D questionnaire For combined performance status 0 and 1 in the progressive health state, utility estimates ranged from -0.239 to 1.000 Could have done better..

Health Care resource use Derived from questionnaires with clinicians One set of clinicians (experienced in the use of sorafenib) provided data for healthcare resources used in the management of patients receiving sorafenib Another set of clinicians provided estimates in relation to BSC

Results Company s submission provided an estimate of 35,523 per QALY gained Shorter time horizon of 1-year yielded a cost per QALY estimate of 343,989

Sensitivity analysis No probabilistic analysis was performed Sensitivity analysis limited to parameter uncertainty, bounded by the 95% confidence interval, and choice of discount rate

Face validity Do the results seem plausible? E.g. Check whether small benefits (indicated by the RCT) translate to large QALY gains Does the company s base-case correspond with your preferred set of assumptions? Do utility values exceed population norms in treated patients? Are estimated QALY gains realistic based on the clinical trial data? Are findings consistent? How do they compare with other AWMSG/NICE determinations in comparable populations? How do the estimated life years gained compare with evidence taken directly from the RCTs?

Uncertainty...medicines with presented ICERs less than 20,000 per QALY gained may not be recommended if AWMSG are not persuaded by the plausibility of the inputs to the economic modelling and/or the certainty around the estimated ICER Above a most plausible ICER of 20,000 per QALY gained, judgements about the acceptability of the medicine as an effective use of NHS resources will specifically take account of...the degree of certainty surrounding the calculation of ICERs...

Structural uncertainty Scenario analyses Uncertainty Did they present the data for each and every scenario for you to decide which you consider to be plausible? Parameter uncertainty Sensitivity analyses on all key parameters Tornado diagrams, multi-way analyses Probabilistic Sensitivity Analysis C/E plane, % in each quadrant, CEACs Probability C/E at WTP 20k and 30k

3. Quality of reporting Husereau D, Drummond M, Petrou S, Carswell C, Moher D, Greenberg D, Augustovski F, Briggs AH, Mauskopf J, Loder E; CHEERS Task Force. Consolidated Health Economic Evaluation Reporting Standards (CHEERS) statement. BMJ. 2013 Mar 25;346:f1049.