Propensity Scores; Generalising Results from RCT s

Propensity Scores; Generalising Results from RCT s Robbie Peck, University of Bath June 5, 2016

The Idea Randomised Controlled Trials (RCT s) are the gold standard for estimating the effect of treatments on healthy patients under controlled conditions. Small scale Efficacy

The Idea Randomised Controlled Trials (RCT s) are the gold standard for estimating the effect of treatments on healthy patients under controlled conditions. Small scale Efficacy But will the treatment be effective in a particular target population? Effectiveness Clinical trial participants might not be representative of target population

The Idea Definition The propensity score of a patient is the probability of taking part in the RCT given the covariates of the patient

The Idea Definition The propensity score of a patient is the probability of taking part in the RCT given the covariates of the patient Can be used to reweight the results of a RCT in order to estimate the effectiveness of a drug in a target population.

Problem Formulation Ω = the target population of interest.

Problem Formulation Ω = the target population of interest. X i = the covariates of i, i Ω.

Problem Formulation Ω = the target population of interest. X i = the covariates of i, i Ω. Φ Ω. A RCT is performed on Φ.

Problem Formulation Ω = the target population of interest. X i = the covariates of i, i Ω. Φ Ω. A RCT is performed on Φ. S i = indicator variable for patient i being in Φ.

Problem Formulation Ω = the target population of interest. X i = the covariates of i, i Ω. Φ Ω. A RCT is performed on Φ. S i = indicator variable for patient i being in Φ. T i = indicator variable that patient i is in the treatment group (as opposed to the control) of the RCT.

Problem Formulation RCT average treatment effect: 1 Φ i {s i =1} Y i (1) Y i (0). Target Population average treatment effect: 1 Ω Ω i=1 Y i(1) Y i (0).

Problem Formulation RCT average treatment effect: 1 Φ i {s i =1} Y i (1) Y i (0). Target Population average treatment effect: 1 Ω Ω i=1 Y i(1) Y i (0). Assuming Φ is representative of the target population Ω, these should be equal.

Three Assumptions P( S i = 1 X i ) > 0 for any covariates X i. All patients in the target population have some probability of being used in the RCT.

Three Assumptions P( S i = 1 X i ) > 0 for any covariates X i. All patients in the target population have some probability of being used in the RCT. S [Y (0), Y (1)] X There are no unmeasured confounders (related to both trial sample selection and treatment effect). T [S, Y (0), Y (1)] X Treatment assignment in the RCT is randomly assigned and independent of sample selection and responses.

Estimating the propensity scores Definition The propensity score of patient i is p i = P(S i = 1 X i )

Estimating the propensity scores Definition The propensity score of patient i is p i = P(S i = 1 X i ) Given observed data {S i } i Φ and covariates {X } i Φ, estimate propensity scores by logistic regression: S i = Bern(p i ) log ( p i 1 p i ) = β 0 + β 1 X 1,i +... + β k X k,i

Assessing Generalisability Propensity score difference:= difference between the mean propensity scores of patients in the trial sample Ω and target population Φ: p = 1 Ω i {S i =1} ˆp i 1 Φ Ω i {S i =0} ˆp i If propensity score means differ by >0.25 sd s, then clinical trial population Φ may not be representative of Ω.

Assessing Generalisability To see if propensity score weighting works: Weight the RCT control group so the characteristics are like the target population, and then compare the responses.

Assessing Generalisability To see if propensity score weighting works: Weight the RCT control group so the characteristics are like the target population, and then compare the responses. Use propensity score matching : Inverse Probability of Treatment Weighting: Each individual is given a weight 1 ˆp i (X i ). Under our 3 assumptions, the RCT control group under this weighting has responses which are an unbias estimate of those in the target population.

Assessing Generalisability IPoTW can be unstable/give very high propensity scores

Assessing Generalisability IPoTW can be unstable/give very high propensity scores Subclassification and Full Matching use coarser weights by grouping individuals with a similar propensity score.

Assessing Generalisability IPoTW can be unstable/give very high propensity scores Subclassification and Full Matching use coarser weights by grouping individuals with a similar propensity score. R package Matching

Estimating the Treatment Effect in the Target Population Assuming we are happy, use these propensity score matching methods on the treatment group instead.

Estimating the Treatment Effect in the Target Population Assuming we are happy, use these propensity score matching methods on the treatment group instead. As before, weight the patients in the treatment group of the RCT. Compare treatment effects in this weighted population and the target population.

Application: PBIS Study in Maryland USA PBIS is a school prevention program which aims to "improve the school climate through better systems and procedures".

Application: PBIS Study in Maryland USA PBIS is a school prevention program which aims to "improve the school climate through better systems and procedures". 37 Maryland schools took part in a RCT to investigate the effect of PBIS in 2002-2003. The 37 schools were randomised into control and treatment groups. Define the target population as Maryland schools.

Application: PBIS Study in Maryland USA Main Point: The characteristics/covariates of the schools in the RCT are different to those in the target population.

Application: PBIS Study in Maryland USA Main Point: The characteristics/covariates of the schools in the RCT are different to those in the target population. Schools in the trial had lower test scores and higher free school meals than those in the target population.

Application: PBIS Study in Maryland USA Figure: Propensity score density of schools in target population and propensity scores of 37 schools in RCT (vertical lines) Mean propensity score difference p = 0.055. So Φ not representative of Ω.

Application: PBIS Study in Maryland USA Figure: Observed and Predicted Maths outcomes for schools in Maryland. Thick Line: Target Population Scores. Dashed Line: Control Group RCT Scores. Thin Line: Propensity Score Weighted Control Group RCT Scores.

Application: PBIS Study in Maryland USA "Despite the differences seen between the trial and non-trial schools, it appears that the control schools in the trial reflect what was happening across the state as a whole, when weighted up to represent the population" - Stuart et al. (2010) [2] Therefore if we weighted the RCT treatment group schools in the same way, we unbiasedly assess the effectiveness of PBIS in the target population of Maryland Schools.

Link to ITT4: Trials Within Cohorts (TwiCs) [1] Idea: Recruit a large observational cohort with a condition. Regularly measure responses. Perform repeated RCT s over time on subsets of the cohort.

Link to ITT4: Trials Within Cohorts (TwiCs) [1] Idea: Recruit a large observational cohort with a condition. Regularly measure responses. Perform repeated RCT s over time on subsets of the cohort. In each RCT: Randomly select some patients, who are offered the treatment. Compare responses with remaining patients.

Bibliography O Cathain A Nicholl J Relton C, Torgerson D. Rethinking pragmatic randomised controlled trials: introducing the cohort multiple randomised controlled trial design. BMJ, 340:c1066, 2010. Bradshaw C Leaf P Stuart E, Cole S. The use of propensity scores to assess the generalizability of results from randomized trials. J R Stat Soc Ser A Stat Soc, 2010.