Propensity Score Methods for Longitudinal Data Analyses: General background, ra>onale and illustra>ons*

Similar documents
Propensity Score Analysis: Its rationale & potential for applied social/behavioral research. Bob Pruzek University at Albany

Propensity Score Analysis to compare effects of radiation and surgery on survival time of lung cancer patients from National Cancer Registry (SEER)

Casual Methods in the Service of Good Epidemiological Practice: A Roadmap

For Objec*ve Causal Inference, Design Trumps Analysis. Donald B. Rubin Department of Sta*s*cs Harvard University 16 March 2012

Propensity Score. Overview:

Applica(on of Causal Inference Methods to Improve Treatment of HIV in Resource Limited Se?ngs

Welcome! Pragmatic Clinical Studies. David Hickam, MD, MPH Program Director Clinical Effectiveness Research. David Hickam, MD, MPH

Introduction to Observational Studies. Jane Pinelis

Sta$s$cs is Easy. Dennis Shasha From a book co- wri7en with Manda Wilson

In this module we will cover Correla4on and Validity.

Classifica4on. CSCI1950 Z Computa4onal Methods for Biology Lecture 18. Ben Raphael April 8, hip://cs.brown.edu/courses/csci1950 z/

Process of Science and hypothesis tes2ng in Behavioral Ecology

A large database study in the general population in England

Common Data Elements: Making the Mass of NIH Measures More Useful

Propensity Score Analysis Shenyang Guo, Ph.D.

Unit 1 Exploring and Understanding Data

Are Job- Training Programs Effec4ve?

Learning Objec1ves. Study Design Considera1ons in Clinical Pharmacy

PubH 7405: REGRESSION ANALYSIS. Propensity Score

BIOSTATISTICAL METHODS

3/31/2015. Designing Clinical Research Studies: So You Want to Be an

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

Dr. Alessio Signori Longitudinal trajectories of EDSS in primary progressive MS pa:ents A latent class approach

Use and Interpreta,on of LD Score Regression. Brendan Bulik- Sullivan PGC Stat Analysis Call

Causes of variability in developmental disorders

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

Today. HW6 ques.ons? Next reading presenta.on: Friday (R25) Sta.s.cal methods

Public Health / Public Works Introduc6on, reitera6on + Lessons 1-2. Pioneer Winter IDH 3034, IDH 4007

Piaget s Studies in Generaliza2on. Robert L. Campbell Department of Psychology Clemson University June 2, 2012

Learning Objec1ves. Study Design Strategies. Cohort Studies 9/28/15

Pragma&c Clinical Trials

OHDSI Tutorial: Design and implementation of a comparative cohort study in observational healthcare data

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

Chapter 7: Descriptive Statistics

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of

Debate Regarding Oseltamivir Use for Seasonal and Pandemic Influenza

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine

Blue Cross Blue Shield of Michigan Building a Statewide PCMH Program: Design, Evalua>on Methods, and Results

Ac*vity Level Interpreta*on Of Forensic Evidence. Some defini*ons. NCSTL BJA Atlanta, August 2012

MS&E 226: Small Data

Clinical Research Project Design and Guidelines: Choosing a Research Ques8on

Categories. Represent/store visual objects in terms of categories. What are categories? Why do we need categories?

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Key Statistical Considerations. Dr Delva Shamley

Understandable Statistics

The Mortality Effects of Re3rement: Evidence from Social Security Eligibility at Age 62

Recording ac0vity in intact human brain. Recording ac0vity in intact human brain

The Availability of Homeopathic Medicinal Products (HMPs) in Europe with reference to the Matrix Report

Diurnal Pattern of Reaction Time: Statistical analysis

Development and Behavioral Gene2cs. PSC 113 Jeff Schank

Causal Inference from Complex Observa4onal Data. Samantha Kleinberg

THE FRONT- LINE LEADER S INTERPRETATION OF EMOTIONAL INTELLIGENCE SKILLS. Tanya O Neill, Psy.D. April 2016

Discordant MIC Analysis: Tes5ng for Superiority within a Non- inferiority Trial

For more information about how to cite these materials visit

CDC's Tips From Former Smokers (Tips) Campaign and Its Impact on Quitlines

Simple Sensitivity Analyses for Matched Samples Thomas E. Love, Ph.D. ASA Course Atlanta Georgia

Identifying Engineering, Clinical and Patient's Metrics for Evaluating and Quantifying Performance of Brain- Machine Interface Systems

Propensity Score Methods to Adjust for Bias in Observational Data SAS HEALTH USERS GROUP APRIL 6, 2018

Regression Discontinuity Analysis

Illinois Accountability Technical Advisory Committee (TAC) Academic Growth for Accountability

Econometric Game 2012: infants birthweight?

A Practical Guide to Getting Started with Propensity Scores

Key questions when starting an econometric project (Angrist & Pischke, 2009):

Chapter 13 Estimating the Modified Odds Ratio

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Book review of Herbert I. Weisberg: Bias and Causation, Models and Judgment for Valid Comparisons Reviewed by Judea Pearl

Propensity scores: what, why and why not?

TRIPLL Webinar: Propensity score methods in chronic pain research

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Sensitivity Analysis in Observational Research: Introducing the E-value

Propensity Score Analysis to compare effects of radiation and surgery on survival time of lung cancer patients from National Cancer Registry (SEER)

6/20/18. Prac+cum #1 Wednesday 6/27. Lecture Ques+on. Quick review. Establishing Causality. Causality

Propensity Score Matching with Limited Overlap. Abstract

Propensity score methods : a simulation and case study involving breast cancer patients.

State-of-the-art Strategies for Addressing Selection Bias When Comparing Two or More Treatment Groups. Beth Ann Griffin Daniel McCaffrey

Propensity score methods to adjust for confounding in assessing treatment effects: bias and precision

Local EBQ Course - PERU Epidemiology, Biostatistics & Qualitative Research Methods

Overview of Perspectives on Causal Inference: Campbell and Rubin. Stephen G. West Arizona State University Freie Universität Berlin, Germany

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

MEASURES OF ASSOCIATION AND REGRESSION

Module 14: Missing Data Concepts

Donna L. Coffman Joint Prevention Methodology Seminar

Business Statistics Probability

Causal Inference: predic1on, explana1on, and interven1on

PATRICK ANSAH PRINCIPAL INVESTIGATOR 22/02/2016

Methods for Addressing Selection Bias in Observational Studies

STATISTICS AND RESEARCH DESIGN

Mul$ Voxel Pa,ern Analysis (fmri) Mul$ Variate Pa,ern Analysis (more generally) Magic Voxel Pa,ern Analysis (probably not!)

Sheesh, Problem behavior!

Exploring the Impact of Missing Data in Multiple Regression

Health Informa.cs. Lecture 9. Samantha Kleinberg

NORTH SOUTH UNIVERSITY TUTORIAL 2

Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX

THE RARE DISEASES CLINICAL RESEARCH NETWORK AS A NESTED CULTURAL COMMONS

EDS- 544 Week 2: Concepts on theory and theories of instruc<on. Dr. Evrim Baran Middle East Technical University

Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 2010

Economic outcomes: Method for implementa5on

Complier Average Causal Effect (CACE)

Assessment of locomo,on ability

Resolving the PSA testing controversy. Professor Villis Marshall AC Professor Bruce Armstrong AM Professor Mark Frydenberg

Transcription:

Propensity Score Methods for Longitudinal Data Analyses: General background, ra>onale and illustra>ons* Bob Pruzek, University at Albany SUNY Summary Propensity Score Analysis (PSA) was introduced by Rosenbaum & Rubin (Biometrika, 1983). Since then PSA has become one of the most studied and frequently used new methods in sta>s>cs. Hundreds of papers have appeared covering philosophy, sta>s>cal theory and a wide variety of applica>ons (especially in health science & medicine). I focus on background and logical founda>ons of PSA, then present & discuss graphics that illustrate various PSA methods; lastly, I describe with examples how conven>onal PSA methodology can be extended to accommodate longitudinal data analysis. It is noted that while longitudinal PSA oven oven entails notable complica>ons, special advantages can accrue to LDA- PSA if axen>on is given to certain aspects of observa>onal study design. *Talk for INTEGRATIVE ANALYSIS OF LONGITUDINAL STUDIES OF AGING Conference, Victoria, BC June 2010

PSA is based on the same logic that underpins analyses of true experiments. In true experiments, units are randomly allocated to (two) treatment groups at the outset of study, that is, before the treatments begin. Randomiza>on, in the words of R. A. Fisher, provides the reasoned basis for causal inference in experiments. Randomiza>on ensures that units in the two treatment groups do not differ systema>cally on any covariate which is why this opera>on supports causal interpreta5ons: When one group scores notably higher than another on ul>mate response variable(s), this can (with qualifica>ons*) be axributed to treatment differences; random assignment tends to make alterna>ve explana>ons implausible. *Three caveats, at least, are in order: 1. Randomiza5on can go awry in prac5ce, par5cularly when samples are not large; 2. Much depends on details of how experiments are run; & 3. To say that treatments caused differences is not to say that one knows what feature(s) of the treatments had the noted effects. Sta5s5cians generally study effects of causes, not causes of effects.

Observa1onal studies entail comparison of groups not formed using randomiza1on; units are said to select their own treatments. This means that observa>onal studies give rise to a greater likelihood for Selec1on Bias (SB). SB refers to systema5c covariate differences between groups differences that can confound apempts to interpret response variable differences. SB is the central problem that propensity score analysis aims to reduce, if not eliminate (usually but not always in the context of observa>onal studies). This tends to be facilitated if one conceptualizes each observa>onal study as having arisen from a (complex) randomized experiment. Three people have wrixen key ar>cles and books that underpin propensity score methods: William Cochran, his student Donald Rubin; and then his student, Paul Rosenbaum. A review of one of Cochran s studies, done 40 years ago is worth brief examina>on.

Cochran (1968) compared death rates of smokers and non- smokers. It had been found, using unstra>fied data, that death rates for smokers and non- smokers were nearly iden>cal (evidence that many smokers and manufacturers of tobacco products found greatly to their liking). Cochran decided to reanalyze the data aver stra*fying both smokers & non- smokers by age before compu>ng death rates. AVer age- based stra>fica>on he re- calculated death rates. This led to the finding that death rates among smokers were on average 40-50% higher than for non- smokers! Moreover, this was for very large samples. Results of this kind represent early versions of what now can be seen as propensity score analysis. The advent of modern PSA methods helps inves>gators adjust for mul>ple covariates, not just one as in Cochran s case.

When there are many poten>al confounding variables in an observa>onal study then direct stra>fica>on is unwieldy because the number of cells associated with the crossing of covariates is oven huge; also missing values will be found in many cells. Nevertheless numerous covariates can be expected to confound interpreta>ons. For many years analysts found it especially difficult to account for confounding effects. The key breakthrough came when Rosenbaum and Rubin (1983) showed how to produce a single variable, a propensity score, the use of which could greatly simplify treatment comparisons in observa>onal studies. They noted that condi>ons may exist where treatment assignment Z (binary) is independent of poten5al outcomes* Y 0 & Y 1, condi>onal on observed baseline covariates, X. That is, (Y(1), Y(0)) Z X, if 0 < P(Z=1 X) < 1. This condi>on was defined as strong ignorability which essen>ally means that all covariates that affect treatment assignment are included in X. *Reference to poten1al outcomes invokes counterfactual logic.

These authors defined the propensity score e(x) (a scalar func1on of X) as the probability of treatment assignment, condi1onal on observed baseline covariates: e(x) = e i = Pr(Z i = 1 X i ). They then demonstrated that the propensity score is a balancing score, meaning that, condi>onal on the propensity score, the distribu>on of measured baseline covariates is similar for the treated & untreated (or treatment and control) subjects. Therefore (Y(1), Y(0)) Z e(x), an analog of the preceding expression. In effect e(x) summarizes the informa>on in X. Rosenbaum and Rubin rely strongly on the assump>on of strong ignorability. In prac>ce, the preceding leads to an interest in es>ma>ng the (scalar) propensity score from the (vector) of (appropriately chosen) covariates, say X, so that comparisons of treatment and control response score distribu>ons can be made condi>onal on an es>mated propensity score. The most common method for es>ma>ng e(x) entails use of logis>c regression (LR).

In prac>ce, there are two main Phases of a propensity score analysis. In Phase I, pre- treatment covariates are used to construct a scalar variable, a propensity score, that summarizes key differences among units (or respondents) with respect to the two* treatments being compared. Generally the fixed values produced in logis5c regression are taken as es>mates of propensity scores, the e(x) s. e(x) = 1/(1 + e - {linear func>on of covariates} ). These e(x) s are then used in Phase II in either of two ways: units in the treatment and control groups are either matched or stra>fied (sorted); then the two groups are compared on one or more outcome measures, condi1onal on the matches or strata. For matching, an algorithm or rule is used to match individuals in the T & C groups whose P- scores are reasonably close to one another; numerous methods are available. With stra>fica>on responses of units in the two groups are compared within propensity- based strata. Both methods are illustrated below. *Except for recent work, nearly all PSA s to date have focused on two groups. See my wiki: propensityscoreanalysis.pbworks.com

The following slide exhibits a flow chart showing how propensity score analysis proceeds when comparing two groups (to be read counterclockwise from the NW corner). Covariate selec>on is central. Once the T & C groups have been defined, the key problem is to decide what covariates should be balanced re: T & C comparison. Theory and prior evidence come into play. Use of all relevant covariates is advised; they should relate to the ul>mate response variable, as well as the T vs. C dis>nc>on Logis>c regression modeling should consider main effects as well as interac5ons (based on substan>ve relevance, and empirics) Once propensity scores have been calculated, it is helpful to demonstrate overlap of P- score distribu5ons for the T & C groups Either or both, matching and stra5fica5on, are generally used for analyses; the es>mands, however, differ in the two cases (ATT, ATE). Outcomes are readily compared across the range of P- scores; see the loess graphic that follows. For matched data either dependent or independent sample sta>s>cal methods and graphics may be used.

The next slide illustrates a Phase II analysis, where loess regression was used to compare infant birth weights of mothers who smoked (treatment group) with mothers who did not. Birth weights (in lbs.) are ploxed (ver>cal) against LR- derived propensity scores (horizontal) for n = 189 infants. Two loess regression lines (dashed and solid) are shown, for infants of smoking (darkened points) and non- smoking (open circles) mothers. Ver>cal dashed lines depict eight quan>le- based strata; effects are assessed within strata (and then averaged). In this case, aver adjus>ng for covariate effects using P- scores, it is seen that birth weights are notably lower for infants (whose mothers smoked) than for controls. (Notably, overlap of the two P- score distribu>ons provided reasonable support for the comparison and all covariates were reasonably balanced across the eight P- score strata.) To complete the illustra>on note that the Average Treatment Effect was.84 lbs., and the 95% CI yields the limits (0.30, 1.38) failing to span zero. (The graphic is based on func>on loess.psa from the PSAgraphics package (R).)

The next slide illustrates matching* in an observa>onal study by Morten, et. al (1982, Amer. Jour. Epidemiology, p. 549 ff) that entailed an especially simple form of propensity score analysis. Children of parents who had worked in a factory where lead was used in making baxeries were matched by age and neighbor- hood with children whose parents did not work in lead- based industries. Whole blood was assessed for lead content to provide responses. Results shown compare blood of Exposed with that of Control Children in what can be seen as a paired samples design. Conven>onal dependent sample analysis shows that the (95%) C.I. for the popula>on mean difference is far from zero (see line segment, lower lev). The mean difference score is 5.78; results support the conclusion that a parents lead- related occupa>on can cause lead to be found in their children's blood. *Using function granova.ds in package granova (R). The heavy black line on diagonal corresponds to X = Y, so if X > Y its point lies below the identity line. Parallel projections to lower left line segment show the distribution of difference scores corresponding to the pairs; the red dashed line shows the average difference score, and the green line segment shows the 95% C.I.

1982

A graphic allows one to go beyond a numerical summary. In this case note the wide dispersion of lead measurements for exposed children in comparison with their control counterparts. A follow- up showed that parental hygiene differed largely across the baxery- factory parents, and the varia>on in hygiene accounted in large measure for dispersion of their children s lead measurements (a finding made possible because of Morton s close axen>on to detail in ini>al data collec>on). Although Control & Exposed children may differ in other ways (than age and neighborhood of residence) these data seem persuasive in showing that lead- based baxery factory work puts children at risk for high levels of blood lead - - except when personal hygiene of the worker is effec>ve. Rosenbaum (2002), who discusses this example in detail, uses a sensi1vity analysis to show that the hidden bias would have to be substan>al to explain away a difference this large. Sensi>vity analyses can be essen>al to a wrap- up of a PSA study. In summary, these observa>onal data appear to provide valuable evidence to support causal conclusions re: the hypothesis.

The basic ideas of PSA have been simplified in order to focus on key principles and methods central to modern- day propensity score applica>ons. Recent PSA inves>ga>ons have begun to move beyond comparison of two treatments, to compare three or more (however, most authors assume an underlying con>nuum, e.g., dose- response groups). Mul>level methods for PSA have begun to be published, as have methods for studying media>on; the role of stra>fica>on has also begun to see axen>on. A few studies have been aimed at missing data imputa>on methods, including mul>ple imputa>on. Pearl (2010), in par>cular, has formalized basic ideas to help bridge the gap between mainstream PSA methods & structural/graphical modeling. To date, only a handful of authors seem to have addressed the central issue of this conference, viz., analysis of longitudinal data - - in par>cular, P- score methodology to compare treated & control groups for observa>onal data aver adjus>ng for confounding covariates. In what follows, I use a basic illustra>on to show how preceding methodology can be extended to deal with longitudinal data comparisons. The next slide sets the stage for how this might be done.

Longitudinal data are shown for four individuals, where slopes & intercepts are readily discerned for 4 individuals for 5 >me waves. For data like these, where T & C groups could start prior to 5me 1, and key covariate data are available for all units, P- scores could be generated to to assess T vs. C effects. Sta5s5cs that describe profiles could be used as responses in PSA. A key is to find sta>s>cs sufficient to characterize trajectories (regardless of the # of data waves). In this way, LDA versions of PSA may be straigh~orwardly generalized, moving from univariate to mul>variate PSA. Panels distinguish 4 persons at 5 time points, w/ a common response for all. Assuming straight line regression for all panels, two sta>s>cs are sufficient regardless of the # of waves; moreover, many fixed (smoothed) curves might entail few (oven no more than 3 or 4) sta>s>cs to characterize >me trends for assessments of treatment effects. In such cases PSA can be generalized to (low dimensional) mul>variate analysis to support observa>onal LD analyses. Smoothing is the key; let us consider this topic next.

Given the preceding approaches for extending PSA to longitudinal data for observa>onal studies, consider several further points: 1. A wide variety of so- called growth models are available to characterize longitudinal profiles; much recent work in this field has aimed at developing generaliza>ons that extend the reach of models; 2. Some authors have focused on smoothing profiles - - in two dis>nc>ve ways: a. smoothing individual profiles by taking advantage of dependencies among adjacent or closely related observa>ons in profiles, and b. smoothing by capitalizing on similari>es among profiles for individuals; such smoothing entails borrowing strength from mutually related profiles. Double smoothing may also be employed. 3. Those who model as in 1., are oven advised to smooth ini>ally when individual observa>ons are subject to considerable noise. 4. Model- based predic>ons (or fixed versions) of ini>al profiles, or at least their smoothed counterparts, are likely to be bexer targets for (PSA) studies than would be ini>al (raw- data) profiles. 5. As in the case of simpler forms of PSA, sensi>vity analyses will generally be advisable. (Rosenbaum, in his two books, considers this topic closely.) 6. A great deal of work on PSA remains to be done for longitudinal problems and there are many opportuni>es for analysis in this area.

These four panels illus- trate possibili>es. They correspond to subsets of profiles, each for five animals, as clustered (see below). In par>cu- lar, three principal components were derived from ini>ally smoothed profiles that were in turn used to get doubly smoothed profiles computed as linear combina>ons of the PCs. Each profile can be fully described using 4 coefficients: intercept, & three PC regression coefficients. Clusters were based on these coefficients (in R). Given appropriate covariate data for each animal, these might be used for observa>onal study comparison of animals whose diets differed from one another; i.e. using constructed P- scores. (All ini>al responses were measures of protein in milk over five weeks. These data are part of the Milk dataset in the nlme package; they are real.) As seen here, smoothing can work especially well for some data. Exploratory approaches (based on underlying components or latent variables) may oven permit crea>on of smoothed versions of either original or pre- smoothed profiles. LDA versions of PSA may readily follow.

Although >me may not permit discussion, it may be useful to make some further observa>ons pertaining to mainstream IALSA interests. Consider an example of an observa5onal comparison of two groups (as suggested by S. Hofer). Does engagement in intellectually challenging tasks, exercise, [or] social networks help to maintain cogni>ve func>oning in later life? There have been a number of analyses of longitudinal observa>onal studies and experimental studies of this topic. There is some evidence to suggest that physical ac>vity enhances cogni>ve performance. Suppose we revisit such a ques>on using a modern PSA approach. Let us limit axen>on to one measure of cogni>ve func>oning and a clearly defined treatment (that could be a combina>on, but might be limited to one behavior, say engagement in exercise). Given a clear dis>nc>on between two groups, one of which will not have exercised (self- report?), and one of which will (at some defined level of rigor and regularity), we might aim to adjust for (all) relevant covariate differences. This is the hardest part, one that might ideally be done using a prospec5ve approach, where one could have the luxury of naming in advance all covariates that seem likely to confound interpreta>ons of T vs. C effects. Archival data might also be used with the proviso that such data rarely contain all variables that ul>mately maxer (think of the cri>cs). If cogni>ve func>oning scores are available for individuals before and a^er commencement of exercise, ini>al covariate scores might be used in the construc>on of P- scores.

It is almost inevitable in prac>ce that some covariate and longitudinal data will have gone missing. This is one key reason imputa>on methods have garnered so much interest. (But the underlying theory that supports PSA methods, á la Rubin, is strongly based on counterfactual logic where one of two poten>al outcomes will always be missing). A special advantage for some longitudinal data sets is that missing LD values can be more reliably es>mated than counterparts outside the longitudinal framework. This means that mul>ple LD imputa>ons, and many products of analysis, may vary less than is typical. An addi>onal concern is that responses (such as cogni>ve func>on scores) are likely to have been obtained at different >mes for different individuals; i.e., different spacings and different numbers of >mes as well. Smoothing can oven help with such problems, in which case the ul>mate data used in the PSA can begin from sta>s>cs that characterize smoothed profiles, not original data. (This step may also help ameliorate problems induced by measurement errors.) Naturally, imputa>ons can be done in different ways, perhaps using different imputa>on models; and the same goes for smoothing. When mul>ple analyses of the same data are employed one will want to learn how much results vary across methods. Further value may come from use of mixed models in analysis. Ul>mately, longitudinal PS analysis of such data may lead to fairly strong conclusions about treatment effects, condi>onal on the extent to which covariates are strongly ignorable, and the extent to which results do not depend heavily on the par>cular methods used for analysis. Design is likely to be central.