Palo. Alto Medical WHAT IS. combining segmented. Regression and. intervention, then. receiving the. over. The GLIMMIX. change of a. follow.

Similar documents
How to analyze correlated and longitudinal data?

THE UNIVERSITY OF OKLAHOMA HEALTH SCIENCES CENTER GRADUATE COLLEGE A COMPARISON OF STATISTICAL ANALYSIS MODELING APPROACHES FOR STEPPED-

Generalized Estimating Equations for Depression Dose Regimes

1.4 - Linear Regression and MS Excel

A Comparison of Linear Mixed Models to Generalized Linear Mixed Models: A Look at the Benefits of Physical Rehabilitation in Cardiopulmonary Patients

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA

Treatment Adaptive Biased Coin Randomization: Generating Randomization Sequences in SAS

Linear Regression in SAS

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Baseline Mean Centering for Analysis of Covariance (ANCOVA) Method of Randomized Controlled Trial Data Analysis

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

6. Unusual and Influential Data

CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS

LIHS Mini Master Class Multilevel Modelling

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.

CHAPTER 3 RESEARCH METHODOLOGY

Unit 1 Exploring and Understanding Data

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Biostatistics II

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Lecture 14: Adjusting for between- and within-cluster covariates in the analysis of clustered data May 14, 2009

Reveal Relationships in Categorical Data

REPEATED MEASURES DESIGNS

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Chapter 3 CORRELATION AND REGRESSION

Chapter 1: Exploring Data

Certificate Program in Practice-Based. Research Methods. PBRN Methods: Clustered Designs. Session 8 - January 26, 2017

Regression Equation. November 29, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

Knowledge is Power: The Basics of SAS Proc Power

Conditional Distributions and the Bivariate Normal Distribution. James H. Steiger

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

The Use of Piecewise Growth Models in Evaluations of Interventions. CSE Technical Report 477

STATISTICS & PROBABILITY

Example 7.2. Autocorrelation. Pilar González and Susan Orbe. Dpt. Applied Economics III (Econometrics and Statistics)

Analytic Strategies for the OAI Data

Examining Relationships Least-squares regression. Sections 2.3

Lab 8: Multiple Linear Regression

HPS301 Exam Notes- Contents

Impact of guideline dissemination strategies among Network chiropractors: Interrupted time series with segmented regression analysis

Multiple Linear Regression Analysis

Simple Linear Regression the model, estimation and testing

Meta-analysis using HLM 1. Running head: META-ANALYSIS FOR SINGLE-CASE INTERVENTION DESIGNS

Bayes Linear Statistics. Theory and Methods

Class 7 Everything is Related

An Introduction to Modern Econometrics Using Stata

Simple Linear Regression

Current Directions in Mediation Analysis David P. MacKinnon 1 and Amanda J. Fairchild 2

MODELING HIERARCHICAL STRUCTURES HIERARCHICAL LINEAR MODELING USING MPLUS

bivariate analysis: The statistical analysis of the relationship between two variables.

Question 1(25= )

Ecological Statistics

Statistics and Probability

Shrimp adjust their sex ratio to fluctuating age distributions

SCATTER PLOTS AND TREND LINES

GPA vs. Hours of Sleep: A Simple Linear Regression Jacob Ushkurnis 12/16/2016

investigate. educate. inform.

Analyzing diastolic and systolic blood pressure individually or jointly?

SINGLE-CASE RESEARCH. Relevant History. Relevant History 1/9/2018

Transitions in Depressive Symptoms After 10 Years of Follow-up Using PROC LTA

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

(a) y = 1.0x + 0.0; r = ; N = 60 (b) y = 1.0x + 0.0; r = ; N = Lot 1, Li-heparin whole blood, HbA1c (%)

Carrying out an Empirical Project

Analyzing Healthcare Costs with SAS: An Intern s Experience Ben Keefer, The Regence Group, Portland, OR

Supplementary Figure 1. Recording sites.

There are, in total, four free parameters. The learning rate a controls how sharply the model

Impact of Response Variability on Pareto Front Optimization

Edinburgh Research Explorer

SUPPLEMENTAL MATERIAL

8/24/2011. Study Goal. Study Design. Patient Attributes Influencing Pain and Pain Management in Postoperative Total Knee Arthroplasty Patients

NORTH SOUTH UNIVERSITY TUTORIAL 2

Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank)

A framework for evaluating public health interventions for obesity prevention. An IOM committee report

Tutorial #7A: Latent Class Growth Model (# seizures)

Propensity Score Methods to Adjust for Bias in Observational Data SAS HEALTH USERS GROUP APRIL 6, 2018

AP Stats Chap 27 Inferences for Regression

Supplementary Materials. Instructions for Target Subjects (Taken from, and kindly shared by, Haselton & Gildersleeve, 2011).

Bangor University Laboratory Exercise 1, June 2008

HEMOCHRON. Whole Blood Coagulation Systems

1 Version SP.A Investigate patterns of association in bivariate data

A model of parallel time estimation

Problem Set 5 ECN 140 Econometrics Professor Oscar Jorda. DUE: June 6, Name

2012, Greenwood, L.

Quasi-Experimental and Single Case Experimental Designs. Experimental Designs vs. Quasi-Experimental Designs

Report Reference Guide

Bias Adjustment: Local Control Analysis of Radon and Ozone

Method Comparison for Interrater Reliability of an Image Processing Technique in Epilepsy Subjects

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study

Longitudinal and Hierarchical Analytic Strategies for OAI Data

Daniel Boduszek University of Huddersfield

CHAPTER TWO REGRESSION

3.2 Least- Squares Regression

LOGLINK Example #1. SUDAAN Statements and Results Illustrated. Input Data Set(s): EPIL.SAS7bdat ( Thall and Vail (1990)) Example.

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

Blood Glucose Monitoring System. Copyright 2016 Ascensia Diabetes Care Holdings AG diabetes.ascensia.com

Transcription:

Regression and Stepped Wedge Designs Eric C. Wong, Po-Han Foundation Researchh Institute, Palo Alto, CA ABSTRACT Impact evaluation often equires assessing the impact of a new policy, intervention, product, or service in real-world, observational, or quasi-experimental situations. Often, these interventions are rolled out in phasess at different points over time making them good candidates for stepped wedge designs.. Segmented regression is one method for Chen, Dorothy Hung Palo Alto Medical measuring the change in a time series before and after an intervention. In this paper, we propose combining segmented regression and stepped wedge designs to analyze phased interventions over time. Specifically, we propose the use of generalized linear mixed models for a non-randomized, stepped wedged observational study. We start by describing segmented regression for an interrupted time series in location receiving the intervention, then extend the approach to N locations receiving the intervention sequentially over time. Lastly, we discuss accounting for temporal autocorrelation and relevant clustering of individuals within locations. INTRODUCTION Many pragmatic interventions are deployed outside the controlled research world. This is common among real-worldimprovement. process improvement initiatives where an organizationn may be deploying new practices aimed at Often, the primary question of interest is whether the new changes altered the trajectory of measured outcomes over time. Is there a new normal? In the simplest case, there are many methods used to analyze the before and after change in an observational study. However, the real-world can be more complex. Itt may be strategic for an organization to sequentially deploy an intervention, rolling it out in several phases across all its locations. Moreover, an organization with many locations may experience clustered effects where the intervention behaves more similarly among individuals within a location but differentially across locations. In this paper, we propose using generalized linear mixed models (as implemented by The GLIMMIX Procedure), and describe several examples of increasing complexity. Throughout, we will use an example of a healthcare organization with multiple clinic locations. Each location has several departments and many physicians working in each department. A process improvement was phased-in across departments over time. WHAT IS SEGMENTED REGRESSION? A popular approach to evaluating the introduction of a new policy or intervention is to observe the change of a measurable outcome over time. The intervention interrupts this timee series of the outcome, and one is usually interested in how the path of the time series before the interruption compares to the path of the time series after the interruption. The changes to the time series may either be immediatee or gradual over time. Linear regression is a simple approach to modeling data over time, and segmented regression is a particular case. It is a piecewise regression where through model specification, the estimates from a single regression describess the segment before the interruption (e.g. the pre-intervention period) andd the change from the pre- after the interruption (e.g. the post-interventionn period). The figure below illustrates some of the model specifications to follow. post-intervention slope β time _aft_int (+β time ) intervention effect pre-intervention slope β β intervention β time (+β )

The original data is plotted in the first panel of the figure above. The gray shaded area marks the time period after the interventionn began. The second and third panel showss relevant corresponding algebra derived from the model below. We start with a basic linear regression model. Although there are twoo segments, there is only one regression model. Y = β + β time x T + β intervention I + β x T Y: β : T: I: T : β time : β intervention : β : WHAT ARE STEPPED WEDGEE DESIGNS? In situations when randomization of receipt of an intervention cannott be done, for example, when it is unethical to withhold an efficacious intervention from locations, randomization of the startt time of an intervention can be considered instead. In these designs, all locations eventually receivee the intervention, and also have varying amounts of baselinee and follow-up data. Stepped wedge designs are a form of one-way cross- locations. When locations or groups of locations begin an intervention at different time points, the timelines visually resemble a stepped wedge, from which it derives its name. Often, phased interventions in the real world can be framed in this paradigm, however in the real world, the start time of the intervention is often determined strategicallyy or over designs and can be used with these phased intervention designs across multiple opportunistically by organizational leadership and is not randomized.. When using this paradigm to evaluate phased interventions, one should be diligent and cautious about bias when interpreting the results, as with any quasi-experiment. outcome model intercept time from the start of the observation period (,, ) intervention status ( before intervention, after intervention) time after start of the intervention, otherwise (,,,,,, 3, ) model parameter representing the pre-intervention n slope model parameter representing the immediate intervention effect, additive to intercept model parameter representing the change in pre-intervention slope after the intervention, and is interpreted as gradual changes to the time series following the intervention. The value of the post- intervention slope is ( β time + β time_aft_i nt). By adding additional terms to this model, one can account for additional locations, or hierarchical structures such as when data on individuals within locations is available, described in detail later. Figure : Schematic of stepped wedge designs. REAL WORLD SCENARIOS In the real world, one could be analyzing phased intervention in one or many locations. The availability of data can be at the location level or smaller level such as department or individual. We consider several of these permutations in the following scenarios with description, input data set formats, SAS code, output, and interpretation. Table : Scenarios by Number of Locations and Data Availabilityy Scenario # of Data Availability Locations Location Individuals Location 3 N Locations 4 N Individuals Locations 5 N Individuals Departments Locations Segmented Regression Stepped Wedge No No SCENARIO : # OF LOCATIONS =. AVAILABILITY = LOCATION. This example can be solved using a traditional segmented regression for interrupted time series. Since there is only a single location with data available by location, a stepped wedged design is not yet necessary. A number of regression procedures can be used, and PROC GLIMMIX is one of them, shownn below. PROC GLIMMIX is appropriate for generalized linear mix models of which there is ample literature. If one is interested in adjusting for correlation of measures over time across subjects, a RANDOM statement with _RESIDUAL_, SUBJECT= and TYPE= options can be used, described in later scenarios.

Figure : Scenario Plot Table : Scenario Input Data time time interv. // // // // 3 // 4 //3 36 4 Y 8 3 6 6 6 98 PROC GLIMMIX DATA= ; MODEL Y = time intervention time aft_int / SOLUTION; SCENARIO : # OF LOCATIONS =. AVAILABILITY = (INDIVIDUALS LOCATION). In some real world scenarios, one may still be studying one location, but have data from multiple individuals within the application. Segmented regression for interrupted time series is still an appropriate method. Since there is still only one location, stepped wedge designs are not yet leveraged in this scenario. We extend the previous methodology by accounting for individuals nested within locations. Too account for repeated measures one may use the PROC GLIMMIX RANDOM statement with SUBJECT=, and TYPE= options as appropriate. Figure 3: Scenario Plot Table 3: Scenario Input Data Person ID time time interv. // // // // 3 // 4 //3 36 // // Y 8 3 6 6 6 5 98 8 3 PROC GLIMMIX DATA= ; CLASS person_id; MODEL Y = time intervention time aft_int / SOLUTION; RANDOM _RESIDUAL_ / SUBJECT=person_id TYPE=AR(); Table 4: Scenario Output Solutions for Fixed Effects Effect time intervention Estimate 8.9757.497 35.965.7765 SE.6357.483.996.67 DF 4 77 77 77 t 9.85.3 36.9 6.3 Pr > t <..343 <. <. The intervention seems to have both an immediate and gradual effect. The significant coefficient for interventionn indicates an immediate upward shift to the trend at the start of the intervention, with the post-segment starting at (β +β interventio on). The significant coefficient indicates a change to the slope after the start of the intervention, with a post-slope value of (β time + β time_af ft_int). 3

SCENARIO 3: # OF LOCATIONS = N. AVAILABILITY = LOCATIONS. When studying more than one location in a phased implementation, one has the opportunity to use the stepped wedge framework. First, one should update the input data set with information about locations (a location identifier) and when they began the intervention respectively. Using segmentedd regression ass above, this means modifying the interventionn and time after invention () variables to reflectt location-specific time points of implementation. In this real world scenario, one is often interested in how each location perform before and after the intervention, as well as how the entire organization might perform before and after the intervention. Using generalized linear mixed models, we can estimate the location-specific impact as well as organization-wide impact throughh specifying fixed and random effects in the model. As before, we will also account for repeated measures. Figure 4: Scenario 3 Plot location N Table 5: Scenario 3 Input Data date time interv. // // // // 3 // 4 //3 36 // //3 7 3//3 8 //3 36 4 Y 8. 9. 8.7 6. 64.6 9. 8.9 6. 6.8 Now that the data has been simply expanded and modified to reflect t the stepped wedged design through implementation dates (time, intervention, and ) that vary by location, one can model the scenario by using mixed effects modeling below. PROC GLIMMIX DATA=work.scenario3; CLASS location; MODEL Y = time intervention time aft_int / SOLUTION; RANDOM INT intervention n time_aft int / SUBJECT=location SOLUTION; RANDOM _RESIDUAL_ / SUBJECT=location TYPE=AR(); We interpret the fixed effects of (time, intervention, and ) as the effects common to all locations, namely the organization-wide effects. Then, we allow individual locations to have unique starting points as well as immediate and gradual effects of the intervention through random intercepts, intervention and terms. We assume before the intervention, every location has the same common pre-slope (from the fixed effects). Lastly, we add a RANDOM _RESIDUAL_ statement to model the subject and autoregressive relationship within locations. 4

Table 6: Scenario 3 Output: Fixed and Random Effects Estimates Solutions for Fixed Effects Effect time intervention Estimate 9.6.464 3.3356.9 SE.6.53 6.6833.998 DF t 7.7.87 4.69. Pr > t.33.384.46.485 Effect intervention intervention intervention Solution for Random Effects Subject Estimate SE Pred DF t Pr > t Location -.69.894 -.8.8557 Location 3.99 6.743.58.563 Location.9445.9987.95.543 Location -.835.884 -.95.346 Location 8.846 6.759.3.93 Location -.9383.35 -.94.35 Location 3.998.886.3.597 Location 3 -.7569 6.745 -.89.64 Location 3 -.6.9 -..3 Matching the figure, the solutions for the fixed effects have significant immediate impact of the intervention (P<.46). Visually this coincides with a positive vertical increase at the start of the intervention for each location. The amount of the increase varies and is measured by also considering the random effects, which for example in Location 3 shows a near-significant attenuation (P<.64) relative to the other locations. In similar manner, the increase in slope after the intervention () in Location is reflected in a near-significant value (P<.543) for the random effect. SCENARIO 4: # OF LOCATIONS = N. AVAILABILITY = (INDIVIDUALS LOCATION). When individuals are nested within location, the input data set and models can be scaled like before. An identifier for individual is added to the data set, and options are added to the model to account for nested relationship. Figure 5: Scenario 4 Plot 5

Regression and Stepped Wedge Designs, continued Table 7: Scenario 4 Input Data location Person ID date time interv. Y // 8. //3 36 4 // 9. N PROC GLIMMIX DATA=work.scenario4; CLASS location person_id; MODEL Y = time intervention / SOLUTION; RANDOM INT intervention / SUBJECT=location SOLUTION; RANDOM _RESIDUAL_ / SUBJECT=person_id(location) TYPE=AR(); The only modification to the code is the RANDOM statement option SUBJECT=person_id(location). The tables report the same conclusion as before noting a strong immediate effect of the intervention across all locations, and location specific differences in Location s post-slope and in Location 3 at the start of the intervention. Table 8: Scenario 4 Output: Fixed and Random Effects Estimates Solutions for Fixed Effects Effect Estimate SE DF t Pr > t 9.59.9598.3.4 time.476.335 43..39 intervention 3.433 6.777 4.64.435..9937..464 Solution for Random Effects Effect Subject Estimate SE Pred DF t Pr > t Location -.954.898 43 -.33.744 intervention Location 3.84 6.79 43.56.5736 Location.975.9936 43.98.48 Location -.8.89 43 -.6.487 intervention Location 9.7 6.79 43.36.73 Location -.96.995 43 -.97.334 Location 3.336.8888 43.49.37 intervention Location 3-3.95 6.789 43 -.93.544 Location 3 -.85.9967 43 -..3 SCENARIO 5: # OF LOCATIONS = N. AVAILABILITY = (INDIVIDUALS DEPARTMENTS LOCATIONS). For larger organizations with mature data collection infrastructure, it may be realistic that interventions are phased across many locations, affecting individuals within departments within locations. In our example of a healthcare organization, this could mean an intervention phased across multiple clinic locations affecting doctors within departments within locations. The outcome of interest could be physician productivity, measured for each physician monthly, and one may be interested in how locations were affected before and after the intervention by individual location and also over the organization. As before, identifiers are added to the data set for individual, department, and location. 6

Figure 6: Scenario 5 Plot Table 9: Scenario 5 Input Data location N N department N Person ID N date time interv. // //3 36 // 4 Y 8. 9. The PROC GLIMMIX syntax is modifiedd to accommodate the nestedd structure and also report by department within locations. PROC GLIMMIX DATA=work.scenario5; CLASS location department person id; MODEL Y = time intervention time aft_int / SOLUTION; RANDOM INT intervention n time_aft int / SUBJECT=department(location) SOLUTION; RANDOM _RESIDUAL_ / SUBJECT=person_id(department location) TYPE=AR(); 7

Regression and Stepped Wedge Designs, continued Table : Scenario 5 Output: Fixed and Random Effects Estimates Solutions for Fixed Effects Effect Estimate SE DF t Pr > t 9.473.693 5 3.4 <. time.4398.33 86.88.599 intervention 3.94 3.449 5 9.83..837.534 5.57.778 Solution for Random Effects Effect Subject Estimate SE Pred DF t Pr > t Dept, Location -.785.6973 86 -.4.6896 intervention Dept, Location 3.37 3.33 86..369 Dept, Location.359.5383 86 3.97 <. Dept, Location -.4643.6973 86 -.67.557 intervention Dept, Location 5.495 3.33 86.63.8 Dept, Location.75.5383 86.99.47 Dept, Location -.9454.689 86 -.39.66 intervention Dept, Location 8.7388 3.387 86.63.86 Dept, Location -.7959.543 86 -.47.43 Dept, Location -.89.689 86 -.9.34 intervention Dept, Location -.396 3.387 86 -.37.789 Dept, Location -.656.543 86 -.3.574 Dept, Location 3.86.677 86.8.7 intervention Dept, Location 3-3.5465 3.38 86-4.9 <. Dept, Location 3 -.8395.548 86 -.53.6 Dept, Location 3.75.677 86.88.67 intervention Dept, Location 3 -.679 3.38 86 -.8.48 Dept, Location 3 -.9553.548 86 -.74.88 The fixed effects (organization-wide trend) are supportive of an immediate increase in the outcome following the start of the intervention. Additionally, Department and from Location have noteworthy gradual increases, i.e. changes in the slope following the intervention. And Department from Location 3 has a noteworthy immediate decrease in the outcome following the start of the intervention. One should be remember that the model parameters do not estimate the post- values directly, but instead they estimate the change from pre-, so they must be interpreted together with pre- values to calculate the post- coefficient values (e.g. intercept or slope). By interrogating the fixed and random effects, one can construct organization-wide and local interpretations about the intervention impact. LIMITATIONS & FUTURE WORK Many real-world scenarios are likely to use the non-randomized phased interventions and are subject to selection biases and clustering that should be examined and accounted for by the user. The methods in this paper are described for non-randomized stepped wedge designs and the results should always be interpreted and treated with caution. Future work could expand on this application to compare common methods used to account for selection bias. Proponents of generalized estimating equations (GEE) may offer an alternative approach to generalized linear mixed models as a direct method for building population average models (marginal distribution). After initial considerations, we developed a method based on GLMMs here, with advantages for measuring subject-specific models (conditional distribution), and being able to make simultaneous interpretations of the organization-wide and location-specific effects of an intervention. Future work may consider whether and/or develop a GEE approach (with code) within the segmented regression and stepped wedge paradigms. Only the general question of whether the intervention was associated with an impact to the outcome was described here. Stepped wedge designs provide opportunities for many other types of within- or betweencomparisons described well in the literature. Depending on the intervention, a training/learning period may precede the start of the intervention. This can be modeled separately for more accurate estimates of the post-intervention period by modifying the input data set and including the new terms in the model indicating the training period. 8

Regression and Stepped Wedge Designs, continued CONCLUSIONS Phased interventions in the real-world can be examined using segmented regression and non-randomized stepped wedge designs together. This paper presents several real-world scenarios of measuring interventions over time according to varying amounts of data availability, with data set, code, and output examples. The user may follow the scenario most appropriate. Large organizations with a nested hierarchy and frequently captured data could use this methodology to make interpretations about the effect of a process improvement intervention while accounting for the phased nature of the intervention, nested hierarchies, and repeated measures. ACKNOWLEDGMENTS The authors thank Dr. Alice Pressman for early discussions about this methodology. RECOMMENDED READING Hussey M, Hughes J. Design and analysis of stepped wedge cluster randomized trials. Contemporary Clinical Trials. 8 (7): 8-9. Handley M, Schillinger D, Shiboski S. Quasi-Experimental Designs in Practice-based Research Settings: Design and Implementation Considerations. J Am Board Fam Med. ; 4:589-596. Gebski V, Ellingson K, Edwards J, Jernigan J, Kleinbaum D. Modelling interrupted time series to evaluate prevention and control of infection in healthcare. Epidemiol. Infect. 4 (): 3-4. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Eric C. Wong Palo Alto Medical Foundation Research Institute 795 El Camino Real Palo Alto, CA 943 wonge@pamfri.org SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 9