Research Designs For Evaluating Disease Management Program Effectiveness Disease Management Colloquium June 27-30, 2004 Ariel Linden, Dr.P.H., M.S. President, Linden Consulting Group 1
What s the Plan? Discuss threats to validity Provide methods to reduce those threats using currently-used evaluation designs Offer additional designs that may be suitable alternatives or supplements to the current methods used to assess DM program effectiveness 2
Treatment Interference Loss to Attrition New Technology 3 Benefit Design Reimbursement Measurement Error VALIDITY Seasonality Hawthorne Effect Maturation Access Selection Bias Unit Cost Increases Case-mix Regression to the Mean Secular Trends
Selection Bias Definition: Participants are not representative of the population from which they were drawn: Motivation Severity or acuteness of symptoms Specifically targeted for enrollment 4
Selection Bias (cont ) Fix #1: Randomization How: Distributes the Observable and Unobservable variation equally between both groups Limitations: costly, difficult to implement, intent to treat, not always possible 5
Selection Bias (cont ) Pretest-posttest Control Group: R O 1 X O 2 R O 3 O 4 Solomon 4-Group Design: R O X O R O O R X O R O 6
Selection Bias (cont ) Fix #2: Standardized Rates How: Direct/indirect adjustment enables comparisons over time or across populations by weighting frequency of events Limitations: does not control for unobservable variation 7
Age-adjusted Program Results Age Group Pre-Program (rate/1000) r X P Program (rate/1000) r X P Proportion (P) of Population 20 29 7.3 0.9 10.2 1.2 0.1189 30 39 65.2 5.7 79.9 6.9 0.0868 40 49 190.8 13.4 173.6 12.2 0.0703 50 59 277.9 21.3 226.1 17.4 0.0768 60-69 408.4 25.2 287.8 17.7 0.0616 70-79 475.8 17.7 368.8 13.8 0.0373 80 + 422.2 8.4 356.0 7.0 0.0198 Adjusted rate 92.6 76.2 8
Tenure-adjusted Program Results Baseline Group Compared to inflationadjusted Baseline Group Compared to inflationadjusted 2003 prevalent group s 2003 claims 2003 prevalent group s 2004 claims plus 2004 incident group assumed to have cost 2003 prevalent group s claims in 2003 2002 prevalent group s 2003 claims 2003 Newly incident members actual claims, 2003 2003 prevalent group s 2004 claims 2004 Newly incident members actual claims, 2004 9
Selection Bias (cont ) Fix #3: Propensity Scoring What?: Logistic regression score for likelihood of being in intervention How: Controls for Observable variation 10 Limitations: does not control for unobservable variation
1 st Year CHF Program Results Intervention (N=94) Control Group (N=4606) P(T<=t) two-tail Age 77.4 76.6 NS % Female 0.51 0.56 NS % Portland 0.17 0.69 p<0.0001 Pre-Hospitalization 1.13 0.5 p<0.0001 Pre-ED 0.7 0.4 p=0.003 Pre-Costs $18,287 $8,974 p<0.0001 Post-Hospitalization 0.59 0.87 p=0.008 Post-ED 0.57 0.58 NS Post-Costs $11,874 $16,036 p=0.005 11
1 st Year CHF Program Results Admits Hospitalization Rate 1.2 1 0.8 0.6 0.4 0.2 1.13 0.5 0.87 0.59 0 Pre-Hospital Admits Post-Hospital Admits Intervention Group (N=94) Control Group (N=4606) 12
1 st Year CHF Program Results ER Visits 0.8 ED Visit Rate 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.7 0.4 Pre-ED Visits 0.58 0.57 Post-ED Visits Intervention Group (N=94) Concurrent Control Group (N=4606) 13
1 st Year CHF Program Results Costs Average Cost $20,000 $18,000 $16,000 $14,000 $12,000 $10,000 $8,000 $6,000 $4,000 $2,000 $0 $18,287 $8,974 Pre-Costs Intervention Group (N=94) $16,036 $11,874 Post-Costs Control Group (N=4606) 14
1 st Year CHF Program Results Propensity Scoring Method Cases (N=94) Matched Controls (N=94) P(T<=t) two-tail Propensity Score 0.061 0.062 NS Age 77.4 78.2 NS % Female 0.51 0.51 NS % Portland 0.17 0.17 NS Pre-Hospitalization 1.13 1.09 NS Pre-ED 0.70 0.67 NS Pre-Costs $18,287 $17,001 NS Post-Hospitalization 0.59 1.17 0.005 Post-ED 0.57 0.77 0.026 15 Post-Costs $11,874 $24,085 0.003
1 st Year CHF Program Results Propensity Scoring Method - Admits 1.4 Hospitalization Rate 1.2 1 0.8 0.6 0.4 0.2 1.13 1.09 1.17 0.59 0 Pre-Hospital Admits Post-Hospital Admits cases (N=94) matched controls (N=94) 16
1 st Year CHF Program Results Propensity Scoring Method ED Visits ED Visit Rate 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.7 0.67 Pre-ED Visits cases (N=94) 0.77 0.57 Post-ED Visits matched controls (N=94) 17
$30,000 1 st Year CHF Program Results Propensity Scoring Method Costs $25,000 $24,085 Average Cost $20,000 $15,000 $10,000 $18,287 $17,001 $11,874 $5,000 $0 Pre-Costs Post-Costs cases (N=94) matched controls (N=94) 18
Regression to the Mean Definition: After the first of two related measurements has been made, the second is expected to be closer to the mean than the first. 19
Regression to the Mean CAD Where the 1st Quintile (N=749) Went In Year 2 Where the 5th Quintile (N=748) Went In Year 2 Fourth quintile 16% Fifth quintile 17% Third quintile 13% Second quintile 19% First quintile 35% Fifth quintile 27% Fourth quintile 23% First quintile 11% Second quintile 18% Third quintile 21% 20
Regression to the Mean CHF Where the 1st Quintile (N=523) Went In Year 2 Where the 5th Quintile (N=537) Went In Year 2 Fourth quintile 17% Fifth quintile 15% Third quintile 15% Second quintile 18% First quintile 35% Fifth quintile 37% First quintile 7% Fourth quintile 23% Second quintile 14% Third quintile 19% 21
Regression to the Mean (cont ) Fix #1: Increase length of measurement periods How: Controls for movement toward the mean across periods Limitations: periods may not be long enough, availability of historic data 22
Regression to the Mean (cont ) Currently-Used Method Claims run-out periods Baseline Measurement Year 1st Contract Measurement Year 2nd Contract Measurement Year Compare to baseline Compare to baseline 23
Regression to the Mean (cont ) Valid Method (from Lewis presentation) 2 year freeze period + measurement Historic 2-Year Period Baseline Measurement Year 1 st Contract Measurement Year 2 year freeze period + measurement 24
Regression to the Mean (cont ) Fix #2: Time Series Analysis How: Controls for movement across many periods (preferably > 50 observations) Limitations: availability of historic data, change in collection methods 25
Measurement Error Definition: Measurements of the same quantity on the same group of subjects will not always elicit the same results. This may be because of natural variation in the subject (or group), variation in the measurement process, or both (random vs. systematic error). 26
Measurement Error (cont ) Fix #1: Use all suitables in the analysis (to adjust for the zeroes ) Fix #2: Use identical data methods pre and post (like unit claims-to-claims comparison) Fix #3: Use utilization and quality measures instead of cost. 27
Alternative Designs Survival Analysis Time Series Analysis Time-dependent Regression 28
Survival Analysis Features: Time to event analysis longitudinal Censoring Allows for varying enrollment points 29
Survival Analysis First Patient Contact Commencement of DM Intervention Improvement in clinical markers Reduction in Utilization and Costs?????? 30
Survival Analysis.6.5 Probability of Hospitalization.4.3.2.1 PRA Value High Risk High-censored Low Risk 0.0 0 10 20 30 Low-censored 31 Time (Months)
Time Series Analysis Features: Longitudinal analysis Serial Dependency (autocorrelation) Does not require explanatory variables Controls for trend and seasonality Can be used for forecasting 32
33 60 50 40 30 20 10 0 Time Series Analysis (cont ) Admits (Actual) SES DES ARIMA (1,0,0 ) SES (15.5%) DES (19.7%) ARIMA (16.0%) SES (3.6%) Historical Period Baseline Period 1st Measurement Period Apr-98 Jul-98 Oct-98 Jan-99 Apr-99 Jul-99 Oct-99 Jan-00 Apr-00 Jul-00 Oct-00 Jan-01 Apr-01 Jul-01 Oct-01 Jan-02 Apr-02 Jul-02 Oct-02 Jan-98 Month Admits PTMPY
Time-dependent Regression Combines important elements of other models to create a new method, including variables such as: Program tenure (censuring) Seasonality (important for Medicare) Can be used for forecasting 34
350 300 250 Admits/1000 200 150 100 50 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Admits/1000 (Tenure) Admits/1000 (Month) Simulated hospital admissions per thousand members based on program tenure and month-of-year (months 1-12 represent Jan Dec of program year 1, and months 13-24 represent Jan Dec of program year 2). 35
Conclusions Identify potential threats to validity before determining evaluation method Choose outcome variables that mitigate measurement bias (e.g. all identified members vs those with costs) There is no panacea! Use more than one design to validate results. 36
How does this presentation differ from what you just saw? Lewis approach is the only valid prepost population-based design in use today But valid = accurate. Valid just means adjustment for systematic error These methods reduce chances of nonsystematic error to increase accuracy 37
References (1) 1. Linden A, Adams J, Roberts N. An assessment of the total population approach for evaluating disease management program effectiveness. Disease Management 2003;6(2): 93-102. 2. Linden A, Adams J, Roberts N. Using propensity scores to construct comparable control groups for disease management program evaluation. Disease Management and Health Outcomes Journal (in print). 3. Linden A, Adams J, Roberts N. Evaluating disease management program effectiveness: An introduction to time series analysis. Disease Management 2003;6(4):243-255. 4. Linden A, Adams J, Roberts N. Evaluating disease management program effectiveness: An introduction to survival analysis. Disease Management 2004;7(2):XX-XX. 38
References (2) 5. Linden A, Adams J, Roberts N. Evaluation methods in disease management: determining program effectiveness. Position Paper for the Disease Management Association of America (DMAA). October 2003. 6. Linden A, Adams J, Roberts N. Using an empirical method for establishing clinical outcome targets in disease management programs. Disease Management. 2004;7(2):93-101. 7. Linden A, Roberts N. Disease management interventions: What s in the black box? Disease Management. 2004;7(4):XX-XX. 8. Linden A, Adams J, Roberts N. Evaluating disease management program effectiveness: An introduction to the bootstrap technique. Disease Management and Health Outcomes Journal (under review). 39
References (3) 9. Linden A, Adams J, Roberts N. Generalizability of disease management program results: getting from here to there. Managed Care Interface 2004;(July):38-45. 10. Linden A, Roberts N, Keck K. The complete how to guide for selecting a disease management vendor. Disease Management. 2003;6(1):21-26. 11. Linden A, Adams J, Roberts N. Evaluating disease management program effectiveness adjusting for enrollment (tenure) and seasonality. Research in Healthcare Financial Management. 2004;9(1): XX-XX. 12. Linden A, Adams J, Roberts N. Strengthening the case for disease management effectiveness: unhiding the hidden bias. J Clin Outcomes Manage (under review). 40
Software for DM Analyses The analyses in this presentation used XLStat for Excel. This is an Excel add-in, similar to the data analysis package that comes built-in to the program. Therefore, users familiar with Excel will find this program easy to use without much instruction. 41
Questions? Ariel Linden, DrPH, MS ariellinden@yahoo.com 42