Developing a Predictive Model of Physician Attribution of Patient Satisfaction Surveys

Similar documents
TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)

Bringing machine learning to the point of care to inform suicide prevention

Article from. Forecasting and Futurism. Month Year July 2015 Issue Number 11

Estimating Harrell s Optimism on Predictive Indices Using Bootstrap Samples

Development of a New Communication About Pain Composite Measure for the HCAHPS Survey (July 2017)

In this module I provide a few illustrations of options within lavaan for handling various situations.

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

EHR Usage Among PAs. Trends and implications for PAs. April 2019 A Report from the 2018 AAPA PA Practice Survey

A macro of building predictive model in PROC LOGISTIC with AIC-optimal variable selection embedded in cross-validation

Evaluating the Risk Acceptance Ladder (RAL) as a basis for targeting communication aimed at prompting attempts to improve health related behaviours

Chapter 17 Sensitivity Analysis and Model Validation

Identifying Susceptibility in Epidemiology Studies: Implications for Risk Assessment. Joel Schwartz Harvard TH Chan School of Public Health

Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals

RISK PREDICTION MODEL: PENALIZED REGRESSIONS

Screening Programs background and clinical implementation. Denise R. Aberle, MD Professor of Radiology and Engineering

The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression models in credit scoring

Hierarchical Generalized Linear Models for Behavioral Health Risk-Standardized 30-Day and 90-Day Readmission Rates

A Comparison of Linear Mixed Models to Generalized Linear Mixed Models: A Look at the Benefits of Physical Rehabilitation in Cardiopulmonary Patients

A Practical Guide to Getting Started with Propensity Scores

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance

Quantifying the added value of new biomarkers: how and how not

AZ-CAH Operational Performance Review. Howard J. Eng, Stephen Delgado and Kevin Driesen

Best on the Left or on the Right in a Likert Scale

GSK Medicine: Study Number: Title: Rationale: Study Period: Objectives: Indication: Study Investigators/Centers: Research Methods: Data Source

Troubleshooting Audio

Biostatistics 2 nd year Comprehensive Examination. Due: May 31 st, 2013 by 5pm. Instructions:

NQF-Endorsed Measures for Endocrine Conditions: Cycle 2, 2014

Question Sheet. Prospective Validation of the Pediatric Appendicitis Score in a Canadian Pediatric Emergency Department

4Stat Wk 10: Regression

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

Case Studies of Signed Networks

Hospital Readmission Ratio

Real World Patients: The Intersection of Real World Evidence and Episode of Care Analytics

Data Fusion: Integrating patientreported survey data and EHR data for health outcomes research

Quality Metrics & Immunizations

Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008

Lab 8: Multiple Linear Regression

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

Logistic regression. Department of Statistics, University of South Carolina. Stat 205: Elementary Statistics for the Biological and Life Sciences

3. Model evaluation & selection

Propensity score analysis with the latest SAS/STAT procedures PSMATCH and CAUSALTRT

Model reconnaissance: discretization, naive Bayes and maximum-entropy. Sanne de Roever/ spdrnl

Temporal Evaluation of Risk Factors for Acute Myocardial Infarction Readmissions

Type of intervention Treatment. Economic study type Cost-effectiveness analysis.

Supplementary Online Content

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Setting The setting was primary care (general medical practice). The economic study was carried out in Germany.

TOTAL HIP AND KNEE REPLACEMENTS. FISCAL YEAR 2002 DATA July 1, 2001 through June 30, 2002 TECHNICAL NOTES

Smarter Big Data for a Healthy Pennsylvania: Changing the Paradigm of Healthcare

Allscripts Version 11.2 Enhancements Front Desk and Medical Records Staff

Nancy Hailpern, Director, Regulatory Affairs K Street, NW, Suite 1000 Washington, DC 20005

This memo includes background of the project, the major themes within the comments, and specific issues that require further CSAC/HITAC action.

Sepsis 3.0: The Impact on Quality Improvement Programs

NQF-ENDORSED VOLUNTARY CONSENSUS STANDARD FOR HOSPITAL CARE. Measure Information Form Collected For: CMS Outcome Measures (Claims Based)

Breiman s Quiet Scandal: Stepwise Logistic Regression and RELR

Drug Seeking Behavior and the Opioid Crisis. Alex Lobman, Oklahoma State University

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Analytic Strategies for the OAI Data

Reliability and validity of the International Spinal Cord Injury Basic Pain Data Set items as self-report measures

Uses of the NIH Collaboratory Distributed Research Network

Exploring the Relationship Between Substance Abuse and Dependence Disorders and Discharge Status: Results and Implications

Supplementary Online Content

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

Youth Using Behavioral Health Services. Making the Transition from the Child to Adult System

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education.

Geriatric Emergency Management PLUS Program Costing Analysis at the Ottawa Hospital

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

Supplementary appendix

Clinical Study Synopsis for Public Disclosure

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Performance of Median and Least Squares Regression for Slightly Skewed Data

Discovering Meaningful Cut-points to Predict High HbA1c Variation

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

Colon cancer subtypes from gene expression data

Supplementary Methods

Publicly Reported Quality Measures

A Predictive Model of Homelessness and its Relationship to Fatal and Nonfatal Opioid Overdose

Media, Discussion and Attitudes Technical Appendix. 6 October 2015 BBC Media Action Andrea Scavo and Hana Rohan

Building a learning healthcare system for suicide prevention

2016 Medicaid Adult CAHPS 5.0H. At-A-Glance Report

SubLasso:a feature selection and classification R package with a. fixed feature subset

CMS Issues Revised Outlier Payment Policy. By John V. Valenta

Health Reform, ACOs & Health Information Technology

An Improved Algorithm To Predict Recurrence Of Breast Cancer

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

MODEL DRIFT IN A PREDICTIVE MODEL OF OBSTRUCTIVE SLEEP APNEA

Lecture 18: Controlled experiments. April 14

Methodology for Non-Randomized Clinical Trials: Propensity Score Analysis Dan Conroy, Ph.D., inventiv Health, Burlington, MA

Reducing Readmissions and Improving Outcomes at OhioHealth Mansfield Hospital:

Predicting Breast Cancer Survival Using Treatment and Patient Factors

COGNITIVE FUNCTION. PROMIS Pediatric Item Bank v1.0 Cognitive Function PROMIS Pediatric Short Form v1.0 Cognitive Function 7a

Volitional Autonomy and Relatedness: Mediators explaining Non Tenure Track Faculty Job. Satisfaction

PRINCIPLES OF STATISTICS

Meaningful Use - Core Measure 5 Record Smoking Status Configuration Guide

SWITCH Trial. A Sequential Multiple Adaptive Randomization Trial

The complex, intertwined role of patients in research and care

Session 15 Improved Outcomes and a Proven ROI Model for Quality Improvement: Transforming Diabetes Care

Analysis and Interpretation of Data Part 1

SPARRA Mental Disorder: Scottish Patients at Risk of Readmission and Admission (to psychiatric hospitals or units)

Transcription:

ABSTRACT Paper 1089-2017 Developing a Predictive Model of Physician Attribution of Patient Satisfaction Surveys Ingrid C. Wurpts, Ken Ferrell, and Joseph Colorafi, Dignity Health For all healthcare systems, considerable attention and resources are directed at gauging and improving patient satisfaction. Dignity Health has made considerable efforts in improving most areas of patient satisfaction. However, improving metrics around physician interaction with patients has been challenging. Failure to improve these publicly reported scores can result in reimbursement penalties, damage to Dignity's brand and an increased risk of patient harm. One possible way to improve these scores is to better identify the physicians that present the best opportunity for positive change. Currently, the survey tool mandated by the Centers for Medicare and Medicaid Services (CMS), the Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS), has three questions centered on patient experience with providers, specifically concerning listening, respect, and clarity of conversation. For purposes of relating patient satisfaction scores to physicians, Dignity Health has assigned scores based on the attending physician at discharge. By conducting a manual record review, it was determined that this method rarely corresponds to the manual review (PPV = 20.7%, 95% CI: 9.9% -38.4%). Using a variety of SAS tools and predictive modeling programs, we developed a logistic regression model that had better agreement with chart abstractors (PPV = 75.9%, 95% CI: 57.9% - 87.8%). By attributing providers based on this predictive model, opportunities for improvement can be more accurately targeted, resulting in improved patient satisfaction and outcomes while protecting fiscal health. INTRODUCTION This paper describes how the GLMSELECT procedure can be used to develop a parsimonious and generalizable regression model. We describe how to use PROC GLMSELECT and related code with a real example from healthcare analytics. The Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS, pronounced H- caps) is a CMS-mandated survey sent out to a random sample of patients after a hospital stay. Three of the 29 items on the HCAHPS ask the patients how well the doctors treated them with respect, listened to them, and explained things to them, scored on a 1-4 scale. One method to increase HCAHPS physician communication scores would be to look at how each physician scores on the surveys returned by their patients, and target the low-scoring physicians for some type of communication-improvement intervention. However, even during a brief inpatient visit, a patient may see many different physicians. The challenge becomes How do we know which physicians should be given credit for which surveys? The current method of physician attribution is to use attending physician at time of discharge. However, is there good rationale that the communication skills of the attending physician at time of discharge would influence how patients would answer the HCAHPS questions? We attempted to answer that question by first conducting a manual chart review of several patients and then determining if a statistical model could accurately predict the attributing physician better than the current method. We first performed a small pilot study with 29 medical service line encounters January- March 2016. We sent those cases to a set of clinicians to review and rate which physician(s) they believed would have a significant impact on how patients would respond to the three physician communication questions. Within each encounter, physicians rated as attributable physician by abstractors were given a value of 1, and the other physicians a value of 0. Next, we used LASSO regression to determine which predictor variables from the clinical data could provide the best predictive mode. With promising results from the pilot study, we improved the model by sampling another 155 encounters across medical, surgical, and obstetrics service lines and training the model on this larger data set. BUILDING THE MODEL 1

Not only were we interested in building an accurate predictive model, but we wanted to develop a model that could be implemented in the Dignity Health Enterprise Data Warehouse to determine attributable physician across all inpatient encounters. Our data set contained 1,545 physicians across 155 encounters, which is a relatively small sample size with which to use data mining techniques to develop a generalizable model. So, we used LASSO regression k-fold cross validation for predictor selection, as well as model averaging to ensure that our final model was not overfit to the training data. The model was trained on 1,021 physicians across 100 encounters and validated on 524 physicians across 55 encounters. PREPARING THE TRAINING DATASET Potential predictors for the model were created from the electronic health record (EHR) data. Any information in the EHR that was indicative of a physician s interaction with the patient was included. Our final analysis included 26 predictor variables taken from the EHR. The SURVEYSELECT procedure can be used to split the data into training and validation sets, roughly 2:1. In order to take into account the nested nature of the data, you can specify a sampling unit. We used the financial information number (FIN, which is unique across encounters) as the sampling unit. The sample size (SAMPSIZE = 100) then randomly selects 100 FINs. This information is contained in the SELECTED variable from the OUT= data set. Two data sets are then created based on the 100 selected encounters (training data) and 55 unselected encounters (validation data). proc surveyselect data=hcahps out=train_validate method=srs sampsize=100 seed=987654 noprint outall; samplingunit FIN; data train; set train_validate; if selected = 1; data validate; set train_validate; if selected = 0; LASSO REGRESSION AND SCORING Although PROC GLMSELECT does not support logistic regression with LASSO variable selection, we found that treating our binary outcome as continuous still provided us with a well-performing model (see Friedman, Hastie, & Tibshirani, 2001). We originally intended to train the model to detect one primary physician per encounter, but our clinical abstractors often indicated that up to three physicians were equally responsible for patient communication for one encounter. We therefore trained the model to predict the primary physicians (up to three in one encounter) via a binary variable called primary. The PROC GLMSELECT code for building the regression model and also scoring the validation data is shown below: proc glmselect data=train model primary = preda predb predc predd prede predg predh predj predk predl predm predn predo predp predq predr preds predt predu predv predw predx predy @2 /selection = lasso(choose=cv stop=cv ) stats = all cvmethod = INDEX(FIN) cvdetails = all 2

details = all stb; modelaverage tables=(effectselectpct(all) ParmEst(all)) refit(minpct=50 nsamples=1000) alpha=0.1; output out=train_out; score data=validate out=validate_out; quit; All two-way interactions among predictor variables are allowed via the @2 in the model statement. You can specifiy LASSO regression and predictor selection based on cross-validation via the selection = lasso(choose=cv stop=cv) statement, and cross-validation folds are determined via cvmethod = INDEX(FIN) so that there are 100 cross-validation folds based on 100 unique FINs in the data set. With our relatively small sample size, we wanted to ensure that the final model was parsimonious yet robust, so we use model averaging to perform the LASSO regression with cross-validation as described in the MODELAVERAGE statement on 100 bootstrap samples from the data. The final model is selected to include only those effects which appeared in at least 50 percent of the models, and that model is refit with 1000 bootstrap samples to provide the final coefficients. Predicted probabilities from the final model are saved to the TRAIN_OUT data set using the OUTPUT statement. The final model is used to score the validation data set and save the results to a new table using the SCORE statement. RESULTS You can request to see the Effect Selection Percentage in the MODELAVERAGE statement tables=(effectselectpct(all). We can see that there are four effects that occur in over 50% of the models. Effect Selection Percentage Effect Selection Percentage predb 3.00 predb*prede 1.00 predb*predf 3.00 prede*predg 2.00 predb*predd 2.00 predh 1.00 predf*predh 1.00 predi 3.00 predf*predi 1.00 predj 13.00 predb*predj 4.00 predg*preda 3.00 predf*res 2.00 predz 2.00 predd*predz 1.00 predy 11.00 predb*predy 42.00 predf*predy 1.00 predh*predy 1.00 predd*predy 1.00 predb*predx 3.00 predf*predx 1.00 predw 6.00 predb*predw 2.00 3

predd predg 1.00 predy*predd 2.00 predd*predc 80.00 predj*predc 1.00 predk*predc 1.00 predy*predc 2.00 predk 100.0 predb*predk 93.00 predc*predk 6.00 predh*predk 1.00 predx*predk 1.00 predw*predk 1.00 predv*predk 4.00 predk*predd 21.00 predc*predk 4.00 predl 1.00 predb*predl 15.00 predd*predl 2.00 predx*predl 3.00 predc*predl 79.00 predh*predm 1.00 Output 1. Output from the MODELAVERAGE Statement The final model includes four effects that occur in the LASSO regression of at least 50 percent of the 100 bootstrap models. The model with these four effects is then refit to 1000 bootstrap resamples and the mean estimate across all 1000 samples is given via the refit(minpct=50 nsamples=1000) statement. Standard deviations, medians, and 5 th and 95 th percentiles are provided as well. You can see that the estimates do not vary widely across models, which indicates that the final mean estimates of the coefficients are stable. The GLMSELECT Procedure Refit Model Averaging Results Effects: Intercept predk predb*predk predd*predc predc*predl Average Parameter Estimates Parameter Mean Estimate Standard Deviation Estimate Quantiles 5% Median 95% Intercept -0.012672 0.004919-0.020378-0.012848-0.004496 predd*predc 0.063256 0.013101 0.041578 0.063094 0.084844 predk 0.305929 0.065426 0.201972 0.302251 0.415444 predb*predk 0.253052 0.090659 0.108725 0.250794 0.396755 predc*predl 0.074705 0.017375 0.048202 0.073859 0.105193 Output 2. Output from the MODELAVERAGE Statement ASSESSING MODEL PERFORMANCE As mentioned above, the goal of this analysis is to provide a single attributable physician for each inpatient encounter. In order to do so, we select the physician with the maximum predicted probability for each encounter, and compare this choice to the clinical abstractors top choice(s). We also compare the current method, attending physician at discharge, to the clinical abstractors top choice(s). You can select the maximum predicted physician within each encounter using the MEANS procedure. From PROC GLMSELECT, we specify that an output data set be created (output out=train_out) which will contain the predicted probabilities in the variable P_PRIMARY; 4

proc means data=train_out max noprint; class FIN; var p_primary; output out=train_out_w_max max=max; This code creates a new data set TRAIN_OUT_W_MAX that selects the maximum value for P_PRIMARY for each FIN. This gives us our attributable physician. The positive predictive value (PPV) indicates the performance of the model in detecting true positives, i.e., the probability that the physician selected by the model was selected by the clinical abstractors. Algorithm PPV = 89/100 = 89%, 95% CI[82%, 95%] Attending PPV = 65/100 = 65%, 95% CI[55%, 74%] This means that out of 100 encounters, the model corrected selected one of the top physicians 89% of the time, whereas the attending was one of the top physicians only 65% of the time. You can also estimate model performance on the scored validation data set which is created by score data=validate out=validate_out. Based on the scored validation data: Algorithm PPV = 44/55 = 80% [67%, 90%] Attending PPV = 33/55 = 60% [46%, 73%] The PPV for the algorithm decreases somewhat in the validation data, but the relatively high PPV indicates that the model still has good predictive validity with new data. CONCLUSION We have shown how PROC GLMSELECT can be used to select predictor variables for a LASSO regression model, using k-fold cross-validation that accounts for real groups in the data. We also showed how model averaging can be used to estimate the model on many bootstrap data sets and select predictors that appear in the majority of the models. This application of data mining techniques was used to develop a scoring algorithm that will be implemented in the Enterprise Data Warehouse at Dignity Health to provide better physician attribution. REFERENCES Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning (Vol. 1). Springer, Berlin: Springer series in statistics. ACKNOWLEDGMENTS Thanks to our clinical abstractors for helping us obtain our training data set. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Ingrid C. Wurpts, PhD Dignity Health 602-217-1708 Ingrid.wurpts@dignityhealth.org 5