A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

Size: px

Start display at page:

Download "A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY"

Egbert Craig
5 years ago
Views:

1 A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research, UCLA Neuropsychiatric Institute Los Angeles, CA Department of Biostatistics, UCLA School of Public Health, Los Angeles, CA lqtang@ucla.edu KEY WORDS: Multiple Imputation, Hot-deck Imputation, Mental Health ABSTRACT: IMPACT is a multi-center randomized controlled trial of a disease management program for late life depression. Like many longitudinal clinical trials, this study faces problems of item-nonresponse, unit-level non-response and drop out. In this paper, we compare two approaches to handle incomplete data. The first approach is based on hot-deck multiple imputation of missing response, using a modified predicted mean matching method for item-nonresponse (Bell, 1999) and the approximate Bayesian bootstrap for unit-non-response (Lavori, Dawson and Shera 1995). In the second method, we apply multiple imputation based on the multivariate normal model using SAS PROC MI software. The two methods as well as complete-case analysis are compared in a simulation study. Overall both hot-deck multiple imputations performed well with good coverage rates for Monte Carlo means and testing intervention effects. For dichotomous variables, multiple imputation under the multivariate normal model has lower coverage in three variables, which were derived from a highly skewed variable. On the other hand, complete-case analysis showed that 47% varaibles had low coverage in Monte Carlo means, but it had good coverage in the intervention effects because the intervention and the control group have the same direction in bias and biases were cancelled. 1. THE IMPACT STUDY AND MISSING DATA Project IMPACT (Improving Mood: Promoting Access to Collaborative Treatment for Late Life Depression) is a multi-center study to test the effectiveness of a new disease management model for late life depression. 1,801 patients aged 60 and older with major depression or dysthymic disorder were enrolled in the study from 8 organizations. In each clinic, participants are randomly assigned to either the new treatment model or the usual care. Usual care patients could use any primary care or specialty mental health care services available to them as usual. Intervention participants had access to a depression care manager who was supervised by a psychiatrist and a primary care expert for up to 12 months in addition to the usual care component. After 12 months, all study participants continued with their regular primary care provider in care as usual. Telephone surveys were conducted for a two-year period to assess the effects of the new depression treatment model on various health and economic outcomes, as compared to the usual care delivery models in place at each site. This paper uses IMPACT data from baseline, 3 and 6 month follow-ups. Most variables have missing rates of less than 2% at item-level missing data. The unit response rates to the 3 and 6-month follow-up assessments were 90.2% and 87.2% respectively. Missing rates in the control group are higher than in the intervention group (10.1% vs. 8.3% at Month-3 and 12.6% vs. 10.7% at Month-6), while these differences are not significant. The missing data pattern is not monotone in terms that some of patients (2.6%) missed Month-3 assessments and participated at the Month-6 (Table 1) participants completed both follow-ups with 783 in the intervention group and 738 in the care as usual group. Table 1. Unit Response Pattern Over Waves Month Response status Over all (N=1801) Interventi on (N=906) Control (N=895) 3 Responded Missing Dropout Death Responded Missing Dropout Death 23 * 9* 14* *: cumulative number including month 3 To investigate whether unit-nonresponse behavior differs between the intervention and control groups, we tested bivariate association for each independent variable in a large set of predictors with the outcome of response (coded 1 if response and 0 if nonresponse). The independent variables included in the bivariate analyses were demographic, clinical characteristics and prior treatments at baseline. For Month-6 analyses, we also included clinical characteristics and depression treatment variables at 3 month. A number of covariates are associated with nonresponse at 3 and 6 months in both the intervention and the control group although the strength and statistical significance of the associations varies somewhat. 3430

2 At 3 months, higher rates of nonresponse are associated with greater cognitive impairment at baseline in both groups. At 6 months, higher rates of nonresponse are associated with education level, functional impairment at baseline and quality of life at 3 months in both groups. There are, however, a number of covariates that are significantly associated with nonresponse in one group but not the other. For example, the nonresponse rates significantly differ across the organizations at both follow-ups in the intervention group but not in the control group. The baseline treatment preferences for counseling rather than antidepressant medication are associated with lower nonresponse rates in the intervention but not the control group. This could be explained by the fact that the intervention offers free counseling to all participants. Nonresponders in the intervention but not in the control group are more likely to state a treatment preference for "neither counseling nor psychotherapy". This might be explained by the fact that intervention patients are actively encouraged to try one of these two treatments while this is less likely to be the case in the usual care controls. Similarly, nonresponders to both follow-ups in the intervention group have significantly lower rates of prior antidepressant use than responders. This is not the case in the usual care group, and it might reflect the fact that in the intervention group where patients were actively encouraged to try treatments such as antidepressants those who do not wish to take such treatments are more likely to drop out than in the usual care group where there is less such pressure. These findings indicate that the unit nonresponse behavior in the IMPACT study differs between the intervention group and usual care group. And this behavior changes over time. We chose mixed-effects models to conduct intent-to-treat analyses. In the mixed-effects models, we specified the covariance structure within subjects using an unstructured model to account for the within-subject correlation over time. Mixed-effects models handles cases with incomplete follow-up, and it is known that it provides unbiased estimates when the missing data are missing at random (Littell, 1996). However, this assumption is correct only under the included variables in the model. If there are variables not included in the mixed-effects model but related to the missing data mechanism, the model without those variables may lead to biased estimates. In addition, such programs are designed to handle missing outcomes, and it is typical as with many other regression programs to drop subjects from analyses when any explanatory variable is missing. 2. IMPUTATION PROCEDURES In this paper, we considered two multiple imputation methods to handle missing data. The first is referred as the hot-deck multiple imputation procedure since it uses a modified predicted mean matching method for item-nonresponse (Bell, 1999, Little 1988) and the approximate Bayesian bootstrap for unit-non-response (Lavori, Dawson and Shera 1995). In the second method, we apply multiple imputation using a multivariate normal model using a missing data procedure (PROC MI) in SAS software HOT-DECK MULTIPLE IMPUTATION For the hot-deck multiple imputation procedure, we dealt with the item-level missing data and unit-level missing data sequentially. We first impute baseline data 5 times. We then conduct unit-level imputation for 3 month followup data using information from the imputed baseline. The item-level imputation at 3 month follow-up is then performed after the unit-level imputation. For 6 month follow-up data, we impute the unit-level imputation using information from 5 imputed baseline and 3 month data followed by the item-level imputation. These sequential procedures will be continued towards the end of the study, resulting five imputed data sets in each follow-up. The hot-deck imputation for item-level missing data modifies the predictive mean matching method (Bell, 1999). This method involves two steps: (1) forming imputation classes; and (2) drawing imputations at random from observed data within each class. In order to impute variable Y, for instance, we select 20 variables, X1-X20, as predictors. In step 1, each missing value of Y is predicted using a multiple regression of Y on all of the independent variables that were observed for that case. Suppose that Respondent A was missing Y and X15 to X20. Respondent A s prediction would be based on a multiple linear regression for people with complete data for Y and X1 to X14 (without regard to which of X15 to X20 were observed). Predicted values are then computed for anyone with complete data on X1 to X14 (everyone in the regression, Respondent A, and all others with the same missing data pattern as Respondent A). The predicted values from this regression are then sorted into equal-sized cells based on the predicted values from all cases. A value of Y is imputed for Respondent A in Step 2 by choosing an observed value of Y at random (with replacement) from the same cell that Respondent A fell in. To reflect the uncertainty of donor cells we created bootstrap weights before each hot-deck imputation. Then we used the 3431

3 bootstrap weights in the weighted multiple regression as well as in the selection of donors. In each follow-up month, the imputed data sets differs by only the bootstrap weight and the seed used to obtain the random number employed in the hot deck imputation. The hot-deck imputation for unit-level missing data is based on the approximate Bayesian bootstrap method, stratifying by propensity scores of response (Lavori etc., 1995). This method consists of two steps: (1) forming imputation cells by propensity-scores based stratification; (2) imputation based on the approximate Bayesian bootstrap approach within each cell. Let T be the number of follow-up waves and z t be the indicator for response to the t-th wave (coded 1 if response and 0 if nonresponse), t=1, 2,, T. In step 1, we model the propensity of response to the t-th wave through baseline and prior waves: e 1 (X 0 )=Pr(z 1 =1 X 0 ), : : e t (X 0,Y 1,,Y t-1 )=Pr(z 2 =1 X 0,Y 1, Y t-1 ), where X 0 is the baseline covariate vector and Y t is the outcome vector at the t-th wave. Given the stratification based on the quartiles of the propensity scores among the nonrespondants, at step (2), we use the algorithm of the approximate Bayesian bootstrap (Lavori, Dawson and Shera 1995) to select donors: (a) Sample n obs (the number of respondents to the wave) at random with replacement from the observed responses in an imputation cell. This forms a potential set of observed responses. (b) Sample n miss (the number of non-respondents to the wave) with replacement from the potential observed sample from (a). Here the sampled n miss are donors for the imputation. (c) Missing data are replaced by the observed data from the donors determined in (b) MULTIPLE IMPUTATION BASED ON A MULTIVARIATE NORMAL MODEL Multiple imputation based on multivariate normal model assumes that complete data follow a multivariate normal distribution (Schafer 1997). Here we chose the SAS MI procedure to generate five multiply imputed data sets. IMPACT data contains both continuous and categorical variables. Even if categorical variables do not follow the normal distribution assumption, multiple imputation may be robust when the amounts of missing information are not large (Schafer, 1997). The percentage of missing values in each variable was quite low in the IMPACT study, and we used dummy codes for categorical variables. Several variables measured as counts were not normally distributed with highly skewed distribution. The log transformation was applied to those variables, and imputed values were transformed back to the original scale. Imputations under the normal distribution resulted in imputed values in a continuous scale and we rounded off those numbers to the possible range of each variable. IMPACT study measures outcome variables repeatedly over time. However, the measured time points were only 3 times (baseline, 3 months, and 6 months), we considered a cross-sectional imputation model instead of a longitudinal one. In the imputation model, we considered longitudinally measured variable as separate variables. That is, the mean SCL-20 depression score was measured at baseline, three months, and six months and treated as three different variables in the model. In some sense, this model may be more general than a longitudinal model, because it considers all possible correlations among variables. 3. APPLICATION This section describes the implementation of the two imputation procedures and presents numeric results from the data analyses HOT-DECK MULTIPLE IMPUTATION To conduct the unit-level imputation, we first fit the multiple logistic regressions to estimate the propensity scores of response stratified by the intervention arms. We started with a large set of independent variables to be considered for a logistic regression on the outcome of response (coded 1 if response and 0 if nonresponse). In order to control for multiple comparisons, we only included in the modeling procedures with a bivariate association with the response that was partially statistically significant (P-value<0.1) The final model included the predictors that were significant (P-value<.05) in at least one of the 5 multiply imputation data sets for either intervention group or the care as usual group. We also forced in independent variables of gender and two design variables: recruitment method (screening or referral) and site (7 dummy variables for 8 sites). To form the imputation cell, we used the intervention status and gender as the primary stratification variable because men and women may have different characteristics. Within each intervention and gender group, the 4 imputation cells were formed based on the quartiles of the estimated propensity scores e t (X 0,Y 1,,Y t-1 ) among the nonresponse group (see Table 2). Averaged across 5 imputation data sets, the propensity scores differ by 2 percent between the intervention group 3432

4 and care as usual group (mean(sd) =.90 (.04) vs..92(.05) at Month-3, and means(sd) =.87 (.08) vs..89 (.08) at Month-6). Table 2 shows the distribution of the possible matches in each imputation cells, stratified by intervention status, gender and the quartiles of propensity s cores. Table 2. Imputation Cells Stratified by Gender- Propensity Scores Possible matches Month 3 Month 6 Grou p Sex Quartile Group Missing wave Obs. Missing wave Obs. UC M F IV M F All MULTIPLE IMPUTATION BASED ON A MULTIVARIATE NORMAL MODEL The imputation model based on multivariate normal model contained 91 variables. It includes an indicator variable for intervention status and 35 variables of baseline characteristics, 20 outcome variables at baseline, 17 outcome variables at Month-3 follow-up, and 18 outcome variables at Month-6 follow-up. All variables had missing rates less than 2%. Outcome variables at each time point contained 8-9 primary outcome variables of our main interest as well as 7-11 secondary outcome variables for later analysis. Five outcome variables of main interest were derived variables from three variables. The model contained those three original variables, and five outcome variables were derived from them after imputation. Two outcome variables (dysthymia and current employment status) were measured only at baseline, and major depression (SCID) was not measured at 3 month follow-up, resulting different number of outcome variables at each time point. Some outcome variables were composite measures from many item variables. Since much bigger model with each item variables had a problem in convergence of the EM algorithm, we did not consider item variable imputation here. The convergence of data augmentation was checked with time-series plots and autocorrelation plots. The first thousand iterations were discarded as a burn-in period. We considered a single chain multiple imputation and 5 imputed data sets were generated at every one thousand iterations. Imputation of categorical variables could generate unreasonable imputed values. For example, missing ethnicity might result in two ethnicities. In that case, imputed values were considered as a probability and randomly redrawn ethnicity variables until only one ethnicity were chosen. In Month-3 and Month-6, 11 and 12 participants were deceased, respectively, and their values should not be imputed. Since the MI procedure did not allow us to choose who should not be imputed, those people s values were deleted after imputation. That is, final imputed data were missing for those participants on follow-up measures after their death RESULTS We conducted outcome analyses using the data sets imputed by two imputation procedures. Dependent variables in IMPACT included 9 binary and 3 interval valued variables: self reported use of antidepressants or psychotherapy in the last 3 months, potentially effective use of antidepressants or psychotherapy, SCL-20 depression scores, rates of remission of depression (defined by an SCL-20 depression score <0.5), rates of treatment response (defined as a 50 % or greater drop in SCL-20 depression score from baseline), the proportion of subjects who met diagnostic criteria for major depression on the SCID, health related functional impairment, and quality of life at 3 and 6-months. For each dependent variable, we conducted an intent-to-treat analysis of repeated measures. We fitted mixed-effects regression models (or mixedeffects logistic regression for dichotomous variables) using baseline, 3 and 6 months followup data with regression adjustment for recruitment method (screening or referral) and participating study organizations. In the mixed-effects models, to account for the within-subject correlation over time, we specified the covariance structure within subjects using an unstructured model. Table 3 presents the imputation effects on unadjusted analyses for 3 key outcome variables (DEPTX=any depression treatment, MAJDEP= Major depression, SCL= SCL-20 depression score). Point estimates from two imputation methods were compared to the complete-data analysis. SE0 stands for the stander error from complete-data analysis. λ is the estimated fraction of missing information. Both methods produced similar point estimates. On the average, the hotdeck multiple imputation produced higher standard errors, larger λ and slightly less significant 3433

5 intervention effects. Among 12 dependent variables, the difference of t-statistics, on the average, is less than Table 3 Imputation Effects for Key Outcome Varaibles from Unadjusted Analyses MI Hot-deck MI multivariate normal Difference on point Difference on point Y Data estimations SE/SE0 100λ estimations SE/SE0 100λ DEPTX03* All UC I DEPTX06** All UC I MAJDEP06** All UC I SCL03* All UC I SCL06** All UC I * Represents 3-month follow-up, ** Represents 6-month follow-up. 4. SIMULATION To compare bias and coverage rates in two imputation methods in the IMPACT study, we performed a Monte Carlo simulation study. The group of 1521 participants who completed all two follow-ups was considered to be the population for this simulation study (intervention group=783, care as usual group =738). For each intervention arm and follow-up wave, simple random samples of n=250 were drawn without replacement. After a sample was drawn, the following missing data mechanism was imposed for each person. The missing mechanism mimicked MAR mechanism with estimated coefficients in the multiple logistic regressions of estimating propensity scores of response from all participates, adjusting intercept such that the percentage of the missingness rates matched the IMPACT observed data. The procedure was conducted separately by the intervention and usual care group as well as the follow-up months. The missingmess rates matched the true data with 8% and 11% for intervention group at months 3 and 6, and 10% and 11% for care as usual group. After imposing a pattern of unit-missingness, the two imputation methods were applied and point estimates (the group means) and intervention effects were calculated for each dependent variable. The estimated variances were adjusted by the finite population correction factor because the sampling fraction is large in the simulation. The procedure was carried out 1000 times. Tables 4 and 5 display the simulation results from point estimation and intervention effects for 3 key outcome varaibles. The tables report the Monte Carlo mean estimated bias, the actual coverage rate (cvg.) indicating the percentage of 95% intervals out of 1000 that covered the true estimand, the length of the 95% confidence interval (lngh). Overall hot-deck multiple imputations performed well with good coverage rates. Multiple imputations under the multivariate normal model showed good coverage in all continuous variables. For dichotomous variables, it has lower coverage in three variables, which were derived from highly skewed variable, the number of all mental health therapy visit. Imputed items tended to be more positive than the observed ones; the percentage of zero was 40.9% among imputed cases and 66.7% among observed cases. Since the distribution of this variable is very skewed, the mean is much higher than the median, and it might end up imputing higher values than observed values. Similar patterns were shown in other simulated data with large bias in this variable and variables derived from this variable. On the other hand, 47% varaibles in complete-case analysis showed lower coverage in Monte Carlo means, but it showed good coverage in the intervention effects because the intervention and the control group have the same direction in bias and biases were cancelled. 3434

6 Table 4. Point estimates for key outcome varaibles Complete-case analysis Hot-deck Multivariate Normal Y Data bias Cvg. lngh bias Cvg. lngh 100λ bias Cvg. lngh 100λ DEPTX03 All UC IV DEPTX06 All UC IV MAJDEP06 All UC IV SCL03 All UC IV SCL06 All UC IV Table 5. Intervention effects for key outcome varaibles Complete-case analysis Hotdeck Multivariate Normal Y bias cvg. lngth bias cvg. lngth 100λ bias cvg. lngth 100λ DEPTX DEPTX MAJDEP SCL SCL REFERENCE Little R. J. (1988) Missing data adjustments in large surveys,.j Business and Economic Statistics, 6, Bell R (1999). Presentation at Depression PORT Methods Workshop (I). RAND, Santa Monica, CA. Rubin DB. (1987), Multiple imputation for nonresponse in surveys. New York: J Wiley & Sons. Lavori P., Dawson R. and Shera D. A multiple imputation strategy for clinical trials with truncation of patient data. Statistics in Medicine 1995; 14: Littell, R. C., Milliken, G. A., Stroup, W. W., and Wolfinger, R. D., (1996) SAS System for Mixed Models, SAS Institute, Inc. Schafer, JL, (1997) Analysis of Incomplete Multivariate Data, Chapman & Hall. 3435

S Imputation of Categorical Missing Data: A comparison of Multivariate Normal and. Multinomial Methods. Holmes Finch.

S Imputation of Categorical Missing Data: A comparison of Multivariate Normal and. Multinomial Methods. Holmes Finch. S05-2008 Imputation of Categorical Missing Data: A comparison of Multivariate Normal and Abstract Multinomial Methods Holmes Finch Matt Margraf Ball State University Procedures for the imputation of missing