Abstract. Introduction A SIMULATION STUDY OF ESTIMATORS FOR RATES OF CHANGES IN LONGITUDINAL STUDIES WITH ATTRITION

A SIMULATION STUDY OF ESTIMATORS FOR RATES OF CHANGES IN LONGITUDINAL STUDIES WITH ATTRITION Fong Wang, Genentech Inc. Mary Lange, Immunex Corp. Abstract Many longitudinal studies and clinical trials are designed to estimate rates of change over time in one or more outcome variables in several groups. Most such studies have incomplete data because some patients drop out before completing the study. The missing data may induce bias and inefficiency in naive estimates of important parameters. We used Monte Carlo methods to compare the bias and efficiency of several two-stage estimators of the effect of treatment on the mean rate of change when the missing data arise from one of four processes. In general, the weighted least squares estimator does relatively well, as does an analysis-of-covariance type estimator proposed by Wu et al. (1989). Introduction In longitudinal studies, responses are measured on each individual on multiple occasions. In this paper, we consider longitudinal clinical trials designed to estimate and compare rates of change of response over time, as measured by the slope of the linear regression of response on time, in two treatment groups. In designed studies with each individual measured at every occasion, an optimal estimate of each group's mean slope is obtained by first computing a least squares slope for each individual in the group, then averaging these least squares slopes (Lange and Laird, 1989). Attrition in studies can cause both bias and loss of efficiency, as well as complicate the analysis. In a study similar to ours, Wu and Carroll (1988) compared the bias and efficiency of three estimators: unweighted least squares, weighted least squares, and a pseudo maximum likelihood approach, through a Monte Carlo experiment. Wu and Bailey (1988) compared five estimators: unweighted least squares, weighted least squares, pseudo maximum likelihood, and two noniterative univariate weighted least squares estimators designed to account for time of dropout. They used simulated data subject to missing data processes that are missing completely at random and nonignorable (Little and Rubin, 1987). This paper builds on their work by using Wu and Carroll's (1988) basic study design, but examines four types of mechanisms for missing data and considers several additional estimators. We compare seven two-stage estimators: (1) the unweighted least squares estimator, (2) the completecase estimator, (3) the restricted maximum likelihood (REML) estimator, (4) the generalized least squares estimator, (5) the weighted least squares estimator, and (6)- 1167

(7) the two estimators (ANCOVA and weighted mean) proposed by Wu and Bailey (1988). The Model We assume that the experimental units in a study are assigned to two groups with sample sizes nk, where k=1 and k=2 represent a control group and a treatment group, respectively; the total sample size is n=n1 +n2. There are T measurements taken at pre specified times for every individual with complete data. We now give the model for the complete data. Stage one: A Tx1 complete observation vector for the ith individual in the kth group satisfies Yik = ZBik + eik where i=1,oo,nk; k=1,2; eik - N(O, ii); B'ik = (Bi1 k, Bi2k>; and Z is a Tx2 design matrix determined by the times at which the measurements are taken. Stage two: The unobservable vector Bik, representing the true intercept and slope of the corresponding individual, is bivariate normally distributed with mean vector a'k= (u1 k u2k> and covariance matrix D. The main parameter of interest is the difference in mean slopes, ~=a21 - a22. Missing Data Mechanisms For simplicity, we assume missing data are due to attrition only. Because most estimators are based on first computing a least squares slope for each individual, we assume that each individual in the study has at least three observations. The types of missing data mechanisms we study can be classified using the typology described in Rubin (1976) and Little and Rubin (1987): a. Missing completely at random (MCAR): Attrition occurs at random and does not depend on an individual's characteristics. b. Missing at random (MAR): Attrition occurs at random, but with a probability that depends on an individual's previously observed response. c. Two types of nonignorable (NOI) missingness: Here the probability of attrition depends on unobserved characteristics (true slope and/or intercept) of the individuals under study. We use A 1 through A4 to denote the four missing data processes. Following Little and Rubin (1987) and Wu and Carroll (1988), we assumed that the probability 7t of attrition during any time interval (tj, tj+1) was determined by a probit model: n=ci>(zijk>, where Zijk= y 1 ~i1 k+ Y 2~i2k+ Y 3 Yi,j-1,k+ y 4j, and a different subset of the Ys is set to zero for each process. We chose the parameters of the model so that, at the end of the study, the attrition rate for each process was approximately 50%. Process A1 is MCAR; the conditional probability 7t that an individual drops out in 1168

the intenal (ti' ti+1), given they are still in the study at ti is constant (Y1 =Y2=Y3=0). Under the MCAR assumption, the sampling distribution of the obsened data is a marginal of the complete data distribution. This implies that moment-based estimators will be unbiased in this missing data setting. Process A2 is MAR; here, it depends on the previously obsened Yiik (Y1 =Y2=0). Individuals with lower values are more likely to drop out. When data are MAR, the sampling distribution of the obsened data no longer equals the marginal distribution, but depends upon the missing process (Rubin, 1976). Hence we might expect moment estimators to be biased. However, the likelihood of the obsened data is proportional to the marginal likelihood; thus it is not necessary to specify the missingness process in order to obtain valid MLE's. Processes A3 and A4 are both nonignorable. Here, it depends upon an individual's true, unobsened parameter vector ~ik (Y3=0). For A3, we assume the dependence of it on ~ik is the same for both treatment groups; for A4 we assume the two treatment groups have different dependence of it on ~ik. This might occur, for example, in an unblinded study. Monte Carlo Experiment Our study was motivated by design and analysis problems encountered in clinical trials of lung diseases. The data are simulated using a model developed by Wu and Carroll (1988) for a study of anti proteolytic replacement therapy among individuals with PiZ emphysema. Early studies (Laurell and Eriksson, 1963) showed that individuals with PiZ phenotype tend to develop severe u1-antitrypsin deficiency and hence pulmonary emphysema and more rapid decline in lung function. The clinical trial was designed to compare rates of loss of FEV1 (musec.) in therapeutic and control groups over time. The simulation model consists of patients randomly divided into two groups with equal sample sizes of 100 in each group. The study duration is three years, with obsenations bimonthly during the first 6 months, and then quarterly through 3 years. The simulation involved 600 replications under each of four types of missingness. Methods We ran the simulation using PC-SAS on a local area network. We computed all but the REML estimates using the SASIIML procedure PROC IML; we used EPILOG- PLUS to obtain the REML estimates. Disk space was at a premium, but computer time was free, so we set up the replications to run overnight on whatever dormant machines were available. After estimates from a replication were computed, they were appended to a permanent SAS dataset. We used the SAS macro language to design programs that were interruptible and restartable from the point of 1169

interruption. Thus, higher priority (daytime) tasks were not delayed, and we were able to develop and execute the analyses incrementally. Results Table 1 displays the means and standard deviations of the estimates of slope for the two groups; Table 2 presents the results for the estimated difference of slopes. As expected, when the dropout process is MCAR, all estimators are unbiased, both for the group means and their difference. The unweighted estimator is quite inefficient, even compared to the complete-case estimator. The situation is quite different when the likelihood of dropout depends on a previously observed data value (MAR). In this case, the REML estimator is unbiased and efficient for both the group means and their difference. The complete-case analysis also works well here, and except for a small bias, it is quite competitive with the REML estimator. All the other estimators suffer from considerable bias in estimating the group means, or their difference, or both; and the UWLS and weighted mean estimators are grossly inefficient for the difference. (The relatively large bias of these estimators for group effects is due to the dependence of missingness on the value of the last observed value. Because these estimators are based on the individual LS slopes, the leverage of the last observation is greatest for the dropouts, resulting in negativelybiased estimates.) When the dropout process is nonignorable, but the same in both groups (A3), the UWLS, ANCOVA and weighted mean estimators are all relatively unbiased for group mean slopes. The UWLS is, as usual, unacceptably inefficient. For the difference, the ANCOVA estimator is the best in terms of bias, with an acceptable efficiency. For the case in which the missing data process differs for the two groups (A4), all estimators except the inefficient UWLS have substantial bias. Among the remaining estimators, the ANCOVA is acceptably efficient, and has the smallest bias, although that bias is substantial. Discussion We have investigated the relative performance of several commonly used estimators of rates of change in the presence of attrition. Not unexpectedly, our findings concerning their relative performance depend strongly on our assumptions about the nature of the dropout process. The missing completely at random process assumes attrition from the study is at random, without regard to treatment or individual characteristics. The classical example of MCAR is patients leaving town for reasons unrelated to the study. The missing at random mechanism assumes the likelihood of dropout depends 1170

upon previously observed outcomes. Thus, in a trial designed to test effectiveness of a drug in inhibiting the decline of T 4 counts among HIV-infected patients, the MAR process could correspond to patients dropping out if their T4 count reached some threshold. Nonignorable processes assume the probability of dropout depends on unobservable quantities. For example, in a study designed to control hypertension, patients with home blood pressure kits may decide not to return for protocol measurements on the basis of their own measurements of blood pressure. In practice, investigators usually do not know the dropout mechanism, although some scenarios may be more plausible than others. The real situation is usually complex, and may involve dropouts of several types. Although there is much discussion in the literature of the MAR assumption because of the implications for validity of likelihood-based methods, whether it is more or less appropriate than a nonignorable model must depend on the particular situation. While the missing at random assumption is testable as an alternative to the missing completely at random assumption, it will generally not be possible to distinguish between ignorable and nonignorable mechanisms by examination of the data. References Lange N and Laird NM, (1989), "The Effect of Covariance Structure on Variance Estimation in Balanced Growth-Curve Models with Random Parameters," JASA, 84, 241-247. Laurel! Band Erikkson S, (1963), "The Electrophoretic Alphal-3 Lobeline Pattern of Serum in Alpha 1-trypsin Deficiency," Scand J of Clinical Laboratory Investigation, 15, 132-140. Little RJA and Rubin DB, (1987), Statistical Analysis with Missing Data, New York: Wiley & Son. Rubin DB, (1976), "Inference on Missing Data," Biometrika, 63(3), 581-592. Wu MC and Bailey KR, (1988), "Analyzing Changes in the Presence of Informative Right Censoring Caused by Death and Withdrawal," Statistics in Medicine, 7,337-346. Wu MC and Bailey KR, (1989), "Estimation and Comparison of Changes in the Presence of Informative Right Censoring: Conditional Linear Model," Biometrics, 45, 939-955. Wu MC and Carroll RJ, (1988), "Estimation and Comparison of Changes in the Presence of Informative Right Censoring: Modeling the Censoring Process," Biometrics, 44,175-188. Acknowledgements We would like to thank our colleagues at Syntex, Immunex, and Genentech for their generous support in the preparation of this paper. SAS and SAS/IML are registered trademarks of SAS Institute Inc., Cary, NC, USA. EPILOG-PLUS is a registered trademark of Epicenter Software, Pasadena, CA, USA. 1171

Table 1. Mean and Standard Deviation (SO) of the Estimates for the Treatment (Trt) and Control (CtI) Groups A1-MCAR A2- MAR A3 - NOI A4- NOI.. Constant Yi j-l k Bilk & Bi2k Bil k. Bi2k & group Tn Ctl Tn Ctl Tn Cli Tn Cli True Value -45.0-90.0-45.0-90.0-45.0-90.0-45.0-90.0 UWLS -46.3-90.5-103.1-148.7-46.8-90.6-47.6-91.0 SD 22.7 20.3 29.6 26.9 28.2 30.3 23.3 32.1 GLS -46.0-90.3-64.0-109.1-22.1-58.3-36.3-58.2 SD 12.5 10.8 18.8 18.8 15.9 17.3 14.8 17.1 WLS -46.0-90.1-61.6-106.5-32.6-73.0-45.9-72.8 SD 12.5 10.7 14.2 13.3 12.7 12.1 11.8 12.1 ANCOVA -46.3-90.3-90.5-137.2-48.2-94.5-51.5-83.9 SD 14.2 12.7 25.0 25.9 17.2 18.8 14.7 19.0 Weighted Mean -46.1-89.9-94.0-132.1-42.3-80A -47.1-77A SD 13.5 12.0 20.2 24.5 16.8 16.8 13.2 15.6 REML -45.8-89.6-49.3-93.6-15.0-48.1-31.1-50.4 SD 11.7 10.8 13A 12.5 12.0 11.6 12.2 11.8 Complete Case -45.9-89.9-49A -91.9-28A -67A -46.0-67.5 SD 14.3 12.8 13.1 12A 12.8 13.1 12.1 13.0 Table 2. Bias (SO) of the Difference between Control and Treatment Groups' A1- MCAR. A2- MAR A3 - NOI A4- NOI.. Constant Yi j-l k Bilk & Bi2k Bil k. Bi2k & group UWLS 0.8 (29.8) -0.6 (39.8) 1.2 (41.6) 1.6 (40.2) GLS 0.7 (16.6) -0.1 (18.3) 8.8 (17.5) 23.1 (17.7) WLS 0.9 (16.6) 0.1 (18.1) 4.6 (17.3) 18.1 (17.2) ANCOVA 1.0 (16.5) -1.7 (19.7) -1.3 (18.2) 12.6 (18.0) Weighted Mean 1.2 (18.1) 6.9 (31.2) 6.9 (22.7) 14.7 (20A) REML 1.2 (16.0) 0.7 (17.6) 11.9 (17.0) 25.7 (16.9) Complete case 1.0 (19.5) 2.5 (18.2) 6.0 (18A) 23A (17.9) 1172