Statstcs & Analyss Analyss of Correlated Recurrent and ermnal Events Data n SAS L Lu 1, Chenwe Lu 2 1 he EMMES Corporaton, Rockvlle, MD 2 Core Genotypng Faclty, Dvson of Cancer Epdemology and Genetcs, Advanced echnology Program, SAIC Frederck, Inc., NCI-Frederck, Frederck, MD ABSRAC Recurrent events data have been ncreasngly mportant n clncal studes. here are many methods to analyze ths type of data. Several papers have been presented about how to perform repeated events analyss n SAS. In clncal studes, we may encounter recurrent dsease epsodes n patents wth a termnal event such as death, and the termnal event s often strongly correlated wth the recurrent event process. In ths paper, usng a smulated organ transplant dataset as example, we demonstrate how to model and analyze correlated recurrent and termnal events data n SAS.. INRODUCION Recurrent event data are commonly encountered n clncal and observatonal studes, such as repeated tumor occurrences, repeated hosptalzatons and multple rejecton epsodes after organ transplant. he observaton of recurrent events could be dsrupted by loss to follow-up, end of study, or a termnal event such as death. Analyss focus s usually ether on falure tme usng standard survval analyss or recurrent event process usng Mean Cumulatve Functon for the number of events to model the process [1]. In many nstances, the termnal events may have nteracton wth the recurrent process, thus need to be treated as nformatve censorng. Analyzng the data based on recurrent events or termnal event separately may lead to based estmates when such nformatve censorng exsts. It s mportant to take nto account of both termnal events and recurrent events when suspects of ther nteracton reasonably arse based on doman knowledge. Fralty models have been proposed and successfully used n the analyss of correlated falure tme data. he fralty approach ams to account for heterogenety caused by unmeasured covarates. hey are extensons of the proportonal hazard model. he recurrent event process can be modeled by a random effects ( fralty proportonal hazards model. In the presence of dependent termnal event, the random effects are also ncorporated nto the model for the termnal events [2]. Such models are conceptually shared fralty models. hey are useful for assessng the covarate effects on both processes as well as the level of ther correlaton. A novel Gaussan quadrature estmaton method has been proposed by Lu and Huang [3] for varous fralty proportonal hazards models. hs estmaton method s relatvely straghtforward and has been mplemented n SAS Proc NLMIXED. Based on ther work, we present n ths paper a smple SAS macro to conduct the analyss and generate addtonal hazard and survval plots for the analyss. DEFINIION As proposed n [2], a random effect was shared by the proportonal hazard models of both recurrent events and termnal events as seen below: r t = β v r (t (1 ( λ ( t = β γv λ ( t (2 where β are coeffcents of observed covarates, r (t and λ (t are baselne hazards for recurrent and termnal event processes, respectvely. he correlaton between these two processes s ntroduced by the shared fralty ν, whch can have a dfferent mpact on r (t and λ (t due to coeffcent γ. 1
Statstcs & Analyss o estmate the parameters, a pecewse constant baselne hazard can be adopted. In the case of only tmendependent covarates are consdered, lkelhood can be smplfed as follows: L Where lj s: l j = δ j [ α [ β n j = 1 l j v log r (t γ v log λ (x f θ ( υ dv (3 j ] ] - - Usng Gaussan quadrature technques, the lkelhood can be approxmated by a weghted average of the ntegrand assessed at Q predetermned quadrature pont u q over the random effect, thus lˆ j = δ j [ α [ β γ u u q q log r (t log λ (x j ] ] - - he above equaton (5 can be specfed by smple programmng statements n SAS PROC NLMIXED. he lkelhood (3 can be maxmzed and the parameters of beta, alpha, gamma and other baselne hazard pecewse constants can be readly estmated and output from PROC NLMIXED. A SAS MACRO O PERFORM HE ESIMAES We put the data preparng and estmatng process usng PROC NLMIXED n a smple SAS macro, the code of the macro defnton s lsted n the appendx. As shown n the code, the quantles for recurrent events and termnal events are obtaned separately usng PROC UNIVARIAE. hose qunatles are stored n macro varables and were later used to calculate ndcator of whch quantle an event fall and the correspondng duraton of the quantle, so that the baselne hazard and cumulatve hazard can be calculated. he defned macro takes the followng parameters: nds s the nput dataset, t should nclude all the recurrent events, censorng and termnal event observaton, user should frst check the data ntegratty of ths dataset, e.g, recurrent event tme should be smaller than the censorng or death tme of the same patent ; dvar s the patent dentfer varable; tmevar s the tme varable; statusvar s the event status varable, s for censorng, 1 s for recurrent event and 2 s for termnal event; covar s the covarate, such as treatment group, currently only one covarate can be specfed, t s easy to modfy the macro to take more covarates; npar s the nput data set wth the start value of the parameters; parest s the output dataset name to hold parameter estmates; nu_est s the output dataset name to hold the random effects estmates for each ndvdual; cumh s the output dataset name to hold the estmated recurrent event cumulatve hazard functon; outs s the output data set name to hold the estmated termnal event survval functon. APPLICAION EXAMPLE As an applcaton example, we generate an organ transplant data set. We assume the patents experence rejecton epsodes after transplant, some of them have termnal event such as death. We assume a total of 75 patents are randomzed nto a treatment or a placebo group. 75% of them have nether epsode nor death. For the other 25% patents, the number of recurrent epsodes ranges from 1 to 6, among them 6% of patents have only one epsode, 3% of patents have 2 to 3 epsodes and 1% of patents have over 4 epsodes. Of those who have recurrent rejecton epsodes, 75% s censored n the end, 25% have termnal event of death. α α β β v γ v v γ v R Λ R Λ (x (x (x (x (4 (5 2
Statstcs & Analyss he generated dataset RANS has 5 varables, ptd s the patent dentfer, endt s the tme to recurrent event, censor or death, treat s the group varable (1 for treatment, 2 for placebo, event s the status varable, t can be for censorng, 1 for recurrent event and 2 for termnal event. Please note ths dataset has one row for each epsode, censor or death, t mght have several rows for the same patent. Based on the assumpton that the baselne hazard constants are very small, and may ncrease slghtly wth tme, we frst specfy the followng ntal value for the parameters n a data set usng data steps as below: data ntpar; nput parameter $ estmate @@; datalnes; r1.1 r2.1 r3.1 r4.2 r5.2 r6.2 r7.3 r8.2 r9.3 r1.3 h1.2 h2.2 h3.2 h4.3 h5.3 h6.3 h7.4 h8.4 h9.4 h1.5 beta1 1 alpha1 1 gamma 1 vara 1 ; hen the macro s called as follows: %recur(trans, ptd, endt, event, treat, ntpar, est1, est_nu, cumh, outs; Whle runnng the above code, t s always a good practce to check the log fle and output from PROC NLMIXED about the optmzaton convergence and warnng messages. he runnng tme of the above macro depends on the sze of the nput dataset, the ntal values specfed. If the process takes too long, t may be a good dea to comment out the random statement n the macro defnton, call the macro and save the estmate output n a dataset. hs process should run very quck, the estmate output should be closer to the optmal values than we could ntally specfy wth guess. Next we could run the macro wth random statement agan usng the prevous estmate output as npar. Part of the output from the above macro call s lsted n table 1. he estmated beta1 s.9837 (p=.7, the treatment has sgnfcant effect on the hazard of recurrent epsodes. he estmated alpha1 s.9776 ( p<.1, the treatment also has sgnfcant effect on the hazard of termnal events. he hazards of recurrent epsode and death are postvely assocated (gamma=1.261, p<.1, ndcatng that conductng the recurrent events and termnal events jont analyss s necessary. proc sort data=outs; by endt; proc sort data =cumh; by endt; axs1 label=(angle=9 rotate= 'Estmated Survval ' mnor=none; axs2 label=('me (Days' mnor=none order=( to 2 by 4; axs3 label=(angle=9 rotate= 'Estmated Cumulatve Hazard' mnor=none; symbol1 value=dot c=black h=.1n =stepj r=1; symbol2 value=crcle c=blue h=.1n =stepj r=1; proc gplot data=outs(where=(event=2; ttle "Survval Plot for termnal events"; plot pred*endt=treat / frame cframe=lgr legend vaxs=axs1 haxs=axs2; symbol1 value=dot c=black h=.1n =none r=1; symbol2 value=crcle c=blue h=.1n =none r=1; proc gplot data=cumh(where=(event =1; ttle "Hazard plot for recurrent events"; plot pred*endt=treat / frame cframe=lgr vaxs=axs3 haxs=axs2; 3
Statstcs & Analyss We further output the predcted survval for termnal events and the cumulatve hazard for recurrent events to the data set outs and cumh respectvely. hey can be ploted usng proc gplot as show above, the result are n Fgure 1. and Fgure 2. able 1. Parameter Estmates From Proc NLMIXED. Parameter Estmate Parameter Estmates Standard Error DF t Value Pr > t Alpha Lower Upper Gradent Actve BC r1.1359.82 915 1.66.98.5 -.25.2968.245381 r2.47.263 915 1.78.746.5 -.5.986 3.389322 r3.447.239 915 1.87.619.5 -.2.917-3.87576 r4.31.161 915 1.92.551.5-6.82E-6.626 9.335858 r5.488.249 915 1.96.5.5-3.9E-8.976-1.6195 r6.548.275 915 2..463.5 8.931E-6.187 -.477 r7.673.329 915 2.4.412.5.27.132-19.8713 r8.512.244 915 2.9.365.5.32.992-5.93225 r9.188.863 915 2.1.364.5.114.351 7.93371 r1.3372.1676 915 2.1.445.5.84.666.33675 h1 6.673E-6. 915...5.. 13167.2 h2.16 1.253E-6 915 13.1 <.1.5.14.19 225.1748 h3 7.657E-6 2.369E-6 915 3.23.13.5 3.7E-6.12 999.3356 h4. 915...5.. 82916.21 Lower BC h5. 915...5.. 29722.8 Lower BC h6. 915...5.. 12366.36 Lower BC h7. 915...5.. 95.78 Lower BC h8. 915...5.. 13663.37 Lower BC h9. 915...5.. 12813.46 Lower BC h1.117.84 915 1.39.1659.5 -.5.282 31.32537 beta1.9837.2888 915 3.41.7.5.4169 1.554 9.99867 alpha1.9776.2363 915 4.14 <.1.5.5138 1.4414 17.58792 gamma 1.261.133 915 9.93 <.1.5.8234 1.2289-3.16135 vara 1.221.327 915 3.38.8.5.4281 1.616-11.311 We should pont out that the estmated cumulatve hazard s not normalzed, fgure 1 just shows the trend of hazard. he random effect caused some fluctuaton of hazard,.e., a later data pont may have lower hazard than an earler data pont. Nevertheless, the dfferences between the treatment group (black and placebo group (blue are vsble. Fgure 1 shows the treatment group (treat=1 has lower rsk of recurrent events. Fgure 2 shows that the survval of treat group s better than placebo group, they have less rsk of sufferng termnal events. 4
Statstcs & Analyss 4 3 2 1 4 8 12 16 2 me ( Days t r eat 1 2 Fgure 1. Recurrent events Hazard by treatment group.. 99. 98. 97. 96. 95. 94. 93. 92. 91. 9. 89. 88. 87. 86. 85. 84. 83. 82. 81. 8. 79. 78. 77. 76 4 8 12 16 2 me ( Days treat 1 2 Fgure 2. ermnal events Survval by treatment group. CONCLUSIONS Recurrent events wth nformatve censorng arse n many clncal studes. Lu et al [2][3] proposed shared fralty models and Gaussan Quadrature estmaton methods for the jont analyss of recurrent events and termnal events. We have presented here a smple macro to carry out the analyss process. he macro wll make t easer to analyze such knd of data. It s only necessary to prepare the data n the format as shown n ths paper. It s also dscussed n ths paper the computaton ssues mght arse and the strateges to shorten the PROC NLMIXED runnng tme. For advanced users, the macro can be easly modfed to specfy other random effects dstrbutons, to specfy dfferent number of pecewse constant baselne hazards and output dfferent predctons expressons. 5
Statstcs & Analyss REFERENCES [1]. Gordon Johnston and Yng So. Analyss of Data from Recurrent Events, he Proceedngs of the wenty Eghth Annual SAS Users Group Internatonal Conference. Cary, NC: SAS Insttute Inc. [2]. Lu L, Wolfe RA, Huang X. Shared fralty models for recurrent events and a termnal event. Bometrcs 24; 6:747-756 [3]. Le Lu and Xuelng Huang, he use of Gaussan quadrature for estmaton n fralty proportonal hazards models. Statstcs n Medcne 27; 27(14:2665-83 ACKNOWLEDGMENS SAS s a Regstered rademark of the SAS Insttute, Inc. of Cary, North Carolna. hs project has been funded n whole or n part wth federal funds from the Natonal Cancer Insttute, Natonal Insttutes of Health, under contract N1-CO-124. he content of ths publcaton does not necessarly reflect the vews or polces of the Department of Health and Human Servces, nor does menton of trade names, commercal products, or organzatons mply endorsement by the U.S. Government. hs Research was supported by the Intramural Research Program of the NIH, Natonal Cancer Insttute, Center for Cancer Research. CONAC INFORMAION Your code requests, comments and questons are valued and encouraged. Contact the author at L Lu he EMMES Corporaton 41 N. Washngton St, Sute 7 Rockvlle MD 285 (31 251-1161 X 276 llu@emmes.com APPENDIX %macro recurr(nds,dvar,tmevar,statusvar,covar,npar,parest,nu_est,cumh,outs; * Obtan quantles for recurrent events; proc unvarate data=&nds(where=(&statusvar=1 noprnt; var &tmevar; output out=quant_r pctlpts= 1 2 3 4 5 6 7 8 9 1 pctlpre=qr; * Obtan quantles for death; proc unvarate data=&nds(where=(&statusvar= or &statusvar=2 noprnt; var &tmevar; output out=quant_d pctlpts= 1 2 3 4 5 6 7 8 9 1 pctlpre=qd; proc transpose data=quant_r out=quant_r2; data _null_; length a $ 15; retan a ' '; set quant_r2 end=last; a= trm(a ' ' col1; f last then call symput('quant_r',a; proc transpose data=quant_d out=quant_d2; data _null_; 6
Statstcs & Analyss length a $ 15; retan a ' '; set quant_d2 end=last; a= trm(a ' ' col1; f last then call symput('quant_d',a; * Calculate the duraton n each quantle nterval, the ndcator of event n each nterval; data all; set &nds; array quant_r {11} _EMPORARY_ ( &quant_r; array quant_d {11} _EMPORARY_ ( &quant_d; array dur_r {1} dur_r1-dur_r1; array dur_d {1} dur_d1-dur_d1; array event_r {1} event_r1-event_r1; array event_d {1} event_d1-event_d1; do =1 to 1; dur_r{}=; dur_d{}=; event_r{}=; event_d{}=; * For recurrent event; f event=1 then do; do =2 to 11; f &tmevar<=quant_r{} then do; event_r{-1}=1; dur_r{-1}=&tmevar-quant_r{-1}; =11; else dur_r{-1}=quant_r{}-quant_r{-1}; else do; /* If death or censored observaton */ do =2 to 11; f &tmevar<=quant_d{} then do; event_d{-1}=(event=2; dur_d{-1}=&tmevar-quant_d{-1}; =11; else dur_d{-1}=quant_d{}-quant_d{-1}; ods output ParameterEstmates=&parest; proc nlmxed data=all qponts=5; parms / data=∦ bounds r1 r2 r3 r4 r5 r6 r7 r8 r9 r1 h1 h2 h3 h4 h5 h6 h7 h8 h9 h1 vara >=; /* baselne hazard and cum baselne hazard, recurrent events */ base_haz_r=r1*event_r1r2*event_r2r3*event_r3r4* event_r4 r5*event_r5r6*event_r6r7*event_r7r8*event_r8 r9 * event_r9 r1 * event_r1; cum_base_haz_r=r1*dur_r1r2*dur_r2r3*dur_r3r4*dur_r4 r5*dur_r5r6*dur_r6r7*dur_r7r8*dur_r8r9*dur_r9 r1 * dur_r1; /* baselne hazard and cumulatve baselne hazard for death */ 7
Statstcs & Analyss base_haz_d=h1*event_d1h2*event_d2h3*event_d3h4*event_d4 h5*event_d5h6*event_d6h7*event_d7h8*event_d8 h9 * event_d9 h1 * event_d1; cum_base_haz_d=h1 * dur_d1 h2 * dur_d2 h3 * dur_d3 h4 * dur_d4 h5 * dur_d5 h6 * dur_d6 h7 * dur_d7 h8* dur_d8 h9 * dur_d9 h1 * dur_d1; mu1= beta1 * &covar nu;/* for recurrent event */ mu2= alpha1 * &covar gamma * nu; /* for death event */ loglk1=-mu1 * cum_base_haz_r; loglk2=-mu2 * cum_base_haz_d; /*log lkelhood for recurrent event */ f event=1 then loglk=log(base_haz_r mu1loglk1 loglk2 ; /*log lkelhood for death */ f event=2 then loglk=loglk1 log(base_haz_dmu2loglk2; /*log lkelhood for censorng */ f event= then loglk=loglk1 loglk2; model &tmevar general(loglk; random nu normal(, vara subject=&dvar out=&nu_est; predct loglk2 out=&outs; /*estmated survval functon of death event*/ predct loglk1 out=&cumh; /*estmated cum hazard of recurrent event*/ %m 8