Recursive partitioning of resistant mutations for longitudinal markers based on a U-type score

Size: px

Start display at page:

Download "Recursive partitioning of resistant mutations for longitudinal markers based on a U-type score"

Christopher Shaw
5 years ago
Views:

1 Biostatistics (2011), 0, 0, pp doi: /biostatistics/kxr011 Biostatistics Advance Access published May 19, 2011 Recursive partitioning of resistant mutations for longitudinal markers based on a U-type score CHENGCHENG HU Division of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ 85724, USA hucc@ .arizona.edu VICTOR DEGRUTTOLA Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA SUMMARY Development of human immunodeficiency virus resistance mutations is a major cause of failure of antiretroviral treatment. We develop a recursive partitioning method to correlate high-dimensional viral sequences with repeatedly measured outcomes. The splitting criterion of this procedure is based on a class of U-type score statistics. The proposed method is flexible enough to apply to a broad range of problems involving longitudinal outcomes. Simulation studies are performed to explore the finite-sample properties of the proposed method, which is also illustrated through analysis of data collected in 3 phase II clinical trials testing the antiretroviral drug efavirenz. Keywords: Antiretroviral drugs; Longitudinal data; Recursive partitioning; Repeated measurements; Resistance mutations; Tree method. 1. INTRODUCTION Highly active antiretroviral therapy (HAART), which usually consists of 3 or more different antiretroviral drugs, has been very effective in suppressing the replication of human immunodeficiency virus (HIV) and slowing the progression to acquired immune deficiency syndrome (AIDS). After HAART initiation, viral load, which is quantified as the number of HIV RNA copies per milliliter of blood plasma, quickly declines. Under the selective pressure of antiretroviral drugs, however, viruses may develop drug resistance mutations, resulting in a slower decline or even a rebound of the viral load. These mutations tend to persist after change in therapy and impact the effect of subsequent treatment regimens, as there is considerable cross-resistance among drugs within the same drug class. Furthermore, minority species of virus with drug resistance mutations, present since initial infection, may emerge as the dominant species under drug pressure. Antiretroviral resistance is a major cause of treatment failure. In clinical studies of HIV-infected patients, resistant mutations are generally monitored by sequencing the relevant portions of the HIV genome; in this paper, we focus on the reverse transcriptease (RT) region of the HIV genome, To whom correspondence should be addressed. c The Author Published by Oxford University Press. All rights reserved. For permissions, please journals.permissions@oup.com.

2 2 C. HU AND V. DEGRUTTOLA corresponding to the classes of drugs used in the studies that motivated our research. These genetic regions have a total length of several hundred codons, and high dimensionality complicates the effort to understand the relationship between HIV genotype and viral response to therapy. To apply traditional statistical methods to viral genetic data, we need to reduce the dimensionality of the viral genome and to classify the genetic sequences into a small number of relatively homogeneous classes. Such method may also be useful in stratification of patients in clinical trials and in guiding treatment choices. Recursive partitioning is of particular interest in the study of HIV-resistant mutations. Such methods naturally accommodate high-dimensional data, allow exploration of complex interactions of different codons, and provide easily interpretable results. For an introduction to recursive partitioning methods, the readers are referred to the seminal work of Breiman and others (1984), while Zhang and Singer (1999) gives a very nice summary of more recent developments. This method has been employed to investigate the association between drug-resistant mutations and a single measurement of viral load or 50% inhibitory concentration (IC 50 ) of a certain drug, which is a drug susceptibility phenotype. See Sevin and others (2000), Foulkes and DeGruttola (2002), Foulkes and others (2004), and Beerenwinkel and others (2002) for details. Recursive partitioning methods have also been developed for repeatedly measured responses. Zhang (1998) studied multiple binary responses with joint probability distribution from the exponential family. The method of Segal (1992) for longitudinal responses requires the observations to be spaced equally in time, and the covariance structure needs to be modeled. Lee (2006) proposed a recursive partitioning method based on generalized estimating equation (GEE) models, which also requires a common set of observation time points for all study participants or the covariance structure needs to be modeled. In this article, we develop a tree-based method to classify HIV genetic sequences with respect to longitudinal outcome measures such as plasma HIV-1 RNA (viral load) and CD4 count. Although in most studies, all participants follow the same schedule of clinical visits, at which the biomarkers will be measured, the actual visit time could be weeks or even months away from the scheduled time. Also, trajectory of a patient s longitudinal viral load can be quite erratic and parametric modeling could be very difficult. As examples, Figure 1 shows viral RNA trajectories of 6 subjects who started combination therapy Fig. 1. Viral load trajectories of 6 participants in the DMP-266 studies.

3 Recursive partitioning of resistant mutations 3 containing the drug efavirenz. Two subjects, represented by solid curves, kept the viral load suppressed below 400 copies/ml (the horizontal line) since a few weeks after treatment initiation. Two other subjects, represented by broken curves, initially had suppressed viral load but then it rebounded back to high levels. Viral load of the remaining 2 subjects (dotted curves) was never suppressed. We see it would be challenging to find an appropriate parametric model for viral load trajectories the variable is generally measured at different time points for different subjects, and the mean structure is hard to model. The method we propose in Section 2 is fully nonparametric and does not require modeling the mean and covariance structure; it also allows irregular times of measurement. Performance of the method is studied by simulation in Section 3 under realistic settings with moderate sample sizes, and in Section 4, the method is applied to data from 3 phase II clinical trials, from which the 6 subjects shown in Figure 1 are selected. Discussions and potential extensions are relegated to Section METHODS 2.1 Definition of U-type score For a study of n subjects, assume that the viral load of the ith subject (i = 1,..., n) was measured at baseline t i0 = 0 and n i subsequent time points 0 < t i1 < < t i,ni, with values Y i0, Y i1,..., Y i,ni, respectively. Viral load declines in the first few months of successful treatment; we restrict our attention to this phase of the study to focus on therapeutic efficacy. Similar to May and DeGruttola (2007), for 2 different subjects indexed by i and j and one viral measurement of each person measured at t ik > 0 and t jl > 0, respectively, we define 1 if Y ik < Y jl and t ik t jl, or Y ik = Y jl and t ik < t jl ; D{(Y ik, t ik ), (Y jl, t jl )} = 1 if Y ik > Y jl and t ik t jl, or Y ik = Y jl and t ik > t jl ; 0 otherwise. This score compares 2 outcome measurements of 2 different subjects; if the ith subject achieved a lower viral load at t ik than did the jth subject at time t jl t ik, a score of 1 is assigned. On the other hand, if the ith subject had a higher viral load at t ik than did the jth subject at time t jl t ik, a score of 1 is assigned. Since viral load is expected to decline, we assign a score of 0 if a subject achieved a lower viral load level at a later time than did another subject, as we are unable to judge which performance was superior. To compare the rates of decline for any 2 subjects, we sum over scores of all pairwise comparisons between their viral load measurements D(i, j) = 1 n i n j n i n j D{(Y ik, t ik ), (Y jl, t jl )}. (2) k=1 l=1 This score compares the full viral load trajectories of the 2 subjects. A positive score implies that the ith subject had a better response to the therapy than did the jth subject. (1) 2.2 Properties of U-type score When the viral loads of 2 subjects follow exactly the same downward trend and the observations at different times follow the same process, we can easily show that the U-type score comparing them has expectation 0. Prospective studies, including clinical trials, normally require the study participants to visit clinics at a fixed set of time points for collection of laboratory and clinical data. The date at which the

4 4 C. HU AND V. DEGRUTTOLA visit is scheduled depends on the convenience of both the study subject and the clinic staff, and hence could differ from the pre-set timeline by a few days or even weeks. For the ith subject, let tik be the kth time point (k = 0,..., m) of scheduled clinical visit. Note that {ti0,..., t im } could differ from the {t i0,..., t i,ni } since the subject might miss one or more of the clinical visits. Let ik be an indicator of whether the ith subject had the kth scheduled visit. In the decline phase of an HIV study, we can generally assume that {ti0,..., t im } and { i0,..., im } are identically and independently distributed across all study participants. We also assume that the visit indicators are independent of all other data, that is, visits are missing completely at random. Let Yik be the (possibly unobserved) viral load of the ith subject at the scheduled time t ik. It can be seen that D(i, j) = (n i n j ) 1 m ml=1 k=1 ik jl D{(Yik, t ik ), (Y jl, t jl )}. If the viral load process is the same for both the ith subject and the jth subject, viral load data of the ith and jth subjects are identically distributed. For any k and l, it can be easily seen that E D{(Yik, t ik ), (Y jl, t jl )} = E D{(Y jk, t jk ), (Y il, t il )} = E D{(Yil, t il ), (Y jk, t jk )}. In the special case of k = l, this reduces to E D{(Y ik, t ik ), (Y jk, t jk )} = 0. Since the visit indicators ik s are assumed to be independent of all other data, we get E D(i, j) = 0. When the viral load processes for subjects i and j are different, assume Yik = μ i(tik ) + ɛ i(tik ) and Y jl = μ j(t jl ) + ɛ j(t jl ), where μ i( ) and μ j ( ) are deterministic functions representing the means of the time-varying viral loads for the 2 subjects, and ɛ i ( ) and ɛ j ( ) are independently and identically distributed mean-zero error processes that are also independent of all other data. If μ j (t) > μ i (t) for all positive time t in the range we consider, we can easily prove that for 1 k l m, E [D{(Yik, t ik ), (Y jl, t jl )} + D{(Yil, t il ), (Y jk, t jk )}]>0, and for 1 k m, E D{(Y ik, t ik ), (Y jk, t jk )}>0. We then have E D(i, j) > 0. Thus, we can be sure that when the viral load of subjects i and j are identically distributed, the expectation of D(i, j) is 0, and if the viral load of the ith subject has a mean curve lower than that of the jth subject, the expectation of D(i, j) is positive. These properties demonstrate that D(, ) provides a valid pairwise comparison measure between any 2 subjects. 2.3 Alternative definition of score Flexibility in the definition of the pairwise comparison score allows accommodation of special characteristics of the outcome measures. For example, in a study with longitudinal viral loads as the outcomes, we can revise the score D(, ) to take into account information about viral response. If a subject fails to attain viral load suppression below a certain threshold in the first few months of receiving a new therapy, the viral load is unlikely to become suppressed later on the same treatment, and the therapy is considered to have failed. If one subject always has higher viral load levels than does another, but neither attains viral load levels below a fixed threshold, both are considered treatment failures and we may not want to distinguish between them. Also, if the viral load of a subject who complies with therapy initially becomes suppressed but rebounds later, viral load is unlikely to be suppressed again on the same treatment, which is also considered to have failed. Again we might not want to distinguish between 2 such subjects. Regardless of the nature of treatment failure, it may be most appropriate to consider any subject who failed to have had a worse response than any other subject who did not fail, within the same interval of time. Investigators may choose to consider subjects whose viral loads are never suppressed to have worse treatment response than those who have an initial viral suppression followed by a rebound. Alternatively, because subjects of the latter description are at more risk of developing new resistance mutations than are those who never responded to treatment, we may consider transient response to be worse than no response. The choice may depend on the goal of the analysis: focusing on biological activity would lead to different choices for the score definition than focusing on clinical benefit. Viral suppression usually occurs within a few weeks after initiation of an effective new treatment. In HIV/AIDS studies, the most common thresholds for viral suppression are 400 copies/ml or 50 copies/ml, depending on assay characteristics. Then definition of viral rebound varies across studies; in the studies

5 Recursive partitioning of resistant mutations 5 we consider, the definition was 2 consecutive viral loads of over 400 copies/ml (or 50 copies/ml) or one single viral load of over 4000 copies/ml after an initial viral suppression. For the ith subject, let 0 for viral suppression; V (i) = 1 for initial viral suppression followed by rebound; 2 for no suppression. Now, we can define an alternative score comparing 2 different subjects indexed by i and j: 1 if V (i) < V ( j); 1 if V (i) > V ( j); D(i, j) = 0 if V (i) = V ( j) > 0; D(i, j) if V (i) = V ( j) = 0, where D(i, j) is defined in (2). In this definition, an initial viral suppression followed by rebound is considered a response superior to no suppression. Another marker of interest in HIV studies is CD4 T-lymphocyte count, which is expected to rise after initiation of an effective therapy. A more rapid rise is considered to be a more favorable response; hence, a score can be defined as in (1), reversing the directions of the inequalities involving the outcomes. 2.4 Construction of trees Here, we introduce a recursive partitioning method to correlate baseline variables of potentially high dimensions, like viral genetic mutations, with trajectories of a longitudinal biomarker, like viral load. A splitting criterion based on the U-type score D(, ) defined in (2) measures the difference between 2 mutually exclusive groups of study subjects. Based on a viral genetic mutation or other baseline variable chosen by the splitting criterion, the cohort of all subjects (defined as the root node) is divided into 2 subgroups, called daughter nodes. The same splitting rule is then applied recursively to each daughter node to build a large tree. A pruning procedure is then developed to remove some branches to obtain a tree with proper size. The methods are motivated by those of Breiman and others (1984) and also by the survival tree methodology of LeBlanc and Crowley (1993), who used the log-rank statistic as the splitting criterion. For a study of n subjects, assume all had the relevant genetic sequence at study entry. For the ith subject (i = 1,..., n), let Z ir be an indicator of mutation at the rth codon (r = 1,..., R). Let v be an arbitrary node, which is a subset of study participants. For any r, let w r be the subset of v with all study subjects having a mutation at the rth codon: w r = {i v : Z ir = 1}. Thus, v\w r consists of subjects with a wild-type rth codon. Let 2 G(r, v) = 1ˆσ r 2 D(i, j), j v\w r i w r where ˆσ r is an estimator for the standard deviation of the sum of U-type scores in the formula above. As for general tree methods, a binary split of v can also be defined by dichotomizing a baseline continuous or ordinal variable Z, where the 2 daughter nodes are defined as {Z z} and {Z < z} for a threshold z or the split can be defined by dichotomizing a baseline nominal (unordered categorical) variable W, where the 2 daughter nodes are defined as {W A} and {W A} for a subset A of all levels of W. (3)

6 6 C. HU AND V. DEGRUTTOLA The variance estimator ˆσ r can be obtained from general theory on two-sample U-statistics, which is covered in textbooks like Serfling (1980). For an arbitrary binary split of v into w and v\w, let m 1 = w and m 2 = v\w, the sizes of w and v\w, respectively, and let m = m 1 + m 2. Define j v\w D(i, j). We have m(u w E U w ) d U w = (m 1 m 2 ) 1 i w N(0, σ10 2 /p + σ 01 2 /(1 p)), where σ10 2 = Cov{D(i, j), D(i, j )}, σ01 2 = Cov{D(i, j), D(i, j)}, and p is the limit of m 1 /m. Here, i, i w, and i i, while j, j v\w, and j j. These quantities can be easily estimated from the data: ˆσ 10 = (m 1 ) 1 i w {m 2(m 2 1)} 1 j j D(i, j)d(i, j ) Uw 2, and ˆσ 01 = (m 2 ) 1 j v\w {m 1(m 1 1)} 1 i i D(i, j)d(i, j) Uw 2. As mentioned earlier, a binary split can be defined based on any ordered or unordered baseline variable (Breiman and others, 1984). For brevity of notation, however, we assume that there is no baseline variable other than those mutation indicators. We now split the node v based on the mutation status at the r th codon, where G(r, v) = max 1 r R G(r, v). The 2 daughter nodes are then w r and v\w r, with G(v) G(r, v) a goodness of split measure indicating the difference in viral load trajectories of the 2 daughter nodes. We apply this procedure first to the root node of all subjects, and then recursively to the resulting daughter nodes. The procedure does not stop until every terminal node satisfies one of the criteria: the node size is smaller than a prespecified number, say 5; the node contains only homogeneous subjects with respect to the splitting criterion, that is, G(r, ) = 0 for all 1 r R or all subjects in the node have the same baseline viral sequence. This splitting process generally results in a large tree, and pruning is needed to avoid overfitting. 2.5 Determination of proper tree size To describe the pruning procedure, we introduce some notation. For any tree or any branch of a tree T, T denotes the set of all terminal nodes of T, and T o T \ T denotes all internal nodes of T. When T 1 is a subtree of T 2, we write T 1 T 2, while T 1 T 2 specifies that T 1 is a proper subtree of T 2. Also T 0 denotes the large tree obtained in the splitting procedure. The goodness of split measure for a tree or a branch of a tree T, G(T ) = v T o G(v), is the sum of goodness of split measures for all internal nodes of T. When further splits are made, G(T ) can only increase. As in Breiman and others (1984) and LeBlanc and Crowley (1993), we add a penalty term to the splitting criterion, penalizing the complexity of the tree. For α 0, define G α (T ) = G(T ) α T o. Thus, α is the penalty deducted from the splitting criterion for each additional split. Our goal is to find a subtree of T 0 that maximizes G α ( ) for any α 0. As pointed out by LeBlanc and Crowley (1993), α can be chosen between 2 and 4. Using α = 2 is in the same spirit of the Akaike information criterion, while a penalty of α = 4 is roughly equivalent to setting the significance level at 0.05 for each split. For any tree T, if T T and G α (T ) = max T T G α (T ), we call T an optimally pruned subtree of T for the penalty value α. For α 0, let T (α) denote the smallest optimally pruned subtree of T with respect to α. For any node v of T, let T v be the branch of T rooted at v. As in Breiman and others (1984), we can establish the existence of T (α) by induction on T and further prove that T (α) = {v T : G α (T v ) > 0 for any ancestor v of v}. Note that T 0 (0) = T 0. To identify T 0 (α) for all α 0, for any node v T 0, we define o G(T 0v )/ T 0v if v T o 0, g 0 (v) = + otherwise. Here, T 0v is the branch of T 0 rooted at v. Let α 1 = min v T0 g 0 (v). Then T 0 = T 0 (α) for all α < α 1 and T 1 = T 0 (α 1 ) can be obtained by removing from T 0 every branch T 0v with g 0 (v) = α 1. Now, for any v T 1, define

7 Recursive partitioning of resistant mutations 7 o G(T 1v )/ T 1v if v T o 1, g 1 (v) = + otherwise. Let α 2 = min v T1 g 1 (v). Then T 1 = T 0 (α) for α 1 α < α 2 and T 2 = T 0 (α 2 ) can be obtained by removing from T 1 every branch T 1v with g 1 (v) = α 2. This process is repeated until for some S > 0, T S is reduced to the tree containing the root node of T 0 only. We then have a sequence of trees T S T 1 T 0 and a sequence of numbers = α S+1 > α S > > α 1 > α 0 = 0 so that T s = T 0 (α) for α s α < α s+1. Like LeBlanc and Crowley (1993), we define the cost for a node v as R(v) = v T o G(v ) if v T o 0, 0v and 0 otherwise. For any T T 0, let R(T ) = v T R(v) and R α(t ) = R(T ) + α T. This implies that G α (T ) = G(T 0 ) + α R α (T ) and our proposed pruning process can be described in terms of R α ( ) instead of G α ( ), and all optimal properties of the pruning process can be proved as in Chapter 10 of Breiman and others (1984). Since G α is calculated from the same data used to grow and prune the tree, it is a biased estimate of the expected G α for the same tree applied to a new independent data set. If the sample size is large, we can randomly split the data into a training sample and a test sample. A tree will be grown and pruned on the training sample, and the goodness of split measures can be calculated using the test sample, resulting in an unbiased estimate of G α. Data sets of moderate size do not allow for separate training and test samples, so resampling methods are used to calculate a proper estimate of G α as in LeBlanc and Crowley (1993). Let X 1 and X 2 be 2 samples of data, and let G(X 1 ; X 2, T ) be the G statistic calculated from X 1 for a tree T built on X 2. Then G(T ) = G(X; X, T ) for the data set X, and what we need is an estimator for G (T ) = E{G(X ; X, T )}, where X is a sample independent of X. Let α s = α s α s+1 for 1 s S 1. We use the following resampling method to identify the proper tree size. First, draw B bootstrap samples from X and denote them by X b (b = 1,..., B). Then grow a large tree T b for each b, and for each s find T b (α s ), the optimally pruned subtree of T b for the per split penalty α s. The overoptimism in G(T b(α s )) can be estimated by O bs = G(X b ; X b, T b (α s )) G(X; X b, T b (α s )). Then ˆω s = ( B b=1 O bs )/B is a reasonable estimate for the overoptimism in G(T (α s )). For any fixed per split penalty α, we can then choose the optimal subtree as the T (α s ) that maximizes Ĝ α (T (α s )) = G(X; X, T (α s )) ˆω s α T (α s ). If Ĝ α (T (α s )) is maximized at s = S 1 and Ĝ α (T (α S 1 )) < 0, the optimal subtree contains only the root node. 3. SIMULATION STUDIES Simulation studies are carried out under various settings. In each setting, n = 200 subjects have baseline (t = 0) viral load and viral genetic sequences and are scheduled to have 5 viral load measurements at t = 1, 2,..., 5. Thus, n i = 5 for i = 1,..., n. For any scheduled t > 0, the actual measurement time follows a uniform distribution on [t 0.25, t ]. For r = 1,..., 30, let C ir be the indicator of mutation at the rth codon, which are all independent in the simulation. Let Y ik be the log 10 viral load at time t ik for the ith subject. In all scenarios, we assume Y ik = β 0 + β 1i t ik + ɛ ik, where the random error ɛ ik N(0, σ 2 ) is independent of other variables. We set σ = log 10 2, so that a change of one standard deviation in the error is equivalent to doubling or halving the viral load on the original scale. The baseline mean β 0 is set to be log 10 (50000), close to the median observation in the efavirenz studies to be analysed in the next section. The slope β 1i depends on the resistance status of each patient. All simulated viral loads are imputed at 50, reflecting the threshold of accurate quantification for commonly used assays. At any split the left daughter node is associated with better response. Data are simulated under 4 general scenarios:

8 8 C. HU AND V. DEGRUTTOLA I. Each codon has a 0.5 chance to be mutant and none is resistant. Two values of the slope β 1i are simulated; β 1i = log 10 (2) (rapid decline) and β 1i = log 10 (2)/2.5 (slow decline). Here, the correct tree is one with root node only. II. Each codon has a 0.5 chance to be mutant, and a mutant first codon confers resistance to the study drug. For resistant subjects β 1i = 0, and for other subjects sensitive to drugs, we simulate 2 separate settings of rapid decline (β 1i = log 10 (2)) and slow decline (β 1i = log 10 (2)/2.5). Here, the correct tree has a single split on codon 1. III. Each codon has a 0.5 chance to be mutant, and a subject is resistant to drugs only if the first and second codons are both mutant. Hence, about 25% of all subjects are resistant. For resistant subjects, the slope β 1i = 0, and for the sensitive subjects, we simulate 2 separate settings of rapid decline (β 1i = log 10 (2)) and slow decline (β 1i = log 10 (2)/2.5). Here, the correct tree has 2 splits on the first 2 codons in either order, and the second split should be at the right daughter node of the root. IV. The first codon has probability of 1/3 to be mutant and the other codons have 0.5 chance to mutant. A subject is resistant if C i1 = 1 with slope β 1i = 0. A subject is sensitive to drugs if C i1 = C i2 = 0, with the slope β 1i set to be negative. If C i1 = 0 and C i2 = 1, the viral load declines initially at the first 3 time points after baseline and then rebounds to baseline level. For these rebounding subjects, the slope β 1i is negative for k 3 and β 1i = 0 for k = 4, 5. Under this scenario, the cohort is divided into 3 subsets of approximately equal sizes: resistant, sensitive, and rebounding. Again we simulate 2 separate settings of rapid decline (β 1i = log 10 (2)) and slow decline (β 1i = log 10 (2)/2.5) for the sensitive subjects and for the initial decline phase of the rebounding subjects. Multiple trees are considered correct: root node is split on codon 1 and its left daughter node is split on codon 2, root node is split on codon 2 and its left daughter node is split on codon 1, and also this tree with an extra split added to the right daughter node based on codon 1. One thousand data sets are generated for each setting. In the pruning process, we use 50 bootstrap samples to calculate the overoptimism in the goodness of split measure. Two values are used for the persplit penalty (α = 2 and 4). Table 1 provides the proportions of getting various types of trees for each setting. The correct trees under all scenarios have been described above. When there should be at least one split but the resulting tree has none we refer to it as a null tree. A noisy tree contains all correct split(s) and also some extra noise. A partially correct tree (denoted by partial in the table) contains some but not all correct splits, and an incorrect tree has none of the correct split(s) but some noise. For the first scenario of no resistance mutations, the method performs well for all settings, with >95% power to obtain the correct tree. Not surprisingly, the performance is slightly better in the slow decline setting, where observations at different time points are closer to each other. For the second scenario with one resistant mutation, the method performs well in all settings, with >97% power to obtain the correct tree. For the small number of data sets where the correct tree is not obtained, all selected trees have the root node correctly split on codon 1, with some added noise. For the third scenario with interaction effect of 2 codons, the power to identify the correct tree is very high (96%) in the rapid decline setting. In the slow decline setting, viral load trajectories of resistant and sensitive subjects are not as well separated as in the rapid decline setting, and the power is moderately high (>80%). For the fourth scenario with resistant, sensitive and rebounding subjects, the power is very high (>97%) in the rapid decline setting and is moderately high (>76%) in the slow decline setting. The 2 penalty values α = 2 or 4 induced similar performance in all scenarios. From Table 1, it can be seen that the average tree sizes are close for these 2 α values. Simulation studies are also carried out for the smaller sample sizes of n = 100 and n = 30, and results are summarized in Section 1 of the web-based supplementary material available at Biostatistics online. For n = 100, the proposed method still performs well except for the slow decline settings of scenarios III and IV, where the power is low to detect the resistant mutations. This is due to the small number of

9 Recursive partitioning of resistant mutations 9 Table 1. Results of simulation studies Scenario Rate of α Proportion Proportion Proportion Proportion Proportion Mean number decline correct (%) null (%) noisy (%) partial (%) incorrect (%) of nodes I Rapid Slow II III IV Rapid Slow Rapid Slow Rapid Slow subjects with resistant mutations (e.g., 25 in scenario III). When n = 30 we see some further reduction in the power to detect resistant mutations. For scenarios II and III, we also study through simulation the power of a parametric approach using the true functional form of the viral load over time. For each node, a GEE model is fitted and a splitting criterion based on the residual sum of squares is used. The algorithm has an almost 100% power to identify the correct resistant codons when n = 200 and 100. The power deteriorated substantially only in the slow decline setting of scenario III when n = 30. As discussed in Section 1, viral trajectories can be quite complicated as they reflect a complex interaction of viral dynamics, evolutionary dynamics, intercurrent illnesses, and behavior. These factors induce a high level of heterogeneity in the data, so parametric modeling will be very difficult, and sometimes even impossible. 4. PHASE II EFAVIRENZ STUDIES Data from 3 phase II clinical trials of efavirenz (DMP , 004 and 005) were analyzed using the proposed methods. Most study participants were previously exposed to nucleoside reverse transcriptase inhibitors (NRTIs), often leading to development of various resistance mutations to NRTIs. The drug under study belongs to the drug class of nonnucleoside reverse transcriptase inhibitors (NNRTIs) and proved to be useful in construction of potent regimens. Because most study participants received efavirenz in combination with 2 NRTIs, baseline NRTI mutations could predict response to treatment. The details of these studies can be found in Bacheler and others (2000). A total of 156 subjects had viral genetic sequences and viral load available at baseline, and at least one viral load measurement after study entry. The December 2009 version of the International AIDS Society-USA drug resistance mutations list (Johnson and others, 2009) specifies 25 NRTI and NNRTI resistance sites, and 12 of them appeared in the baseline data of these 3 studies. They are all considered as potential predictors for future viral load, along with the baseline viral load dichotomized at 50,000 copies/ml, which is close to the median of 55,115 copies/ml. These predictors are summarized in Table 2.

10 10 C. HU AND V. DEGRUTTOLA Table 2. Frequencies of baseline mutations and high viral load Variable Frequency 95% confidence interval RT (0.101, 0.220) RT (0.040, 0.131) RT (0.031, 0.115) RT (0.080, 0.191) RT (0.000, 0.035) RT (0.000, 0.035) RT (0.002, 0.046) RT (0.000, 0.035) RT (0.278, 0.433) RT (0.022, 0.099) RT (0.145, 0.277) RT (0.040, 0.131) RNA 50, (0.457, 0.618) We first investigate plasma HIV-1 RNA trajectories in the first 12 weeks after study entry. The U-type score D(, ) defined in (2) serves as the basis of the first analysis; another analysis makes use of the alternative score D(, ) defined in (3). In the latter, viral suppression is defined as HIV-1 RNA level below 400 copies/ml, and rebound, as 2 consecutive values above 400 copies/ml or a single one above 4000 copies/ml after an initial suppression. As viral load testing did not always happen on the scheduled day, we allow a 1-week window before and after the scheduled dates. In this way, we include all viral loads measured in the first 13 weeks in the 12-week analysis. In calculation of both scores, viral load measurements of 2 different subjects taken within 7 days of each other were considered to have occurred at the same time. During the period of time under consideration, the median number of viral load measurements was 7, and nearly 95% of subjects had at least 4 viral load measurements. For the alternative score, the numbers of subjects with V (i) = 0 (viral suppression), 1 (initial viral suppression followed by rebound) and 2 (no suppression) were 55(35%), 37(24%), and 64(41%), respectively. To identify appropriate size of a tree, the resampling method mentioned at the end of Section 2 was used to correct for the overoptimism in G(T ), with 50 bootstrap samples. The per-split penalty α is first set to be 2, and analysis is repeated with α = 4. Figure 2 shows the tree constructed from original scores with α set to be 2; subjects with better response were included in the left daughter nodes. We see that the root node was split on baseline viral load, with low levels associated with better response. For subjects with low baseline viral load, baseline mutation at RT184 predicted worse response, as did mutation at RT210 for those with high viral load. Subjects with low viral load and mutant RT184 were split based on RT69, and such subjects with wild-type RT69 were further split based on RT219. The mean profile for trajectories in each terminal node is shown in Figure 1 of the web-based supplementary material available at Biostatistics online. The alternative scores gave a very similar tree (not shown), but did not contain the split based on RT210 for subjects with high baseline viral loads. A smaller tree is expected, since the alternative score contains less information than did the original, but the 2 types of scores gave a quite consistent picture about resistant mutations. When α was set to be 4, the resulting tree was unchanged for both scores. When viral load trajectories in the first 16 weeks are studied, setting α = 2 and using the original scores gave a similar tree (Figure 3) to that for response in the first 12 weeks. The tree has one fewer split that associated with RT69. The mean profile for trajectories in each terminal node is shown in Figure 2 of the web-based supplementary material available at Biostatistics online. The alternative scores, on the other hand, did not give any split. When α is set at 4, both types of scores gave the tree with no

11 Recursive partitioning of resistant mutations 11 Fig. 2. Tree for DMP-266 studies based on viral load (VL) of the first 12 weeks. Fig. 3. Tree for DMP-266 studies based on viral load (VL) of the first 16 weeks. split. We see that the association between baseline viral load and mutations with viral response during the 16-week period was weaker. This may reflect the impact of new resistance mutations after study entry. As suggested by a referee we also carried out a traditional variable selection procedure using GEE models. All predictors significant at 0.10 level in single-covariate models were included in a backward elimination process at the 0.05 significance level. Similar models were obtained for both the 12-week and the 16-week periods, which show that high baseline viral load along with mutations at RT75 and RT101

12 12 C. HU AND V. DEGRUTTOLA were associated with poor outcome, while mutation at RT106 was associated with favorable outcome. One cause of the difference in results from this approach and our recursive partitioning model could be the additive versus interaction nature of the methods. Also, the proposed recursive partitioning method gives a predictive model, while GEE is used to find association between predictors and the outcome. 5. DISCUSSION We have proposed a recursive partitioning method to correlate baseline variables, which could be of high dimension, with the entire observed trajectory of a longitudinally measured marker. No functional form is assumed for the marker trajectory, and no parametric assumptions are made on the marker distribution. This method is especially useful for markers like the viral load in HIV/AIDS studies where parametric modeling can be very complicated. The definition of the U-type score is very flexible and can depend on the nature of specific markers and the goals of analysis. We explored different ways of defining the score for markers expected to fall (e.g. viral load) and those expected to rise (e.g. CD4 T-cell count). We can also incorporate qualitative information like suppression and rebound. We showed through simulation that our method works very well for modest sample sizes. The pruning process is able to simultaneously identify signals in the data and exclude noise. Choice of per-split penalty α = 2 or 4 had little impact on trees an observation that also applied in the analysis of DMP-266 data. The choice of α did not affect the resulting tree for the strong signal observed during the first 12 weeks of the study. During that time period, we see that similar trees were generated from the original and alternative scores, despite the very different score definitions. During the 16-week period in which the associations appeared to be somewhat weaker, the original scores, containing more information, led to a greater number of splits, as did the choice of α = 2 compared to α = 4. In the development of the score, we assumed that longitudinal marker measurements are missing completely at random. More general missing mechanism will be explored in future research. When viral loads are missing at random, the missingness mechanism will be modeled, and an inverse probabilityweighted version of D(i, j) can be defined. In Section 3, we see that the power of the proposed method is low for some scenarios when n 100. The score comparing 2 viral load observations (Y 1, t 1 ) and (Y 2, t 2 ) from 2 different subjects is set to 0 if Y 1 < Y 2 but t 1 > t 2. This score avoids specification of the mean trajectory and parametric assumptions for the longitudinal viral load but does not utilize all information contained in such pair of observations. Making use of such information, when possible, could improve the power of the proposed method. Further research on this issue will use smoothing techniques in the calculation of scores. SUPPLEMENTARY MATERIAL Supplementary material is available at ACKNOWLEDGMENTS The authors would like to thank Bristol-Myers Squibb Company for providing the DMP 266 data, and 2 reviewers and the associate editor for helpful suggestions and comments. Conflict of Interest: None declared. National Institutes of Health (AI51164). FUNDING

13 Recursive partitioning of resistant mutations 13 REFERENCES BACHELER, L., ANTON, E., KUDISH, P., BAKER, D., BUNVILLE, J., KRAKOWSKI, K., BOLLING, L., AUJAY, M. X. W., ELLIS, D., BECKER, M. and others (2000). Human immunodeficiency virus type 1 mutations selected in patients failing efavirenz combination therapy. Antimicrobial Agents and Chemotherapy 44, BEERENWINKEL, N., SCHMIDT, B., WALTER, H., KAISER, R., LENGAUER, T., HOFFMANN, D., KORN, K. AND SELBIG, J. (2002). Diversity and complexity of HIV-1 drug resistance: a bioinformatics approach to predicting phenotype from genotype. Proceedings of the National Academy of Sciences of the United States of America 99, BREIMAN, L., FRIEDMAN, J., OLSHEN, R. AND STONE, C. (1984). Classification and Regression Trees. Boca Raton, FL: Chapman & Hall/CRC Press. FOULKES, A. S. AND DEGRUTTOLA, V. (2002). Characterizing the relationship between HIV-1 genotype and phenotype: prediction based classification. Biometrics 58, FOULKES, A. S, DEGRUTTOLA, V. AND HERTOGS, K. (2004). Combining genotype groups and recursive partitioning: an application to human immunodeficiency virus type 1 genetics data. Applied Statistician 53, JOHNSON, V. A., BRUN-VZINET, F., CLOTET, B., GÜNTHARD, H. F., KURITZKES, D. R., PILLAY, D., SCHAPIRO, J. M. AND RICHMAN, D. D. (2009). Update of the drug resistance mutations in HIV-1: December Topics in HIV Medicine 17, LEBLANC, M. AND CROWLEY, J. (1993). Survival trees by goodness of split. Journal of the American Statistical Association 88, LEE, S. K. (2006). On classification and regression trees for multiple responses and its application. Journal of Classification 23, MAY, S. AND DEGRUTTOLA, V. (2007). Nonparametric tests for dependent observations obtained at varying time points. Biometrics 63, SEGAL, M. R. (1992). Tree-structured methods for longitudinal data. Journal of the American Statistical Association 87, SERFLING, R. J. (1980). Approximation Theorems of Mathematical Statistics. New York: John Wiley & Sons. SEVIN, A. D., DE GRUTTOLA, V., NIJHUIS, M., SCHAPIRO, J. M., FOULKES, A. S., PARA, M. F. AND BOUCHER, C. A. (2000). Methods for investigation of the relationship between drug-susceptibility phenotype and human immunodeficiency virus type-1 genotype with applications to AIDS clinical trials group 333. Journal of Infectious Diseases 182, ZHANG, H. P. (1998). Classification trees for multiple binary responses. Journal of the American Statistical Association 93, ZHANG, H. P. AND SINGER, B. (1999). Recursive Partitioning in the Health Sciences. New York: Springer. [Received July 20, 2010; revised April 11, 2011; accepted for publication April 20, 2011]

A novel approach to estimation of the time to biomarker threshold: Applications to HIV

A novel approach to estimation of the time to biomarker threshold: Applications to HIV Pharmaceutical Statistics, Volume 15, Issue 6, Pages 541-549, November/December 2016 PSI Journal Club 22 March 2017