A Neuropsychological Test Battery for Use in Alzheimer Disease Clinical Trials

ORIGINAL CONTRIBUTION A Neuropsychological Test Battery for Use in Alzheimer Disease Clinical Trials John Harrison, C. Psychol, PhD; Sonia L. Minassian, DrPH; Lisa Jenkins, PhD; Ronald S. Black, MD; Martin Koller, MD, MPH; Michael Grundman, MD, MPH Objective: To report the psychometric properties of an alternative instrument to the cognitive subscale of the Alzheimer s Disease Assessment Scale, a neuropsychological test battery (NTB) for measuring drug efficacy in Alzheimer disease clinical trials. Design: The NTB was evaluated in a randomized, doubleblind, placebo-controlled trial of AN1792(QS-21) (synthetic -amyloid plus an adjuvant) (300 patients) and isotonic sodium chloride solution (72 patients). The testretest reliability of the NTB was examined, and the NTB was correlated with other cognitive (cognitive subscale of the Alzheimer s Disease Assessment Scale and Mini- Mental State Examination) and functional (Disability Assessment Scale for Dementia and Clinical Dementia Rating Sum of Boxes) measures. In addition, a factor analysis was performed on NTB components. Finally, the sensitivity of the NTB to change was assessed as a function of Mini-Mental State Examination performance. Results: The NTB had high test-retest reliability at 6 (Pearson product moment correlation [r] =0.92) and 12 (r=0.88) months. Internal consistency was high (Cronbach =0.84). The correlations between the NTB z score and scores on traditional measures of cognition and function were significantly different from 0 (P.001). A factor analysis yielded memory and executive function factors. The NTB z score declined linearly over 1 year in patients receiving placebo and, in contrast to the Alzheimer s Disease Assessment Scale cognitive subscale, demonstrated similar declines in patients with high (21-26) and low (15-20) Mini-Mental State Examination scores at baseline. Conclusions: The NTB exhibits excellent psychometric properties and seems to be a reliable and sensitive measure of cognitive change in patients with mild to moderate Alzheimer disease. The psychometric properties of the NTB suggest that it may have particular utility in evaluating drug efficacy in clinical trials in which patients with mild Alzheimer disease are included. Arch Neurol. 2007;64(9):1323-1329 Author Affiliations: CPC Pharma Services, Warminster, England (Dr Harrison); Elan Pharmaceuticals, San Diego, California (Drs Minassian, Koller, and Grundman); and Wyeth Research, Collegeville, Pennsylvania (Drs Jenkins and Black). THE FOOD AND DRUG ADMINistration and the European Medicines Authority require Alzheimer disease (AD) treatments to demonstrate efficacy through cognitive outcome measures. 1 In practice, this requirement has most often been met by including the cognitive subscale of the Alzheimer s Disease Assessment Scale (ADAS-Cog) in clinical trials. However, this instrument has been widely recognized as suboptimal, particularly with respect to the measurement of areas of cognition that are compromised in patients with AD, including attention, planning, working memory, and executive function. 2,3 To remedy this deficiency, further subtests have been added to the ADAS-Cog, including a delayed word recall task, a maze task, 4 digit cancellation, 4 and a subjective judgment included from the ADAS noncognitive subscale, which is designed to assess concentration and distractibility. However, the utility and sensitivity of these measures are unproved and the standard ADAS-Cog is still the most commonly used cognitive outcome measure in AD clinical trials. 5 This is unfortunate, because the ability of pharmacological interventions to affect areas of cognition, such as executive function, offers much promise. A further area of concern with the use of the ADAS-Cog is its lack of sensitivity for measuring cognitive change in patients with relatively high Mini-Mental State Examination (MMSE) scores. In planning clinical drug trials for patients with AD, it is assumed that patients receiving placebo will decline on the ADAS-Cog by approximately 7 points per year. However, several recently reported trials have 1323

suggested this may not be true. Wilcock et al 6 reported that the 6-month placebo group decline on the ADAS- Cog in patients scoring 18 or higher on the MMSE was only slightly more than 1 point. Other recent studies have also reported similar modest changes in ADAS-Cog scores over time. For example, Seltzer et al 7 report that the AD placebo group (mean baseline MMSE score, 24.3; SD, 1.3) in a 24-week study of donepezil hydrochloride showed less than a 1-point change. In addition, Kurz et al, 8 in a study that examined treatment effects of rivastigmine by disease stage in AD, reported that the placebo group of 288 patients scoring 22 or higher on the MMSE at baseline changed by just 1.13 ADAS-Cog points in 26 weeks. The ADAS-Cog change from baseline scores in patients with mild cognitive impairment has been even more modest. 7,9 These findings suggest that it would be prudent to identify cognitive outcome measures that are more sensitive to small changes in function in studies of patients with mild AD (ie, MMSE score 20) and mild cognitive impairment. In a previously reported, randomized, double-blind, placebo-controlled trial of AN1792(QS-21) (synthetic -amyloid plus an adjuvant), 10 a neuropsychological test battery (NTB), a novel combination of 6 well-known, validated, cognitive tests, yielding 9 measures of patient performance, was used. Scores for each of the individual NTB components from baseline to month 12 in the antibody responder (anti-an1792 IgG [total] serum titer 1: 2200 at any postinjection visit) and placebo groups were standardized into z scores and converted into an overall composite z score across all NTB components. The study found less worsening in the change from baseline NTB z scores at month 12 in the antibody responder group compared with z scores in the placebo group. 10 The mean NTB z score improvement from baseline to month 12 was 0.03 (SD, 0.37) in the antibody responder group, while the NTB decline was 0.20 (SD, 0.45) in the placebo group. There were no significant differences in the decline on other traditional AD assessments (eg, ADAS-Cog, Disability Assessment Scale for Dementia [DAD], Clinical Dementia Rating Sum of Boxes [CDR-SOB], and Alzheimer Disease Cooperative Study Clinical Global Impression of Change [ADCS-CGIC]) between the placebo and responder groups. Concentration-response analysis indicated significant relationships between geometric mean titer and the NTB z score; greater improvements from baseline in the NTB z scores were associated with higher IgG antibody titers. 10 The NTB components were selected because they indexed memory and/or executive function. Tests of visual and verbal memory were selected for inclusion. Delayed verbal recall has been repeatedly shown to be a sensitive measure of cognitive decline in patients with AD. 11 To maximize the chances of capturing delayed verbal memory effects, the Rey Auditory Verbal Learning Test was included in the NTB. However, to reduce the number of outcome variables to a manageable number, we combined total words recalled at delayed testing with total number of words correctly recognized. No correction for false-positive recognition was made in this analysis. Impaired executive function, defined as the inability to plan, organize, set, and adapt current and past knowledge to future behavior, is one of the core features of AD. In one study, 12 64% of patients had executive dysfunction and these patients performed worse on tests of cognition, dementia severity, and activities of daily living compared with patients with normal executive function. Neuropsychological tests are typically classified according to the primary area of cognition believed to be necessary for successful completion of the task. However, this does not suggest that other cognitive skills are not required, because even a word recall task will require satisfactory attentional, working memory, language, and other skills. 13 The NTB was also designed to incorporate tests of paired associative learning. Impairment on these tasks can be induced through scopolamine hydrobromide administration, 14 and successful paired associative learning task performance is known to rely on the integrity of medial temporal lobe structures. These brain areas are reported to be among the first to experience degeneration in AD. 15 This class of test has been shown to be a sensitive measure of cognitive decline in patients with AD. 16 In this study, we performed a factor analysis of the individual NTB components to determine whether a memory factor and an executive function factor could be identified. We also characterized the internal consistency and test-retest reliability of the NTB and, to address the issue of clinical relevance, correlated the NTB with cognitive and functional measures. The concurrent validity of the NTB was also assessed by correlating the NTB z scores with scores from the ADAS-Cog, MMSE, and CDR-SOB. Finally, the sensitivity to change of the NTB was assessed in relation to MMSE baseline performance. METHODS PATIENTS Eligible patients were aged 50 to 85 years and met the criteria for a diagnosis of probable AD as defined by the National Institute of Neurological and Communicative Disorders and Stroke Alzheimer s Disease and Related Disorders Association 17 ; magnetic resonance imaging of the brain was conducted to rule out other brain pathology to support the clinical diagnosis of AD. Additional inclusion and exclusion criteria and other details of patient disposition are described elsewhere. 10 STUDY DESIGN AND TREATMENT The NTB and other cognitive and functional measures were evaluated using data from a randomized, placebo-controlled, double-blind trial conducted at 28 centers in the United States and Europe between September 24, 2001, and December 18, 2002. A total of 372 patients with mild to moderate AD were randomized in a double-blind manner to receive treatment with AN1792, 225 µg (Elan Pharmaceuticals), and QS-21, 50 µg (Antigenics, Framingham, Massachusetts), containing 0.4% polysorbate 80 or isotonic sodium chloride solution in a 4:1 ratio. However, dosing was discontinued after reports of encephalitis in 6.0% of AN1792(QS-21)-treated patients. SELECTION OF THE NTB COMPONENT TESTS The NTB is composed of paper-and-pencil assessments and takes on average approximately 40 minutes to complete. The 1324

NTB consists of 9 validated components: Wechsler Memory Scale visual immediate (score range, 0-18), Wechsler Memory Scale verbal immediate (score range, 0-24), Rey Auditory Verbal Learning Test (RAVLT) immediate (score range, 0-105), Wechsler Memory Digit Span (score range, 0-24), Controlled Word Association Test (COWAT), Category Fluency Test (CFT), Wechsler Memory Scale visual delayed (score range, 0-6), Wechsler Memory Scale verbal delayed (score range, 0-8), and RAVLT delayed (score range, 0-30). 18 The RAVLT delayed measure is composed of delayed recall and recognition performance components that are summed to yield a score ranging from 0 to 30. Three of the NTB tasks (visual paired associates, verbal paired associates, and digit span) were drawn from the Wechsler Memory Scale Revised. 18 In addition to the digit span task, we also selected 2 tests of verbal fluency, the COWAT and the CFT. The former task examines the patient s phonological fluency skills and the latter examines the patient s semantic fluency (in this case, the ability to generate exemplars from the category animals ). Key psychometric properties, such as the relative lack of ceiling and floor effects, sensitivity to cognitive decline in patients with mild AD, and robust test-retest reliability, have been previously established. 13,19 Test-retest reliability for component tasks has been shown to range from 0.68 to 0.91. 13 Reference to standard neuropsychological texts, such as that of Lezak, 13 suggested that all 9 tasks were likely to be familiar to investigator site psychometricians. The cognitive scales were administered by 2 blinded raters: the first administered the ADAS-Cog, DAD, and NTB, and the second administered the ADCS-CGIC and the CDR-SOB. STATISTICAL ANALYSIS z Scores were calculated for each of the 9 NTB components using the sample baseline mean and standard deviation for all patients with baseline scores. The resultant z scores were averaged to obtain the NTB z score. An executive function z score was obtained by averaging the z scores from NTB components measuring executive function (CFT, COWAT, and Wechsler Memory Digit Span). The remaining 6 components, which measure memory, were averaged to obtain a memory z score. The NTB z scores were calculated for patients who had no missing data for at least 6 of the 9 component tests. Using only the data from the placebo group, the test-retest reliability of the NTB was characterized by correlating the baseline NTB z score with the NTB scores obtained at months 6 and 12. The study was not designed to formally examine the testretest reliability of the NTB and, therefore, although placebo group NTB z scores declined on average throughout the study, the 6- and 12-month z score correlations with the baseline z score provide the best available measure of test-retest reliability. Baseline scores on the components of the NTB were used to calculate the Cronbach, a measure of the internal consistency of the NTB. To characterize the concurrent validity of the NTB, baseline, month 6, and month 12 NTB z scores for all study participants were correlated with baseline, month 6, and month 12 scores on the MMSE, ADAS-Cog, CDR-SOB, and DAD. Furthermore, the NTB 6- and 12-month z score changes from baseline were correlated with available 6- and 12-month changes from baseline for the MMSE, ADAS-Cog, CDR-SOB, ADCS- CGIC, and DAD. All correlations were calculated using a Pearson product moment correlation coefficient. A factor analysis was conducted with the 9 components of the NTB at baseline to determine whether a memory factor and an executive function factor could be identified. The Pearson product moment correlations between the baseline scores for each of the 9 NTB components were calculated and used in the Month 6 NTB z Score 4 3 2 1 0 1 2 1 0 1 2 3 4 Baseline NTB z Score Figure 1. The neuropsychological test battery (NTB) test-retest reliability at month 6. The Pearson product moment correlation coefficient (r ) was 0.92 (P.001) between baseline and month 6. factor analysis. The factor analysis was conducted using the principal components method of factor extraction with a varimax rotation. Finally, the sensitivity to change of the NTB as a function of MMSE performance was determined by comparing the average 12-month NTB and ADAS-Cog changes from baseline for placebo group patients with high and low MMSE scores. A z score was also calculated for the ADAS-Cog using the sample baseline mean and standard deviation for all patients with baseline scores to allow a comparison between the average 12- month NTB and ADAS-Cog z score changes from baseline for placebo group patients with high and low MMSE scores. RESULTS A total of 372 patients were randomly assigned to receive study treatment, 300 received AN1792(QS-21) and 72 received placebo. Included patients were approximately 72 years old, with a slightly greater preponderance of women (55.1%), and had a mean MMSE score of 20. Of the patients, 83.3% were at steady-state use of acetylcholinesterase inhibitors at study enrollment, and inclusion criteria required that acetylcholinesterase inhibitor dose be unchanged for the duration of the study. No patient was withdrawn from the study because of changes in the doses of acetylcholinesterase inhibitor medication. The NTB z scores for patients receiving placebo at baseline were correlated with their z scores at 6 and 12 months to assess test-retest reliability. The NTB had high testretest reliability at 6 (r=0.92; 95% confidence interval, 0.87-0.95) (Figure 1) and 12 (r=0.88; 95% confidence interval, 0.79-0.93) months. The NTB also had high levels of internal consistency, with a Cronbach of 0.84. There was a correlation between the NTB z score and scores on traditional measures of cognition (MMSE and ADAS-Cog) and function (DAD and CDR-SOB) (P.001) at baseline, month 6, and month 12 for patients receiving placebo and for all patients included in the study. Given the negligible differences between correlations calculated for all patients and for patients receiving placebo, only the correlations for all patients are presented in Table 1. In particular, correlations with the ADAS- 1325

Table 1. Pearson Product Moment Correlation Coefficients Between the NTB and Other Measures at Baseline and at 6 and 12 Months a Measure Baseline Data 6-mo Data 12-mo Data ADAS-Cog 0.73 ( 0.77 to 0.68) [n = 367] 0.78 ( 0.82 to 0.73) [n = 309] 0.77 ( 0.82 to 0.71) [n = 233] DAD 0.37 (0.28 to 0.46) [n = 367] 0.48 (0.39 to 0.56) [n = 309] 0.54 (0.44 to 0.63) [n = 230] CDR-SOB 0.49 ( 0.56 to 0.41) [n = 367] 0.56 ( 0.63 to 0.48) [n = 305] 0.62 ( 0.69 to 0.53) [n = 229] MMSE 0.69 (0.63 to 0.74) [n = 367] NA 0.77 (0.71 to 0.82) [n = 217] ADCS-CGIC NA 0.31 ( 0.41 to 0.20) [n = 292] 0.41 ( 0.51 to 0.30) [n = 230] Abbreviations: ADAS-Cog, Alzheimer s Disease Assessment Scale cognitive subscale; ADCS-CGIC, Alzheimer Disease Cooperative Study Clinical Global Impression of Change; CDR-SOB, Clinical Dementia Rating Sum of Boxes; DAD, Disability Assessment Scale for Dementia; MMSE, Mini-Mental State Examination; NA, data not available; NTB, neuropsychological test battery. a Data are given as mean (95% confidence interval) correlation with the NTB. The number of participants is given in brackets. All coefficients are statistically significantly different from 0 (P.001). Table 2. Pearson Product Moment Correlation Coefficients for the Change From Baseline Between the NTB and Other Measures at 6 and 12 Months a Measure 6-mo Change From Baseline 12-mo Change From Baseline ADAS-Cog 0.34 (0.24 to 0.44) [n = 309] 0.39 (0.28 to 0.49) [n = 233] DAD 0.26 (0.15 to 0.36) [n = 309] 0.31 (0.19 to 0.42) [n = 230] CDR-SOB 0.23 (0.12 to 0.33) [n = 305] 0.42 (0.31 to 0.52) [n = 229] MMSE NA 0.39 (0.27 to 0.50) [n = 217] ADCS-CGIC 0.25 ( 0.35 to 0.14) [n = 292] 0.41 ( 0.50 to 0.29) [n = 230] Abbreviations: See Table 1. a Data are given as mean (95% confidence interval) correlation with change from the baseline NTB. The number of participants is given in brackets. All coefficients are statistically different from 0 (P.001). Table 3. Factor Analysis Results (z Scores) Factor Measure According to Test Type 1 2 Memory WMVer-D 0.75 0.28 WMVer-I 0.70 0.41 RAVLT-D 0.68 0.11 WMVis-D 0.69 0.15 WMVis-I 0.63 0.28 RAVLT-I a 0.49 0.66 Executive function CFT 0.33 0.63 COWAT 0.15 0.84 WMDS 0.02 0.84 Abbreviations: CFT, Category Fluency Test; COWAT, Controlled Word Association Test; RAVLT-D, Rey Auditory Verbal Learning Test delayed; RAVLT-I, Rey Auditory Verbal Learning Test immediate; WMDS, Wechsler Memory Digit Span; WMVer-D, Wechsler Memory Scale verbal delayed; WMVer-I, Wechsler Memory Scale verbal immediate; WMVis-D, Wechsler Memory Scale visual delayed; WMVis-I, Wechsler Memory Scale visual immediate. a Although RAVLT-I is considered a measure of memory and is, therefore, included in the memory z score, in the factor analysis, RAVLT-I loaded more heavily on factor 2 than factor 1. Cog are strong at all 3 time points. The similarity of the correlations on each measure at baseline, month 6, and month 12 supports the assertion that the NTB has strong test-retest reliability. The correlations between the changes from baseline on the NTB and the changes from baseline for cognitive and functional measures for all patients were also different from 0 (P.001) (Table 2). These correlations were smaller than correlations between scores obtained for each measure at baseline and 6 and 12 months (Table 1). The factor analysis yielded 2 factors with eigenvalues greater than 1.0, indicating the presence of 2 factors. The factor loadings for both factors are given in Table 3 and are shown graphically in Figure 2. The factors for the Wechsler Memory Digit Span, COWAT, CFT, and RAVLT immediate tests load heavily on factor 2 (factor loadings, 0.6). The rest of the NTB tests load primarily on factor 1. Of the 4 tests that primarily load on factor 2, 3 measure executive function, and all of the tests that primarily load on factor 1 measure memory. Thus, factor 1 can be viewed as a memory factor and factor 2 can be viewed as an executive function factor. Finally, the average 12-month NTB and ADAS-Cog changes from baseline were examined for patients with mild and moderate AD receiving placebo. Patients with mild AD had MMSE scores of 21 to 26 at baseline, and patients with moderate AD had MMSE scores of 15 to 20 at baseline (Figure 3). The mean (SD) NTB change from baseline for patients with a high MMSE score ( 0.21 [0.47]) was similar to that for patients with a low MMSE score ( 0.18 [0.42]). In contrast, the mean ADAS-Cog change from baseline to month 12 in the placebo group was different for patients with high and low MMSE scores. Patients with high baseline MMSE scores declined by a mean (SD) of 0.63 (4.32) ADAS-Cog points, and patients with low baseline MMSE scores and more severe AD declined by a mean (SD) of 4.92 (7.61) points on the ADAS-Cog. As expected, the mean ADAS-Cog z score 1326

1.0 0.8 Memory Executive function A 0.00 0.05 MMSE score 21-26 MMSE score 15-20 Factor 1 0.6 0.4 0.2 NTB Mean Change 0.10 0.15 0.20 0 0.2 0.2 0.4 0.6 0.8 1.0 0.2 0.25 Factor 2 B 0 Figure 2. Factor analysis of 9 individual neuropsychological test battery components that measure memory or executive function. 1 change from baseline to month 12 was also different for patients with high and low MMSE scores. Patients with high baseline MMSE scores declined by a mean (SD) of 0.07 (0.46) ADAS-Cog z score points, and patients with low baseline MMSE scores with more severe manifestations of AD declined by a mean (SD) of 0.52 (0.80) ADAS- Cog z score points. ADAS-Cog Mean Change 2 3 4 COMMENT 5 Baseline Month 12 Figure 3. Decline in the neuropsychological test battery (NTB) z score (A) and the cognitive subscale of the Alzheimer s Disease Assessment Scale (ADAS-Cog) score (B) by the Mini-Mental State Examination (MMSE) score at baseline. Knowing the limitations of the ADAS-Cog, the primary aim of including the NTB in the AN1792(QS-21) phase 2 study 10 was to establish a sensitive easy-to-administer test battery to assess cognitive change in patients with mild to moderate AD. The NTB z score composite assessing cognitive change in patients in the antibody responder group vs the placebo group yielded statistically significant evidence of reduced impairment in the treated group. This result was in contrast to the analyses conducted with the ADAS-Cog and other functional and cognitive measures, which did not find evidence of reduced impairment in the treatment group. Furthermore, a high correlation was found between performance on the NTB and traditional measures at baseline, month 6, and month 12, suggesting that the NTB exhibits high concurrent validity vs measures such as the ADAS-Cog and MMSE. In addition, we also observed statistically significant correlations with functional measures, such as the CDR- SOB and DAD. These correlations suggest that these scales predominately measure the same constructs. The correlations between the NTB change from baseline at 6 and 12 months and the other functional and cognitive measures were all different from 0. Good test-retest reliability was a key requirement of the position paper published by the International Working Group on Harmonization of Dementia Drug Guidelines. 2 Further requirements included validity (the instrument must measure the intended disease-relevant cognitive functions), reliability (test-retest, also interrater and intrarater if scoring is subjective), appropriate sensitivity range (the absence of ceiling and floor effects), longitudinal data (information should be available on expected change during the trial), practice effects (determines the need for practice sessions before starting a trial), and equivalent forms (for repeated testing during the trial). Of these requirements, the most difficult to satisfy when selecting from the available cognitive tests are the requirements for practice effects and equivalent forms. Standard texts, such as that of Lezak, 13 reveal that evidence regarding practice effects and the availability of genuinely equivalent cognitive tests is difficult to find. While the guidelines of Ferris et al 2 represent excellent guidance, it could be argued that cognitive tests with high levels of test-retest reliability are sufficiently psychometrically robust to preclude the necessity for parallel forms. The NTB was initially designed to emphasize assessing cognition in the domains of memory and executive function. When all 9 NTB components were subjected to factor analysis, the model yielded evidence to support 2 clear factors. One factor is best characterized as memory, and the other factor represents executive function. However, while the NTB was designed to emphasize the memory and executive function domains, it also provides an index of global cognitive function by drawing on the many cortical areas required to support language (COWAT and CFT), attention (digit span forward), visual perception (Wechsler Memory Scale visual), verbal memory (Wechsler Memory Scale verbal and RAVLT), working memory (digit span backward), and list learning (RAVLT). The outcome measures that contributed to the executive function factor were all derived from tests generally 1327

acknowledged 20 to index this cognitive domain. Further interpretation of the executive function factor is challenging largely because of the lack of a clear definition of the term. There are a number of definitions, and while some family resemblance between them is evident, some are couched in neuroanatomical terms, typically with reference to the prefrontal cortex, 21 while others are more functional. An alternative view is that all cognitive tasks require some contribution from executive function; what varies from task to task is the extent to which this resource is taxed. Tasks with demanding working memory components seem to require significant contributions from executive function. Consequently, those tasks that require planning, the development of strategy, or the online monitoring of task components are all likely to make robust demands on executive function resources. Thus, given the need to search semantic memory for appropriate items in the NTB 2-word fluency tasks (CFT and COWAT), and the need to ensure that words are not repeated, it is not surprising that performance on these tests loads on the putative executive function factor. The need to maintain and order items in working memory would also explain why digit span loads on the executive function factor. The same need to utilize working memory when attempting to hold in memory the 15 items of the RAVLT could also account for why this metric seems to load on both factors. We suspect that executive function could be profitably indexed using other putative tests of this resource. Further studies might usefully use measures that index executive function through assessing setshifting abilities and abstraction. The measures of cognition that compose the NTB were selected to demonstrate sensitivity to cognitive deterioration and to exhibit robust psychometric properties. The NTB performed well, exhibiting a test-retest reliability correlation coefficient of 0.92 at 6 months and 0.88 at 12 months. This is among the higher levels of reliability for cognitive measures and suggests that the NTB is a highly reliable index of cognitive function and that it is well suited for repeated assessment. An analysis of the NTB s internal consistency also yielded satisfactory levels of performance. The observed Cronbach of 0.84 represents high levels of consistency, exceeding the desirable threshold of 0.80 proposed by Kline. 22 The sensitivity of the NTB z score was compared with traditional measures, including the ADAS-Cog. The NTB seems to be sensitive to cognitive change independently of MMSE status when stratified by baseline score. The NTB showed similar decline in patients with mild (high baseline MMSE score [21-26]) and moderate (low baseline MMSE score [15-20]) AD. In contrast, performance on the ADAS-Cog shows systematic change depending on disease severity at baseline. The ADAS-Cog showed cognitive decline for patients with moderate AD but did not show decline in patients with mild AD. This is consistent with previous reports 7,8 of comparatively small annual decline on the ADAS-Cog in those with mild AD. The relative lack of decline in ADAS-Cog scores in patients with mild AD becomes problematic in the design of clinical trials for disease-modifying AD therapies. For trials that are powered to detect a reduction in the rate of decline over 1 year in the drug-treated patients compared with placebo, the number of subjects with mild AD required to detect a reduction in ADAS-Cog would be excessive. For example, sample size estimates were calculated to determine the number of patients with mild AD (MMSE score, 21-26) necessary to detect a drugplacebo difference of 50% of the annual placebo decline using the 12-month NTB and ADAS-Cog changes for patients with mild AD based upon the results of the AN1792 (QS-21) study. To detect a change from baseline in the treatment group that is 50% less than that in the placebo group, with 80% power, a significance level of.05, and assuming equal sample sizes in the treatment and placebo groups, 1478 patients per group would be needed using the ADAS-Cog (mean placebo group decline, 0.63; SD, 4.32). In contrast, only 159 patients per group would be necessary to detect a change from baseline in the treatment group that is 50% less than that in the placebo group for the NTB (mean placebo group decline, 0.21; SD, 0.47). While the difference in sample sizes required for 80% power using the ADAS-Cog and NTB is accentuated by the particularly low rate of decline on the ADAS-Cog in the placebo group of the AN1792(QS-21) trial, the insensitivity of the ADAS-Cog is also apparent from other recent studies 7,8 that found relatively small declines on this measure in patients with mild AD. This study has several limitations. To our knowledge, this is the first study to formally evaluate the NTB as a composite measure for AD clinical trials. Performance on the NTB from other AD patient samples would provide additional support for the NTB as an alternative cognitive tool in AD clinical trials. The NTB could also be useful in evaluating patients with mild cognitive impairment, so future studies of the NTB might consider including these patients. Another limitation of this trial was that it was not designed to assess the test-retest reliability of the NTB. While the 6- and 12-month NTB scores allowed us to estimate test-retest reliability, examination of test-retest reliability over shorter periods would also be of interest to evaluate in future studies. In summary, the NTB has exhibited excellent psychometric properties, especially in the areas of concurrent validity and test-retest reliability. In contrast to the ADAS- Cog, the NTB z score showed sensitivity to change across the range of MMSE scores in this trial and was the only measure of cognitive efficacy to yield a statistically significant effect in favor of the AN1792(QS-21) immunotherapeutic treatment. The NTB may represent a useful cognitive measure for clinical trials to assess cognitive change in patients with mild to moderate AD or potentially mild cognitive impairment and may be more effective than traditional measures in detecting cognitive changes in the clinical assessment of a therapy designed to reverse or halt disease pathological features. Accepted for Publication: January 24, 2007. Correspondence: John Harrison, C. Psychol, PhD; CPC Pharma Services, PO Box 3223, Warminster, Wiltshire BA12 8XA, England (johncpc@btinternet.com). Author Contributions: Dr Harrison had full access to all of the data and takes responsibility for the integrity of 1328

the data and the accuracy of the data analysis. Study concept and design: Harrison, Minassian, Jenkins, Black, Koller, and Grundman. Acquisition of data: Black. Analysis and interpretation of data: Harrison, Minassian, Jenkins, Black, and Grundman. Drafting of the manuscript: Harrison, Minassian, Jenkins, and Grundman. Critical revision of the manuscript for important intellectual content: Harrison, Minassian, Jenkins, Black, Koller, and Grundman. Statistical analysis: Harrison, Minassian, Jenkins, and Grundman. Obtained funding: Grundman. Administrative, technical, and material support: Harrison, Black, and Koller. Study supervision: Black, Koller, and Grundman. Financial Disclosure: Dr Minassian is a salaried employee of and has been awarded stock options from Elan Pharmaceuticals; Dr Black is a full-time employee of and holds stock in Wyeth Research; and Drs Koller and Grundman are salaried employees of and have stock options and ownership in Elan Pharmaceuticals. Funding/Support: The AN1792 clinical study was supported by Elan Pharmaceuticals and Wyeth Research; however, no support was provided for the analyses described herein. Role of the Sponsor: Researchers from Elan Pharmaceuticals and Wyeth Research participated in the design, conduct, and analysis of the research described herein. Disclaimer: The intent of the present study is not to provide the clinical results of the AN1792 trial (previously published) but to describe the properties of instruments used in that trial. REFERENCES 1. Wesnes K, Harrison JE. The evaluation of cognitive function in the dementias: methodological and regulatory considerations. Dialogues Clin Neurosci. 2003; 5(1):77-88. 2. Ferris SH, Lucca U, Mohs R, et al. Objective psychometric tests in clinical trials of dementia drugs: position paper from the International Working Group on Harmonization of Dementia Drug Guidelines. Alzheimer Dis Assoc Disord. 1997; 11(suppl 3):34-38. 3. Veroff AE, Bodick NC, Offen WW, Sramek JJ, Cutler NR. Efficacy of xanomeline in Alzheimer disease: cognitive improvement measured using the Computerized Neuropsychological Test Battery (CNTB). Alzheimer Dis Assoc Disord. 1998; 12(4):304-312. 4. Mohs RC, Knopman D, Petersen RC, et al. Development of cognitive instruments for use in clinical trials of antidementia drugs: additions to the Alzheimer s Disease Assessment Scale that broaden its scope: the Alzheimer s Disease Cooperative Study. Alzheimer Dis Assoc Disord. 1997;11(suppl 2):S13-S21. 5. Mohs R. ADAS-Cog: what, why and how? Alzheimer s Insights. 1997;3(1):1-2. 6. Wilcock GK, Lilienfeld S, Gaens E; Galantamine International-1 Study Group. Efficacy and safety of galantamine in patients with mild to moderate Alzheimer s disease: multicentre randomised controlled trial. BMJ. 2000;321(7274):1445-1449. 7. Seltzer B, Zolnouni P, Nunez M, et al; Donepezil 402 Study Group. Efficacy of donepezil in early-stage Alzheimer disease: a randomized placebo-controlled trial. Arch Neurol. 2004;61(12):1852-1856. 8. Kurz A, Farlow M, Quarg P, Spiegel R. Disease stage in Alzheimer disease and treatment effects of rivastigmine. Alzheimer Dis Assoc Disord. 2004;18(3): 123-128. 9. Petersen RC, Thomas RG, Grundman M, et al; Alzheimer s Disease Cooperative Study Group. Vitamin E and donepezil for the treatment of mild cognitive impairment. N Engl J Med. 2005;352(23):2379-2388. 10. Gilman S, Koller M, Black RS, et al; AN1792(QS-21)-201 Study Team. Clinical effects of A immunization (AN1792) in patients with AD in an interrupted trial. Neurology. 2005;64(9):1553-1562. 11. Welsh K, Butters N, Hughes J, Mohs R, Heyman A. Detection of abnormal memory decline in mild cases of Alzheimer s disease using CERAD neuropsychological measures. Arch Neurol. 1991;48(3):278-281. 12. Swanberg MM, Tractenberg RE, Mohs R, Thal LJ, Cummings JL. Executive dysfunction in Alzheimer disease. Arch Neurol. 2004;61(4):556-560. 13. Lezak MD. Neuropsychological Assessment. 3rd ed. New York, NY: Oxford University Press Inc; 1995. 14. Robbins TW, Semple J, Kumar R, et al. Effects of scopolamine on delayedmatching-to-sample and paired associates tests of visual memory and learning in human subjects: comparison with diazepam and implications for dementia. Psychopharmacology (Berl). 1997;134(1):95-106. 15. Braak H, Griffing K, Braak E. Neuroanatomy of Alzheimer s disease. Alzheimer Res. 1997;3:235-247. 16. Fowler KS, Saling MM, Conway EL, Semple JM, Louis WJ. Computerized neuropsychological tests in the early detection of dementia: prospective findings. J Int Neuropsychol Soc. 1997;3(2):139-146. 17. McKhann G, Drachman D, Folstein M, Katzman R, Price D, Stadlan EM. Clinical diagnosis of Alzheimer s disease: report of the NINCDS-ADRDA Work Group under the auspices of Department of Health and Human Services Task Force on Alzheimer s Disease. Neurology. 1984;34(7):939-944. 18. Wechsler D. Wechsler Memory Scale Revised Manual. San Antonio, TX: Psychological Corp; 1987. 19. Mitrushina M, Boone KB, Razani J, Delia LF. Handbook of Normative Data for Neuropsychological Assessment. New York, NY: Oxford University Press Inc; 1999. 20. Graham NL, Emery T, Hodges JR. Distinctive cognitive profiles in Alzheimer s disease and subcortical vascular dementia. J Neurol Neurosurg Psychiatry. 2004; 75(1):61-71. 21. Cummings J, Mega M. Neuropsychiatry and Behavioural Neuroscience. 2nd ed. New York, NY: Oxford University Press Inc; 2003. 22. Kline P. A Psychometrics Primer. London, England: Free Association Books; 2000. Announcement Online Submission and Peer Review System Available. The Archives of Neurology editorial office has introduced an online manuscript submission and peer review system developed by ejournalpress that will serve the needs of authors, reviewers, and editors. The new system went live on November 14, 2005. See http: //archneur.ama-assn.org for more detailed information. 1329