Rheumatology 1999;38:870 877 Generic and condition-specific outcome measures for people with osteoarthritis of the knee J. E. Brazier, R. Harper, J. Munro, S. J. Walters and M. L. Snaith1 School for Health and Related Research and 1Institute for Bone and Joint Medicine, Medical School, University of Sheffield, Sheffield, UK Abstract Objectives. The aims of this study were to evaluate two condition-specific and two generic health status questionnaires for measuring health-related quality of life in patients with osteoarthritis (OA) of the knee, and to offer guidance to clinicians and researchers in choosing between them. Methods. Patients were recruited from two settings: 118 from knee surgery waiting lists and 112 from rheumatology clinics. Four self-completion questionnaires [ Western Ontario and McMaster University Osteoarthritis Index ( WOMAC ), Health Assessment Questionnaire (HAQ), Short Form-36 (SF-36) and Euroqol ] were sent to subjects on two occasions 6 months apart. Construct validity, convergent validity, internal consistency and responsiveness were examined using primarily non-parametric methods. Results. All instruments proved satisfactory in terms of ease of use, acceptability to patients, internal consistency and reliability. In the surgical group, the OA-specific WOMAC performed better than the HAQ and the generic measures in terms of validity and responsiveness to change, whereas in the rheumatology group the SF-36 was more responsive. Conclusion. WOMAC is the instrument of choice for evaluating the outcome of knee replacement surgery in OA. The SF-36 provides a more general insight into patients health and may be more responsive to change than the WOMAC in a heterogeneous rheumatology clinic population. Researchers wishing to undertake an economic evaluation might consider the EQ-5D for a surgical, but not a rheumatology clinic group. KEY WORDS: Osteoarthritis, Knee, Health-related quality of life, Outcomes. been with rheumatoid arthritis [4]. A more recent development has been a measure specific to OA: the Western Ontario and McMaster University Osteoarthritis Index (WOMAC ) [5]. This has shown considerable promise, but there has been little published comparative evidence on its validity by an independent group of researchers. There has also been increasing use of generic (i.e. not disease-specific) measures of health-related quality of life. These have the potential advantage of being more able to measure side-effects or complications of treat- ment, which may be unrelated to the condition itself. Many people with OA will also have co-morbidities and so to obtain a more holistic view of HRQoL in this patient group there is a case for using a generic measure. A generic measure which has been widely used for other conditions is the Short Form-36 (SF-36), which gener- ates a profile of eight dimensions and for which there is some evidence for validity in OA patients [3, 6 ]. However, it is claimed that generic measures are less responsive to health changes than condition-specific measures. This claim needs to be tested. In a resource-constrained environment, it is important to be able to examine the relative cost effectiveness of interventions, and for this a single-index measure of HRQoL is required. The Euroqol instrument (EQ) is a Osteoarthritis (OA) is the single most important cause of disability and limitation of activity of elderly people in the UK [1]. As a method of treatment, joint replacement is now commonplace for hips and increasingly so for knees, while the pharmacological management of people with OA continues to be important. In this context, it is necessary to identify valid and acceptable outcome measures so that progress in treating OA can be evaluated. Such measures should benefit not only clinicians managing OA and purchasers of health care for this condition, but also, ultimately, patients through improved forms of treatment. It is increasingly recognized that a key outcome measure for any health care intervention for OA, as for many other conditions, is change in health-related quality of life (HRQoL) [2, 3]. In the field of OA of the knee, clinicians and researchers are faced with a choice of measures. One established measure for arthritis is the UK version of the Health Assessment Questionnaire ( HAQ), but most experience with this instrument has Submitted 12 March 1998; revised version accepted 1 April 1999. Correspondence to: J. Brazier, School for Health and Related Research, University of Sheffield, Regent Court, 30 Regent Street, Sheffield S1 4DA, UK. 870 1999 British Society for Rheumatology
Outcome measures in OA of the knee 871 recently developed brief and easy to use single-index measure [7], which has been used successfully on patients with rheumatoid arthritis, but its validity has yet to be examined for OA patients. Concern has been raised about its crudeness and whether it is sufficiently sensitive to many changes in health [8]. The objective of the study reported here was to assess these four instruments for measuring HRQoL in patients with OA of the knee, in terms of their ability to discriminate between patient groups (i.e. their discriminative properties) and their sensitivity to change in both patients undergoing surgery and in patients being treated medically (i.e. their evaluative properties). The aim is to help clinicians and researchers in choosing instruments when measuring the outcomes of surgery or pharmacological interventions. Methods provide a score range of 0 4 [9]. The HAQ was developed for arthritic conditions in general and the final version, modified for British patients [4, 10], contains 20 items covering eight categories of disability which combine to derive a single disability index ranging from 0 to 3 (Table 1). For both these instruments, a low score indicates good health. The SF-36, revised for use in a British population [11], contains 36 items and generates a profile of eight dimension scores ranging from 0 to 100, where high scores indicate good health [12]. The EQ is a brief two-page questionnaire, the first page containing five items describing health status across five dimensions (mobility, self-care, usual activity, pain/ distress and depression/anxiety) (the EQ-5D), and the second displaying a visual analogue rating scale on which the respondent marks an assessment of their overall health [7]. The responses to the five items of the EQ-5D can be scored using a utility-weighted algorithm [13], which has been recommended for use in economic evaluation. The EQ therefore provides two single-index measures of health, the Rating scale and the EQ-5D index, ranging from 0 to 100. The recommended methods of substitution for missing responses for the WOMAC and SF-36 were carried out, but not for the HAQ and EQ, for which there are no methods of substitution. An initial assessment using these four instruments was undertaken at recruitment, with a follow-up assess- ment ~6 months later. At follow-up, item 2 of the Recruitment of patients Patients were recruited from two distinct clinical settings in five UK hospitals. Over an 8 month period, all new patients attending rheumatology clinics, and those assessed pre-operatively for total knee replacement ( TKR), were eligible for the study. From these groups, recruitment was restricted to patients with a diagnosis of OA of the knee made by a hospital rheumatology or orthopaedic specialist. No further inclusion or exclusion criteria were applied, so that the recruited patients were likely to be representative of those seen in everyday hospital clinical practice in the UK. TABLE 1. Dimension scores for knee replacement patients at initial Patients were invited to participate in the study by assessment letter explaining the study and signed by their consulting Mean S.D. % floor % ceiling physician or surgeon. Their names and addresses were obtained from the physicians secretaries or from the WOMAC (n = 109 118) pre-operative assessment clinic. A questionnaire booklet Pain 2.2 0.7 2.5 0.0 and pre-paid reply envelope were enclosed together with Stiffness 2.4 0.8 6.8 0.9 the letter, patient information sheet and consent form. Physical function 2.3 0.7 1.8 0.0 HAQ (n = 109 118) For rheumatology clinic patients, a global assessment Dressing and grooming 1.8 0.7 6.2 11.5 of the severity of their condition was supplied by Rising 2.0 0.6 14.9 3.5 their consulting physician, based upon their clinical Eating 1.8 0.7 6.1 12.3 impression. Walking 1.9 0.5 7.0 2.6 Hygiene 1.7 0.9 13.7 16.2 Instruments Reach 1.9 0.8 19.8 8.6 Grip 1.6 0.9 4.3 22.2 The questionnaire booklet contained the four self- Activities 2.1 0.8 28.8 4.2 completed HRQoL questionnaires ( WOMAC, HAQ, Disability index 1.9 0.5 0.0 0.0 SF-36 and EQ), a short section on socio-demographic SF-36 (n = 103 113) information and recent use of health services, and a Physical functioning 21.0 18.8 16.5 0.0 Social functioning 51.2 28.5 7.3 9.1 question asking about any other major health problems Role limitations (physical ) 11.8 25.5 76.9 2.8 apart from their knee trouble (i.e. co-morbidities). Role limitations (emotional ) 42.0 44.3 46.3 31.5 Medical data were obtained from medical records of Pain 35.2 23.1 9.7 3.5 rheumatology clinic patients when these were not avail- Mental health 67.6 18.4 0.9 2.8 Vitality 40.9 19.6 2.8 0.0 able from physicians secretaries. General health perception 56.4 23.9 1.9 3.8 The WOMAC is a 24-item questionnaire, taking EQ (n = 107 114) around 5 min to complete, and originally designed for EQ-5D Index 44.7 17.6 0.0 0.0 use in clinical trials in patients with OA of the knee or hip. The Likert-scaled version was used in this study. Rating scale 61.5 21.9 0.0 0.9 Scores are generated for the three dimensions of Pain, Stiffness and Physical Function by summing the coded responses and then dividing by the number of items to Scores for the SF-36 and EQ range from 0 to 100, with high scores indicating good health. The range for WOMAC is 0 4. For the HAQ, dimensions range from 0 to 3. For both these instruments, low scores indicate good health.
872 J. E. Brazier et al. SF-36 questionnaire was adapted as follows to measure was no external indicator of change, and hence responsiveness patient-perceived health change: Compared to the last was assessed by comparing mean changes in time you completed the questionnaire, how would you scores across three distinct groups of patients: those rate your health in general now?. The responses available who rated their health as having improved, worsened were: much better, somewhat better, about the or stayed the same between the first and second surveys same, somewhat worse, much worse. (i.e. their response to the self-perceived transition ques- Analysis tion of the SF-36 questionnaire). (Item 2 is not used in the scoring of the SF-36.) This global change item has The primary purpose of the analysis was to assess the been used to assess responsiveness for a number of discriminative and evaluative properties of the four conditions, including rheumatoid arthritis [17, 18]. measures of HRQoL used, i.e. their ability to discriminate The statistical significance of the changes in scores between patient groups and their sensitivity to between different groups was assessed using the change, respectively. Kruskal Wallis test in the rheumatology clinic group The discriminative properties were examined in terms and by the Mann Whitney U-test in the knee replacement of their construct validity, where the distribution of group. For both groups, the four measures were scores is compared between groups with expected health also compared in terms of the standardized response differences. For the rheumatology clinic group, this was mean (SRM), which is the mean change between assess- undertaken by estimating score differences between those ments divided by the standard deviation of the change, classified by their physician as having mild or moderate and can be thought of as an indicator of the ability to disease, on the one hand, and those with severe disease, distinguish signal from noise [19]. Cohen s criteria on the other. Further, for both rheumatology and for effect size were also applied to this statistic [14]. surgical groups, score differences were examined between those who reported and those who did not report a non-musculoskeletal co-morbidity. The significance Results of any difference was tested with the Mann Whitney Response U-test (a non-parametric equivalent of the t-test), and Knee replacement sample. Questionnaire booklets were the importance of each difference assessed by calculating mailed to 151 patients whose names were on surgical an effect size, which is the mean difference between the waiting lists to undergo TKR surgery in the near future. groups divided by the pooled standard deviation. This Contact was not made with two patients, one of whom can be regarded as an indication of the ability of a was admitted for surgery earlier than expected, and measure to distinguish the signal from the overall another who could not be traced. One hundred and noise or variance, and it provides the basis for comparing eighteen patients (effective response rate 79%) consented measures with differing scales. Effect sizes were to participate by returning questionnaires at the initial judged against criteria recommended by Cohen [14]: assessment, with no adverse comments received. 0.2 <0.5, 0.5 <0.8 and 0.8 indicating small, mod- Reminder letters to non-respondents were not sent erate and large effect sizes, respectively. because of the short interval between theatre lists being Validity was examined in terms of the convergence drawn up and the date for surgery. Of the responders between like dimensions of the WOMAC, HAQ and to the initial assessment, 109 questionnaires were SF-36 questionnaires. The opportunity was also taken returned at the follow-up assessment. to examine the internal consistency of the measures by The mean age of respondents was 71 yr (range calculating Cronbach s a coefficients for the two condi- 47 87 yr). More than half the sample were female. Nonrespondents tion-specific questionnaires and the SF-36. According to were of similar age to respondents, and Streiner and Norman [15], a value of 0.8 is usually were more likely to be female. regarded as acceptable. This statistic is not relevant for Rheumatology clinic sample. Questionnaire booklets the EQ, which has only one item per dimension. were sent to 125 patients attending rheumatology outpatient The evaluative properties were examined in terms of clinics with a primary diagnosis of OA. Contact sensitivity to change or responsiveness. In part, the was not made with one patient who had changed ability to respond to change can be assessed in terms of address. After one reminder, 112 (effective response rate the proportion of patients at the floor (i.e. the worst 90%) patients consented to participate by returning score) or the ceiling (the best score) of each scale [16]. questionnaires, with no adverse comments. Of these, If many patients score at either extreme of a scale, the 102 returned questionnaires at the follow-up assessment. instrument will have limited ability to register deterioration The mean age of rheumatology clinic respondents was or improvement, respectively. A more complete 64 yr, considerably younger than the sample of patients method is to examine the change in scores in patients undergoing TKR, with more than twice as many women who have experienced a change in health status. as men. Fifty-four per cent of patients were classified For the knee replacement group, responsiveness was by their physicians as having mild disease and 41% as assessed in terms of the changes in scores before and having severe disease. Only six (5%) patients were after their arthroplasty, since this procedure has been found to bring about health improvement in most patients [ 2]. For the rheumatology clinic group, there classified as moderate and, on the basis of their scores, these were combined in subsequent analyses with those classified as mild. Non-respondents were on average
Outcome measures in OA of the knee 873 TABLE 2. Dimension scores for rheumatology clinic patients at initial assessment Mean S.D. % floor % ceiling WOMAC (n = 105 109) Pain 2.0 0.9 1.8 1.8 unintentionally. Stiffness 2.3 0.9 7.3 1.8 Physical function 2.0 0.8 0.0 0.0 dimension. The exception was that of the Physical functioning dimension for the knee replacement sample (86% completion). It was found that 11 knee replacement and eight rheumatology clinic respondents failed to separate the pages of the booklet, thus omitting items HAQ (n = 108 110) Cross-sectional analysis: discriminitive properties of Dressing and grooming 1.5 0.8 4.6 17.4 instruments Rising 1.6 0.8 6.4 10.9 Eating 1.5 0.9 4.5 23.6 The dimension scores at the initial assessment for the Walking 1.6 0.8 6.4 12.7 knee replacement sample are shown in Table 1 and for Hygiene 1.8 0.9 17.3 13.6 the rheumatology clinic sample in Table 2. [ Two week Reach 1.9 0.9 19.1 12.7 retest reliability was assessed (for the WOMAC only) Grip 1.6 0.9 4.5 20.9 by examining score differences for patients who said Activities 1.9 0.8 20.9 8.2 Disability index 1.7 0.7 0.0 5.6 that their health had not changed (n = 30). For all three SF-36 (n = 103 110) dimensions, there were no statistically significant differ- Physical functioning 27.9 22.7 11.3 0.9 ences between test and retest scores [20]. The reliability Social functioning 53.1 30.4 3.7 11.9 properties of the other instruments are already well Role limitations (physical ) 12.4 23.8 73.3 1.9 established.] Role limitations (emotional ) 41 43.9 47.1 29.8 Pain 32.7 19.8 8.2 0.0 Construct validity. Mean score differences between Mental health 63.2 20.6 0.0 2.8 patients with a clinical assessment of mild/moderate vs Vitality 37.7 19.7 2.8 0.9 severe disease in the rheumatology clinic group were General health perception 45.4 23.5 1.0 1.0 highly significant for all three dimensions of the EQ (n = 107 108) EQ-5D Index 46.0 16.8 0.0 0.0 WOMAC questionnaire and the HAQ at the 1% level Rating scale 58.0 20.0 0.9 0.9 (Table 3). Seven dimensions of the SF-36 also discriminated significantly between the patient groups in terms of severity. Results for the EQ were mixed, where younger (57 yr), more likely to be female and to have differences between the EQ-5D measures, but not the mild disease. Rating scale, reached significant levels. These results Patients recruited from both sources were broadly were reflected in the effect sizes, which were large for typical of patients with OA of the knee seen in secondary the Pain dimensions of WOMAC and moderate for the care settings in the UK. other significant dimensions. Overall, the larger effect Completion sizes were associated with WOMAC. Six dimensions of the SF-36, the EQ Rating scale and For the WOMAC, HAQ and EQ questionnaires, item the condition-specific HAQ produced significant differcompletion rates exceeded 90%. The majority of SF-36 ences between rheumatology clinic patients with dimensions achieved completion rates of >90% for each co-morbidity and those without, at the 5% level TABLE 3. Score differences between rheumatology clinic patients with mild/moderate and severe knee osteoarthritis Mean difference Pooled S.D. Effect size Pa WOMAC (n = 105 110) Pain 0.8 0.9 0.95 0.00 Stiffness 0.7 0.9 0.78 0.00 Physical function 0.6 0.8 0.76 0.00 HAQ (n = 108) Disability index 0.4 0.7 0.63 0.00 SF-36 (n = 103 110) Physical functioning 17.0 22.7 0.75 0.00 Social functioning 18.3 30.4 0.60 0.00 Role limitations (physical ) 15.3 23.8 0.64 0.00 Role limitations (emotional ) 26.1 43.9 0.60 0.00 Pain 12.8 19.8 0.65 0.00 Mental health 4.7 20.6 0.23 0.28 Vitality 9.3 19.7 0.47 0.01 General health perception 8.2 23.5 0.35 0.04 EQ (n = 106 107) EQ-5D Index 13.2 16.8 0.79 0.00 Rating scale 5.3 20.0 0.27 0.12 Effect size = (mean score of mild/moderate group) (mean score of severe group) divided by the overall pooled S.D. Effect sizes were judged against criteria recommended by Cohen [14]: 0.2 < 0.5, 0.5 < 0.8 and 0.8 indicating small, moderate and large effect sizes, respectively. ap value from Mann Whitney U-test comparing mean scores in each group.
874 J. E. Brazier et al. TABLE 4. Effect sizes for patients in relation to co-morbidity For those patients soon to undergo TKR surgery, all instruments discriminated to some extent between Rheumatology Knee clinic replacement patients with and without co-morbidity. However, the patients patients relative performance of the instruments was different. The clearest picture emerged for the EQ-5D and EQ Effect size Pa Effect size Pa Rating scale. For the SF-36, differences between Mental WOMAC (n = 94 109) health, Pain and General health perception reached Pain 0.12 0.84 0.60 0.01 statistical significance. The HAQ Disability index and Stiffness 0.09 0.50 0.53 0.01 two dimensions of WOMAC were significantly different. Physical function 0.30 0.43 0.42 0.10 Moderate effect sizes were found for two WOMAC HAQ (n = 84) dimensions, three SF-36 dimensions and both EQ indi- Disability index 0.58 0.02 0.37 0.02 SF-36 (n = 92 104) ces, but not for the HAQ (Table 4). Physical functioning 0.49 0.05 0.34 0.08 Convergent validity. An inspection of the correlation Social functioning 0.56 0.01 0.23 0.44 of like dimensions across all dimensions found the Role limitations (physical ) 0.61 0.01 0.16 0.25 expected convergence between dimension scores across Role limitations (emotional ) 0.04 0.70 0.27 0.17 Pain 0.39 0.07 0.56 0.02 instruments. Spearman s rank correlation coefficient Mental health 0.49 0.03 0.51 0.05 between the WOMAC s Physical functioning dimension Vitality 0.62 0.01 0.28 0.17 and the HAQ Disability index was 0.68. For the generic General health perception 0.78 0.01 0.60 0.01 SF-36 and WOMAC, correlation between the physical EQ (n = 94 105) functioning dimensions was 0.70 and also 0.70 between EQ-5D Index 0.32 0.14 0.62 0.01 Rating scale 0.70 0.00 0.64 0.00 pain dimensions. These correlations exceeded those between WOMAC Physical function and WOMAC Effect size = (mean score of co-morbidity group) (mean score of dimensions of Pain and Stiffness (0.65 and 0.63, respectno co-morbidity group) divided by the overall pooled S.D. ively). As expected, correlations of Mental health and ap value from Mann Whitney U-test comparing mean scores in Vitality (SF-36) with WOMAC dimensions were low. each group. Internal consistency. Cronbach s a coefficients were acceptable for all three dimensions of the WOMAC, ( Table 4). However, neither the three WOMAC dimensions according to standards recommended by Streiner and nor the EQ-5D produced significant differences. Norman [15]. For the HAQ, four of the eight categories The largest effect size was observed for the General did not meet these. The a coefficients of the SF-36 were health perception dimension of the SF-36 ( 0.78) with also below these standards, but in only one instance was moderate effect sizes being observed for three other SF-36 dimensions and the HAQ ( Table 4). the a coefficient <0.7 (role limitations due to physical problems). TABLE 5. Mean score differences between initial assessment and follow-up in relation to patient-perceived health change for rheumatology clinic patients Better Same Worse (n = 6 8) (n = 43 51) (n = 32 38) Mean Mean Mean diffa S.D. diff S.D. diff S.D. Pb WOMAC Pain 0.5 0.8 0.1 0.5 0.1 0.6 0.05 Stiffness 0.6 0.7 0.0 0.7 0.2 0.7 0.04 Physical function 0.6 0.8 0.1 0.6 0.1 0.5 0.02 HAQ Disability index 0.5 0.6 0.0 0.4 0.1 0.3 0.11 SF-36 Physical functioning 29.2 22.2 2.2 19.2 1.1 10.5 0.01 Social functioning 1.4 11.0 0.4 24.6 5.4 22.9 0.56 Role limitations (physical ) 50.0 29.9 8.9 28.8 0.0 14.0 0.00 Role limitations (emotional ) 0.0 0.0 2.1 34.3 5.9 45.3 0.96 Pain 25.0 18.5 7.3 19.8 7.4 18.2 0.00 Mental health 1.1 10.5 3.1 13.9 4.0 10.6 0.03 Vitality 3.1 14.1 4.5 16.5 6.8 14.1 0.01 General health perception 9.6 3.6 0.6 13.7 3.5 10.1 0.01 EQ EQ-5D Index 9.7 7.4 1.4 12.3 1.5 15.4 0.03 Rating scale 10.3 16.3 4.5 12.0 6.0 15.9 0.01 amean difference = (mean score at follow-up) (mean score at initial assessment) where a mean difference of >0 for SF-36 and EQ and of <0 for WOMAC and HAQ indicates a health improvement. bp value from a Kruskal Wallis test comparing mean score differences by patient-perceived health change group.
Outcome measures in OA of the knee 875 Longitudinal analysis: evaluative properties of the TABLE 6. Mean score differences between initial assessment and followinstruments up: knee replacement patients Score distributions. Floor effects of >10% of Mean S.D. of responses were observed for the Physical functioning differencea difference Pb and Role limitations dimensions of the SF-36 in both samples (Tables 1 and 2). For the HAQ, over 10% of WOMAC (n = 93 106) Pain 0.8 0.9 0.01 responses for Rising, Hygiene, Reach and Activities in Stiffness 0.7 1.0 0.01 the knee replacement sample were on the floor, and Physical function 0.8 0.8 0.01 for Hygiene, Reach and Activities in the rheumatology HAQ (n = 94 108) clinic sample. Two dimensions of the SF-36, Social Disability index 0.2 0.6 0.00 functioning and Role limitations due to emotional prob- SF-36 (n = 84 103) Physical functioning 13.5 19.0 0.00 lems, showed ceiling effects in both patient groups. Social functioning 5.6 29.4 0.06 The HAQ showed ceiling effects in four dimensions in Role limitations (physical ) 6.5 34.9 0.08 the knee replacement sample and seven in the rheumatol- Role limitations (emotional ) 3.9 47.1 0.42 ogy clinic sample. Neither the WOMAC nor the EQ Pain 16.6 26.2 0.00 Mental health 0.1 15.5 0.96 indices demonstrated substantial floor or ceiling Vitality 4.8 16.4 0.01 effects. General health perception 0.9 13.7 0.56 Perceived health change. (i) Rheumatology clinic EQ (n = 94 97) sample. For the rheumatology clinic sample, all dimen- EQ-5D Index 9.4 16.6 0.00 sion scores were found to be associated to some extent Rating scale 0.1 15.9 0.93 with the perceived direction of change ( Table 5). The amean difference = (recorded mean score at follow-up) (recorded pattern was found to be significant for six dimensions mean score at initial assessment) where a mean difference of >0 for of the SF-36, both EQ indices and all dimensions of the SF-36 and EQ and of <0 for WOMAC and HAQ indicates a health WOMAC, but not the HAQ, at the 5% level using the improvement. bvalue from paired t-test comparing mean score differences, within Kruskal Wallis test. The condition-specific measures did individual patients. not perform noticeably better than either of the generic measures in terms of the standardized response mean (Table 7). Only the SRM for the Pain dimension of SF-36 was moderate in size, while all other SRMs for all four instruments were either small or not significant. (ii) Knee replacement sample. The mean changes found TABLE 7. Responsiveness of instruments indicated by SRM at post-operative follow-up were statistically significant Rheumatology Knee for all dimensions of the WOMAC and the HAQ clinic replacement disability index. For the SF-36, three dimensions patientsa patients SRMb SRM (Physical functioning, Pain and Vitality) were significantly different, as was the EQ-5D, but not the EQ WOMAC n = 41 46 Rating scale ( Table 6). n = 93 106 These results were reflected in the SRMs, where a Pain 0.27 1.03 high value was observed for Physical function and Pain Stiffness 0.34 0.63 Physical function 0.39 1.06 for the WOMAC compared to a small SRM for the HAQ n = 43 46 HAQ. SRMs were moderate for the Physical functioning n = 94 108 and Pain dimensions of the SF-36 and for the EQ-5D. Disability index 0.33 0.33 (Table 7). SF-36 n = 41 45 n = 84 103 Discussion PHysical functioning 0.34 0.71 Social functioning 0.22 0.19 Role limitations (physical ) 0.36 0.19 The high response rates achieved, and the absence of Role limitations (emotional ) 0.12 0.08 any adverse comments from respondents, suggest that Pain 0.55 0.63 Mental health 0.33 0.01 all instruments may be acceptable to this clinical popula- Vitality 0.43 0.29 tion. In addition, in both samples, completion rates were General health perception 0.47 0.06 very satisfactory, which is encouraging in an elderly EQ n = 41 42 group of patients. These results confirm previous studies n = 94 97 EQ-5D Index 0.20 0.56 using these instruments [3, 5, 9, 21]. WOMAC, SF-36 Rating scale 0.42 0.01 and HAQ scores were similar, though not identical, to those found in other OA cohorts [9, 21]. athis table includes only those rheumatology patients who reported Differences in performance between these measures a change in health between assessments; the scores of those reporting were found in the comparisons of validity and responbsrm is the mean change in score from initial assessment to follow- worse health are multiplied by minus one. siveness. This is the first time that these outcome measup, divided by the S.D. of the change in scores. These were judged ures have been evaluated together in terms of their against criteria recommended by Cohen [14]: 0.2 < 0.5, 0.5 < 0.8 discriminatory and evaluative properties for these two and 0.8 indicating small, moderate and large effect sizes, respectively.
876 J. E. Brazier et al. the dimensions of the SF-36. This may reflect the fact that it is based on a more crude description of status in any given dimension, which makes it efficient for large changes, but less so for the more subtle and diverse changes experienced by the rheumatology clinic group. The advantage of the EQ-5D is its brevity (occupying a single page), but this is at the expense of lower sensitivity, and it does not give the broad picture available from a profile measure such as the SF-36. In summary, these results suggest that the EQ-5D may be suitable for economic evaluations of surgical interventions in this group, but for other purposes, the SF-36 would be preferred. The EQ Rating scale is even simpler, but its perform- ance was inconsistent. It proved unable to distinguish between severity groups, and remarkably unresponsive to the changes following TKR. It performed better in detecting non-musculoskeletal co-morbidity and change in the rheumatology clinic group, but dimensions of the SF-36 performed as well or better in all respects. groups of patients with OA of the knee, and there is no reason to suppose that the same instrument should perform well in both groups. It is commonly assumed that the condition-specific measure should be the more responsive and this hypothesis is supported by our results in the knee replacement group, who received a major intervention. The OA-specific WOMAC physical functioning scale was more responsive than the more general HAQ Disability index, and this and Pain were more responsive than the equivalent dimensions of the SF-36. These results confirm previous studies comparing WOMAC to the HAQ [9] and the SF-36 [3, 21]. In the present study, WOMAC emerges as the instrument of choice for assessing the consequences of surgery for OA of the knee. However, the results also support the use of a generic instrument on this group of patients, since the SF-36 was better at distinguishing those reporting a non-musculoskeletal co-morbidity from those who did not. The advantages of the OA-specific WOMAC were less clear for the rheumatology clinic patients, in whom the HAQ and equivalent dimensions of the SF-36 Conclusions (Physical functioning and Pain) were just as able to distinguish between severity groups. Furthermore, many of their dimension scores were able to discriminate between patients with and without non-musculoskeletal co-morbidity, whereas the WOMAC was not. Most importantly, some dimensions of the SF-36 (Pain, Vitality and General health) were more responsive than the WOMAC for these patients. A possible reason for this result could be that the rheumatology clinic patients were a less well-defined and homogeneous group of patients, with more frequent health problems unrelated to OA of the knee. The changes being experienced by this group may have been more general in nature, so that the generic SF-36 was better at detecting them. An important feature of the rheumatology patients was that more of them reported a deterioration in their health than an improvement, and this was better reflected in Acknowledgements the SF-36 than in the condition-specific measures. However, there are reasons to interpret this result with caution. The analysis of this group is limited by a small sample size, although it compares well with other studies. In addition, the result may apply only to the broad mix of patients typically attending NHS rheumatology the UK NHS Executive ( Trent). clinics, rather than to medically managed OA patients in general. This investigation has confirmed that WOMAC is the instrument of choice for evaluating the outcome of TKR in patients with OA of the knee. For a more general insight into patients health and as a means of making comparisons across conditions, the SF-36 should also be used. For researchers wishing to undertake an eco- nomic evaluation, the EQ-5D might be considered for a surgical but not a heterogeneous medically managed clinic group. Our results suggest that the SF-36 is probably a better choice than WOMAC for detecting change in the less condition-specific morbidity found in this diverse patient population, though care should be taken in generalizing this last result to all medically managed OA populations. We wish to express our gratitude to the consultant orthopaedic surgeons and rheumatologists and their staff, and to the patients who gave up their time to complete the questionnaires. The study was funded by For both patient groups, there is the question of References which generic instrument is most appropriate for use 1. McAlindon TE, Cooper C, Kirwan JR, Dieppe PA. Knee [21]. This is the first time that the EQ has been evaluated pain and disability in the community. Br J Rheumatol in patients with OA. Results from a study using the EQ 1992;31:189 92. with patients with rheumatoid arthritis found that it 2. Liang MH, Fossel AH, Larson MG. Comparisons of five performed as well as the more specific HAQ [22]. In health status instruments for orthopedic evaluation. Med the present study, the EQ-5D was able to discriminate Care 1990;28:632 42. 3. Bombardier C, Melfi CA, Paul J, Green R, Hawker G, on the basis of severity for patients with OA of the knee Wright J et al. Comparison of a generic and a diseaseattending a rheumatology clinic, and was comparable specific measure of pain and physical function after knee in terms of responsiveness to the best-performing dimen- replacement surgery. Med Care 1995;33(suppl.): sions of the SF-36 in the knee replacement group. AS131 44. However, the EQ-5D was noticeably less responsive to 4. Kirwan JR, Reeback JS. Stanford Health Assessment change in the rheumatology clinic group than many of Questionnaire modified to assess disability in British
Outcome measures in OA of the knee 877 patients with rheumatoid arthritis. Br J Rheumatol Economics, York Health Economics Consortium, NHS 1986;25:206 9. Centre for Reviews and Dissemination. York: University 5. Bellamy N, Watson Buchanan W, Goldsmith CH, of York, 1995. Campbell J, Stitt LW. Validation study of WOMAC: A 14. Cohen J. Statistical power analysis for the behavioural health status instrument for measuring clinically important sciences. New York: Academic Press, 1978. patient relevant outcomes in antirheumatic drug therapy 15. Streiner DL, Norman GR. Health measurement scales: a in patients with osteoarthritis of the hip or knee. practical guide to their development and use. Oxford: J Rheumatol 1988;15:1833 40. Oxford University Press, 1989. 6. Stucki G, Liang MH, Phillips C, Katz JN. The Short 16. Fortin PR, Stucki G, Katz JN. Measuring relevant change: Form-36 is preferable to the SIP as a generic health status an emerging challenge in rheumatological clinical trials. measure in patients undergoing elective total hip arthro- Arthritis Rheum 1995;38:1027 30. plasty. Arthritis Care Res 1995;8:174 81. 17. Fitzpatrick R, Ziebland S, Jenkinson C, Mowat A, 7. The Euroqol Group. Euroqol a facility for the measureassessment of rheumatoid arthritis. Br J Rheumatol Mowat A. A generic health status instrument in the ment of health-related quality of life. Health Policy 1990;16:199 208. 1992;31:87 90. 18. Garratt AM, Ruta DA, Abdalla MI. The SF-36 health 8. McDowell I, Newell C. A guide to rating scales and survey questionnaire: an outcome measure suitable for questionnaires. Oxford: Oxford University Press, 1987. routine use within the NHS. Br Med J 1992;306:1440 4. 9. Griffiths G, Bellamy N, Bailey WH, Bailey SI, McLaren 19. Katz JN, Larson MG, Phillips CB, Fossel AH, Liang AC, Campbell J. A comparative study of the relative MH. Comparative measurement sensitivity of short and efficiency of the WOMAC, AIMS and HAQ instruments longer health status instruments. Med Care 1992; in evaluating the outcome of total knee arthroplasty. 30:917 25. Inflammopharmacology 1995;3:1 6. 20. Brazier J, Snaith M, Munro J. Measuring health outcome 10. Wilkin D, Hallam L, Doggett MA. Measures of need and in people with osteoarthritis of the knee. Report to NHS outcome for primary health care. Oxford: Oxford Executive ( Trent), UK, 1996. University Press, 1992. 21. Hawker G, Melfi C, Paul J, Green C, Bombardier C. 11. Brazier JE, Harper R, Jones NMB et al. Validating the Comparison of a generic (SF-36) and a disease-specific SF-36 health survey questionnaire: new outcome measure ( WOMAC) instrument in the measurement of outcomes for primary care. Br Med J 1992;305:160 4. after knee replacement surgery. J Rheumatol 1995; 12. Ware JE, Snow KK, Kolinski M, Gandeck B. SF-36 22:1193 6. health survey manual and interpretation guide. Boston: 22. Hurst NP, Kind P, Ruta D, Hunter M, Stubbings A. The Health Institute, New England Medical Centre, 1993. Measuring health-related quality of life in rheumatoid 13. Williams A. The measurement and valuation of health: a arthritis: validity, responsiveness and reliability of Euroqol chronicle. Discussion Paper 136. Centre for Health ( EQ-5D). Br J Rheumatol 1997;36:551 9.