Generic and condition-specific outcome measures for people with osteoarthritis of the knee

Similar documents
The EuroQol and Medical Outcome Survey 36-item shortform

reliability and validity of the EuroQol ( EQ-5D), an patients with osteoarthritis of the knee M. Fransen and J. Edmonds

Osteoarthritis and Cartilage (1998) 6, Osteoarthritis Research Society /98/ $12.00/0

International Cartilage Repair Society

Type of intervention Treatment. Economic study type Cost-effectiveness analysis.

Assessment of the SF-36 version 2 in the United Kingdom

T he WOMAC (Western Ontario and McMaster

Validation of a French version of the Oxford knee questionnaire

HANDICAP IN INFLAMMATORY ARTHRITIS

The Development of an Orthopedic Waiting List Algorithm for Elective Total Hip and Total Knee Replacement Surgery

R ating scales are consistently used as outcome measures

Validation of the Russian version of the Quality of Life-Rheumatoid Arthritis Scale (QOL-RA Scale)

Importance of sensitivity to change as a criterion

GENERAL PRACTICE. Validating the SF-36 health survey questionnaire: new outcome measure for primary care

Rasch Measurement in the Assessment of Amytrophic Lateral Sclerosis Patients

Health Status Instruments / Utilities

Using the patient s perspective to develop function short forms specific to total hip and knee replacement based on WOMAC function items

CRITICALLY APPRAISED PAPER (CAP)

Cover Page. The handle holds various files of this Leiden University dissertation

THE RELIABILITY AND CONSTRUCT VALIDITY OF THE RAQoL: A RHEUMATOID ARTHRITIS-SPECIFIC QUALITY OF LIFE INSTRUMENT

THE BATH ANKYLOSING SPONDYLITIS PATIENT GLOBAL SCORE (BAS-G)

ARD Online First, published on July 1, 2004 as /ard

Reliability, validity, and responsiveness of the Japanese version of the patient-rated elbow evaluation

AIMS2 USER'S GUIDE BOSTON UNIVERSITY ARTHRITIS CENTER

A Review of Generic Health Status Measures in Patients With Low Back Pain

REPRODUCIBILITY AND RESPONSIVENESS OF EVALUATIVE OUTCOME MEASURES

University of Bristol - Explore Bristol Research. Publisher's PDF, also known as Version of record

unchanged; and the proportion with severe decreased from 7% to 4%; the proportion with mild pain decreased (48% to 32%;

Osteoarthritis (OA), the most common joint

To what extent do people prefer health states with higher values? A note on evidence from the EQ-5D valuation set

Finalised Patient Reported Outcome Measures (PROMs) in England

After Total Hip Arthroplasty Comparison of a Traditional Disease-specific and a Quality-of-life Measurement of Outcome

Guideline scope Persistent pain: assessment and management

Physiotherapy Services for People with Hip and Knee Arthritis in Ontario

Measures of Adult Work Disability The Work Limitations Questionnaire (WLQ) and the Rheumatoid Arthritis Work Instability Scale (RA-WIS)

Evaluating and improving health-related quality of life in patients with varicose veins

The Short Form-36 is Preferable to the SIP as a Generic Health Status Measure in. Patients Undergoing Elective Total Hip Arthroplasty.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Disease-specific, patient-assessed measures of health outcome in ankylosing spondylitis: reliability, validity and responsiveness

International Cartilage Repair Society

Development of a self-reported Chronic Respiratory Questionnaire (CRQ-SR)

International Journal of Health Sciences and Research ISSN:

Internet Journal of Medical Update

continued TABLE E-1 Outlines of the HRQOL Scoring Systems

A survey of the teaching of conscious sedation in dental schools of the United Kingdom and Ireland J A Leitch, 1 N M Girdler 2

Psychometric properties of the Chinese quality of life instrument (HK version) in Chinese and Western medicine primary care settings

About the Measure. Pain, Pain (Type and Intensity), Impairment, Arthritis/Osteoarthritis, Exercise Capacity/Six-Minute Walk Test

Background: Traditional rehabilitation after total joint replacement aims to improve the muscle strength of lower limbs,

*Department of Orthopaedic Oncology, University of Texas MD Anderson Cancer Center, Houston TX

Setting The setting was outpatient clinics. The economic analysis was conducted in Boston, USA.

Cover Page. The handle holds various files of this Leiden University dissertation

Is Postoperative Function After Hip or Knee Arthroplasty Influenced by Preoperative Functional Levels?

CHAPTER - III METHODOLOGY

Osteoarthritis as a public health problem: the impact of developing knee pain on physical function in adults living in the community: (KNEST 3)

Jane T Osterhaus 1* and Oana Purcaru 2

Title:Prediction of poor outcomes six months following total knee arthroplasty in patients awaiting surgery

Health and Quality of Life Outcomes BioMed Central

Pre-Operative Status and Quality of Life Following Total Joint Replacement in a Developing Country: A Prospective Pilot Study

DENOMINATOR: All patient visits for patients aged 21 years and older with a diagnosis of OA

Functional Status and Health-related Quality of Life Assessment in Patients with Rheumatoid Arthritis

Functional Outcome following Primary and Revision Total Hip and Knee Replacement

ACE Briefing paper hip & knee replacement - Appendix

Dates to which data relate The dates of the effectiveness and resource use data were not reported. The price year was 2000.

T he goal of the Bone and Joint Decade is to

Research Report. A Comparison of Five Low Back Disability Questionnaires: Reliability and Responsiveness

Validity and responsiveness of the Core Outcome Measures Index (COMI) for the neck

The Chinese University of Hong Kong The Nethersole School of Nursing. CADENZA Training Programme

Do pain referral patterns determine patient outcome after total hip arthroplasty?

Interpretation Clinical significance: what does it mean?

Learned helplessness predicts functional disability, pain and fatigue in patients with recent-onset inflammatory polyarthritis

S. Rinaldi, A. Doria, F. Salaffi 2, M. Ermani 1, L. Iaccarino, A. Ghirardello, S. Zampieri, P. Sarzi-Puttini 3, P. F. Gambari and G.

Measuring health status in older patients. The SF-36 in practice

H. L. Mitchell 1, A. J. Carr 1 and D. L. Scott 1,2

The New England Journal of Medicine. Special Articles DIFFERENCES BETWEEN MEN AND WOMEN IN THE RATE OF USE OF HIP AND KNEE ARTHROPLASTY

ARTICLE IN PRESS. All-Patient Refined Diagnosis- Related Groups in Primary Arthroplasty

Issues for selection of outcome measures in stroke rehabilitation: ICF Participation

Economic evaluation of end stage renal disease treatment Ardine de Wit G, Ramsteijn P G, de Charro F T

2018 OPTIONS FOR INDIVIDUAL MEASURES: REGISTRY ONLY. MEASURE TYPE: Process

TO SHOW THE EFFECTIVENESS of their interventions

Osteoarthritis Research Society /98/ $12.00/0

The Patient-Rated Elbow Evaluation (PREE) User Manual. June 2010

Myles et al. Table 1 Dimensions of the QoR-40 identified to represent aspects of good quality recovery after anaesthesia and surgery. Positive items w

The RAW Deal on Knee Replacements

Too young for an ankle replacement? Does patient age affect outcomes following total ankle replacement -5 year results

The St. Leger total knee replacement: a false economy Westwood M J, White S P, Bannister G C

Discussion Areas. Patient Reported Outcome Measures in Clinical Practice and Research Arthritis as an Exemplar

Manuscript type: Research letter

The Effect of Type 1 Mobilization of Patello-femoral Joint on Reduction of Knee Joint Stiffness

Comparative, Validity and Responsiveness of the HOOS-PS and KOOS-PS

Note: This is an outcome measure and will be calculated solely using registry data.

RHEUMATOLOGY TRAINING AT INTERNAL MEDICINE AND FAMILY PRACTICE RESIDENCY PROGRAMS

N J Wiles, D G I Scott, E M Barrett, P Merry, E Arie, K GaVney, A J Silman,

White Rose Research Online URL for this paper:

For more information: Quality of Life. World Health Organization Definition of Health

Patient-reported outcomes after fixed- versus mobile-bearing total knee replacement

Linking osteoarthritis-specific health-status measures to the International Classification of Functioning, Disability, and Health (ICF)

A randomized controlled trial of league tables and control charts as aids to health service decision-making

ESPECIALLY IN THE PAST 2 DEcades,

F. Birrell 1,3, M. Lunt 1, G. Macfarlane 2 and A. Silman 1

Transcription:

Rheumatology 1999;38:870 877 Generic and condition-specific outcome measures for people with osteoarthritis of the knee J. E. Brazier, R. Harper, J. Munro, S. J. Walters and M. L. Snaith1 School for Health and Related Research and 1Institute for Bone and Joint Medicine, Medical School, University of Sheffield, Sheffield, UK Abstract Objectives. The aims of this study were to evaluate two condition-specific and two generic health status questionnaires for measuring health-related quality of life in patients with osteoarthritis (OA) of the knee, and to offer guidance to clinicians and researchers in choosing between them. Methods. Patients were recruited from two settings: 118 from knee surgery waiting lists and 112 from rheumatology clinics. Four self-completion questionnaires [ Western Ontario and McMaster University Osteoarthritis Index ( WOMAC ), Health Assessment Questionnaire (HAQ), Short Form-36 (SF-36) and Euroqol ] were sent to subjects on two occasions 6 months apart. Construct validity, convergent validity, internal consistency and responsiveness were examined using primarily non-parametric methods. Results. All instruments proved satisfactory in terms of ease of use, acceptability to patients, internal consistency and reliability. In the surgical group, the OA-specific WOMAC performed better than the HAQ and the generic measures in terms of validity and responsiveness to change, whereas in the rheumatology group the SF-36 was more responsive. Conclusion. WOMAC is the instrument of choice for evaluating the outcome of knee replacement surgery in OA. The SF-36 provides a more general insight into patients health and may be more responsive to change than the WOMAC in a heterogeneous rheumatology clinic population. Researchers wishing to undertake an economic evaluation might consider the EQ-5D for a surgical, but not a rheumatology clinic group. KEY WORDS: Osteoarthritis, Knee, Health-related quality of life, Outcomes. been with rheumatoid arthritis [4]. A more recent development has been a measure specific to OA: the Western Ontario and McMaster University Osteoarthritis Index (WOMAC ) [5]. This has shown considerable promise, but there has been little published comparative evidence on its validity by an independent group of researchers. There has also been increasing use of generic (i.e. not disease-specific) measures of health-related quality of life. These have the potential advantage of being more able to measure side-effects or complications of treat- ment, which may be unrelated to the condition itself. Many people with OA will also have co-morbidities and so to obtain a more holistic view of HRQoL in this patient group there is a case for using a generic measure. A generic measure which has been widely used for other conditions is the Short Form-36 (SF-36), which gener- ates a profile of eight dimensions and for which there is some evidence for validity in OA patients [3, 6 ]. However, it is claimed that generic measures are less responsive to health changes than condition-specific measures. This claim needs to be tested. In a resource-constrained environment, it is important to be able to examine the relative cost effectiveness of interventions, and for this a single-index measure of HRQoL is required. The Euroqol instrument (EQ) is a Osteoarthritis (OA) is the single most important cause of disability and limitation of activity of elderly people in the UK [1]. As a method of treatment, joint replacement is now commonplace for hips and increasingly so for knees, while the pharmacological management of people with OA continues to be important. In this context, it is necessary to identify valid and acceptable outcome measures so that progress in treating OA can be evaluated. Such measures should benefit not only clinicians managing OA and purchasers of health care for this condition, but also, ultimately, patients through improved forms of treatment. It is increasingly recognized that a key outcome measure for any health care intervention for OA, as for many other conditions, is change in health-related quality of life (HRQoL) [2, 3]. In the field of OA of the knee, clinicians and researchers are faced with a choice of measures. One established measure for arthritis is the UK version of the Health Assessment Questionnaire ( HAQ), but most experience with this instrument has Submitted 12 March 1998; revised version accepted 1 April 1999. Correspondence to: J. Brazier, School for Health and Related Research, University of Sheffield, Regent Court, 30 Regent Street, Sheffield S1 4DA, UK. 870 1999 British Society for Rheumatology

Outcome measures in OA of the knee 871 recently developed brief and easy to use single-index measure [7], which has been used successfully on patients with rheumatoid arthritis, but its validity has yet to be examined for OA patients. Concern has been raised about its crudeness and whether it is sufficiently sensitive to many changes in health [8]. The objective of the study reported here was to assess these four instruments for measuring HRQoL in patients with OA of the knee, in terms of their ability to discriminate between patient groups (i.e. their discriminative properties) and their sensitivity to change in both patients undergoing surgery and in patients being treated medically (i.e. their evaluative properties). The aim is to help clinicians and researchers in choosing instruments when measuring the outcomes of surgery or pharmacological interventions. Methods provide a score range of 0 4 [9]. The HAQ was developed for arthritic conditions in general and the final version, modified for British patients [4, 10], contains 20 items covering eight categories of disability which combine to derive a single disability index ranging from 0 to 3 (Table 1). For both these instruments, a low score indicates good health. The SF-36, revised for use in a British population [11], contains 36 items and generates a profile of eight dimension scores ranging from 0 to 100, where high scores indicate good health [12]. The EQ is a brief two-page questionnaire, the first page containing five items describing health status across five dimensions (mobility, self-care, usual activity, pain/ distress and depression/anxiety) (the EQ-5D), and the second displaying a visual analogue rating scale on which the respondent marks an assessment of their overall health [7]. The responses to the five items of the EQ-5D can be scored using a utility-weighted algorithm [13], which has been recommended for use in economic evaluation. The EQ therefore provides two single-index measures of health, the Rating scale and the EQ-5D index, ranging from 0 to 100. The recommended methods of substitution for missing responses for the WOMAC and SF-36 were carried out, but not for the HAQ and EQ, for which there are no methods of substitution. An initial assessment using these four instruments was undertaken at recruitment, with a follow-up assess- ment ~6 months later. At follow-up, item 2 of the Recruitment of patients Patients were recruited from two distinct clinical settings in five UK hospitals. Over an 8 month period, all new patients attending rheumatology clinics, and those assessed pre-operatively for total knee replacement ( TKR), were eligible for the study. From these groups, recruitment was restricted to patients with a diagnosis of OA of the knee made by a hospital rheumatology or orthopaedic specialist. No further inclusion or exclusion criteria were applied, so that the recruited patients were likely to be representative of those seen in everyday hospital clinical practice in the UK. TABLE 1. Dimension scores for knee replacement patients at initial Patients were invited to participate in the study by assessment letter explaining the study and signed by their consulting Mean S.D. % floor % ceiling physician or surgeon. Their names and addresses were obtained from the physicians secretaries or from the WOMAC (n = 109 118) pre-operative assessment clinic. A questionnaire booklet Pain 2.2 0.7 2.5 0.0 and pre-paid reply envelope were enclosed together with Stiffness 2.4 0.8 6.8 0.9 the letter, patient information sheet and consent form. Physical function 2.3 0.7 1.8 0.0 HAQ (n = 109 118) For rheumatology clinic patients, a global assessment Dressing and grooming 1.8 0.7 6.2 11.5 of the severity of their condition was supplied by Rising 2.0 0.6 14.9 3.5 their consulting physician, based upon their clinical Eating 1.8 0.7 6.1 12.3 impression. Walking 1.9 0.5 7.0 2.6 Hygiene 1.7 0.9 13.7 16.2 Instruments Reach 1.9 0.8 19.8 8.6 Grip 1.6 0.9 4.3 22.2 The questionnaire booklet contained the four self- Activities 2.1 0.8 28.8 4.2 completed HRQoL questionnaires ( WOMAC, HAQ, Disability index 1.9 0.5 0.0 0.0 SF-36 and EQ), a short section on socio-demographic SF-36 (n = 103 113) information and recent use of health services, and a Physical functioning 21.0 18.8 16.5 0.0 Social functioning 51.2 28.5 7.3 9.1 question asking about any other major health problems Role limitations (physical ) 11.8 25.5 76.9 2.8 apart from their knee trouble (i.e. co-morbidities). Role limitations (emotional ) 42.0 44.3 46.3 31.5 Medical data were obtained from medical records of Pain 35.2 23.1 9.7 3.5 rheumatology clinic patients when these were not avail- Mental health 67.6 18.4 0.9 2.8 Vitality 40.9 19.6 2.8 0.0 able from physicians secretaries. General health perception 56.4 23.9 1.9 3.8 The WOMAC is a 24-item questionnaire, taking EQ (n = 107 114) around 5 min to complete, and originally designed for EQ-5D Index 44.7 17.6 0.0 0.0 use in clinical trials in patients with OA of the knee or hip. The Likert-scaled version was used in this study. Rating scale 61.5 21.9 0.0 0.9 Scores are generated for the three dimensions of Pain, Stiffness and Physical Function by summing the coded responses and then dividing by the number of items to Scores for the SF-36 and EQ range from 0 to 100, with high scores indicating good health. The range for WOMAC is 0 4. For the HAQ, dimensions range from 0 to 3. For both these instruments, low scores indicate good health.

872 J. E. Brazier et al. SF-36 questionnaire was adapted as follows to measure was no external indicator of change, and hence responsiveness patient-perceived health change: Compared to the last was assessed by comparing mean changes in time you completed the questionnaire, how would you scores across three distinct groups of patients: those rate your health in general now?. The responses available who rated their health as having improved, worsened were: much better, somewhat better, about the or stayed the same between the first and second surveys same, somewhat worse, much worse. (i.e. their response to the self-perceived transition ques- Analysis tion of the SF-36 questionnaire). (Item 2 is not used in the scoring of the SF-36.) This global change item has The primary purpose of the analysis was to assess the been used to assess responsiveness for a number of discriminative and evaluative properties of the four conditions, including rheumatoid arthritis [17, 18]. measures of HRQoL used, i.e. their ability to discriminate The statistical significance of the changes in scores between patient groups and their sensitivity to between different groups was assessed using the change, respectively. Kruskal Wallis test in the rheumatology clinic group The discriminative properties were examined in terms and by the Mann Whitney U-test in the knee replacement of their construct validity, where the distribution of group. For both groups, the four measures were scores is compared between groups with expected health also compared in terms of the standardized response differences. For the rheumatology clinic group, this was mean (SRM), which is the mean change between assess- undertaken by estimating score differences between those ments divided by the standard deviation of the change, classified by their physician as having mild or moderate and can be thought of as an indicator of the ability to disease, on the one hand, and those with severe disease, distinguish signal from noise [19]. Cohen s criteria on the other. Further, for both rheumatology and for effect size were also applied to this statistic [14]. surgical groups, score differences were examined between those who reported and those who did not report a non-musculoskeletal co-morbidity. The significance Results of any difference was tested with the Mann Whitney Response U-test (a non-parametric equivalent of the t-test), and Knee replacement sample. Questionnaire booklets were the importance of each difference assessed by calculating mailed to 151 patients whose names were on surgical an effect size, which is the mean difference between the waiting lists to undergo TKR surgery in the near future. groups divided by the pooled standard deviation. This Contact was not made with two patients, one of whom can be regarded as an indication of the ability of a was admitted for surgery earlier than expected, and measure to distinguish the signal from the overall another who could not be traced. One hundred and noise or variance, and it provides the basis for comparing eighteen patients (effective response rate 79%) consented measures with differing scales. Effect sizes were to participate by returning questionnaires at the initial judged against criteria recommended by Cohen [14]: assessment, with no adverse comments received. 0.2 <0.5, 0.5 <0.8 and 0.8 indicating small, mod- Reminder letters to non-respondents were not sent erate and large effect sizes, respectively. because of the short interval between theatre lists being Validity was examined in terms of the convergence drawn up and the date for surgery. Of the responders between like dimensions of the WOMAC, HAQ and to the initial assessment, 109 questionnaires were SF-36 questionnaires. The opportunity was also taken returned at the follow-up assessment. to examine the internal consistency of the measures by The mean age of respondents was 71 yr (range calculating Cronbach s a coefficients for the two condi- 47 87 yr). More than half the sample were female. Nonrespondents tion-specific questionnaires and the SF-36. According to were of similar age to respondents, and Streiner and Norman [15], a value of 0.8 is usually were more likely to be female. regarded as acceptable. This statistic is not relevant for Rheumatology clinic sample. Questionnaire booklets the EQ, which has only one item per dimension. were sent to 125 patients attending rheumatology outpatient The evaluative properties were examined in terms of clinics with a primary diagnosis of OA. Contact sensitivity to change or responsiveness. In part, the was not made with one patient who had changed ability to respond to change can be assessed in terms of address. After one reminder, 112 (effective response rate the proportion of patients at the floor (i.e. the worst 90%) patients consented to participate by returning score) or the ceiling (the best score) of each scale [16]. questionnaires, with no adverse comments. Of these, If many patients score at either extreme of a scale, the 102 returned questionnaires at the follow-up assessment. instrument will have limited ability to register deterioration The mean age of rheumatology clinic respondents was or improvement, respectively. A more complete 64 yr, considerably younger than the sample of patients method is to examine the change in scores in patients undergoing TKR, with more than twice as many women who have experienced a change in health status. as men. Fifty-four per cent of patients were classified For the knee replacement group, responsiveness was by their physicians as having mild disease and 41% as assessed in terms of the changes in scores before and having severe disease. Only six (5%) patients were after their arthroplasty, since this procedure has been found to bring about health improvement in most patients [ 2]. For the rheumatology clinic group, there classified as moderate and, on the basis of their scores, these were combined in subsequent analyses with those classified as mild. Non-respondents were on average

Outcome measures in OA of the knee 873 TABLE 2. Dimension scores for rheumatology clinic patients at initial assessment Mean S.D. % floor % ceiling WOMAC (n = 105 109) Pain 2.0 0.9 1.8 1.8 unintentionally. Stiffness 2.3 0.9 7.3 1.8 Physical function 2.0 0.8 0.0 0.0 dimension. The exception was that of the Physical functioning dimension for the knee replacement sample (86% completion). It was found that 11 knee replacement and eight rheumatology clinic respondents failed to separate the pages of the booklet, thus omitting items HAQ (n = 108 110) Cross-sectional analysis: discriminitive properties of Dressing and grooming 1.5 0.8 4.6 17.4 instruments Rising 1.6 0.8 6.4 10.9 Eating 1.5 0.9 4.5 23.6 The dimension scores at the initial assessment for the Walking 1.6 0.8 6.4 12.7 knee replacement sample are shown in Table 1 and for Hygiene 1.8 0.9 17.3 13.6 the rheumatology clinic sample in Table 2. [ Two week Reach 1.9 0.9 19.1 12.7 retest reliability was assessed (for the WOMAC only) Grip 1.6 0.9 4.5 20.9 by examining score differences for patients who said Activities 1.9 0.8 20.9 8.2 Disability index 1.7 0.7 0.0 5.6 that their health had not changed (n = 30). For all three SF-36 (n = 103 110) dimensions, there were no statistically significant differ- Physical functioning 27.9 22.7 11.3 0.9 ences between test and retest scores [20]. The reliability Social functioning 53.1 30.4 3.7 11.9 properties of the other instruments are already well Role limitations (physical ) 12.4 23.8 73.3 1.9 established.] Role limitations (emotional ) 41 43.9 47.1 29.8 Pain 32.7 19.8 8.2 0.0 Construct validity. Mean score differences between Mental health 63.2 20.6 0.0 2.8 patients with a clinical assessment of mild/moderate vs Vitality 37.7 19.7 2.8 0.9 severe disease in the rheumatology clinic group were General health perception 45.4 23.5 1.0 1.0 highly significant for all three dimensions of the EQ (n = 107 108) EQ-5D Index 46.0 16.8 0.0 0.0 WOMAC questionnaire and the HAQ at the 1% level Rating scale 58.0 20.0 0.9 0.9 (Table 3). Seven dimensions of the SF-36 also discriminated significantly between the patient groups in terms of severity. Results for the EQ were mixed, where younger (57 yr), more likely to be female and to have differences between the EQ-5D measures, but not the mild disease. Rating scale, reached significant levels. These results Patients recruited from both sources were broadly were reflected in the effect sizes, which were large for typical of patients with OA of the knee seen in secondary the Pain dimensions of WOMAC and moderate for the care settings in the UK. other significant dimensions. Overall, the larger effect Completion sizes were associated with WOMAC. Six dimensions of the SF-36, the EQ Rating scale and For the WOMAC, HAQ and EQ questionnaires, item the condition-specific HAQ produced significant differcompletion rates exceeded 90%. The majority of SF-36 ences between rheumatology clinic patients with dimensions achieved completion rates of >90% for each co-morbidity and those without, at the 5% level TABLE 3. Score differences between rheumatology clinic patients with mild/moderate and severe knee osteoarthritis Mean difference Pooled S.D. Effect size Pa WOMAC (n = 105 110) Pain 0.8 0.9 0.95 0.00 Stiffness 0.7 0.9 0.78 0.00 Physical function 0.6 0.8 0.76 0.00 HAQ (n = 108) Disability index 0.4 0.7 0.63 0.00 SF-36 (n = 103 110) Physical functioning 17.0 22.7 0.75 0.00 Social functioning 18.3 30.4 0.60 0.00 Role limitations (physical ) 15.3 23.8 0.64 0.00 Role limitations (emotional ) 26.1 43.9 0.60 0.00 Pain 12.8 19.8 0.65 0.00 Mental health 4.7 20.6 0.23 0.28 Vitality 9.3 19.7 0.47 0.01 General health perception 8.2 23.5 0.35 0.04 EQ (n = 106 107) EQ-5D Index 13.2 16.8 0.79 0.00 Rating scale 5.3 20.0 0.27 0.12 Effect size = (mean score of mild/moderate group) (mean score of severe group) divided by the overall pooled S.D. Effect sizes were judged against criteria recommended by Cohen [14]: 0.2 < 0.5, 0.5 < 0.8 and 0.8 indicating small, moderate and large effect sizes, respectively. ap value from Mann Whitney U-test comparing mean scores in each group.

874 J. E. Brazier et al. TABLE 4. Effect sizes for patients in relation to co-morbidity For those patients soon to undergo TKR surgery, all instruments discriminated to some extent between Rheumatology Knee clinic replacement patients with and without co-morbidity. However, the patients patients relative performance of the instruments was different. The clearest picture emerged for the EQ-5D and EQ Effect size Pa Effect size Pa Rating scale. For the SF-36, differences between Mental WOMAC (n = 94 109) health, Pain and General health perception reached Pain 0.12 0.84 0.60 0.01 statistical significance. The HAQ Disability index and Stiffness 0.09 0.50 0.53 0.01 two dimensions of WOMAC were significantly different. Physical function 0.30 0.43 0.42 0.10 Moderate effect sizes were found for two WOMAC HAQ (n = 84) dimensions, three SF-36 dimensions and both EQ indi- Disability index 0.58 0.02 0.37 0.02 SF-36 (n = 92 104) ces, but not for the HAQ (Table 4). Physical functioning 0.49 0.05 0.34 0.08 Convergent validity. An inspection of the correlation Social functioning 0.56 0.01 0.23 0.44 of like dimensions across all dimensions found the Role limitations (physical ) 0.61 0.01 0.16 0.25 expected convergence between dimension scores across Role limitations (emotional ) 0.04 0.70 0.27 0.17 Pain 0.39 0.07 0.56 0.02 instruments. Spearman s rank correlation coefficient Mental health 0.49 0.03 0.51 0.05 between the WOMAC s Physical functioning dimension Vitality 0.62 0.01 0.28 0.17 and the HAQ Disability index was 0.68. For the generic General health perception 0.78 0.01 0.60 0.01 SF-36 and WOMAC, correlation between the physical EQ (n = 94 105) functioning dimensions was 0.70 and also 0.70 between EQ-5D Index 0.32 0.14 0.62 0.01 Rating scale 0.70 0.00 0.64 0.00 pain dimensions. These correlations exceeded those between WOMAC Physical function and WOMAC Effect size = (mean score of co-morbidity group) (mean score of dimensions of Pain and Stiffness (0.65 and 0.63, respectno co-morbidity group) divided by the overall pooled S.D. ively). As expected, correlations of Mental health and ap value from Mann Whitney U-test comparing mean scores in Vitality (SF-36) with WOMAC dimensions were low. each group. Internal consistency. Cronbach s a coefficients were acceptable for all three dimensions of the WOMAC, ( Table 4). However, neither the three WOMAC dimensions according to standards recommended by Streiner and nor the EQ-5D produced significant differences. Norman [15]. For the HAQ, four of the eight categories The largest effect size was observed for the General did not meet these. The a coefficients of the SF-36 were health perception dimension of the SF-36 ( 0.78) with also below these standards, but in only one instance was moderate effect sizes being observed for three other SF-36 dimensions and the HAQ ( Table 4). the a coefficient <0.7 (role limitations due to physical problems). TABLE 5. Mean score differences between initial assessment and follow-up in relation to patient-perceived health change for rheumatology clinic patients Better Same Worse (n = 6 8) (n = 43 51) (n = 32 38) Mean Mean Mean diffa S.D. diff S.D. diff S.D. Pb WOMAC Pain 0.5 0.8 0.1 0.5 0.1 0.6 0.05 Stiffness 0.6 0.7 0.0 0.7 0.2 0.7 0.04 Physical function 0.6 0.8 0.1 0.6 0.1 0.5 0.02 HAQ Disability index 0.5 0.6 0.0 0.4 0.1 0.3 0.11 SF-36 Physical functioning 29.2 22.2 2.2 19.2 1.1 10.5 0.01 Social functioning 1.4 11.0 0.4 24.6 5.4 22.9 0.56 Role limitations (physical ) 50.0 29.9 8.9 28.8 0.0 14.0 0.00 Role limitations (emotional ) 0.0 0.0 2.1 34.3 5.9 45.3 0.96 Pain 25.0 18.5 7.3 19.8 7.4 18.2 0.00 Mental health 1.1 10.5 3.1 13.9 4.0 10.6 0.03 Vitality 3.1 14.1 4.5 16.5 6.8 14.1 0.01 General health perception 9.6 3.6 0.6 13.7 3.5 10.1 0.01 EQ EQ-5D Index 9.7 7.4 1.4 12.3 1.5 15.4 0.03 Rating scale 10.3 16.3 4.5 12.0 6.0 15.9 0.01 amean difference = (mean score at follow-up) (mean score at initial assessment) where a mean difference of >0 for SF-36 and EQ and of <0 for WOMAC and HAQ indicates a health improvement. bp value from a Kruskal Wallis test comparing mean score differences by patient-perceived health change group.

Outcome measures in OA of the knee 875 Longitudinal analysis: evaluative properties of the TABLE 6. Mean score differences between initial assessment and followinstruments up: knee replacement patients Score distributions. Floor effects of >10% of Mean S.D. of responses were observed for the Physical functioning differencea difference Pb and Role limitations dimensions of the SF-36 in both samples (Tables 1 and 2). For the HAQ, over 10% of WOMAC (n = 93 106) Pain 0.8 0.9 0.01 responses for Rising, Hygiene, Reach and Activities in Stiffness 0.7 1.0 0.01 the knee replacement sample were on the floor, and Physical function 0.8 0.8 0.01 for Hygiene, Reach and Activities in the rheumatology HAQ (n = 94 108) clinic sample. Two dimensions of the SF-36, Social Disability index 0.2 0.6 0.00 functioning and Role limitations due to emotional prob- SF-36 (n = 84 103) Physical functioning 13.5 19.0 0.00 lems, showed ceiling effects in both patient groups. Social functioning 5.6 29.4 0.06 The HAQ showed ceiling effects in four dimensions in Role limitations (physical ) 6.5 34.9 0.08 the knee replacement sample and seven in the rheumatol- Role limitations (emotional ) 3.9 47.1 0.42 ogy clinic sample. Neither the WOMAC nor the EQ Pain 16.6 26.2 0.00 Mental health 0.1 15.5 0.96 indices demonstrated substantial floor or ceiling Vitality 4.8 16.4 0.01 effects. General health perception 0.9 13.7 0.56 Perceived health change. (i) Rheumatology clinic EQ (n = 94 97) sample. For the rheumatology clinic sample, all dimen- EQ-5D Index 9.4 16.6 0.00 sion scores were found to be associated to some extent Rating scale 0.1 15.9 0.93 with the perceived direction of change ( Table 5). The amean difference = (recorded mean score at follow-up) (recorded pattern was found to be significant for six dimensions mean score at initial assessment) where a mean difference of >0 for of the SF-36, both EQ indices and all dimensions of the SF-36 and EQ and of <0 for WOMAC and HAQ indicates a health WOMAC, but not the HAQ, at the 5% level using the improvement. bvalue from paired t-test comparing mean score differences, within Kruskal Wallis test. The condition-specific measures did individual patients. not perform noticeably better than either of the generic measures in terms of the standardized response mean (Table 7). Only the SRM for the Pain dimension of SF-36 was moderate in size, while all other SRMs for all four instruments were either small or not significant. (ii) Knee replacement sample. The mean changes found TABLE 7. Responsiveness of instruments indicated by SRM at post-operative follow-up were statistically significant Rheumatology Knee for all dimensions of the WOMAC and the HAQ clinic replacement disability index. For the SF-36, three dimensions patientsa patients SRMb SRM (Physical functioning, Pain and Vitality) were significantly different, as was the EQ-5D, but not the EQ WOMAC n = 41 46 Rating scale ( Table 6). n = 93 106 These results were reflected in the SRMs, where a Pain 0.27 1.03 high value was observed for Physical function and Pain Stiffness 0.34 0.63 Physical function 0.39 1.06 for the WOMAC compared to a small SRM for the HAQ n = 43 46 HAQ. SRMs were moderate for the Physical functioning n = 94 108 and Pain dimensions of the SF-36 and for the EQ-5D. Disability index 0.33 0.33 (Table 7). SF-36 n = 41 45 n = 84 103 Discussion PHysical functioning 0.34 0.71 Social functioning 0.22 0.19 Role limitations (physical ) 0.36 0.19 The high response rates achieved, and the absence of Role limitations (emotional ) 0.12 0.08 any adverse comments from respondents, suggest that Pain 0.55 0.63 Mental health 0.33 0.01 all instruments may be acceptable to this clinical popula- Vitality 0.43 0.29 tion. In addition, in both samples, completion rates were General health perception 0.47 0.06 very satisfactory, which is encouraging in an elderly EQ n = 41 42 group of patients. These results confirm previous studies n = 94 97 EQ-5D Index 0.20 0.56 using these instruments [3, 5, 9, 21]. WOMAC, SF-36 Rating scale 0.42 0.01 and HAQ scores were similar, though not identical, to those found in other OA cohorts [9, 21]. athis table includes only those rheumatology patients who reported Differences in performance between these measures a change in health between assessments; the scores of those reporting were found in the comparisons of validity and responbsrm is the mean change in score from initial assessment to follow- worse health are multiplied by minus one. siveness. This is the first time that these outcome measup, divided by the S.D. of the change in scores. These were judged ures have been evaluated together in terms of their against criteria recommended by Cohen [14]: 0.2 < 0.5, 0.5 < 0.8 discriminatory and evaluative properties for these two and 0.8 indicating small, moderate and large effect sizes, respectively.

876 J. E. Brazier et al. the dimensions of the SF-36. This may reflect the fact that it is based on a more crude description of status in any given dimension, which makes it efficient for large changes, but less so for the more subtle and diverse changes experienced by the rheumatology clinic group. The advantage of the EQ-5D is its brevity (occupying a single page), but this is at the expense of lower sensitivity, and it does not give the broad picture available from a profile measure such as the SF-36. In summary, these results suggest that the EQ-5D may be suitable for economic evaluations of surgical interventions in this group, but for other purposes, the SF-36 would be preferred. The EQ Rating scale is even simpler, but its perform- ance was inconsistent. It proved unable to distinguish between severity groups, and remarkably unresponsive to the changes following TKR. It performed better in detecting non-musculoskeletal co-morbidity and change in the rheumatology clinic group, but dimensions of the SF-36 performed as well or better in all respects. groups of patients with OA of the knee, and there is no reason to suppose that the same instrument should perform well in both groups. It is commonly assumed that the condition-specific measure should be the more responsive and this hypothesis is supported by our results in the knee replacement group, who received a major intervention. The OA-specific WOMAC physical functioning scale was more responsive than the more general HAQ Disability index, and this and Pain were more responsive than the equivalent dimensions of the SF-36. These results confirm previous studies comparing WOMAC to the HAQ [9] and the SF-36 [3, 21]. In the present study, WOMAC emerges as the instrument of choice for assessing the consequences of surgery for OA of the knee. However, the results also support the use of a generic instrument on this group of patients, since the SF-36 was better at distinguishing those reporting a non-musculoskeletal co-morbidity from those who did not. The advantages of the OA-specific WOMAC were less clear for the rheumatology clinic patients, in whom the HAQ and equivalent dimensions of the SF-36 Conclusions (Physical functioning and Pain) were just as able to distinguish between severity groups. Furthermore, many of their dimension scores were able to discriminate between patients with and without non-musculoskeletal co-morbidity, whereas the WOMAC was not. Most importantly, some dimensions of the SF-36 (Pain, Vitality and General health) were more responsive than the WOMAC for these patients. A possible reason for this result could be that the rheumatology clinic patients were a less well-defined and homogeneous group of patients, with more frequent health problems unrelated to OA of the knee. The changes being experienced by this group may have been more general in nature, so that the generic SF-36 was better at detecting them. An important feature of the rheumatology patients was that more of them reported a deterioration in their health than an improvement, and this was better reflected in Acknowledgements the SF-36 than in the condition-specific measures. However, there are reasons to interpret this result with caution. The analysis of this group is limited by a small sample size, although it compares well with other studies. In addition, the result may apply only to the broad mix of patients typically attending NHS rheumatology the UK NHS Executive ( Trent). clinics, rather than to medically managed OA patients in general. This investigation has confirmed that WOMAC is the instrument of choice for evaluating the outcome of TKR in patients with OA of the knee. For a more general insight into patients health and as a means of making comparisons across conditions, the SF-36 should also be used. For researchers wishing to undertake an eco- nomic evaluation, the EQ-5D might be considered for a surgical but not a heterogeneous medically managed clinic group. Our results suggest that the SF-36 is probably a better choice than WOMAC for detecting change in the less condition-specific morbidity found in this diverse patient population, though care should be taken in generalizing this last result to all medically managed OA populations. We wish to express our gratitude to the consultant orthopaedic surgeons and rheumatologists and their staff, and to the patients who gave up their time to complete the questionnaires. The study was funded by For both patient groups, there is the question of References which generic instrument is most appropriate for use 1. McAlindon TE, Cooper C, Kirwan JR, Dieppe PA. Knee [21]. This is the first time that the EQ has been evaluated pain and disability in the community. Br J Rheumatol in patients with OA. Results from a study using the EQ 1992;31:189 92. with patients with rheumatoid arthritis found that it 2. Liang MH, Fossel AH, Larson MG. Comparisons of five performed as well as the more specific HAQ [22]. In health status instruments for orthopedic evaluation. Med the present study, the EQ-5D was able to discriminate Care 1990;28:632 42. 3. Bombardier C, Melfi CA, Paul J, Green R, Hawker G, on the basis of severity for patients with OA of the knee Wright J et al. Comparison of a generic and a diseaseattending a rheumatology clinic, and was comparable specific measure of pain and physical function after knee in terms of responsiveness to the best-performing dimen- replacement surgery. Med Care 1995;33(suppl.): sions of the SF-36 in the knee replacement group. AS131 44. However, the EQ-5D was noticeably less responsive to 4. Kirwan JR, Reeback JS. Stanford Health Assessment change in the rheumatology clinic group than many of Questionnaire modified to assess disability in British

Outcome measures in OA of the knee 877 patients with rheumatoid arthritis. Br J Rheumatol Economics, York Health Economics Consortium, NHS 1986;25:206 9. Centre for Reviews and Dissemination. York: University 5. Bellamy N, Watson Buchanan W, Goldsmith CH, of York, 1995. Campbell J, Stitt LW. Validation study of WOMAC: A 14. Cohen J. Statistical power analysis for the behavioural health status instrument for measuring clinically important sciences. New York: Academic Press, 1978. patient relevant outcomes in antirheumatic drug therapy 15. Streiner DL, Norman GR. Health measurement scales: a in patients with osteoarthritis of the hip or knee. practical guide to their development and use. Oxford: J Rheumatol 1988;15:1833 40. Oxford University Press, 1989. 6. Stucki G, Liang MH, Phillips C, Katz JN. The Short 16. Fortin PR, Stucki G, Katz JN. Measuring relevant change: Form-36 is preferable to the SIP as a generic health status an emerging challenge in rheumatological clinical trials. measure in patients undergoing elective total hip arthro- Arthritis Rheum 1995;38:1027 30. plasty. Arthritis Care Res 1995;8:174 81. 17. Fitzpatrick R, Ziebland S, Jenkinson C, Mowat A, 7. The Euroqol Group. Euroqol a facility for the measureassessment of rheumatoid arthritis. Br J Rheumatol Mowat A. A generic health status instrument in the ment of health-related quality of life. Health Policy 1990;16:199 208. 1992;31:87 90. 18. Garratt AM, Ruta DA, Abdalla MI. The SF-36 health 8. McDowell I, Newell C. A guide to rating scales and survey questionnaire: an outcome measure suitable for questionnaires. Oxford: Oxford University Press, 1987. routine use within the NHS. Br Med J 1992;306:1440 4. 9. Griffiths G, Bellamy N, Bailey WH, Bailey SI, McLaren 19. Katz JN, Larson MG, Phillips CB, Fossel AH, Liang AC, Campbell J. A comparative study of the relative MH. Comparative measurement sensitivity of short and efficiency of the WOMAC, AIMS and HAQ instruments longer health status instruments. Med Care 1992; in evaluating the outcome of total knee arthroplasty. 30:917 25. Inflammopharmacology 1995;3:1 6. 20. Brazier J, Snaith M, Munro J. Measuring health outcome 10. Wilkin D, Hallam L, Doggett MA. Measures of need and in people with osteoarthritis of the knee. Report to NHS outcome for primary health care. Oxford: Oxford Executive ( Trent), UK, 1996. University Press, 1992. 21. Hawker G, Melfi C, Paul J, Green C, Bombardier C. 11. Brazier JE, Harper R, Jones NMB et al. Validating the Comparison of a generic (SF-36) and a disease-specific SF-36 health survey questionnaire: new outcome measure ( WOMAC) instrument in the measurement of outcomes for primary care. Br Med J 1992;305:160 4. after knee replacement surgery. J Rheumatol 1995; 12. Ware JE, Snow KK, Kolinski M, Gandeck B. SF-36 22:1193 6. health survey manual and interpretation guide. Boston: 22. Hurst NP, Kind P, Ruta D, Hunter M, Stubbings A. The Health Institute, New England Medical Centre, 1993. Measuring health-related quality of life in rheumatoid 13. Williams A. The measurement and valuation of health: a arthritis: validity, responsiveness and reliability of Euroqol chronicle. Discussion Paper 136. Centre for Health ( EQ-5D). Br J Rheumatol 1997;36:551 9.