94 ORIGINAL ARTICLE Assessments of Interrater Reliability and Internal Consistency of the Norwegian Version of the Berg Balance Scale Karin E. Halsaa, PT, Therese Brovold, PT, Vibeke Graver, PhD, PT, Leiv Sandvik, PhD, Astrid Bergland, PhD, PT ABSTRACT. Halsaa KE, Brovold T, Graver V, Sandvik L, Bergland A. Assessments of interrater reliability and internal consistency of the Norwegian version of the Berg Balance Scale. Arch Phys Med Rehabil 2007;88:94-8. Objective: To investigate the interrater reliability and the internal consistency of the Norwegian version of the Berg Balance Scale (BBS) when applied to patients in a geriatric department. Design: Interrater reliability was measured using the statistics and intraclass correlation coefficients (ICCs). Setting: Geriatric rehabilitation unit and geriatric day hospital in Norway. Participants: Eighty-three patients were included; 25 were inpatients in a geriatric rehabilitation unit, whereas 58 were admitted to a geriatric day hospital. Interventions: Not applicable. Main Outcome Measure: The BBS. Results: The values for the different BBS items varied from 0.83 to 1.00, and the ICC for the sum score of the BBS was.998 (95% confidence interval,.996.999). The mean value of the BBS was 44.4. There was a negative significant relation between age and the sum score (r.36). The sum scores of BBS ranged from 12 to 56. The patients were able to perform the BBS without ceiling effect. The score values 3 and 4 were more frequently used than the score values 0, 1, and 2. Conclusions: The Norwegian version of the BBS seems to have an excellent interrater reliability and high internal consistency when applied to patients in geriatric rehabilitation. Key Words: Balance; Geriatrics; Outcome assessment (health care); Rehabilitation. 2007 by the American Congress of Rehabilitation Medicine and the American Academy of Physical Medicine and Rehabilitation BALANCE IS OFTEN IMPAIRED in the elderly, and improvement in balance is an important goal of rehabilitation. Systematic physiotherapeutic assessment of patients with balance problems is important in planning treatment and assessing changes in motor function over time. Measuring balance can assist the clinician in selection of appropriate therapy and serve as an outcome measurement. 1,2 The Berg Balance Scale (BBS) is a brief and frequently used measure of balance for elderly From the Departments of Geriatric Medicine (Halsaa, Brovold), Medicine (Graver), and Statistics (Sandvik), Ullevaal University Hospital, Oslo, Norway; and Faculty of Health Sciences, Oslo University College, Oslo, Norway (Bergland). Supported by the Norwegian Fund for Postgraduate Training in Physiotherapy. No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit upon the author(s) or upon any organization with which the author(s) is/are associated. Reprint requests to Karin E. Halsaa, PT, Dept of Geriatric Medicine, Ullevaal University Hospital, N-0407 Oslo, Norway, e-mail: karin.halsaa@ulleval.no. 0003-9993/07/8801-10857$32.00/0 doi:10.1016/j.apmr.2006.10.016 people. The construct, concurrent, and predictive validity of BBS has been found to be good. 3 More than 100 articles have cited the BBS since 1992. 4 The BBS can be used to assess the balancing ability of the frail elderly, to monitor changes in balance over time, to screen patients for rehabilitation therapy services, and to predict falls in both community-dwelling and institutionalized older adults. 1,4-11 Several studies have shown high levels of inter- and intraobserver agreement for the test as a whole and for the individual items. 5,12-17 During rehabilitation, more than 1 physiotherapist may assess an elderly patient and high interrater reliability is therefore essential. Because errors can occur within each testing, high reliability is required when repeated measures are used to monitor the clinical status of patients or evaluate the effectiveness of treatments. The BBS has been translated into Norwegian, but the reliability of the translated test has not been evaluated. One reason for translation into Norwegian is the possibility of participating in international clinical trials that use this instrument. Another is that we can safely assume that studies using the English language version could be applicable to older adults in Norway. The purposes of this study were to assess the interrater reliability of the Norwegian version of the BBS when applied to patients in geriatric rehabilitation departments, to assess the internal consistency, and to investigate how the different scoring levels of the 14 items were used. METHODS Participants The subjects were a total of 83 patients all admitted to Ullevaal University Hospital, Oslo, Norway; 25 were inpatients in a geriatric rehabilitation unit, and 58 were admitted to a geriatric day hospital (ie, an outpatient geriatric rehabilitation unit where the patients stay for 5 hours, 2 or 3 days a week, during a period of 3 weeks). Criteria for exclusion were impairment causing difficulties in understanding verbal communication such as cognitive deficit and aphasia (diagnosed by the physician who had responsibility for inclusion) and not speaking the Norwegian language properly. Subjects with a recent fracture were excluded because we thought that pain would affect their performance. All the patients who were able to walk with or without a walking aid and had a recommendation for physiotherapy from a doctor were consecutively included. The mean age was 82 years (range, 69 95y); 58 were women and 25 men. The primary reasons for admittance were as follows: several falls (23 persons), cerebral stroke (11 persons), general poor health (9 persons), Parkinson s disease (9 persons), low back pain (7 persons), pneumonia (6 persons), heart failure (6 persons), an osteoarthritis hip (5 persons), rheumatoid arthritis (4 persons), and diabetes (2 persons). All the subjects were ambulatory. Twenty-eight people did not require the use of walking aids, 17 used a cane, and 38 used walking frames. Data on demographic characteristics and comorbidity were collected from medical records.
INTERRATER RELIABILITY OF THE NORWEGIAN BERG BALANCE SCALE, Halsaa 95 Procedure Two experienced physiotherapists who had used the BBS for several years were involved in the study. They were accustomed to using the standardized instructions of administering the test. Before commencing the study, they had 2 weeks of intensive practical training with the Norwegian version of the BBS including discussing and comparing results of testing in order to be quite sure how details concerning the patients performances should be scored. The patients were tested once only. This model was chosen because all the patients were undergoing rehabilitation, and their condition could have been improved if they had been tested on 2 different days. They could also have performed better after knowing the assessment and thus have felt more secure if they had been tested twice. In addition, the scores simply could be different on a test on day 2 because the patient had a better or worse day. Both physiotherapists scored all the patients simultaneously. They alternated between instructing and scoring and observing and scoring. They did not look at each other s ratings and did not discuss their assessments. All the tests were performed in the same room. The Regional Committee for Ethics in Medical Research approved the study. Instrument The BBS is a performance-based measure of balance consisting of 14 observable tasks frequently encountered in everyday life (table 1). Scoring is based on the patients ability to perform the 14 tasks or movements independently and meet certain time and distance requirements. The test is simple, easy to administer, and safe for the elderly to perform. The evaluators rate performance on a 5-level scale from 0 (cannot perform) to 4 (normal performance) for 14 different tasks involving functional balance control, including transfer, turning, and stepping. 3,5 The sum score ranges from 0 to 56. Statistical Analysis Data were analyzed by using the SPSS program. a Intraclass correlation (2-way mixed-model, single measure) was used to measure interrater reliability of the BBS s sum score. 18 An intraclass correlation coefficient (ICC) of.80 or higher reflects high reliability,.60 to.79 moderate reliability, and less than.60 indicates that reliability is poor. 18-20 The interrater agreement of individual items of the BBS was analyzed by means of a score. A score indicates the agreement between raters, adjusted for the amount of agreement expected by chance and the magnitude of disagreements. 20 To calculate and to construct categories (used by both evaluators), we condensed item-rating categories to eliminate the categories used by only 1 evaluator. A value of.75 or higher indicates excellent agreement, 0.4 to.74 indicates fair to good agreement, and less than 0.4 indicates poor agreement. 18,21 The floor and ceiling effects of the sum score reflect the extent that scores cluster at the bottom and top of the scale range. Floor and ceiling effects of more than 20% are considered to be significant. 14 The magnitude of the floor and ceiling effects may be indicative of the sum score s ability to discriminate between subjects. To test the construct validity and dimensionality of the BBS, factor analysis with varimax rotation was performed. Factors were extracted with an eigenvalue greater than 1. Internal consistency of the BBS was tested both by item-to-total correlation and by calculating the Cronbach 12,22,23 for each evaluator s scorings. The Cronbach is regarded as high if it is at least.80. 22 An item-to-total correlation shows the degree of association between each item and the total score of the other items in the scale. An item-to-total correlation is considered adequate if it is above 0.4. 20 The Spearman rank correlation coefficient was used to investigate the relation between variables. Cross-Cultural Translation The procedure used to produce the Norwegian version of the BBS was the forward-backward translation method, 24 involving the following steps. Step 1. Step 1 is the translation into Norwegian of the original version of BBS. English-Norwegian translators, native Norwegian speakers, with more than 15 years of education, were involved. Each translator independently translated the BBS and then compared and discussed the result with that of the other, until a common version was reached. Table 1: Distribution of Scores From 1 Evaluator Within Each of the 14 Items of the BBS (N 83) Item Number and Description Scoring Values 0 1 2 3 4 1. Sitting to standing 0 1 2 29 51 3.6 2. Standing unsupported 1 1 2 1 78 3.9 3. Sitting unsupported 0 0 0 0 83 4.0 4. Standing to sitting 0 0 1 32 50 3.6 5. Transfers 0 1 2 25 55 3.6 6. Standing with eyes closed 1 2 2 2 76 3.8 7. Standing with feet together 6 4 4 5 64 3.4 8. Reaching forward with outstretched arm 9 4 5 30 35 2.9 9. Retrieving an object from floor 5 1 0 3 74 3.7 10. Turning to look behind 4 1 9 10 59 3.5 11. Turning 360 4 7 34 11 27 2.6 12. Placing alternate foot on stool 17 6 5 14 41 2.7 13. Standing with 1 foot in front 34 0 8 33 8 1.8 14. Standing on 1 foot 11 54 5 6 7 1.4 Total 92 82 79 201 708 Mean
96 INTERRATER RELIABILITY OF THE NORWEGIAN BERG BALANCE SCALE, Halsaa Step 2. Step 2 is back-translation of the Norwegian version of the BBS into English: the preliminary version was given to 2 native English people who were experienced translators, each producing a translation into English. These translators were unaware either of the methodology or of the aims of the study. RESULTS A total of 83 patients (25 men, 58 women; mean age standard deviation, 82 5.5y; range, 69 95y) were included. The mean values of the BBS scored by the 2 evaluators were 44.4 8.6 and 44.3 8.6, respectively. The items are presented in table 1. There was a negative significant relation between age and the BBS sum score (r.36), the items sitting to standing (r.24), standing with feet together (r.24), reaching forward with outstretched arm (r.24), turning to look behind (r.27), turning 360 (r.41), placing alternate foot on stool (r.31), and standing on 1 foot (r.28) for both evaluators. The sum score of BBS was similar for men and women. Distribution The sum scores ranged from 12 to 56 for both evaluators. Two persons got the top sum score (56) on the 14 items, and nobody got the sum score of 0. Table 1 displays the frequency distribution for the scores of the 14 items. Some rating categories were not used at all, and others were used very sparingly. Totally, each evaluator completed 1162 scores (see table 1). The score values 0, 1, 2, 3, and 4 were used in 7.9%, 7.1%, 6.8%, 17.3%, and 60.9% of the times, respectively. The items standing with 1 foot in front and standing on 1 foot had the lowest mean score, indicating a greater degree of difficulty. Reliability and Construct The extent of agreement ( ) between scores for each of the 14 items obtained by both evaluators was excellent (table 2). The value ranged from 0.83 to 1.00, and the mean was.94. The evaluators scored differently on only 17 occasions out of the total 1162 scores (1.5%). The largest score difference was 2, which was related to turning to look behind. The ICC between the 2 raters for the BBS s sum score was.988 (95% confidence interval,.966.999). Table 2: Reliability Coefficient ( ) for Each Item of the BBS (N 83) Item 1. Sitting to standing 0.95 2. Standing unsupported 1.00 3. Sitting unsupported * 4. Standing to sitting 0.85 5. Transfers 0.97 6. Standing with eyes closed 1.00 7. Standing with feet together 0.94 8. Reaching forward with outstretched arm 1.00 9. Retrieving object from floor 0.94 10. Turning to look behind 0.83 11. Turning 360 0.97 12. Placing alternate foot on stool 0.98 13. Standing with 1 foot in front 0.96 14. Standing on 1 foot 0.88 *Everyone scored 4. Rating categories 0 and 1 are merged because the 2 score levels were not used by both evaluators. Table 3: Results From Factor Analyses of the BBS Item Changing Position Name of the Factor Maintaining Position From Broad to Narrow Base of Support 1. Sitting to standing.80 2. Standing unsupported.78 3. Sitting unsupported* 4. Standing to sitting.77 5. Transfer.86 6. Standing with eyes closed.78 7. Standing with feet together.58 8. Reaching forward with outstretched arm.61 9. Retrieving an object from floor.70 10. Turning to look behind.59 11. Turning 360.65 12. Placing alternate foot on stool.86 13. Standing with 1 foot in front.85 14. Standing on 1 foot.43 *Everyone scored 4 (not entered in the factor analysis). Item load scored below 0.4 on the factors. Factor analysis on the 14 items of the BBS gave 3 factors with eigenvalues greater than 1. Together the 3 factors accounted for 65% of the matrix variance (30%, 26%, and 9%, respectively). The first factor, which we decided to call changing position, consisted of the items sitting to standing, standing to sitting, transfers, turning 360, and placing alternate foot on stool (table 3). The second factor, which we called maintaining the position, contained the items standing unsupported, standing with eyes closed, standing with feet together, reaching forward with outstretched arm, retrieving an object from floor, and turning to look behind. The third factor, which we called from broad to narrow base of support, covered the items standing with 1 foot in front and standing on 1 foot. The Cronbach coefficient of the BBS s sum score was.87. The correlation matrix, calculated for the 14 items and itemto-total correlation, is presented in table 4. A correlation coefficient could not be computed for item 3 (sitting unsupported) because the scores did not vary. The significant item-to-item correlations range from r equal to.15 to r equal to.87. Except for 2 items, the item-to-total correlations for all items were higher than 0.4. DISCUSSION All the values in the present study were above.82. Considering that values greater than.75 signify excellent agreement, 18,21,25 our study shows an excellent interrater reliability when using BBS to assess balance of patients in geriatric rehabilitation. These findings fit well with the results in other studies. 5,12-17 The generalization of the results is strengthened by the varied clinical characteristics of the subjects and the lack of control of the test conditions (see participant description in the Methods section). The raters had used BBS for several years, and the test cannot be assumed to be as reliable with less experienced health care professionals. The Cronbach measure was high
INTERRATER RELIABILITY OF THE NORWEGIAN BERG BALANCE SCALE, Halsaa 97 Table 4: Correlations Between Items and Between the BBS Sum Score and Items Item 1 2 4 5 6 7 8 9 10 11 12 13 14 1 1 2.26* 1 4.67.16 1 5.87.31.68 1 6.44.31.33.35 1 7.55.41.42.56.51 1 8.55.28*.34.46.37.48 1 9.37.40.27*.42.43.40.37 1 10.51.26*.50.49.41.45.37.47 1 11.60.32.44.58.41.58.58.44.42 1 12.68.25*.59.74.31.59.53.41.54.70 1 13.17.23*.23*.24*.15.14.09.04.21.10.12 1 14.35.33.37.36.31.42.38.42.41.43.51.09.1 Sum.76.33.65.74.41.64.69.47.56.80.81.37.60 Abbreviations: 1, sitting to standing; 2, standing unsupported; 3, sitting unsupported; 4, standing to sitting; 5, transfers; 6, standing with eyes closed; 7, standing with feet together; 8, reaching forward with outstretched arm; 9, retrieving object from floor; 10, turning to look behind; 11, turning 360 o ; 12, placing alternate foot on stool; 13, standing with 1 foot in front; 14, standing on 1 foot; Sum, sum score of the BBS. *P.05. P.01. (.87), indicating strong internal consistency. This finding confirms that BBS items describe a homogeneous variable, in line with results from the original version of the BBS. 25 The primary advantage of having multiple homogeneous items in the BBS is that they provide a basis for more consistent estimate of the ability of subjects to balance. Most of the item-to-total correlation coefficients are above the critical value 0.4 (see table 4). Although some item relations showed fairly high correlation (see table 4), none had a correlation coefficient exceeding.90 and were thus not so highly related as to be redundant. 5 The BBS assesses both static and dynamic aspects of balance, 26 as shown in table 3. Our factor analysis indicates that 3 factors have emerged (see table 3). The first factor (changing position) addresses the ability to maintain balance when changing position(s). The second factor (maintaining position) relates to maintaining the same position with a broad base of support. The third factor (from broad to narrow base) is related to maintaining balance with a narrow base of support when starting in a position with a broad base of support. Factor analyses in other studies have shown that only 1 or 2 factors have emerged, 16 and a possible reason for this discrepancy could be that our study population was less heterogeneous than the study population of Ottonello. 16 In our study, the mean value 44.4 of the BBS was higher than reported by Ottonello. 16 This difference is probably associated with lower level of function and more impairments in the Ottonello study. Floor and ceiling effects have been shown in other studies. 14,15 However, in our study, no significant ceiling and floor effect were seen. A distinct feature of the BBS in our study was that some ratings were not used at all or were underused (see table 1). We found no variability between patients in the item sitting unsupported, which corresponds with the experience of Ottonello 16 and Berg 12 and colleagues, who reported that more than 90% had a top score on this item, indicating a very low degree of difficulty. By condensing item-rating categories, we could eliminate underused categories and construct categories that separated people of differing abilities better. The score values 3 and 4 were significantly more frequently used than the score values 0, 1, and 2, indicating that 3 levels might be better than 5 levels in our population. This is supported by the results of Kornetti 4 and Wang 27 and colleagues. The BBS is frequently used 4 also in Norway. Our study has shown that the Norwegian version of the BBS has excellent interrater reliability and internal consistency. Thus, Norwegian researchers may participate in multicenter international clinical trials that use this instrument. In addition, studies performed by using the Norwegian version of the BBS can safely be included in review articles and meta-analyses. CONCLUSIONS The Norwegian version of the BBS appears to have excellent interrater reliability and high internal consistency when used by experienced physiotherapists on patients in geriatric rehabilitation. References 1. Boulgarides LK, McGinty SM, Willett JA, Barnes CW. Use of clinical and impairment-based tests to predict falls by communitydwelling older adults. Phys Ther 2003;83:328-39. 2. Wade DT. Measurement in neurological rehabilitation. Oxford: Oxford Univ Pr; 1992. 3. Berg KO, Wood-Dauphinee SL, Williams JI, Maki B. Measuring balance in the elderly: validation of an instrument. Can J Public Health 1992;83(Suppl 2):S7-11. 4. Kornetti DL, Fritz SL, Chiu YP, Light KE, Velozo CA. Rating scale analysis of the Berg Balance Scale. Arch Phys Med Rehabil 2004;85:1128-35. 5. Berg KO, Wood-Dauphinee SL, Williams JI, Gayton D. Measuring balance in the elderly: preliminary development of an instrument. Physiother Can 1989;41:304-11. 6. Shumway-Cook A, Baldwin M, Polissar NL, Gruber W. Predicting the probability for falls in community-dwelling older adults. Phys Ther 1997;77:812-8. 7. Thorbahn LD, Newton RA. Use of the Berg Balance Test to predict falls in elderly persons. Phys Ther 1996;76:576-85. 8. Bohannon RW, Leary KM. Standing balance and function over the course of acute rehabilitation. Arch Phys Med Rehabil 1995; 76:994-9. 9. Coweley A, Kerr K. A review of clinical balance tools for use with elderly populations. Crit Rev Phys Rehabil Med 2003;15:167-205.
98 INTERRATER RELIABILITY OF THE NORWEGIAN BERG BALANCE SCALE, Halsaa 10. Whitney SL, Poole JL, Cass SP. A review of balance instruments for older adults. Am J Occup Ther 1998;52:666-71. 11. Harris JE, Eng JJ, Marigold DS, Tokuno CD, Louis CL. Relationship of balance and mobility to fall incidence in people with chronic stroke. Phys Ther 2005;85:150-8. 12. Berg KO, Wood-Dauphinee SL, Williams JI. The balance scale: reliability assessment with elderly residents and patients with acute stroke. Scand J Rehabil Med 1995;27:27-36. 13. Tyson SF, DeSouza LH. Reliability and validity of functional balance tests post stroke. Clin Rehabil 2004;18:916-23. 14. Mao HF, Hsueh IP, Tang PF, Sheu CF, Hsieh CL. Analysis and comparison of the psychometric properties of three balance measures for stroke patients. Stroke 2002;33:1022-7. 15. Norén AM, Bogren U, Bolin J, Stenstrøm C. Balance assessment in patients with peripheral arthritis: applicability and reliability of some clinical assessments. Physiother Res Int 2001;6: 193-204. 16. Ottonello M, Ferriero G, Benevolo E, Sessarego P, Dughi D. Psychometric evaluation of the Italian version of the Berg balance scale in rehabilitation inpatients. Eur Med Phys 2003;39:181-9. 17. Liston RA, Brouwer BJ. Reliability and validity of measures obtained from stroke patients using the Balance Master. Arch Phys Med Rehabil 1996;77:425-30. 18. Fleiss JL. The design and analysis of clinical experiments. New York: John Wiley & Sons; 1986. p 1-32. 19. Richman J, Makrides L, Prince B. Research methodology and applied statistics. Physiother Can 1980;32:253-7. 20. Altman DG. Practical statistics for medical research. London: Chapman & Hall; 1991. 21. McCluggage WG, Bharucha H, Caughley LM, et al. Interobserver variation in the reporting of cervical colposcopic biopsy specimens: comparison of grading systems. J Clin Pathol 1996;49:833-5. 22. Cronbach LJ. Coefficient alpha and the internal structure of test. Psychometrika 1951;16:297. 23. Streiner DL, Norman GR. Health measurement scales. New York: Oxford Univ Pr; 1989. 24. Lin YH, Chen CY, Chiu PK. Cross-cultural research and backtranslation. Sport J 2005;8:1-10. 25. Portney JG, Watkins MP. Foundations of clinical research. Upper Saddle River: Prentice Hall Health; 2000. 26. Berg KO, Maki BE, Williams JI, Holliday PJ, Wood-Dauphinee SL. Clinical and laboratory measures of postural balance in an elderly population. Arch Phys Med Rehabil 1992;73:1073-80. 27. Wang CH, Hsueh IP, Sheu CF, Yao G, Hsieh CL. Psychometric properties of 2 simplified 3-level balance scales used for patients with stroke. Phys Ther 2004;84:430-8. Supplier a. Version 13.00; SPSS Inc, 233 S Wacker Dr, 11th Fl, Chicago, IL 60606.