Melinda H. Tarbell, Alexis D. Henry, Wendy J. Coster. 324 May/June 2004, Volume 58, Number 3

Psychometric Properties of the Scorable Self-Care Evaluation Melinda H. Tarbell, Alexis D. Henry, Wendy J. Coster OBJECTIVE. The purpose of this study was to examine evidence for the reliability and validity of the Scorable Self-Care Evaluation (SSCE), an 18-item assessment of observed and perceived self-care performance commonly used with persons with psychiatric disabilities. METHOD. As part of a longitudinal study, 70 adults with psychiatric disabilities were administered two cognitive measures, the Wisconsin Card Sorting Test and the Logical Memory subscales of the Weschler Memory Scale, at baseline, and the SSCE at follow-up. After transforming the weighted item scores, intraclass correlation coefficients were used to examine inter-rater reliability and Rasch analysis was used to examine internal consistency of the SSCE. Spearman rank-order correlations were used to examine construct validity. RESULTS. High interrater reliabilities were found for the four subscale scores (ICCs ranging from.96 to 1.00, p <.001) and the total scores (ICC =. 98, p <.001) of the SSCE. Rasch analysis indicated that no items misfit; however, some items showed a weak distribution across all possible scores. The SSCE subscale and total scores correlated to varying degrees with the cognitive measures. CONCLUSION. The SSCE has the potential to be a reliable and valid clinical measure, as demonstrated by the results of the current study. However, these results were only achieved using a transformation of the current scoring system for the SSCE, pointing to the need for further revision of the test items and scoring system. Tarbell, M. H., Henry, A. D., & Coster, W. J. (2004). Psychometric properties of the scorable self-care evaluation. American Journal of Occupational Therapy, 58, 324 332. Melinda H. Tarbell, MS, OTR/L, is Occupational Therapy Clinician, Acute Care Rehabilitation Department, Concord Hospital, Concord, New Hampshire 03301; mtarbell@highstream.net Alexis D. Henry, ScD, OTR/L, FAOTA, is Assistant Professor of Occupational Therapy, Department of Rehabilitation Sciences, Sargent College of Health and Rehabilitation Sciences, Boston University, Boston, Massachusetts, and Adjunct Research Assistant Professor of Psychiatry, Center for Mental Health Services Research, University of Massachusetts Medical School, Worcester, Massachusetts. Wendy J. Coster, PhD, OTR/L, FAOTA, is Associate Professor and Director of Occupational Therapy Programs, Department of Rehabilitation Sciences, Sargent College of Health and Rehabilitation Sciences, Boston University, Boston, Massachusetts. Independent functioning in the community requires mastery of a variety of basic and instrumental activities of daily living. Basic activities of daily living (ADL) include self-care tasks such as dressing, eating, grooming, and toileting. Additional activities required for independent living in the community are often referred to as instrumental activities of daily living (IADL), and include activities such as money and household management, shopping, and community mobility (Rogers & Holms, 2003). Some argue that functioning is also a measure of social performance, including age appropriate tasks such as working, attending school, forming relationships, and socializing with peers (Kuntz, 1995). ADL and IADL performance is linked to and may be affected by cognitive functioning (DePoy & Gitlow, 1998). Research has shown a relationship between measures of cognition and measures of ADL and IADL capacity among individuals with varying disabling conditions (e.g., Allen & Blue, 1998; Diamond, Felsenthal, Macciocchi, Butler, & Lally-Cassady, 1996; Neistadt, 1994). There is considerable evidence that cognitive impairments often accompany serious mental disorders such as schizophrenia, bipolar disorder, and major depression, placing individuals with these disorders at risk for difficulties in daily functioning (Green, 1996, 2000; Schretlen, et al., 2000). Specific cognitive impairments associated with mental disorders include difficulties with memory, 324 May/June 2004, Volume 58, Number 3

attention, reaction time, problem-solving, processing speed and capacity, concept formation, sensory-motor processing, and integration of sensory stimuli (Kern & Green, 1998). Relationships between cognitive impairments and ADL/IADL functioning have been found in several recent studies of adults with serious mental disorders. (e.g., Evans, et al., 2003; Twamley, et al., 2002; Velligan, Bow-Thomas, Mahurin, Miller, & Halgunseth, 2000). Specifically, secondary verbal memory, immediate memory, vigilance, and executive functioning have emerged as predictors of community functioning in this population (Green, 1996; Green, Kern, Braff, & Mintz, 2000). People with serious mental disorders appear to be at a greater risk for difficulties with more complex community living skills, or IADL, than basic ADL (Kuntz, 1995). Thus, for a performance measure to be a useful predictor of community functioning, it must assess aspects of both ADL and IADL performance. There are limited empirical data to support that assessments conducted in a clinical setting can predict an individual s performance in his or her actual community (Brown, Moore, Hemman, & Yunek, 1996). Continued research is needed to identify reliable and valid measures that can be used to inform recommendations regarding independence in the community among these individuals. This paper investigates the psychometric properties of the Scorable Self-Care Evaluation (SSCE), which is a published measure of ADL and IADL capacities designed primarily for use with adults with psychiatric disabilities (Clark & Peters, 1993). The SSCE uses both observation and self-report methods and is intended to comprehensively and quantifiably assess a person s performance of self-care tasks. The SSCE includes 18 items distributed across four subscales: personal care; housekeeping; work and leisure; and financial management. There are an unequal number of items within each subscale. The items, the range of possible scores, and method of administration are shown in Table 1. The personal care subscale consists of five items designed to measure performance and knowledge in activities of daily living. The housekeeping subscale has four items designed to assess knowledge and skill in domestic tasks. The work and leisure subscale consists of three items to measure an individual s ability to use community resources. The six items included in the financial management subscale assess a person s ability to earn and use money appropriately. Prior research on the psychometric properties of the SSCE is limited. As shown in Table 1, the range of possible scores on individual items varies from 0 to 2 (for the procurement of supplemental income item) to 0 to 27 (for the first aid item). Zero reflects good performance or no errors. Summary scores are calculated for each of the four subscales. The four subscale scores are summed for a maximum total score of 160. Clark and Peters (1993) report percentile scores based on normative data from 30 males and 37 females without disabilities, which the authors maintain can be used to identify where an individual falls on a functional needs skill development dysfunctional continuum. Reliability of the SSCE was examined using this Table 1. Score Ranges and Testing Methods for the 18 Items of the Scorable Self-Care Evaluation. Testing Method Item Score Range Rater Task Self- Knowledge- Observation Completion Report Based Personal Care Items Initial appearance 0 8 X Orientation 0 7 X Hygiene 0 6 X Communication 0 6 X First aid 0 27 X Housekeeping Items Food selection 0 26 X Household chores 0 10 X Safety 0 6 X Laundry 0 15 X Work and Leisure Items Leisure activity 0 4 X Transportation 0 4 X X X Job-seeking skills 0 4 X Financial Management Items Making change 0 4 X Checking 0 11 X Paying personal bills 0 5 X Budgeting 0 10 X Procurement of supplemental income 0 2 X Source of income 0 6 X Note. A score of zero reflects good performance or no deficits or errors, whereas a higher score denotes an increasing number of deficits. The American Journal of Occupational Therapy 325

same sample of adults without disabilities. Although item to total correlations ranged from.07 to.65, subscale to total correlations for the 4 subscales were.52 to.75. Test retest reliability, calculated on 10 individuals from the sample over a 2-month period, ranged from.05 to 1.00 for subscale scores. Interrater reliability among five raters ranged from.50 to 1.00 for the subscale and total scores. No further validity or reliability data on the SSCE have been reported. As this summary makes clear, although the SSCE has relevant content for persons with psychiatric disabilities, the published psychometric evidence supporting the measure is not strong. Among the problems with the SSCE are the limited normative data and the fact that the data were not generated using a sample of adults with psychiatric disabilities, the primary population for whom the measure was intended. Also problematic is that the scoring system of the SSCE appears to lack any conceptual underpinning. That is, the authors provide no rationale for the varying ranges of possible scores across the items. Typically, when test items are weighted differently it is because certain items are believed to be more important to the underlying construct being measured than others. Thus the current scoring system implies that certain, high score items (e.g., first aid or food selection) are both more difficult and more important to self-care performance than other low score items (e.g., procurement of supplemental income or hygiene), yet no empirical or conceptual support for this implication is provided. Although psychometrics are not strong, the SSCE appears to have clinical value in that the items cover a range of relevant ADL and IADL activities for adults with psychiatric disabilities. Moreover, the SSCE has a structured set of performance-based items that can be quickly and inexpensively administered in a clinical setting. Accordingly, we undertook a more rigorous evaluation of the psychometric properties of the measure. Specifically, we examined interrater and internal consistency reliabilities and construct validity of the SSCE among a group of adults with psychiatric disabilities using both traditional and contemporary approaches. As an initial assessment of construct validity of the SSCE, we examined both convergent and discriminant validity. Convergent validity is typically established by examining how well a measure correlates with other measures of the same construct. Discriminant validity is established by demonstrating a lack of significant correlations with measures of unrelated constructs (Polgar, 2003). Because research indicates that deficits in ADL/IADL functioning among people with serious mental disorders are related to cognitive impairments, we expected to find moderate correlations between the SSCE and measures of higher order cognitive functioning. Methods Study Design The current study used data collected as part of a larger study examining the predictive validity of the Cognitive Disabilities Model (Henry, McCraith, & St. Germain, 2001). The larger study involved a prospective follow-up of adults with serious mental disorders, and included baseline and two waves of follow-up data collection. In the current study, we used data collected at baseline and at the first follow-up, which occurred 1 to 6 months after baseline. Participants The participants were 70 adults with serious mental disorders who were part of the larger study. Participants were recruited from three acute inpatient units in two general hospitals, one state hospital, and one day program in a community mental health center in a mid-sized New England city. Institutional Review Board approval for the study was obtained from all sites, and occupational therapists at each site recruited and obtained written informed consent from participants. The participants included 38 men and 32 women, ages 18 to 62 years (M = 36.89 years). Sixty-one (87.1%) participants were Caucasian and over half (n = 41, 58.6%) were never married. The primary diagnoses of the participants were schizophrenia (n = 23, 32.9%), bipolar disorder (n = 23, 32.9%), and major depression (n = 7, 10%). Seventeen participants (24.3%) had other diagnoses including post-traumatic stress disorder and dissociative identity disorder. Thirty-seven (52.9%) participants had at least some post-high school education, and the rest had a high school education or less. Only 19 (27.1%) participants were working for pay at any type of job (including supported employment) at or just prior to the time of study intake. Forty participants (57.1%) were living independently in the community at or just prior to study intake, while the rest were living in supported housing, group homes, or hospitals. Procedure Baseline measures were administered to participants within 1 month of study intake. Demographic and diagnostic data were collected by occupational therapists at each site via record review. Trained research assistants administered two well-known neurocognitive measures at baseline, the Wisconsin Card Sorting Test (Heaton, 1993) and the Logical Memory I and II subtests of the Wechsler Memory Scale (Wechsler, 1987). At 1 to 6 months after study intake, the research assistants administered the SSCE to participants. This follow-up data collection took place at each participant s desired location including private homes, the 326 May/June 2004, Volume 58, Number 3

research center where the study was housed, day programs, or hospital if the person was hospitalized at the time of follow-up. Measures The Wisconsin Card Sorting Test (WCST) (Heaton, 1993) is a widely used measure of higher order cognitive functioning. Higher WCST scores have predicted better community function (Jaeger & Douglas, 1992) and work-related behaviors (Lysaker, Bell, & Beam-Goulet, 1995) among people with serious mental disorders. The WCST requires participants to match cards that vary in the color, shape, and number of symbols they display to four key cards. Several scores are generated reflecting correct responses, errors and the ability to switch and maintain conceptual set. For this study, only the number of correct categories score from the WCST was used. Scores range from 0 to 6, with a high score denoting better performance. Research assistants were trained in the administration of the WCST by a consulting neuropsychologist; a computer program was used to generate scores. The Logical Memory I (LM-I) and II (LM-II) subtests of the Wechsler Memory Scale-Revised (Wechsler, 1987) were used to assess memory functions. The LM-I and LM- II measure immediate and delayed recall, respectively. Participants are asked to retell each of two brief stories; a higher score indicates better recall of details. Wechsler (1987) reported average internal consistency and test retest reliability scores and strong interrater reliability scores for LM-I and LM-II subscales. The research assistants were trained to administer the LM-I and LM-II by the consulting neuropsychologist, and interrater reliability of the assistants rating 6 subjects was examined using intraclass correlation coefficients (ICCs =.98 for LM-I and.94 for LM-II). The Scorable Self-Care Evaluation (SSCE) (Clark & Peters, 1993), described in detail above, takes approximately 45 minutes to administer. Research assistants were trained in the administration of the SSCE by the principal investigator for the larger study. As noted previously, in the original scoring system for the SSCE certain items with many scale points carry much more weight than others. Because there is no conceptual or psychometric basis for this approach, we standardized item scores to weight each item equally, as described below. Data Analysis Summary data on demographic and diagnostic variables were generated using the Statistical Analysis System (SAS), version 8.2 (SAS Institute, 2001). The initial step in conversion of the SSCE data was to transform all item raw scores to T scores (mean = 50, SD = 10) to create a uniform scale across the 18 items. These T scores were used to examine the psychometric properties of the SSCE. Pilot data were collected while training the research assistants in the administration of the SSCE for the larger study. These data were used to examine inter-rater reliability among the four research assistants, who rated the same 5 individuals on the SSCE. Reliability was examined using intraclass correlation coefficients (ICC 2,1) (Shrout & Fleiss, 1979). Internal consistency of the SSCE, including indices of item difficulty, redundancy, distribution, and misfit characteristics were examined using the Rasch item response model (Wright & Stone, 1979) with the MiniStep computer program (Linacre, 2001). The SSCE authors indicate that the 18 items delineate a single construct measuring performance of self-care tasks. The Rasch model assesses how well the items fit this unidimensional continuum. Items that fit well are good representations of the underlying construct, while items that misfit may be poor representations or may represent a different construct altogether (Bond & Fox, 2001). In order to perform the Rasch analysis, we needed to further transform the T scores. Because Rasch analysis requires whole numbers, each T score was rounded to the nearest multiple of 10 and truncated to the second digit to reduce the values to a single-digit ordinal scale (L. H. Ludlow, personal communication). For example, a T score of 52 was transformed to a score of 5. Rasch analysis converts these ordinal data to an interval level continuum. We used the partial credit model, which does not require an assumption that scoring categories have the same interpretation from one item to the next (Wright & Masters, 1982). The Rasch analysis generated mean square fit statistics (MnSq) and a variable map. The Infit MnSq is used to examine the extent of randomness present in the scale as a test of item and person fit. The ideal value for the Infit MnSq is 1.0, and values above 1.4 are identified as misfit (Linacre, 2001). Misfitting values indicate that the item or person does not perform as predicted by the probabilistic model generated from the analysis. Infit MnSq values less than 0.6 indicate redundancy in the model (Bond & Fox, 2001). The variable map translates the total score on a particular scale into a profile of probable item responses that are located on a 0 100 criterion-referenced continuum of function (Coster, Ludlow, & Mancini, 1999). The 0 100 continuum is a transformation of the Rasch logit estimates. The items are represented in order of increasing difficulty, with a unique scaling distribution for each item. Since a lower score indicates better performance on the SSCE, the variable map described below (see Figure 1) shows that scores (truncated T scores) closer to the left are more difficult to obtain and scores farther to the right are easier (Bond The American Journal of Occupational Therapy 327

& Fox, 2001; Coster, Denny, Haltiwanger, & Haley, 1998). Lastly, in order to examine convergent and discriminant (construct) validity of the SSCE, we calculated Spearman rank-order correlation coefficients between the baseline demographic, diagnostic and neurocognitive data and the SSCE subscale and total scores. Results Table 2. Item Calibrations and Fit Statistics. Item Calibration Standard Error Infit Mnsq Zstd Hygiene 63.6 4.3 1.0-0.1 Supplemental income 61.9 4.1 1.0-0.1 Orientation 58.0 2.6 1.1 0.3 Job seeking skills 53.3 2.0 1.1 0.2 Initial appearance 53.3 1.6 1.4 1.2 Food selection 50.1 1.4 1.1 0.5 Making change 49.3 1.2 1.2 0.9 Paying bills 48.9 1.3 0.9-0.9 Chores 48.9 1.4 1.1 0.5 Laundry 48.5 1.3 0.8-1.4 Transportation 48.5 1.6 0.9-0.9 Safety 48.3 1.6 1.0-0.1 Source of income 47.7 1.3 1.1 0.7 Check writing 47.1 1.2 0.7-1.6 Communication 43.6 1.3 0.7-2.0 Leisure 43.6 1.3 1.3 1.6 Budget 43.3 1.5 1.1 0.7 First aid 42.3 1.4 1.0-0.1 Person reliability:.71 Item Reliability:.88 Person Real Separation: 1.56 Item Real Separation: 2.68 The interrater reliability coefficients were high for all four subscale (ICC s ranging from.96 to 1.00, p s <.001), and for the total scores (ICC =. 98, p <.001). Table 2 provides the item fit statistics, calibration (item difficulty), and standard errors for the 18 items of the SSCE. Item reliability was 0.88 and person reliability was 0.71. The total scores of 67 of the 70 participants were identified as valid, indicating that their scores fit the predictive model across the items. The MnSq statistics indicate that no items misfit, although, initial appearance approaches misfit (Mnsq = 1.4). Based on item calibrations, hygiene (63.6) was the easiest item, and first aid (42.3) was the most difficult. The variable map (Figure 1) shows the range and distribution of the truncated T scores (values ranging from 3 to 9) for each of the 18 items. These are spread along a 0 100 functional continuum, which is based on logits. The variable map shows that the ratings and distribution for each item are unique. For example, the scores on first aid span across a broad range on the continuum of function (1 to 72), while the scores on check writing discriminate people only within a narrow range (30 to 56). The distribution within each item also varies. It is harder to score a 4 on safety than on first aid, even though first aid is a more difficult item overall. Similarly, less change is needed to improve from a 7 to a 6 than from an 8 to a 7 on the making change item. First aid, safety, transportation, and food selection items showed the greatest distribution across the functional continuum. Hygiene and supplemental income showed a more narrow distribution and limited scores. Item Hygiene Supplemental income Orientation Job skills Initial appearance Food selection Make change Paying bills Chores Laundry Transportation Safety Income source Check writing Communication Leisure Budget First aid Continuum of Function Figure 1. The variable map shows the range and distribution of scores (truncated t scores) for each of the 18 items along a 0 100 continuum, which is based on logits. On this map, lower scores on the continuum indicate better functioning. Sixty-seven persons fit the data and were used in this analysis. 328 May/June 2004, Volume 58, Number 3

Spearman correlations between baseline demographic, diagnostic, and neurocognitive data and SSCE subscale and total scores are presented in Table 3. Means for the cognitive measures were 3.03 (SD = 2.33, range 0 6) for the WCST categories, 18.77 (SD = 7.69, range 0 25) for LM I, and 13.89 (SD = 7.60, range 0 25) for LM-II. Neither the SSCE subscale nor the total score was significantly correlated with age, education, or diagnosis. Women performed slightly better than men on the personal care subscale (r = -.29, p =.015). The correlation between financial management and independent living situation was weak (r = -.24, p =.05). However, the correlations between SSCE scores and working status at study intake were moderately strong for housekeeping (r = -.31, p =.01), financial management (r = -.35, p =.003), and total (r = -.34, p =.004) scores. The WCST correlated significantly with all SSCE subscales at a moderate level (r s = -.30 to -.34, p s <.05), except for work and leisure (r = -.22). The WMS-R LM-I and LM-II subscales, on the other hand, were most highly correlated with the housekeeping (r = -. 49 and -. 53, p =.0001) and work and leisure (r = -.32 and -.32, p =.009) subscales, and the SSCE total score (r = -.30 and -.35, p =.01). Discussion This study examined inter-rater and internal consistency reliabilities of the SSCE. For both types of reliability, Polgar (2003) suggests that acceptable reliability levels vary according to the purpose of the assessment. Measures used to make significant decisions about a person should have a minimum reliability level of 0.90, while measures that are intended to serve as a screening device should have a minimum level of reliability of 0.80. A minimum reliability level of 0.60 is generally considered unacceptable for clinical use, but is suggested as appropriate for a measure that is administered to a group, as in research. Clark and Peters (1993) stated that the SSCE was designed to obtain a baseline profile to screen individuals as well as guide treatment goals and discharge planning. The former purpose requires a minimum reliability level of 0.80, while the latter requires at least 0.90. However, the SSCE is intended to encompass a broad spectrum of performance, and a lower level of reliability may be tolerable. Interrater reliability was very strong, and is comparable to reliabilities previously reported by Clark and Peters (1993). This high interrater reliability was achieved in the context of significant training in the SSCE administration provided to the research assistants in this study. While the scoring criteria for most items are sufficiently defined in the SSCE manual, the principal investigator and the research assistants developed additional scoring criteria for specific items during training. Generalization of this reliability estimate is limited to raters who undergo similar training. Internal consistency reliability indicates the extent to which the items collectively measure the same underlying construct. The Rasch analysis indicated that the item reliability was 0.88, which is within the acceptable range for measurements used for descriptive or screening purposes and nears the criterion for measures used to make major decisions about an individual such as establishing discharge placement or service allocation (Polgar, 2003). The results further indicated that no items were excessively redundant or misfitting. In addition to providing data on how the items work together, Rasch analysis also describes the individual items. For example, hygiene does not appear to spread out sufficiently, as it is only represented by two points in this sample. Although it is not written as a dichotomous item, it appears that intermediate ratings are not distinctive enough to be utilized by raters. Items such as first aid, safety, food selection, transportation, and budgeting appear to capture Table 3. Correlations Between Demographics, Diagnosis, and Cognitive Measures and SSCE Subscale and Total Scores. Personal Care Housekeeping Work/Leisure Financial Total Demographics/Diagnosis Age.11 -.06 -.11.06.08 Gender -.29* -.06 -.02 -.18 -.20 Education -.06 -.12 -.18 -.06 -.09 Living Independently -.18 -.07 -.05 -.24* -.20 Working at Study Intake -.21 -.31 -.16 -.35** -.34** Diagnosis.21.18 -.02.15.18 Cognitive Measures WCST-Categories -.33** -.30* -.22 -.33** -.34** LM-I -.21 -.49*** -.32** -.27* -.30** LM-II -.29* -.53*** -.32** -.26* -.35** * p <.05 ** p <.01 *** p <.001 Note. Gender: 1 = male, 2 = female; Education: 0 = completed high school or less, 1= completed at least some college; Living Independently: 1 = yes, 0 = no; Working at Study Intake: 1 = yes, 0 = no; Diagnosis: 1 = psychotic disorder, 0 = nonpsychotic disorder. WCST-Categories = Wisconsin Card Sort Test-Categories Completed. LM-I and LM-II = Logical Memory I and II subtests of the Wechsler Memory Scale. For WCST-Categories, LM-I and LM-II, higher score = better performance. For SSCE subscales and total score, lower score = better performance. The American Journal of Occupational Therapy 329

most of the participants in the middle of their scales, which better approximate a bell-shaped curve. The total set of individual scores for these items is distributed relatively symmetrically (skewness < 1.0). Items with more scale points, such as first aid or safety, allow for more variability in the scoring and a more descriptive representation of performance in that area. In addition to reliability, this study also examined the construct validity of the SSCE by examining its correlations with demographics, diagnosis, and two well-established cognitive measures. In general, there was no relationship between age, education, or diagnosis and SSCE scores. However, people who were working at study intake tended to score better on the SSCE, and participants living independently performed better on the financial management subscale. Living independently and working immediately prior to study intake are likely to be indicators of higher functioning individuals. It may be that individuals who live independently are more familiar with financial management tasks, such as check writing and budgeting. Furthermore, women performed better on the personal care subscale; this is consistent with research that men with psychiatric disabilities tend to have slightly lower self-care skills (Chaves, Seeman, Mari, & Maluf, 1993; Paradiso & Robinson, 1998). As expected, the relationship between SSCE and cognitive measures generally was moderate across all subscales. The WCST-categories score was significantly related to all subscale and total scores on the SSCE except for the work and leisure subscale, and was most highly correlated with the financial management and personal care subscales. This cognitive measure is particularly sensitive to frontal lobe dysfunction, such as difficulty with problem solving and impoverished thinking (Morrison-Stewart, Williamson, Corning, & Kutcher, 1992). The financial management subscale items of making change, checking, paying personal bills, and budgeting, and the personal care subscale items of first aid and communication are performance-based and require problem-solving, organization, and decision-making skills to complete them successfully. In contrast, the items of the work and leisure subscale are posed in an interview format, and the directive questions require memory recall, but little problem solving is involved. Although other studies have found relationship between work functioning and WCST (Lysaker, Bell, & Beam-Goulet, 1995), this study did not support these findings. It is likely that the work and leisure items do not tap into the type of cognitive function that the WCST measures. The WMS-R LM-I and LM-II subscales, on the other hand, focus on measuring immediate and delayed recall, respectively. Memory functions such as information storage and retrieval, recall, and sequencing events are components of these subscales. The housekeeping subscale of the SSCE was most strongly related to the WMS-R LM-I and LM-II subscales. A closer examination of the items that comprise the housekeeping subscale reveals that they involve a higher degree of recall, sequencing, and memory organization functions. For example, food selection requires an individual to recall what he has eaten in the past 3 days. Household chores and laundry also require recall memory and sequencing of tasks. People with psychiatric disabilities that score poorly on the housekeeping subscale may have underlying problem in these cognitive functions. This preliminary psychometric research on the SSCE suggests that the measure has potential as a reliable and valid performance measure when used with persons with serious mental illness. For the most part, the scoring criteria are clearly defined and the measure can be administered in a reasonable amount of time, pointing to its potential clinical utility. Moreover, the Rasch analysis suggests that the task items fit a unidimensional construct. The significant relationships between the SSCE and cognitive measures highlight the validity of having a functional performance evaluation sensitive to the unique qualities of persons with serious mental illness. This study supports the literature that cognitive impairments accompany psychiatric disabilities. Furthermore, persons scoring lower on the cognitive measures tended to score lower on the SSCE, indicating that ADL and IADL performance may be dependent upon a person s cognitive status. The SSCE appears to uniquely capture functional limitations related to executive functioning, which supports the use of this measure with the population used in this study. The process of developing an assessment begins with a clear definition of the construct being assessed (DePoy & Gitlow, 1998). Clark and Peters (1993) describe the SSCE as a unique and useful instrument that measures an individual s performance of self-care tasks. However, they do not define how competent performance is manifest. On the SSCE, functional performance is evaluated through an individual s awareness of self-care norms, recall of self-care information, report of frequency of participation in selfcare activities and performance of self-care tasks. The items in an assessment should reflect a consistent operational definition of functional performance and, at present, the SSCE does not meet this standard. Currently, the SSCE is organized into four subscales with a combination of self-report, observation, information-based, and task-based items. The item difficulty information provided in the findings of the current study suggests that the self-report items tend to be easier than the performance-based items overall. Creating some harder self-report questions and restructuring the 330 May/June 2004, Volume 58, Number 3

items into a self-report subscale and a task completion subscale may be a more valid approach. Clark and Peters (1993) suggest that these tasks of the SSCE are important for community living. Although the measure incorporates activities such as budgeting, laundry, making change, and paying bills, other important activities may have been overlooked. For example, the community living training units provided by Johnson and Orichowskyj (1999) include IADLs not examined in the SSCE, such as consumer buying, banking, catalogue shopping, postal service, library service, and vacation planning. Updating the scoring criteria could enhance some of the existing items. For example, food selection does not use current nutrition standards as a basis for its scoring criteria, and budgeting does not use a current price guide for monthly expenses. Technological advances and societal changes since the development of the SSCE also need to be considered. For example, automated telephone systems are prevalent in current society, and an item to examine navigating through such a system may be an important addition to the measure. Revision of the SSCE should be guided by contemporary models of disablement. For example, the conceptual framework offered by the International Classification of Functioning, Disability, and Health (ICF) may offer guidelines for revising the SSCE to most meaningfully capture relevant activity performance and capacity among people with serious mental disorders (World Health Organization, 2002). The most significant limitation of the SSCE, however, is that the subscales and items are weighted unequally across the measure, and summing the components risks over- or under-emphasizing certain aspects of functioning that the measure intends to describe. In the current design heavily weighted items that are not necessarily more important have extra pull in subscale and total scores. Furthermore, raw summary scores indicate the number of items correctly performed and whether an individual s performance is higher or lower, but they do not provide exact interval measurements (Davies & Gavin, 1999). Thus the total score may be misleading as an overall statement of performance. For example, a total score of 66 may be a composite of high and low item scores, or it may be composed of consistent scores across the items (Merbitz, Morris, & Grip, 1989). In the present study, data were transformed in order to correct for unequal weighting of items. However, this procedure cannot realistically be applied in a clinical setting. The SSCE would clearly benefit from the development of a conceptually-sound rating scale that is consistent across items. The findings from this study may be used to revise the current SSCE. Based on the analyses presented in this paper, it is recommended that occupational therapists be very aware of the purposes, strengths, and limitations of the SSCE administered in a psychiatric clinical setting. Acknowledgments The authors thank Tara St. Germain, the project coordinator for the larger study from which these data were drawn, and Elizabeth Henrikson, for providing training to the research assistants in administration and scoring of the WCST, LMI and LMII. We also thank Gary Bedell and Larry Ludlow for their assistance with Rasch methodology, data transformation, and interpretation. This study was supported in part by a grant to Dr. Henry from the American Occupational Therapy Foundation. The study was completed in partial fulfillment of the Master of Science degree in occupational therapy, awarded to Mrs. Tarbell, at Boston University. References Allen, C. K., & Blue, T. (1998). Cognitive disabilities model: How to make clinical judgments. In N. Katz (Ed.), Cognition and occupation in rehabilitation: Cognitive models for intervention in occupational therapy. (pp. 225 279). Bethesda, MD: American Occupational Therapy Association. Bond, T. G., & Fox, C. M. (2001). Applying the Rasch Model. Fundamental measurement in the human sciences. Mahwah, NJ: Earlbaum. Brown, C., Moore, W. P., Hemman, D., & Yunek, A. (1996). Influence of instrumental activities of daily living assessment method on judgments of independence. American Journal of Occupational Therapy, 50, 202 206. Chaves, A. C., Seeman, M. V., Mari, J. J., & Maluf, A. (1993). Schizophrenia: Impact of positive symptoms on gender social role. Schizophrenia Research, 11, 41 45. Clark, E. N., & Peters, M. (1993). Scorable self-care evaluation, revised. Tucson, AZ: Therapy Skill Builders. Coster, W., Ludlow, L., & Mancini, M. (1999). Using IRT variable maps to enrich understanding of rehabilitation data. Journal of Outcome Measurement, 3, 123 133. Davies, P. L., & Gavin, W. J. (1999). Quantitative research series Measurement issues in treatment effectiveness studies. American Journal of Occupational Therapy, 53, 363 372. DePoy, E., & Gitlow, L. (1998). Cognitive rehabilitation research in occupational therapy: A critical review. In N. Katz (Ed.), Cognition and Occupation in Rehabilitation: Cognitive Models for Intervention in Occupational Therapy, (pp. 343 358). Bethesda, MD: American Occupational Therapy Association. Diamond, P. T., Felsenthal, G., Macciocchi, S. N., Butler, D. H., & Lally-Cassady, D. (1996). Effect of cognitive impairment on rehabilitation outcome. American Journal of Physical Medicine and Rehabilitation, 75, 40 43. Evans, J. D., Heaton, R. K., Paulsen J. S., Palmer, B. W., The American Journal of Occupational Therapy 331

Patterson, T., & Jeste, D. V. (2003). The relationship of neuropsychological abilities to specific domains of functional capacity in older schizophrenia patients. Biological Psychiatry, 53(5), 422 430. Green, M. F. (1996). What are the functional consequences of neurocognitive deficits in schizophrenia? American Journal of Psychiatry, 153(3), 321 330. Green, M. F., Kern, R. S., Braff, D. L., & Mintz, J. (2000). Neurocognitive deficits and functional outcome in schizophrenia: Are we measuring the right stuff? Schizophrenia Bulletin, 26(1), 119 136. Heaton, R. K. (1993). Wisconsin Card Sorting Test manual. Odessa, FL: Psychological Assessment Resources. Henry, A. D., McCraith, D., & St. Germain, T. L. (2001). The usefulness of the Cognitive Disabilities Model in predicting community functioning. Paper presented at the 81st American Occupational Therapy Association Annual Conference, Philadelphia, PA. Johnson, R. P., & Orichowskyj, R. M. (1999). Out in the world: A community living skills manual, second edition. Abstract retrieved April 25, 2001, from http://www.idyllarbor.com/ books/b304.htm Kern, R. S., & Green, M. F. (1998). Cognitive remediation in schizophrenia. In K. T. Mueser & N. Tarrier (Eds.), Handbook of social functioning in schizophrenia (pp. 342 354). Boston: Allyn & Bacon. Kuntz, C. (1995). Persons with severe mental illness: How do they fit into long-term care. Office of Disability, Aging and Long-Term Care Policy and U.S. Department of Health and Human Services. Retrieved July 28, 2002, from http://aspe.hhs.gov/daltcp/home.htm Linacre, J. M. Ministep Rasch Measurement, V3.3. Retrieved November 18, 2001, from http://www.winsteps.com Lysaker, P., Bell, M., & Beam-Goulet, J. (1995). Wisconsin Card Sorting Test and work performance in schizophrenia. Psychiatry Research, 56, 45 51. Merbitz, C., Morris, J., & Grip, J. C. (1989). Ordinal scales and foundations of misinference. Archives of Physical Medicine and Rehabilitation, 70, 308 312. Morrison-Stewart, S. L., Williamson, P. C., Corning, W. C., & Kutcher, S. P. (1992). Frontal and non-frontal lobe neuropsychological test performance and clinical symptomatology in schizophrenia. Psychological Medicine, 22, 353 359. Neistadt, M. E. (1994). A meal preparation treatment protocol for adults with brain injury. American Journal of Occupational Therapy, 48, 431 438. Paradiso, S., & Robinson, R. G. (1998). Gender differences in poststroke depression. Journal of Neuropsychiatry and Clinical Neurosciences, 10, 41 47. Polgar, J. M. (2003). Critiquing assessments. In E. B. Crepeau, E. S. Cohn, & B. A. Schell (Eds.), Willard & Spackman s occupational therapy (10th ed., pp. 299 313). Philadelphia: Lippincott Williams & Wilkins. Rogers, J. C., & Holm, M. B. (2003). Activities of daily living and instrumental activities of daily living. In E. B. Crepeau, E. S., Cohn, & B. A. Schell (Eds.), Willard & Spackman s occupational therapy, (pp. 315 339). Philadelphia: Lippincott Williams & Wilkins. Schretlen, D., Jayaram, G., Maki, P., Park, K., Abebe, S., & DiCarlo, M. (2000). Demographic, clinical, and neurocognitive correlates of everyday functional impairment in severe mental illness. Journal of Abnormal Psychology, 109, 134 138. SAS Institute. (1999 2001). Statistical Analysis System. Cary, NC: SAS Institute. Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420 428. Twamley, E. W., Doshi, R. R., Nayak, G. V., Palmer, B. W., Golshan, S., Heaton, R. K., et al. (2002). Generalized cognitive impairments, ability to perform everyday tasks, and level of independence in community living situations of older patients with psychosis. American Journal of Psychiatry, 159(12), 2013 2020. Velligan, D. I., Bow-Thomas, C. C., Mahurin, R. K., Miller, A. L., & Halgunseth, L. C. (2000). Do specific neurocognitive deficits predict specific domains of community function in schizophrenia? Journal of Nervous & Mental Disease, 188(8), 518 524. Wechsler, D. (1987). Wechsler Memory Scale Revised. New York: The Psychological Corporation. World Health Organization. (2002). International Classification of Functioning, Disability and Health. Geneva: World Health Organization. Wright, B. D., & Linacre, J. M. (1989). Observations are always ordinal; measurements, however, must be interval. Archives Physical Medicine and Rehabilitation, 70, 857 860. Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA Press. Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago: MESA Press. 332 May/June 2004, Volume 58, Number 3