REHABILITATION RESEARCHERS and clinicians are

Size: px
Start display at page:

Download "REHABILITATION RESEARCHERS and clinicians are"

Transcription

1 384 A Computer Adaptive Testing Simulation Applied to the FIM Instrument Motor Component Marcel P. Dijkers, PhD From the Department of Rehabilitation Medicine, Mount Sinai School of Medicine, New York, NY. Supported by the National Institute on Disability and Rehabilitation Research, Office of Special Education and Rehabilitative Services, US Department of Education (grant nos. H133N50008, H133N ). No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit upon the author(s) or upon any organization with which the author(s) is/are associated. Reprint requests to Marcel P. Dijkers, PhD, Mount Sinai School of Medicine, Dept of Rehabilitation Medicine, Box 1240, One Gustave Levy Pl, New York, NY , marcel.dijkers@mountsinai.org /03/ $30.00/0 doi: /apmr ABSTRACT. Dijkers MP. A computer adaptive testing simulation applied to the FIM instrument motor component. Arch Phys Med Rehabil 2003;84: Objective: To determine whether computer adaptive testing (CAT) can be used to decrease the number of FIM instrument motor component items administered in assessing persons with spinal cord injury (SCI). Design: For a CAT simulation, a 3-step algorithm was used to select 6 FIM items for each individual; items were selected according to the subject s motor ability as estimated by 2 initial items. Separate estimates of motor ability for admission, discharge, and follow-up data (plus combined time points) derived from 6 items were compared statistically with estimates derived from 14 items (walking and wheelchair mobility were split). Setting: Records from the Spinal Cord Injury Model Systems (SCIMS). Participants: Patients served by the SCIMS, for whom complete motor FIM information was available for rehabilitation admission (N 5969), discharge (N 5964), or follow-up at a first or later anniversary (N 5176). Interventions: Not applicable. Main Outcome Measures: Similarity of mean, standard deviation, skewness, kurtosis, and Rasch reliability and separation of persons and items based on 6 and 13 items; intraclass correlation coefficient (ICC) for parallel estimates. Results: Calibrations for FIM items and FIM steps differed for the 3 time points, but showed sufficient agreement (ICC,.90) that combined calibration was feasible. Means and other distribution characteristics differed minimally between the 6- and 13-item estimates. The person and item separations and reliabilities were somewhat lower and the mean measurement errors somewhat higher for the 6-item estimates, but only marginally so. ICCs between 6- and 13-item estimates were.95 or higher. Conclusion: CAT can be used to reduce data collection time; the level of precision of estimates is minimally less than that provided by traditional assessment approaches. Key Words: Outcome assessment (health care); Questionnaires; Rehabilitation; Reproducibility of results; Spinal cord injuries by the American Congress of Rehabilitation Medicine and the American Academy of Physical Medicine and Rehabilitation REHABILITATION RESEARCHERS and clinicians are increasingly confronted with the problem that fairly good measures of many patient or subject characteristics exist, but the time available to administer them is limited. A possible solution is individually customized tests containing carefully selected items that provide maximum information on the person to whom they are administered. New methods in fundamental measurement, and the wide availability of computers, facilitate such adaptive testing. This article shows the overall approach of computer adaptive testing (CAT). I used the FIM instrument to show how an algorithm for item selection and administration is developed. If an instructor were to administer to a class of college students taking a calculus course a midterm test that consisted entirely of third-grade arithmetic problems, 2 predictable results would ensue: all students would get an A, and the instructor would not obtain any useful information about their knowledge of mathematics, either absolutely or relative to each other. Conversely, if a teacher used a college calculus examination as the year-end test for third-grade arithmetic, we would not be surprised to get a class full of disappointed (if not bewildered) students, all with a grade of F, while the teacher would gain no information about how much mathematics knowledge the year s instruction had produced. In situations like these, it is easy to see that the difficulty of a test should be matched to the ability of the test takers, or no useful information will result. What holds true at the group level applies at the individual level. If the test contains no items uniquely attuned to the performance abilities of each level of test taker, the test takers cannot be differentiated. The arithmetic genius in third grade will get the same grade as the very good students an A unless the test contains items that only a mathematical prodigy can handle. The best and most efficient testing, of knowledge as well as of all other characteristics (eg, personality traits, functional ability), is done when the items have a level of difficulty that is approximately at the level of ability (strength of the characteristic, construct, or latent trait ) of the person being measured. 1,2 In that way, each additional item on the test or measure provides the maximum amount of information. If we know each item s difficulty level and we know which items a person passed and which ones he/she failed, a close estimate of the strength of the characteristic of interest is possible. This approach calls for individualized tests, rather than the one-sizefits-all approach typical of classical test theory, in which all subjects are administered exactly the same (lengthy) test or battery of tests. 2 We need an approach in which test difficulty is tailored to the (anticipated) skill level of the test takers. If no a priori basis exists for estimating that level, an algorithm can be used in which everyone starts off with an item of intermediate difficulty. Depending on a subject s success or failure on the first item, an easier or more difficult item is provided, until the appropriate zone of trait strength (eg, knowledge, functional ability, attitude) is reached. Next, the subject is given a

2 COMPUTER ADAPTIVE TESTING AND THE FIM INSTRUMENT, Dijkers 385 short series of additional items or questions at that level. This eliminates the chance of error and establishes that the subject s exact knowledge or ability level has been identified. 3,4 Theoretically, individualized testing can be done with paperand-pencil tests (1 item administered at the time, the next item selected from a pool with known difficulty level). In practice that approach is impossible, especially when groups of subjects are measured at the same time (eg, school end-of-term examinations). The alternative is CAT, which has a 30-year history in education. 5-9 In the CAT approach, banks of true-false and multiple choice items are developed. Their difficulty level is determined by administering them to large numbers of students of the same grade level(s) as the eventual CAT test takers. Computer programs present these questions 1 at a time on the test-takers computer screens; they respond by using the keyboard. Basing its logic pathway on the result for each item (pass or fail), an algorithm selects for each individual student easier or more difficult questions from the bank, until 1 of 2 conditions are met: (1) a prespecified number of questions has been administered (fixed test-length CAT), or (2) the student s competence level has been determined with a preset level of reliability. In the latter condition, the number of items administered may vary considerably between test takers, depending on the consistency of the test taker and other factors. 10 CAT has several advantages, the 2 most important of which are total test length can be shortened (typically by 50%, sometimes by more), and competence levels are known with much greater precision, not just for the middle band of students, but also for students with very low or very high skill (equiprecision). 11 The higher test security, instant scoring, and immediate feedback to the student that computers make possible are considered to be part of the secondary benefits of CAT. 3,10,12 CAT is now used outside of education, for instance, in credentialing examinations of health professionals. 13 CAT is only possible because of another innovation that started in education: Rasch analysis of tests. Rasch theory and analysis focus on the individual test item, whereas in classical test theory analysis focuses on the entire test. The classical approach has given us a multitude of mechanisms to evaluate the quality of a test as a whole, such as the Cronbach. But Rasch analysis concentrates on the relation between the latent trait to be measured and the specific items assembled as presumptive indicators of that trait. The goal of the analysis is to determine if the items all reflect the same dimension, and to find for each item the likelihood of endorsement by people with varying amounts of the trait. The analysis typically results in estimates of item difficulties (termed calibrations), and estimates of the abilities of the persons to whom the test was administered. Rasch analysis also provides diagnostic indices that indicate the degree to which individual items do not belong in the test (eg, a geography question in a math test), and the degree to which individual persons do not belong in the group being tested (eg, the bright child who aces the arithmetic test, except for all questions involving fractions, because he/she was sick when that topic was covered). In recent years, a number of introductory expositions, all nontechnical, of Rasch theory and analysis have been published in the medical rehabilitation literature. 16 For those who have grounding in statistics, there are more advanced accounts. 17,18 Rasch analysis has been used in rehabilitation research, but CAT has not. The likely reason is that applications of Rasch analysis in rehabilitation have focused on functional assessment. Investigators have used it to analyze the output from functional assessment instruments that were already in wide use before the introduction of these new analytical techniques. These studies have shown that the activities of daily living (ADLs) included in the 3 most commonly used measures, the FIM instrument, the Level of Rehabilitation Scale (LRS), and the Patient Evaluation and Conference System (PECS), constitute a single dimension, and that the individual ADLs differ in difficulty level, from the easiest (generally: feeding) to the most difficult (always: climbing stairs). The number of ADLs included in these instruments is generally 15 or fewer seemingly not enough to make it worthwhile to do complicated analyses to determine whether targeting item selection at the patient s ability level may lead to test shortening. Furthermore, FIM, LRS, and PECS data commonly are the product of a comprehensive assessment at the time of rehabilitation admission or discharge. In the production of such data, issues of clinical relevance, completeness of assessment, and turf division between disciplines are all interwoven, making it unlikely that CAT or a paper-and-pencil analog would be introduced. However, other situations exist in which functional assessment instruments are used. Specifically, they are used in research. The only identified medical rehabilitation application of CAT 22 was a simulation intended to determine whether the motor FIM as administrated by means of a standardized questionnaire as part of research follow-up can be shortened. The results of that study indicated that with spinal cord injury (SCI) subjects, a motor FIM consisting of 7 items can be used, instead of the standard 13, with only a minuscule loss of reliability. The time savings from a typical 15 to 20 minute FIM interview may not be in proportion to the reduction in number of items, but even a 5-minute decrease is a gain. The same efficiency improvement may be achieved in studies in which the FIM is administered as a performance test: the subject is required to perform all items, at least in simulated fashion (eg, toileting). A recent study 23 indicated that in an SCI sample such testing took, on average, 60 minutes. The purpose of the present study was to replicate the earlier simulation study, 22 but using rehabilitation admission and discharge data in addition to follow-up information, and with other modifications to improve on the results and show the possibility of CAT in rehabilitation. METHODS Source of Data The National Spinal Cord Injury Database (NSCID) is a data set developed by the Spinal Cord Injury Model Systems (SCIMS) funded by the National Institute on Disability and Rehabilitation Research. 24,25 Currently, the NSCID has 16 contributing systems; the data set used in the present study includes cases submitted by some of Model Systems that are no longer funded. NSCID Form I is used to collect epidemiologic, demographic, medical, and functional information as well as data on initial hospitalization in SCIMS hospital(s) before the patient is definitely discharged to the community. It includes the motor component of the FIM, as rated by therapy staff at admission to and discharge from the rehabilitation hospital or unit. NSCID Form II is completed annually, approximately on the anniversary of injury. It is used to collect information on the person s status as of the anniversary and events occurring in the preceding year. The motor FIM 26 is part of the data collected; the information is self-reported or proxy-reported in an interview by a data collector who uses the FIM Guide 27 or some other standard approach to ask questions that produce the information needed to determine FIM item scores.

3 386 COMPUTER ADAPTIVE TESTING AND THE FIM INSTRUMENT, Dijkers Table 1: Results of the Item (Primary) Motor FIM Rasch Analysis: Item Weights and Step Weights Single Time-Point Calibration Combined Time- Item Measures Admission Discharge Follow-Up Points Calibration FIM Item Logit SE Logit SE Logit SE Logit SE Feeding A Grooming B Bathing C Dressing: upper body D Dressing: lower body E Toileting F Bladder management G Bowel management H Bed/chair/WC transfer I Toilet transfer J Tub/shower transfer K Walking L Wheelchair mobility L Stairs M Step Average Measures FIM Step Logit Logit Logit Logit Total assistance Maximal assistance Moderate assistance Minimal assistance Supervision or set-up Modified independence Complete independence Abbreviation: WC, wheelchair. Case Selection The data set available in September 1999 was used. Case selection was performed separately for admission, discharge, and follow-up FIM analyses. Each case included for the admission data analysis had complete (nonmissing) and valid information on all 13 FIM items, as well as valid information on the common mode of locomotion. (The SCIMS database does not adhere to the Uniform Data System rule that a FIM item is scored 1 if the person is not tested; a separate missing code is used. However, because the admission and discharge data in the present study were derived from clinical files, it is possible that some values of 1 indicate not tested, rather than total assistance. ) I followed the same rules for the discharge and follow-up analyses. Motor FIM information was available for 5968 admissions and 5964 discharges. Follow-up information was available for 5176 anniversaries; some persons contributed multiple anniversaries. The analysis for all time points combined included 17,108 records. Creation of 2 Locomotion Variables Ability to move about by the mode that is most commonly used by the person (walk or wheelchair or both) is recorded on the FIM, listed in variable L, locomotion. An auxiliary variable is used to indicate for which specific mode information is recorded in the L variable. In the current analysis, item L was split into L1 (walking) and L2 (wheelchair use) based on the value of the auxiliary variable; persons who used both means about equally were assigned the score for walking, because that is the more difficult item. The fact that par force walking and wheelchair are alternatives to one another means that, if a person has a valid code for walking, he is coded missing for wheelchair use, and vice versa. Primary Rasch Analysis The rating scale model is a Rasch analysis technique for investigating (1) the difficulty level of a scale s ordinal level items, (2) the discriminating qualities of the individual items and of the scale as a whole, (3) the fit of each item within the scale, and (4) the spread of items along the dimension they define. 15,18,28,29 Rasch analysis calibrates the scale items according to difficulty level, and the individual calibrations can be combined to form meaningful interval-level measures for precise statistical analysis. Besides providing item difficulty estimates and their standard error (SE), Rasch analysis also produces estimates of each person s ability level, along with an SE for that estimate. All parameters are scaled and reported in logits (log odds), but can be translated into any desirable units. For purposes of simplicity, the logit was retained as the unit in all analyses reported here. I performed a first Rasch analysis on the entire item motor FIM for all subjects by using the rating scale model. Basing my decisions on the results of this primary analysis (table 1), I developed an algorithm to select, for each case, appropriate items to be included in the shortened motor FIM. Treatment of Data Collected at Separate Times If separate Rasch analyses are performed for data collected at different times for the same test and sample, discrepant calibrations may result. In effect, the ruler changes and it is not clear whether the different scores for the same person reflect real change. In a reasonable proposal for a simple algorithm, Chang and Chan 30 described 4 possible approaches to Rasch analysis in situations in which data for multiple time points are available and investigators are interested in overtime comparisons: (1) perform separate analyses of the data

4 COMPUTER ADAPTIVE TESTING AND THE FIM INSTRUMENT, Dijkers 387 Fig 1. Item measures and step average measures (in logits) for admission, discharge, and follow-up, based on single time-point calibrations. obtained on different occasions; (2) pool the data and perform a single Rasch analysis; (3) pool the data and perform 1 Rasch analysis that considers the parallel items for different occasions to be distinct items (eg, when there are admission and discharge FIM data, there would be motor FIM items); and (4) consider time a facet, and perform a 3-facet analysis (of item, person, occasion). Chang and Chan suggest that analysts start with the first alternative, and move to the second if the item calibrations for the various time points are reasonably similar, as indicated, for example, by an intraclass correlation coefficient (ICC) of.90 or more. For more discrepant calibrations, the third approach is suggested, with the fourth reserved for special situations. This strategy should result in a parsimonious approach that gives a reasonable chance that a constant ruler is used. In the present study, I observed a reasonable similarity among calibrations (table 1, fig 1), so only the first 2 approaches were necessary. I call calibrations based on data for 1 time point single time-point calibrations. The calibration based on data for the 3 time points combined I call the combined time-points calibration. Item Selection Based on Estimated Ability by Using an Algorithm Because previous research 22 found that a 7-item motor FIM is adequate, and even a 5-item motor scale may be satisfactory, I decided to use a 6-item FIM instrument in the present investigation. Because walking and wheelchair locomotion have such different levels of difficulty (see table 1), especially at follow-up, separate algorithms were developed for walkers and wheelchair users (table 2). Even though the calibrations for the 4 primary analyses (admission, discharge, follow-up, combined) differed somewhat, the overall similarity of the profiles (fig 1) suggests that there is no need for separate item sequences for each time point in the algorithm used to select items for a shortened motor FIM. My general procedure (a variation on what Weiss 6 termed the pyramidal adaptive test algorithm) was the same in both instances. It is shown here for walkers (table 2). Bladder management (item G), an item of average difficulty (as determined in the primary Rasch analyses), was selected as the starting point. If a subject had high ability (raw score of 5) on bladder management, then bowel management (item H), a fairly difficult item, was selected next. If he/she again showed high ability (supervision/setup or better), the 4 most difficult items (J, K, L, M) were selected in addition to the initial 2. On the other hand, if on bladder management a person had a score of 4 or lower, indicating low motor ability, the algorithm next selected a fairly easy item, dressing upper body (item D). Based on her/his performance on this item, either an easy or a somewhat more difficult set of items was selected. Thus, 4 ability groups were distinguished by the algorithm, and for each walker and each wheelchair user an optimal set of items was selected in this manner. The item selection for the medium low and medium high groups was very similar, differing mostly because of the second item selected. Secondary Rasch Analysis The set of 6 items selected in this way was next analyzed with program Bigsteps a by using the rating scale model. In Table 2: Algorithm for Selecting a Set of 6 Motor FIM Items Group Ability Subgroup Item Item Additional Items WC users Low G L2 A B D I Medium low G L2 C D E I Medium high G H C F J K High G H F J K M Walkers Low G D A B C I Medium low G D C E F I Medium high G H E F J K High G H J K L M Abbreviations: FIM items, see table 1.

5 388 COMPUTER ADAPTIVE TESTING AND THE FIM INSTRUMENT, Dijkers Item Table 3: Mean and SD for All Motor FIM Items (Raw Scores), by Time Period Letter Admission Mean SD Discharge Mean SD Follow-Up Mean SD Feeding A Grooming B Bathing C Dressing: upper body D Dressing: lower body E Toileting F Bladder management G Bowel management H Bed/chair/WC transfer I Toilet transfer J Tub/shower transfer K Walking L Wheelchair mobility L Stairs M Motor FIM raw total* No. of cases (except for L1, L2) No. of cases for walking (%) 681 (11.4) 1357 (22.8) 1263 (24.4) No. of cases for WC mobility (%) 5287 (88.6) 4607 (77.2) 3913 (75.6) * Includes walking or wheelchair mobility. these secondary analyses (1 for each time point, 1 for the combined time-points calibration), each item included in the 6-item FIM motor scales was anchored by using as anchoring values the results obtained from the primary analysis of the full set of 14 items for the corresponding time point, as displayed in table 1. Anchoring imposes a set of parameters on the calibrations produced by Rasch analysis. In the present case, the program was forced to take, for each FIM item, the difficulty level (averaged across the 7-rating scale steps) estimated in the full-set (14 motor FIM items) analysis, and for the difficulty of the item steps relative to each other, the estimates produced in the same analysis (see table 1). Analysis of Agreement Between and 6-Item Motor FIM The real root mean square error item and person reliability estimate, item and person separation, and mean error of ability estimates for persons calculated by the Bigsteps program for the secondary analysis were used as indices of the 6-item FIM s potential to reproduce the full item FIM s estimate of the person s motor ability. The person ability estimates produced by the secondary analysis were also compared with the estimates produced in the analysis of the full set, using the 2 following statistical sets. Mean, standard deviation, kurtosis, and skewness. These characteristics of the distribution of motor ability should be minimally affected by substituting a FIM items subset for the item motor FIM. Intraclass correlation coefficient. ICC measures the agreement between 2 estimates of a characteristic of subjects (as opposed to the correlation coefficient, which measures congruence only). ICC 3,1 was the most appropriate of the 6 ICC variations for use here; it assumes that the individual subset of motor FIM items is the unit of analysis (rather than a mean of multiple subsets), and that there is no generalization from this subset to others that potentially could be used. The ICC was calculated for all subjects combined and for 4 subgroups. The latter were created by dividing the total group of subjects into 4 quartiles of about equal size, based on their item motor FIM ability measures. A measure based on a good subset of FIM items is expected to have a high agreement with the item measure, not only overall but also in each of the quartiles distinguished. As indicated earlier, the reason I used combined time-points calibration was to make sure that over-time comparisons were valid, that is, the same measuring stick was used. To determine whether estimates of change were affected by substituting 6 FIM items for the standard 13 (14 with the mobility item split), I calculated over-time changes (for individuals with multiple records in the database) in 2 ways: as a difference between item motor FIM estimates based on the combined timepoints calibration and as a difference between 6-item estimates. Depending on the available data, I calculated for each case the score differences between admission and follow-up, between follow-up and first anniversary, and between first and second anniversary of the injury. Bigsteps was used for the primary and secondary Rasch analyses. All other statistics were calculated by using SPSS, version 10.0, b and a macro written for SPSS. RESULTS The means and standard deviations (SDs) of the subjects scores on the 14 motor FIM items are listed in table 3. These raw scores are provided so that readers have some basis for linking logit values to the more familiar FIM categories. Considering the differences in item difficulty calibrations (see table 1), one should be cautious about comparing means across time. Also, the cases are not necessarily the same individuals from 1 time point to the next. However, on almost every item, the mean score was higher at discharge than at admission, and modest additional gains were reflected in the follow-up scores. At admission, 681 persons (11.4% of the total) used walking as their primary means of locomotion. They scored 2.6 on average, whereas the average score for wheelchair users was 2.3. The percentage of subjects who used walking for locomotion increased to 22.8% at discharge. Primary Rasch analysis results are in table 1. Feeding was the easiest item, and stairs the most difficult, by far. Walking was relatively difficult (typically, only persons with a complete SCI at a very low level or with motor-functional incomplete

6 COMPUTER ADAPTIVE TESTING AND THE FIM INSTRUMENT, Dijkers 389 Fig 2. Mean measures (based on 6-item FIM) for the 4 ability subgroups among wheelchair users (left) and walkers (right). Error bars (for admission and follow-up only) indicate the SD for each subgroup. injury regain walking), and wheelchair locomotion was relatively easy. The divergence between the 2 grew over time. Because of the large sample size, all items had small SEs. The difficulty calibrations of the various FIM rating scale categories, which had the expected increase from the easiest (total assistance) to the most difficult (complete independence), also had a small SE. The single time-point calibrations are in figure 1. Because of the separate calibrations, items can only be compared with each other at the same time point, and not across time points. Despite differences for specific items, especially at the time of follow-up, the overall trends are very similar, for both item and step calibrations. The ICC (model 2,1) for admission, discharge, and follow-up calibrations was.91 for the items and.93 for the steps. This similarity of calibrations was the reason for developing just 1 item-selection algorithm (table 2) instead of 4 separate algorithms (for admission, discharge, follow-up, combined time-points data). However, I used both the single time-point and the combined calibrations (see table 1) to calculate a motor ability estimate based on the 6 items selected for each case. Mean ability measures for the 4 subgroups of persons who used a wheelchair and of those who walked are in figure 2. For walkers, not much difference existed between the 2 intermediate groups, as was expected given the algorithm. In all instances, however, the low group differed clearly from the 2 medium groups, which in turn, on average, had lower motor ability than individuals in the high group. Results were very similar when means for the item FIM were plotted. Table 4 shows how the 6-item motor FIM subset performed on the specified criteria for adequacy of estimating person ability. When using single time-point calibrations, the mean and SD as well as skewness and kurtosis of the original (14 item) distribution of person ability were well reproduced (columns 2, 4, 6 vs 1, 3, 5, respectively). Person reliability as estimated by the Rasch analysis program was almost as good for the 6-item FIM as for the full-length instrument. The mean (across cases) error in estimating motor ability was somewhat higher when only 6 items were used, but not dramatically so. Item reliability in both instances was perfect (1.00), although item separation decreased. The agreement between the original measure and the 6-item estimate as quantified by the ICC was.95 or higher. The agreement between the 2 was lowest within the third quartile, at all 3 time points. This was because the third quartile was the narrowest, at 1.02 logits or less, between the upper and lower cutoff point, compared with at least 2 logits for the first and fourth quartile. When the data sets for the 3 time points were joined for a single calibration, the 6-item FIM estimate of motor ability also reproduced the item motor FIM very well (see columns 7 and 8, table 4). Person reliability and ICC were marginally higher because of a wider spread of abilities being included (compare the row of SDs). Because what is optimal for all records combined may not be optimal for the records from a single time point (table 1 and fig 1 indicate discrepancies in calibrations), the Rasch analysis of the data for 3 separate time points was repeated by using anchoring at the values produced by the combined time-points calibration. (Thus, values in fig 4, columns 10, 12, and 14 result from an analysis that used anchoring in the calibration underlying column 7). The ICC and other analyses were repeated for the 3 time points separately but with the ability estimates included based on the combined calibration. These results are in columns 9 through 14 of table 4. The estimates for the mean, SD, kurtosis, and skewness were very similar to those produced with the single time-point calibrations; the same can be said for the ICC values. For reliability for persons and items, relatively small decreases were observed. (The small increase in item separation for the admission data is not easily explainable.) In fact, the basis for calibration selected did not make much difference at all. The last 6 columns of table 4 are based on the step and item calibrations produced by an analysis to which the admission, discharge, and follow-up data, respectively, contrib-

7 390 COMPUTER ADAPTIVE TESTING AND THE FIM INSTRUMENT, Dijkers Indicator Table 4: Indicators of the Quality of the 6 Item Motor FIM Instruments, by Calibration Basis Single Time-Point Calibrations Combined Time-Points Calibration Admission Discharge Follow-Up Combined Admission Discharge Follow-Up Item 6-Item Item 6-Item Item 6-Item Item 6-Item Item 6-Item Item 6-Item Item 6-Item (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) Mean SD Skewness Kurtosis Rasch reliability: persons Separation Reliability Mean measurement error Rasch reliability: items Separation Reliability ICC All cases combined st quartile * * nd quartile rd quartile th quartile No. of cases , * Statistic cannot be calculated. All cases have the same value for the ability estimate based on the item FIM and the 6-item FIM. Six-item estimates are based on anchoring in item calibrations; see text. uted. If the ICC is calculated for the correspondence between the motor ability estimates at admission based on 6 versus 14 FIM items, but the calibrations resulting from the analysis of the discharge data are used to anchor calculation of the estimates, the ICC is.98. In fact, the lowest ICC value observed when using another time point s calibration results was.96 (which is higher than the ICC for follow-up data based on follow-up calibrations). Thus, the calibration divergences in table 1 and figure 1 are not very important the overall hierarchy of items is largely the same, and minor differences affected motor ability estimates minimally. The agreements (ICC) between the over-time changes, calculated by using calibration estimates from item and 6-item combined time points, were quite high:.93 for the change from admission to discharge (n 5962 cases),.94 for the change from discharge to the first anniversary (n 1412), and.89 for the change from the first to the second anniversary (n 722). DISCUSSION In a previous study, 22 I evaluated 5 methods of selecting from the existing 13 FIM motor items a subset of items that can be used to estimate SCI patients motor ability. That research appraised subsets of 5, 6, and 7 items by using only follow-up data from the NSCID, which at the time contained a somewhat smaller number of cases. Of the 5 item-selection methods, the algorithm approach (similar to that used in the present study) produced the best results overall. This method generally generated the least change in mean, SD, kurtosis, and skewness of the distribution of motor ability estimates. It offered the best reliability, whether ICC, the concordance correlation coefficient, or the limits of agreement approach was used to assess it. All 3 indicated an almost perfect agreement with the 13-item estimate of motor ability, especially for the 7-item set. The superior performance of the algorithm method (CAT) was the reason why I used CAT in the present study. The present study differed from the earlier in 3 basic ways: (1) it featured a simplified selection algorithm that used only 4 ability groups instead of 7 as in the earlier study; (2) it distinguished between mobility types (splitting of the walking/wheelchair item) to avoid a mismatch between item difficulty and person ability, which can result from combining these 2 into a single FIM item; and (3) it offered analysis of admission and discharge data in addition to analysis of follow-up information. The use of a simplified algorithm certainly seems feasible: the ICC value for follow-up (.98) found in the present study is the same as that for the previous study (.98). The Rasch person and item reliability measures (.89 and 1.00, respectively) were also similar in the 2 studies. The improvement resulting from the splitting of the walking/ wheelchair item may have counteracted any loss of precision of estimates resulting from a simpler algorithm. Use of information on each subject s typical mode of mobility made more precise estimation possible, especially for follow-up. In the follow-up data, the difference between walking and wheelchair in difficulty level (average over all item steps) was 1.96 logits, versus only.67 and 1.27 logits for admission and discharge, respectively. Further improvement in creating short versions of the motor FIM (and in Rasch analysis of the full-length FIM) may be possible by introducing similar splits for the transfer items. Although there is no auxiliary variable for these to indicate how a transfer is performed, it is reasonable to assume that persons who walk make a transfer by using a standing pivot or a similar easy method. Persons who get around by wheelchair, on the other hand, transfer exclusively by means of their arms, supplemented by a sliding board or similar aid if they are not completely dependent on a helper. The problem lies in the term exclusively : the FIM auxiliary variable refers only to the common form of locomotion, and there is no claim that the persons marked as wheelchair users cannot walk or vice versa. The algorithm approach to selecting a subset of motor FIM items worked about as well for admission and discharge data as

8 COMPUTER ADAPTIVE TESTING AND THE FIM INSTRUMENT, Dijkers 391 for follow-up data (table 4), even though the relevant information was collected by completely different mechanisms: observer rating versus self-report. Generally, admission and discharge FIM data available in the NSCID (and other national rehabilitation databases, eg, those of the Traumatic Brain Injury Model Systems and the Burn Injury Model Systems) are a byproduct of the clinical process: they are collected for program evaluation purposes and borrowed from the institutional files for the research. In some institutions, the FIM is used for additional clinical purposes: patients rehabilitation treatment outcome goals may be set in terms of FIM motor items. This adds to the institutional value of admission and discharge detail information. As long as clinicians need and administrators expect complete FIM data, shortening of the item set in clinical settings is not likely. Another issue arguing against using an algorithm approach in current clinical practice is the fact that completion of the FIM items is commonly split over most knowledgeable disciplines for instance, occupational therapists score feeding and dressing, and nurses score bowel and bladder management. Even with an integrated computer system, the timing needed to produce a 6-item set, with specific item selection based on prior results, may not be feasible in a typical clinical setting. However, instances exist in which the FIM is administered (as a performance measure of ADL ability) by research staff 31,32 ; in these cases, the algorithm approach as a way to reduce the total time required makes sense. A compromise may be needed between the optimal sequence as dictated by an algorithm and the logical order of testing (eg, toileting tested in conjunction with toilet transfer), but the potential for shorter testing time is there. The time saved can be used to reduce costs, to collect more information in other areas, or to more carefully collect FIM information on the items selected. As in the previous simulation research on the application of an algorithm approach, 22 the present results lead to the recommendation that any research or program evaluation effort that uses the full motor FIM, collected by means of subject or proxy interview, should consider using the algorithm approach to reduce the data collection effort. Rasch analysis has been claimed to produce results that are sample-independent and item-independent. 33 That is to say, if a different sample is used, even one with a significantly higher or lower average ability, item calibrations should come out about the same. Vice versa, if a different set of items (measuring the same concept or latent trait) is used for the same group, person calibrations (ability estimates) should be unchanged. However, in the present case, item calibrations and step calibrations differed from 1 time point to another. The discrepancy was the reason for providing the 3 separate analyses in addition to the combined time-points analysis. Given the large sample sizes, sampling error is an unlikely explanation for the discrepancies shown in figure 1. It may be that unequal percentages of not tested masquerading as total assistance play a role. These likely are nonexistent for the follow-up data; however, as indicated, some may have found their way into the admission and discharge data. However, the discrepancies in item calibration at the 3 time points may be attributable to different behavior of the therapists (becoming easier raters for some items, tougher for others, from admission to discharge). The step calibration divergence for admission and discharge data offers an apparently simple explanation: it appears that therapists, on average, change their standards from admission to discharge. Rehabilitation professionals have suggested that the pressures on programs, especially clinical staff, to show results would create this situation, even though the FIM training and mastery testing process is aimed at standardizing how people evaluate patient performance. 34 Rumors circulate that bonus systems in place in (for-profit) institutions almost guarantee that staff will engage in down-coding at admission and up-coding at discharge. The present data suggest the opposite. Every FIM step had a lower logit value (average over 14 items) for admission than for discharge (see table 1, fig 1). The item difficulties (average over scale steps) in table 1 and figure 1 indicate that for 6 of 14 items raters were less lenient at discharge than at admission. The difference was especially large (.30 logits) for stairs and dressing lower body. Previous studies have produced data that are in line with the doomsayers predictions, 30,35 and it is unclear why the situation is different in the current data set. It may be that SCI rehabilitation constitutes a special case. Alternative explanations for the divergent calibrations may be that patients had changes in the quality of their performance or differences in scoring methodology (professional rating vs self-report by persons with SCI). More research is needed to determine the cause(s) of these step- and item-calibration divergences. However, the order of steps vis-à-vis each other was robust, the order of items fairly so, and interchanging item calibrations did not affect the relative or even absolute estimates of subject motor ability ICCs were in the.95 to.99 range, whatever basis for calibrations was used. If additional research indicates that combined time-points calibration of FIM items is not feasible, hard problems in assessing change in rehabilitation patients over time will need to be addressed. Chang and Chan s third and fourth approaches 30 to Rasch analysis of data for various time points do not truly offer an alternative. They constitute at best a patch to fix a calibration divergence. If analyses of other data sets, preferably with other diagnostic groups, indicate that calibration discrepancies are indeed more than a marginal issue, much additional research will be needed. Previous authors who applied Rasch analysis to the FIM have noted how the bladder and bowel items are relatively misfitting. However, most of them did not see this as a reason to eliminate those 2 from their set of items used to estimate motor ability. The reason for the poor fit likely is the fact that the FIM as used in these studies combines into a single bladder (or bowel) score 2 discrepant characteristics: status of the sphincter (as indicated by the frequency of incontinence) and self-care ability (capacity to perform activities needed for hygienic elimination of waste). Although these were assessed separately (on 7-point scales), only the lowest (most impaired) score was recorded and available for analysis. The bowel and bladder items had a somewhat marginal fit in the current analysis (using traditional Rasch infit and outfit criteria), but they were not eliminated because of continuity with the earlier research. Fit is always relative, and traditionally, in both research and clinical applications, bladder and bowel scores are added to the scores for the other motor FIM items to obtain a total score. The dilemma of how to deal with these 2 FIM items has been resolved by the federal Centers for Medicare and Medicaid Services (CMS; the former Health Care Financing Administration). In the prospective payment system for inpatient medical rehabilitation, introduced per January 2002, CMS requires sphincter status and self-care ability to be separately assessed and recorded on claims, although the ultimate total FIM score calculation still uses the lower of the 2. Unfortunately, the definition for the various levels of incontinence have been changed from the old FIM, somewhat handicapping analysis of pre-2002 FIM data jointly with more recent FIM data. In the present study, all subjects had been administered all 13 (14) FIM items previously. The investigation focused on

9 392 COMPUTER ADAPTIVE TESTING AND THE FIM INSTRUMENT, Dijkers what would have happened if adaptive testing had been used instead, and as such is a simulation. The version of CAT simulated here is fairly simple, with fixed entry (all cases received item G first) and fixed length (all received 6 items in total). By using the 3-step algorithm in table 2, the FIM can easily be administered in paper-and-pencil format. More sophisticated procedures are possible, however, 6 and may be implemented when a computer is used to select items. For instance, the starting item may be selected based on known characteristics 4 (eg, neurologic status). For a person with complete tetraplegia, item D (dressing upper) may be a good starting point, and for someone with incomplete paraplegia, item F (toileting). The next items to be picked could be targeted based on a finer differentiation than scoring either 5 or more or 4 or less on the initial FIM item. In the previous CAT simulation study, 22 3 groups were used: 1 or 2; 3, 4, or 5; and 6 or 7. In fact, the optimal strategy is to calculate the subject s ability after each new item has been administered, and selecting as the next item the one that provides the most information about the subject s true ability level, given the estimated level. 4,6 Last, there is no need to have a fixed number of items. One common CAT strategy is to calculate for each person being tested the SE of measurement after administration of each new item and to terminate the test when the error drops below a prespecified level. In the case of the FIM, administration of items might in some instances go beyond 6. More items might be used for individuals who behave inconsistently for instance, those who have unusual spinal injury syndromes, or who get confused when answering interviewer questions. Programming the computers used for data collection may be worthwhile for larger research projects, in which the FIM and other time-consuming instruments are administered to large numbers of subjects. Researchers in the health-related quality of life (health status) area of research in recent years have discovered Rasch analysis, and have called for the development of item banks that make it possible to use tests that are customized to the person whose quality of life is being measured. 3,4 Their primary interest is in having measures that do not have floor or ceiling effects for patient groups with either very high or very low health status, and that offer equiprecision of measurement across the spectrum. Time savings have not been offered as the only or even primary benefit of customized tests, but all 3 benefits of Rasch analysis and CAT are related. Rasch analysis enables us to determine whether our instrument or item bank includes items that are suitable for the cases at the extremes of the continuum. 2 It also provides information by which an optimal set of items can be selected for all subjects, wherever along that continuum they may be located. Because there is no need to administer items that are too easy or too difficult, testing time can be reduced. Better, more efficient measurement is in our future, if rehabilitation researchers and clinicians accept the challenge of developing the item banks and software needed to implement CAT. The short version of the FIM is just the beginning. Acknowledgments: Thanks to Gwyn Kropp, MS, for preparing a data analysis file, and to Wayne Gordon, PhD, Ralph Marino, MD, MS, and several anonymous reviewers for comments on earlier versions of this article. REFERENCES 1. Nunnally JC. Psychometric theory. New York: McGraw Hill; Bond TG, Fox CM. Applying the Rasch model. Fundamental measurement in the human sciences. Mahwah (NJ): Lawrence Erlbaum Associates; McHorney CA. Generic health measurement: past accomplishments and a measurement paradigm for the 21st century. Ann Intern Med 1997;127: Revicki DA, Cella DF. Health status assessment for the twentyfirst century: item response theory, item banking and computer adaptive testing. Qual Life Res 1997;6: Drasgow F, Olson-Buchanan JB, editors. Innovations in computerized assessment. Mahwah (NJ): Lawrence Erlbaum Associates; Weiss DJ. Adaptive testing by computer. J Consult Clin Psychol 1985;53: Wainer H, editor. Computerized adaptive testing: a primer. 2nd ed. Mahwah (NJ): Lawrence Erlbaum Associates; Luecht RM, Nungester RJ. Some practical examples of computeradaptive sequential testing. J Educ Meas 1998;35: Green BF. Computer-based adaptive testing in Psychol Marketing 1991;8: Waller NG, Reise SP. Computerized adaptive personality assessment: an illustration with the Absorption scale. J Pers Soc Psychol 1989;57: Weiss DJ. Improving measurement quality and efficiency with adaptive theory. Appl Psychol Meas 1982;6: Kreiter CD, Ferguson K, Gruppen LD. Evaluating the usefulness of computerized adaptive testing for medical in-course assessment. Acad Med 1999;74: Bergstrom BA, Lunz ME. CAT for certification and licensure. In: Drasgow F, Olson-Buchanan JB, editors. Innovations in computerized assessment. Mahwah (NJ): Lawrence Erlbaum Associates; p McArthur DL, Cohen MJ, Schandler SL. Rasch analysis of functional assessment scales: an example using pain behaviors. Arch Phys Med Rehabil 1991;72: Velozo CA, Kielhofner G, Lai JS. The use of Rasch analysis to produce scale-free measurement of functional ability. Am J Occup Ther 1999;53: Andiel C. Rasch analysis: a description of the model and related issues. Can J Rehabil 1995;9: Hambleton RK, Swaminathan H, Rogers HJ. Fundamentals of item response theory. Newbury Park (CA): Sage; Andrich D. Rasch models for measurement. Vol 68. Newbury Park (CA): Sage; Linacre JM, Heinemann AW, Wright BD, Granger CV, Hamilton BB. The structure and stability of the Functional Independence Measure. Arch Phys Med Rehabil 1994;75: Velozo CA, Magalhaes LC, Pan AW, Leiter P. Functional scale discrimination at admission and discharge: Rasch analysis of the Level of Rehabilitation Scale-III. Arch Phys Med Rehabil 1995; 76: Kilgore KM, Fisher WP, Silverstein B, Harley JP, Harvey RF. Application of Rasch analysis to the Patient Evaluation and Conference System. Phys Med Rehabil Clin North Am 1993;4: Dijkers MP, Yavuzer G. Short versions of the telephone motor Functional Independence Measure for use with persons with spinal cord injury. Arch Phys Med Rehabil 1999;80: Karamehmetoglu SS, Karacan I, Elbasi N, Demirel G, Koyuncu H, Dosoglu M. The functional independence measure in spinal cord injured patients: comparison of questioning with observational rating. Spinal Cord 1997;35: Richards JS, Go BK, Rutt RD, Lazarus, PB. The national spinal cord injury collaborative database. In: Stover SL, DeLisa J, Whiteneck GG, editors. Spinal cord injury: clinical outcomes from the Model Systems. Rockville (MD): Aspen; p Stover SL, DeVivo MJ, Go BK. History, implementation, and current status of the National Spinal Cord Injury Database. Arch Phys Med Rehabil 1999;80: Hamilton BB, Granger CV, Sherwin FF, Zielezny M, Tashman JS. A uniform national data system for medical rehabilitation. In: Fuhrer M, editor. Rehabilitation outcomes: analysis and measurement. Baltimore: Brookes; p

DIFFERENTIAL ITEM FUNCTIONING OF THE FUNCTIONAL INDEPENDENCE MEASURE IN HIGHER PERFORMING NEUROLOGICAL PATIENTS

DIFFERENTIAL ITEM FUNCTIONING OF THE FUNCTIONAL INDEPENDENCE MEASURE IN HIGHER PERFORMING NEUROLOGICAL PATIENTS J Rehabil Med 5; 7: 6 5 DIFFERENTIAL ITEM FUNCTIONING OF THE FUNCTIONAL INDEPENDENCE MEASURE IN HIGHER PERFORMING NEUROLOGICAL PATIENTS Annet J. Dallmeijer,, Joost Dekker,, Leo D. Roorda,, Dirk L. Knol,,

More information

Using the AcuteFIM Instrument for Discharge Placement

Using the AcuteFIM Instrument for Discharge Placement Using the AcuteFIM Instrument for Discharge Placement Paulette Niewczyk, MPH, PhD Manager of CFAR / Director of Research Center for Functional Assessment Research Uniform Data System for Medical Rehabilitation

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati. Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions

More information

The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests

The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests Mary E. Lunz and Betty A. Bergstrom, American Society of Clinical Pathologists Benjamin D. Wright, University

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

Presented By: Yip, C.K., OT, PhD. School of Medical and Health Sciences, Tung Wah College

Presented By: Yip, C.K., OT, PhD. School of Medical and Health Sciences, Tung Wah College Presented By: Yip, C.K., OT, PhD. School of Medical and Health Sciences, Tung Wah College Background of problem in assessment for elderly Key feature of CCAS Structural Framework of CCAS Methodology Result

More information

Last Updated: July 26, 2012

Last Updated: July 26, 2012 Reviewer ID: Nicole Elfring Type of Outcome Measure: Functional Independence Measure (FIM) Total articles: 27 Author ID Year Davidoff 1990 Segal et al. Marino et al. Yavuz et al. 1998 Study Design Setting

More information

Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker

Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis Russell W. Smith Susan L. Davis-Becker Alpine Testing Solutions Paper presented at the annual conference of the National

More information

EVIDENCE OF THE BENEFITS of medical rehabilitation

EVIDENCE OF THE BENEFITS of medical rehabilitation 100 Course of Functional Improvement After Stroke, Spinal Cord Injury, and Traumatic Brain Injury Rita K. Bode, PhD, Allen W. Heinemann, PhD ABSTRACT. Bode RK, Heinemann AW. Course of functional improvement

More information

Department of Clinical Psychology, National Spinal Injuries Centre, Stoke Mandeville Hospital, Aylesbury, Bucks HP2I BAL, UK.

Department of Clinical Psychology, National Spinal Injuries Centre, Stoke Mandeville Hospital, Aylesbury, Bucks HP2I BAL, UK. Paraplegia 31 (1993) 457-461 1993 International Medical Society of Paraplegia The Functional Independence Measure: a comparative study of clinician and self ratings N Grey BA, P Kennedy MSc C Psychol Department

More information

Author s response to reviews

Author s response to reviews Author s response to reviews Title: The validity of a professional competence tool for physiotherapy students in simulationbased clinical education: a Rasch analysis Authors: Belinda Judd (belinda.judd@sydney.edu.au)

More information

Computerized Mastery Testing

Computerized Mastery Testing Computerized Mastery Testing With Nonequivalent Testlets Kathleen Sheehan and Charles Lewis Educational Testing Service A procedure for determining the effect of testlet nonequivalence on the operating

More information

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky Validating Measures of Self Control via Rasch Measurement Jonathan Hasford Department of Marketing, University of Kentucky Kelly D. Bradley Department of Educational Policy Studies & Evaluation, University

More information

Collaborative Research Grant Initiative: Mental Wellness in Seniors and Persons with Disabilities

Collaborative Research Grant Initiative: Mental Wellness in Seniors and Persons with Disabilities Predicting Potential for Positive Outcomes in a Slow-stream Rehabilitation Program Collaborative Research Grant Initiative: Mental Wellness in Seniors and Persons with Disabilities Ideas Fund Final Report

More information

The need to understand what types and intensities of

The need to understand what types and intensities of Original Contributions Relative Importance of Rehabilitation Therapy Characteristics on Functional Outcomes for Persons With Stroke Rita K. Bode, PhD; Allen W. Heinemann, PhD; Patrick Semik, BS; Trudy

More information

IMPACT ON PARTICIPATION AND AUTONOMY QUESTIONNAIRE: INTERNAL SCALE VALIDITY OF THE SWEDISH VERSION FOR USE IN PEOPLE WITH SPINAL CORD INJURY

IMPACT ON PARTICIPATION AND AUTONOMY QUESTIONNAIRE: INTERNAL SCALE VALIDITY OF THE SWEDISH VERSION FOR USE IN PEOPLE WITH SPINAL CORD INJURY J Rehabil Med 2007; 39: 156 162 ORIGINAL REPORT IMPACT ON PARTICIPATION AND AUTONOMY QUESTIONNAIRE: INTERNAL SCALE VALIDITY OF THE SWEDISH VERSION FOR USE IN PEOPLE WITH SPINAL CORD INJURY Maria Larsson

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 39 Evaluation of Comparability of Scores and Passing Decisions for Different Item Pools of Computerized Adaptive Examinations

More information

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p )

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p ) Rasch Measurementt iin Language Educattiion Partt 2:: Measurementt Scalles and Invariiance by James Sick, Ed.D. (J. F. Oberlin University, Tokyo) Part 1 of this series presented an overview of Rasch measurement

More information

Using Individual Growth Curve Models to Predict Recovery and Activities of Daily Living After Spinal Cord Injury: An SCIRehab Project Study

Using Individual Growth Curve Models to Predict Recovery and Activities of Daily Living After Spinal Cord Injury: An SCIRehab Project Study Archives of Physical Medicine and Rehabilitation journal homepage: Archives of Physical Medicine and Rehabilitation 2013;94(4 Suppl 2):S154-64 ORIGINAL ARTICLE Using Individual Growth Curve Models to Predict

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

Validation of an Analytic Rating Scale for Writing: A Rasch Modeling Approach

Validation of an Analytic Rating Scale for Writing: A Rasch Modeling Approach Tabaran Institute of Higher Education ISSN 2251-7324 Iranian Journal of Language Testing Vol. 3, No. 1, March 2013 Received: Feb14, 2013 Accepted: March 7, 2013 Validation of an Analytic Rating Scale for

More information

alternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over

More information

Chapter 7: Descriptive Statistics

Chapter 7: Descriptive Statistics Chapter Overview Chapter 7 provides an introduction to basic strategies for describing groups statistically. Statistical concepts around normal distributions are discussed. The statistical procedures of

More information

COMPUTING READER AGREEMENT FOR THE GRE

COMPUTING READER AGREEMENT FOR THE GRE RM-00-8 R E S E A R C H M E M O R A N D U M COMPUTING READER AGREEMENT FOR THE GRE WRITING ASSESSMENT Donald E. Powers Princeton, New Jersey 08541 October 2000 Computing Reader Agreement for the GRE Writing

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

PHYSICAL FUNCTION A brief guide to the PROMIS Physical Function instruments:

PHYSICAL FUNCTION A brief guide to the PROMIS Physical Function instruments: PROMIS Bank v1.0 - Physical Function* PROMIS Short Form v1.0 Physical Function 4a* PROMIS Short Form v1.0-physical Function 6a* PROMIS Short Form v1.0-physical Function 8a* PROMIS Short Form v1.0 Physical

More information

Chapter 9. Youth Counseling Impact Scale (YCIS)

Chapter 9. Youth Counseling Impact Scale (YCIS) Chapter 9 Youth Counseling Impact Scale (YCIS) Background Purpose The Youth Counseling Impact Scale (YCIS) is a measure of perceived effectiveness of a specific counseling session. In general, measures

More information

What is Occupational Therapy?

What is Occupational Therapy? Introduction to Occupational Therapy Services What is Occupational Therapy? Alice Chan, OTI Tai Po Hospital a health profession that focuses on promoting health and well being through engagement in meaningful

More information

Conceptualising computerized adaptive testing for measurement of latent variables associated with physical objects

Conceptualising computerized adaptive testing for measurement of latent variables associated with physical objects Journal of Physics: Conference Series OPEN ACCESS Conceptualising computerized adaptive testing for measurement of latent variables associated with physical objects Recent citations - Adaptive Measurement

More information

Prediction of functional outcome after spinal cord injury: a task for the rehabilitation team and the patient

Prediction of functional outcome after spinal cord injury: a task for the rehabilitation team and the patient () 8, 8 ± ã International Medical Society of Paraplegia All rights reserved ± / $. www.nature.com/sc Prediction of functional outcome after spinal cord injury: a task for the rehabilitation team and the

More information

RATER EFFECTS AND ALIGNMENT 1. Modeling Rater Effects in a Formative Mathematics Alignment Study

RATER EFFECTS AND ALIGNMENT 1. Modeling Rater Effects in a Formative Mathematics Alignment Study RATER EFFECTS AND ALIGNMENT 1 Modeling Rater Effects in a Formative Mathematics Alignment Study An integrated assessment system considers the alignment of both summative and formative assessments with

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

25. EXPLAINING VALIDITYAND RELIABILITY

25. EXPLAINING VALIDITYAND RELIABILITY 25. EXPLAINING VALIDITYAND RELIABILITY "Validity" and "reliability" are ubiquitous terms in social science measurement. They are prominent in the APA "Standards" (1985) and earn chapters in test theory

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

Centre for Education Research and Policy

Centre for Education Research and Policy THE EFFECT OF SAMPLE SIZE ON ITEM PARAMETER ESTIMATION FOR THE PARTIAL CREDIT MODEL ABSTRACT Item Response Theory (IRT) models have been widely used to analyse test data and develop IRT-based tests. An

More information

Author's response to reviews

Author's response to reviews Author's response to reviews Title: Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: A preliminary Monte Carlo study Authors: Barth B.

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

Selection of Linking Items

Selection of Linking Items Selection of Linking Items Subset of items that maximally reflect the scale information function Denote the scale information as Linear programming solver (in R, lp_solve 5.5) min(y) Subject to θ, θs,

More information

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory Kate DeRoche, M.A. Mental Health Center of Denver Antonio Olmos, Ph.D. Mental Health

More information

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model Gary Skaggs Fairfax County, Virginia Public Schools José Stevenson

More information

Troy Hillman Manager, Analytical Services Uniform Data System for Medical Rehabilitation

Troy Hillman Manager, Analytical Services Uniform Data System for Medical Rehabilitation Avoiding Confusion between Payment and Quality Items on the New IRF-PAI, Part II: Other Implications Troy Hillman Manager, Analytical Services Uniform Data System for Medical Rehabilitation 2016 Uniform

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective

The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective Vol. 9, Issue 5, 2016 The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective Kenneth D. Royal 1 Survey Practice 10.29115/SP-2016-0027 Sep 01, 2016 Tags: bias, item

More information

1 The conceptual underpinnings of statistical power

1 The conceptual underpinnings of statistical power 1 The conceptual underpinnings of statistical power The importance of statistical power As currently practiced in the social and health sciences, inferential statistics rest solidly upon two pillars: statistical

More information

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such

More information

Research Report. Key Words: Functional status; Orthopedics, general; Treatment outcomes. Neva J Kirk-Sanchez. Kathryn E Roach

Research Report. Key Words: Functional status; Orthopedics, general; Treatment outcomes. Neva J Kirk-Sanchez. Kathryn E Roach Research Report Relationship Between Duration of Therapy Services in a Comprehensive Rehabilitation Program and Mobility at Discharge in Patients With Orthopedic Problems Background and Purpose. The purpose

More information

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian Recovery of Marginal Maximum Likelihood Estimates in the Two-Parameter Logistic Response Model: An Evaluation of MULTILOG Clement A. Stone University of Pittsburgh Marginal maximum likelihood (MML) estimation

More information

Pattern of Functional Change During Rehabilitation of Patients With Hip Fracture

Pattern of Functional Change During Rehabilitation of Patients With Hip Fracture 111 ORIGINAL ARTICLE Pattern of Functional Change During Rehabilitation of Patients With Hip Fracture Nancy K. Latham, PhD, PT, Diane U. Jette, DSc, PT, Reg L. Warren, PhD, Christopher Wirtalla, BA ABSTRACT.

More information

The UK FAM items Self-serviceTraining Course

The UK FAM items Self-serviceTraining Course The UK FAM items Self-serviceTraining Course Course originator: Prof Lynne Turner-Stokes DM FRCP Regional Rehabilitation Unit Northwick Park Hospital Watford Road, Harrow, Middlesex. HA1 3UJ Background

More information

SPINAL CORD INDEPENDENCE MEASURE, VERSION III: APPLICABILITY TO THE UK SPINAL CORD INJURED POPULATION

SPINAL CORD INDEPENDENCE MEASURE, VERSION III: APPLICABILITY TO THE UK SPINAL CORD INJURED POPULATION J Rehabil Med 2009; 41: 723 728 ORIGINAL REPORT SPINAL CORD INDEPENDENCE MEASURE, VERSION III: APPLICABILITY TO THE UK SPINAL CORD INJURED POPULATION Clive A. Glass, PhD 1, Luigi Tesio, MD 2, Malka Itzkovich,

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

HEALTH CARE PROVIDERS are being challenged to

HEALTH CARE PROVIDERS are being challenged to 697 Rasch Analysis of the Gross Motor Function Measure: Validating the Assumptions of the Rasch Model to Create an Interval-Level Measure Lisa M. Avery, BEng, Dianne J. Russell, MSc, Parminder S. Raina,

More information

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, April 23-25, 2003 The Classification Accuracy of Measurement Decision Theory Lawrence Rudner University

More information

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,

More information

Chapter 20: Test Administration and Interpretation

Chapter 20: Test Administration and Interpretation Chapter 20: Test Administration and Interpretation Thought Questions Why should a needs analysis consider both the individual and the demands of the sport? Should test scores be shared with a team, or

More information

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology ISC- GRADE XI HUMANITIES (2018-19) PSYCHOLOGY Chapter 2- Methods of Psychology OUTLINE OF THE CHAPTER (i) Scientific Methods in Psychology -observation, case study, surveys, psychological tests, experimentation

More information

Proceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL)

Proceedings of the 2011 International Conference on Teaching, Learning and Change (c) International Association for Teaching and Learning (IATEL) EVALUATION OF MATHEMATICS ACHIEVEMENT TEST: A COMPARISON BETWEEN CLASSICAL TEST THEORY (CTT)AND ITEM RESPONSE THEORY (IRT) Eluwa, O. Idowu 1, Akubuike N. Eluwa 2 and Bekom K. Abang 3 1& 3 Dept of Educational

More information

Original Article. Client-centred assessment and the identification of meaningful treatment goals for individuals with a spinal cord injury

Original Article. Client-centred assessment and the identification of meaningful treatment goals for individuals with a spinal cord injury (2004) 42, 302 307 & 2004 International Society All rights reserved 1362-4393/04 $25.00 www.nature.com/sc Original Article Client-centred assessment and the identification of meaningful treatment goals

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

Examining Factors Affecting Language Performance: A Comparison of Three Measurement Approaches

Examining Factors Affecting Language Performance: A Comparison of Three Measurement Approaches Pertanika J. Soc. Sci. & Hum. 21 (3): 1149-1162 (2013) SOCIAL SCIENCES & HUMANITIES Journal homepage: http://www.pertanika.upm.edu.my/ Examining Factors Affecting Language Performance: A Comparison of

More information

Functional Independent Recovery among Stroke Patients at King Hussein Medical Center

Functional Independent Recovery among Stroke Patients at King Hussein Medical Center Functional Independent Recovery among Stroke Patients at King Hussein Medical Center Ali Al-Hadeed MD*, Amjad Banihani MD**, Tareq Al-Marabha MD* ABSTRACT Objective: To describe the functional independent

More information

ARE WE MAKING THE MOST OF THE STANFORD HEALTH ASSESSMENT QUESTIONNAIRE?

ARE WE MAKING THE MOST OF THE STANFORD HEALTH ASSESSMENT QUESTIONNAIRE? British Journal of Rheumatology 1996;35:574-578 ARE WE MAKING THE MOST OF THE STANFORD HEALTH ASSESSMENT QUESTIONNAIRE? A. TENNANT, M. HILLMAN, J. FEAR,* A. PICKERING* and M. A. CHAMBERLAIN Rheumatology

More information

RESEARCH ARTICLES. Brian E. Clauser, Polina Harik, and Melissa J. Margolis National Board of Medical Examiners

RESEARCH ARTICLES. Brian E. Clauser, Polina Harik, and Melissa J. Margolis National Board of Medical Examiners APPLIED MEASUREMENT IN EDUCATION, 22: 1 21, 2009 Copyright Taylor & Francis Group, LLC ISSN: 0895-7347 print / 1532-4818 online DOI: 10.1080/08957340802558318 HAME 0895-7347 1532-4818 Applied Measurement

More information

The Use of Rasch Wright Map in Assessing Conceptual Understanding of Electricity

The Use of Rasch Wright Map in Assessing Conceptual Understanding of Electricity Pertanika J. Soc. Sci. & Hum. 25 (S): 81-88 (2017) SOCIAL SCIENCES & HUMANITIES Journal homepage: http://www.pertanika.upm.edu.my/ The Use of Rasch Wright Map in Assessing Conceptual Understanding of Electricity

More information

COGNITIVE FUNCTION. PROMIS Pediatric Item Bank v1.0 Cognitive Function PROMIS Pediatric Short Form v1.0 Cognitive Function 7a

COGNITIVE FUNCTION. PROMIS Pediatric Item Bank v1.0 Cognitive Function PROMIS Pediatric Short Form v1.0 Cognitive Function 7a COGNITIVE FUNCTION A brief guide to the PROMIS Cognitive Function instruments: ADULT PEDIATRIC PARENT PROXY PROMIS Item Bank v1.0 Applied Cognition - Abilities* PROMIS Item Bank v1.0 Applied Cognition

More information

VARIABLES AND MEASUREMENT

VARIABLES AND MEASUREMENT ARTHUR SYC 204 (EXERIMENTAL SYCHOLOGY) 16A LECTURE NOTES [01/29/16] VARIABLES AND MEASUREMENT AGE 1 Topic #3 VARIABLES AND MEASUREMENT VARIABLES Some definitions of variables include the following: 1.

More information

Evaluating and restructuring a new faculty survey: Measuring perceptions related to research, service, and teaching

Evaluating and restructuring a new faculty survey: Measuring perceptions related to research, service, and teaching Evaluating and restructuring a new faculty survey: Measuring perceptions related to research, service, and teaching Kelly D. Bradley 1, Linda Worley, Jessica D. Cunningham, and Jeffery P. Bieber University

More information

Evaluation of the functional independence for stroke survivors in the community

Evaluation of the functional independence for stroke survivors in the community Asian J Gerontol Geriatr 2009; 4: 24 9 Evaluation of the functional independence for stroke survivors in the community ORIGINAL ARTICLE CKC Chan Bsc, DWC Chan Msc, SKM Wong MBA, MAIS, BA, PDOT ABSTRACT

More information

Comparing Vertical and Horizontal Scoring of Open-Ended Questionnaires

Comparing Vertical and Horizontal Scoring of Open-Ended Questionnaires A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to

More information

The Clinical Utility of the Relational Security Explorer. Verity Chester, Research and Projects Associate, St Johns House, Norfolk

The Clinical Utility of the Relational Security Explorer. Verity Chester, Research and Projects Associate, St Johns House, Norfolk The Clinical Utility of the Relational Security Explorer Verity Chester, Research and Projects Associate, St Johns House, Norfolk Overview of Presentation Reflections on definitions and measurement The

More information

Evaluation of the Family - Rated Kinder Infant Development Scale (KIDS) for Disabled Children

Evaluation of the Family - Rated Kinder Infant Development Scale (KIDS) for Disabled Children Jikeikai Med J 2012 ; 59 : 5-10 Evaluation of the Family - Rated Kinder Infant Development Scale (KIDS) for Disabled Children Keiji Hashimoto, Naoko Matsui, Hidemi Yakuwa, and Kohei Miyamura Division of

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

3/5/2014. Longitudinal Databases and Registries. The Spinal Cord Injury Model Systems of Care. Project Design. Project Design

3/5/2014. Longitudinal Databases and Registries. The Spinal Cord Injury Model Systems of Care. Project Design. Project Design The Spinal Cord Injury Model Systems of Care Longitudinal Databases and Registries International SCI Conference: Toward Better Quality of Life Sultan Bin Abdulaziz Humanitarian City Riyadh, KSA Tamara

More information

LEDYARD R TUCKER AND CHARLES LEWIS

LEDYARD R TUCKER AND CHARLES LEWIS PSYCHOMETRIKA--VOL. ~ NO. 1 MARCH, 1973 A RELIABILITY COEFFICIENT FOR MAXIMUM LIKELIHOOD FACTOR ANALYSIS* LEDYARD R TUCKER AND CHARLES LEWIS UNIVERSITY OF ILLINOIS Maximum likelihood factor analysis provides

More information

How are Journal Impact, Prestige and Article Influence Related? An Application to Neuroscience*

How are Journal Impact, Prestige and Article Influence Related? An Application to Neuroscience* How are Journal Impact, Prestige and Article Influence Related? An Application to Neuroscience* Chia-Lin Chang Department of Applied Economics and Department of Finance National Chung Hsing University

More information

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER Introduction, 639. Factor analysis, 639. Discriminant analysis, 644. INTRODUCTION

More information

INTRODUCTION TO ITEM RESPONSE THEORY APPLIED TO FOOD SECURITY MEASUREMENT. Basic Concepts, Parameters and Statistics

INTRODUCTION TO ITEM RESPONSE THEORY APPLIED TO FOOD SECURITY MEASUREMENT. Basic Concepts, Parameters and Statistics INTRODUCTION TO ITEM RESPONSE THEORY APPLIED TO FOOD SECURITY MEASUREMENT Basic Concepts, Parameters and Statistics The designations employed and the presentation of material in this information product

More information

Reliability and validity of the International Spinal Cord Injury Basic Pain Data Set items as self-report measures

Reliability and validity of the International Spinal Cord Injury Basic Pain Data Set items as self-report measures (2010) 48, 230 238 & 2010 International Society All rights reserved 1362-4393/10 $32.00 www.nature.com/sc ORIGINAL ARTICLE Reliability and validity of the International Injury Basic Pain Data Set items

More information

Functional outcome of patients with spinal cord injury: rehabilitation outcome study

Functional outcome of patients with spinal cord injury: rehabilitation outcome study Clinical Rehabilitation 1999; 13: 457 463 Functional outcome of patients with spinal cord injury: rehabilitation outcome study MC Schönherr Rehabilitation Centre Beatrixoord, Haren and Department of Rehabilitation,

More information

INTRODUCTION TO ASSESSMENT OPTIONS

INTRODUCTION TO ASSESSMENT OPTIONS ASTHMA IMPACT A brief guide to the PROMIS Asthma Impact instruments: PEDIATRIC PROMIS Pediatric Item Bank v2.0 Asthma Impact PROMIS Pediatric Item Bank v1.0 Asthma Impact* PROMIS Pediatric Short Form v2.0

More information

Responsiveness, construct and criterion validity of the Personal Care-Participation Assessment and Resource Tool (PC-PART)

Responsiveness, construct and criterion validity of the Personal Care-Participation Assessment and Resource Tool (PC-PART) Darzins et al. Health and Quality of Life Outcomes (2015) 13:125 DOI 10.1186/s12955-015-0322-5 RESEARCH Responsiveness, construct and criterion validity of the Personal Care-Participation Assessment and

More information

Construct Validity of Mathematics Test Items Using the Rasch Model

Construct Validity of Mathematics Test Items Using the Rasch Model Construct Validity of Mathematics Test Items Using the Rasch Model ALIYU, R.TAIWO Department of Guidance and Counselling (Measurement and Evaluation Units) Faculty of Education, Delta State University,

More information

so that a respondent may choose one of the categories to express a judgment about some characteristic of an object or of human behavior.

so that a respondent may choose one of the categories to express a judgment about some characteristic of an object or of human behavior. Effects of Verbally Labeled Anchor Points on the Distributional Parameters of Rating Measures Grace French-Lazovik and Curtis L. Gibson University of Pittsburgh The hypothesis was examined that the negative

More information

Evaluating the Effectiveness of Stroke Rehabilitation: Choosing a Discriminative Measure

Evaluating the Effectiveness of Stroke Rehabilitation: Choosing a Discriminative Measure 92 Evaluating the Effectiveness of Stroke Rehabilitation: Choosing a Discriminative Measure Kim A. Brock, PhD, Patricia A. Goldie, PhD, Kenneth M. Greenwood, PhD ABSTRACT. Brock KA, Goldie PA, Greenwood

More information

Computerized Adaptive Testing for Classifying Examinees Into Three Categories

Computerized Adaptive Testing for Classifying Examinees Into Three Categories Measurement and Research Department Reports 96-3 Computerized Adaptive Testing for Classifying Examinees Into Three Categories T.J.H.M. Eggen G.J.J.M. Straetmans Measurement and Research Department Reports

More information

The Uniform Data System for Medical Rehabilitation Report of Patients with Debility Discharged from Inpatient Rehabilitation Programs in 2000Y2010

The Uniform Data System for Medical Rehabilitation Report of Patients with Debility Discharged from Inpatient Rehabilitation Programs in 2000Y2010 Authors: Rebecca V. Galloway, PT, MPT Carl V. Granger, MD Amol M. Karmarkar, PhD, OTR James E. Graham, PhD, DC Anne Deutsch, RN, PhD, CRRN Paulette Niewczyk, PhD, MPH Margaret A. DiVita, MS Kenneth J.

More information

PLANNING THE RESEARCH PROJECT

PLANNING THE RESEARCH PROJECT Van Der Velde / Guide to Business Research Methods First Proof 6.11.2003 4:53pm page 1 Part I PLANNING THE RESEARCH PROJECT Van Der Velde / Guide to Business Research Methods First Proof 6.11.2003 4:53pm

More information

STROKE REHABILITATION: PREDICTING INPATIENT LENGTH OF STAY AND DISCHARGE PLACEMENT

STROKE REHABILITATION: PREDICTING INPATIENT LENGTH OF STAY AND DISCHARGE PLACEMENT STROKE HKJOT REHABILITATION 2004;14:3 11 STROKE REHABILITATION: PREDICTING INPATIENT LENGTH OF STAY AND DISCHARGE PLACEMENT Fung Mei Ling Background: Stroke is the third leading cause of death in Hong

More information

linking in educational measurement: Taking differential motivation into account 1

linking in educational measurement: Taking differential motivation into account 1 Selecting a data collection design for linking in educational measurement: Taking differential motivation into account 1 Abstract In educational measurement, multiple test forms are often constructed to

More information

Determining Differential Item Functioning in Mathematics Word Problems Using Item Response Theory

Determining Differential Item Functioning in Mathematics Word Problems Using Item Response Theory Determining Differential Item Functioning in Mathematics Word Problems Using Item Response Theory Teodora M. Salubayba St. Scholastica s College-Manila dory41@yahoo.com Abstract Mathematics word-problem

More information

The Functional Outcome Questionnaire- Aphasia (FOQ-A) is a conceptually-driven

The Functional Outcome Questionnaire- Aphasia (FOQ-A) is a conceptually-driven Introduction The Functional Outcome Questionnaire- Aphasia (FOQ-A) is a conceptually-driven outcome measure that was developed to address the growing need for an ecologically valid functional communication

More information

4 Diagnostic Tests and Measures of Agreement

4 Diagnostic Tests and Measures of Agreement 4 Diagnostic Tests and Measures of Agreement Diagnostic tests may be used for diagnosis of disease or for screening purposes. Some tests are more effective than others, so we need to be able to measure

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information