Techniques for Explaining Item Response Theory to Stakeholder

Size: px

Start display at page:

Download "Techniques for Explaining Item Response Theory to Stakeholder"

Blanche Shelton
5 years ago
Views:

1 Techniques for Explaining Item Response Theory to Stakeholder Kate DeRoche Antonio Olmos C.J. Mckinney Mental Health Center of Denver Presented on March 23, 2007 at the Eastern Evaluation Research Society Conference in Absecon, NJ

2 Overview of Presentation Current issues with applying IRT in evaluation Explaining the reasoning for using IRT in evaluation Explaining and involving stakeholder in the IRT analysis Explaing IRT to Stakeholder 2

3 IRT in Evaluation There has been an increase in the application of IRT in evaluation Due to the advantages it provides (Hambleton, Swaminathan, Roger, 1991) Multiple applications of IRT for evaluation purposes: Psychometrics-measurement validation Need to shorten a measurement tool Rank elements of a dimension for difficulty Equating instruments Differential Item Functioning (DIF) Explaing IRT to Stakeholder 3

4 Limitation to applying IRT in Evaluation: Less training in measurement as evaluators/researchers Can teach IRT in a class but difficult to explain to people who do not understand psychometrics and advanced statistics. PROBLEM: It is difficult for evaluators to apply Item Response Theory techniques because of the problems related to: 1. Explaining the advantages of IRT in simple terms 2. Explaining the results of IRT so that stakeholders can be involved in the analysis process Explaing IRT to Stakeholder 4

5 Explaining the Reasoning for Using IRT Why do we want to use a more complex method?

6 Explaining Benefits of IRT Hallmark: Separation of item and person parameters (or item/person invariance) Instead, There are 2 things: 1. Are all of our items equal? Should they all equally contribute to our score or are some questions harder than other? 2. Test created with Classical Test Thoery can be very reliable (i.e. very consistent for a person to score the same, if measured twice) and not measure all of our participants well Explaing IRT to Stakeholder 6

7 Are Items Equal? Example for assumptions of CTT: Question 1: Question 2: I feel sad often I think about suicide often Question 3: I sleep more than usual Question 4: I want to be alone more than usual 1 point 1 point 1 point 1 point Example for assumptions of IRT: Question 1: Question 2: I feel sad often 1/2 point I think about suicide often Question 3: I sleep more than usual 3/4 point Question 4: I want to be alone more than usual 1 1/4 point 1 1/2 points Explaing IRT to Stakeholder 7

8 True Depression More Depressed Test level on Trait MMPI- Depression Section Beck s Depression Inventory Could create a measure and depending on who you give it to, it may be created for a specific population MMPI measures a more depressed sample better Beck s measure a less depressed sample better Less Depressed With out IRT would not know the level of the trait that we are measuring (AKA. Which measure is harder) Explaing IRT to Stakeholder 8

9 Explaining IRT results to Stakeholder You can use IRT analysis and involve stakeholder!!!

10 IQ Hard 4.0 Response- G (Super-dupper smart)-(3.0) Scaling Example with IQ Assuming an order that A is a very easy item (i.e. what is your name?) and G is a very hard question (what does floccinaucinihillipilification mean?) It is assumed that if you get a higher response correct, then you got all of the below responses correct, even though you may or may not have been tested for the items below (i.e. if you get D correct, then you got C, B, & A correct). The higher the response you get correct, the higher score you receive. If you get G correct, then your IQ score is 145 Difficulty of Response Response- F (Super-Smart)-(2.0) Response- E (Smart)-(1.0) Response- D (average)-(0.0) Response- C (Below average)-(-1.0) Response- B (border-line mental retardation)-(-2.0) If you get F correct, then your IQ score is 130 If you get E correct, then your IQ score is 115 If you get D correct, then your IQ score is 100 If you get C correct, then your IQ score is 85 If you get B correct, then your IQ score is 70 Response-A (mentally retarded)-(-3.0) If you get A correct, then your IQ score is 55 Easy -4.0 Ideally, this is what a scale should look like, with even intervals of potential responses across the scale. Explaing IRT to Stakeholder 10

11 Problem 1: Clumping The items are clumping together (i.e. E, D & C). Hard 4.0 Example 2 Response- G (Super-dupper smart) (3.0) IQ 145 Response- F (Super-Smart)- (2.8) IQ 142 This means that responses E, D, & C are basically measuring the same thing, and only discriminate between 106 to 102 and 102 to 97. We could remove D, E, or C, but this should be determined based on context and other IRT outcomes. WE DO NOT WANT CLUMPS OF RESPONSES Difficulty of Response Easy Response- E (Smart) (.4) IQ 106 Response- D (average) (.1) IQ 102 Response- C (Below average) (-.2) IQ 97 Response- B (border-line mental retardation) Response-A (mentally retarded) Explaing IRT to Stakeholder 11

12 Problem #2: Full Range Hard Difficulty of Response Easy Example Response- F (Super-Smart) (.5) IQ 107 Response- G (Super-dupper smart)- (0) IQ 100 Response- E (Smart) (-.5) IQ 93 Response- D (average) (-1.25) IQ 81 Response- C (Below average) (-2) IQ 70 Response- B (border-line mental retardation) (-2.8) IQ 58 Response-A (mentally retarded) (-3.5) IQ 48 The scale does not contain any responses above (+0.5) suggesting that the highest IQ we can measure is 107. People with IQ score higher than 107 (i.e. 130) would only be able to know that they have an IQ above 107. THE RESPONSES NEED TO RANGE FROM +3 TO -3 Explaing IRT to Stakeholder 12

13 Problem #3: Ordering Notice that the order of easy to hard goes A,B,C, D, E, G, then F, suggesting that F and G are out of order. A response of what we think is hard (only for supper-duper smart people) is really not that hard and will only produce an IQ score of 100, not 145 as assumed. THE RESPONSES NEED TO BE IN CORRECT ORDER Hard Difficulty of Response Easy Example Response- F (Super-Smart) (.5) IQ 107 Response- G (Super-dupper smart)- (0) IQ 100 Response- E (Smart) (-.5) IQ 93 Response- D (average) (-1.25) IQ 81 Response- C (Below average) (-2) IQ 70 Response- B (border-line mental retardation) (-2.8) IQ 58 Response-A (mentally retarded) (-3.5) IQ 48 Explaing IRT to Stakeholder 13

14 Problem #4: Large Gaps Hard 4.0 Example 2 Response- G (Super-dupper smart) (3.0) IQ 145 Response- F (Super-Smart)- (2.8) IQ 142 Again we see gaps, but they are within the scale (not just at the top or bottom). Difficulty of Response Response- E (Smart) (.4) IQ 106 Response- D (average) (.1) IQ 102 Response- C (Below average) (-.2) IQ 97 This suggest that there are no items able to measure an IQ score between 141 to 107, and between 96 and 56 people are not able to receive score in this area Response-A (mentally retarded) WE CANNOT HAVE LARGE GAPS Explaing IRT to Stakeholder 14

15 Activity with Stakeholder Keep in mind the 4 problems presented (C.R.O.G.)- Clumping, Range, Order & Gaps Present the results of the IRT item map Have problems listed with potential reason and solutions Should have also previously reviewed other IRT output for the software program (ICC s, Information functions, a, b, or c parameters, infit, outfit, etc ) Stakeholders were able interpret the results in terms of their program Added context to the results Most importantly, stakeholders felt that they were involved in the process Explaing IRT to Stakeholder 15

16 How to Present the Results Two ways the item map was presented 1. Display only the current item map Beneficial for individuals familiar with looking at data or graphs Beneficial for long measures when viewing individual items 2. Present the current item map and ideal item map, with suggestions and explanations at the bottom Beneficial for individuals not as familiar with data, graphs, or have number anxiety Ideal for short measure when viewing individual items because it takes up a lot of room Explaing IRT to Stakeholder 16

17 Example 1: Current Item Map Hard Employment Education Active/Growth C- Active Job Search (3.29) F- Full time college (4.77) E- Part time college (.3.66) 4.0 F- Very high (3.41) D- Non-paid work/volunteer (2.64) Difficulty of Item H- Full time independent (1.59) G- Part time independent (.76) E- Part time supported (-.37) F- Full time supported (-1.22) B- Interest in work, no action (-1.99) A- No interest in work (-2.71) D- noncredit training (-.10) G- Recent Grad (-.20) C- Active education/training search (-1.00) B-interest in education, no action (-1.05) A- No interest in education (-2.82) E- High (-.75) D- Moderate out MH system (-1.62) B- Low (-2.28) C- Moderate in MH system (-2.37) Easy A- Very low (-3.88) Explaing IRT to Stakeholder 17

18 Example 2: Current & Ideal Item Maps 4.0 Current Status: F- Very high (3.41) Active/Growth 4.0 Ideal: F- Very high (3.41) High (??) E- High (-.75) E- Moderate/High (-.75) -2.0 D- Moderate out MH system (-1.62) B- Low (-2.28) C- Moderate in MH system (-2.37) -2.0 D- Moderate/low (-1.62) B & C- Low (-2.28 to -2.37) -4.0 A- Very low (-3.88) Problems: #2 Ordering, #3 Clustering, & #4 Clumping Suggestion: 1. Combined low and moderate in MH systems, because their definitions are very similar, and clinicians can not differentiate between the 2 2. Since, B & C are combined and called low, then moderate out of MH should be call moderate low and high should be called moderate high 3. There is a very large gap between high and very high, when looking at their definition the only difference is participating in activates that are other focused in very high, and self-focused in high. Therefore, in terms of recovery, moving from self to other focused activities is a large step, and there should be other levels added in. Therefore, Explaing a new IRT response to Stakeholder should be added that includes participating in actives that are 18 approximately 50% self-focused and 50% other focuses, called high. Any suggestions regarding this issues? -4.0 A- Very low (-3.88)

19 Applications in Other Domains Any commonly understood test that has a normed mean and standard deviation Education- standard test scores SAT, GRE, K-12 state testing, t-scores Health- any biological measure heart rate, blood pressure, blood sugar Explaing IRT to Stakeholder 19

20 Questions??? For future questions or comments: Kate DeRoche (303) Explaing IRT to Stakeholder 20

21 IRT Resources IRT 101 Reise, S. P., Ainsworth, A. T. & Haviland, M. G. (2005). Item Response Theory: Fundamentals, Applications, and Promise in Psychological Research. Current Directions in Psychological Science, 14, IRT Books Hambleton, R. K., Swaminathan, H. & Rogers, H. J. (1991). Fundamentals of Item Response Theory. Newbury Park, CA: Sage Publication, Inc. Embretson, S. E. & Reise, S. P. (2000). Item Response Theory for Psychologists. Mahwah, NJ: Lawrence Erlbaum Associates And many more resources Explaing IRT to Stakeholder 21

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory Kate DeRoche, M.A. Mental Health Center of Denver Antonio Olmos, Ph.D. Mental Health