Selection of Linking Items

Size: px

Start display at page:

Download "Selection of Linking Items"

Myles Dixon
5 years ago
Views:

1 Selection of Linking Items Subset of items that maximally reflect the scale information function Denote the scale information as Linear programming solver (in R, lp_solve 5.5) min(y) Subject to θ, θs, where 4, 3.95,, 3.95, 4}, 0, 1,, 0. 37

2 An example: Subscale 2 Sum of Information Functions for 6, 7, and 8 Item Linking Sets 38

3 An example: Subscale 3 39

4 Why Fisher information is useful? In multidimensional CAT The volume of the confidence ellipsoid around is proportional to the determinant of (Anderson, 1984) Maximize the determinant of the Fisher information matrix (Segall, 1996, Wang & Chang, 2011). D optimal method 40

5 Fisher information vs. confidence ellipse θ θ Σ (Wang, et al., 2013)

6 Fisher information vs. confidence ellipse θ θ Σ (Wang, et al., 2013)

7 Mini max mechanism Assuming there are three dimensions, then,,, det, det, det, 2 det, This criterion tends to pick the items that minimize the variance of the estimator lagging behind most 43

8 Item bank Information 44

9 Domain/Content balancing Constraint weighted D optimal (Wang et al., 2017) Suppose for each domain, we have maximum and minimum number of items set in advance, {, }, k=1,..,d # of items belong to domain k so far, and n is the current test length, is the maximum test length indicates whether item j belongs to domain k (Cheng, et al., 2009) =, = 45

10 A simulation study Sample size N=2,000 Multivariate normal, with mean of 0 s, and covariance matrix Σ= Maximum a Posteriori (MAP) is used, and prior is multivariate normal with mean of 0 s and Evaluation criterion: root mean squared error (RMSE) N 1 RMSE( )= ( ˆ ) 1 i1 i1 N i

11 Results: Domain level recovery D optimal ( ) vs. Random selection ( ) 47

12 Results: Domain level recovery D optimal ( ) vs. Constraint weighted D optimal ( ) 48

13 Results: Domain level recovery D optimal ( ) vs. Constraint weighted D optimal ( ) 49

14 Reducing Test Length 50

15 (0, 0, 0) Test Length 51 θ Confidence Interval

16 (2, 2, 2) Test Length 52 θ Confidence Interval

17 Variable length CAT: Stopping rule Start 300+ items 53

18 Stopping rule Start 300+ items When the measurement precision criterion is satisfied (Dodd, Koch & De Ayala, 1993; Boyd, Dodd, & Choi, 2010) 54

19 Stopping rule Start 300+ items (a) Volume of the confidence ellipsoid (D rule) (b) Sum of S.E. per domain θ (c) Maximum axis of the confidence ellipsoid (d) Kullback Leibler divergence between to consecutive posteriors (Wang et al., 2013) 55

20 Cumulated information growth Test Length 56 Determinant of Fisher information matrix

21 Stopping rule Start 300+ items 57

22 Stopping rule Start 300+ items 58

23 Stopping rule Start 300+ items When θ does not change much: theta convergence rule (T rule) 0.01 (Babcock & Weiss, 2012 Wang et al., 2017+) 59

24 Why T rule is secondary? 2PL interval of ( ), is in the (Chang & Ying, 2008) 60

25 Why T rule is secondary? 2PL interval of ( ), is in the It does not monotonically decrease when test length increases! Terminate test pre maturely (Wang et al., 2017+) 61

26 Why T rule is secondary? 2PL interval of ( ) Undermine test efficiency Usually, the SE( )<.2 (Dodd, et al., 1993), is in the 25 If hypothetically 1, satisfying <.01 then 50 (Wang et al., 2017+) 62

27 MGRM Simple structure,, 0: 1,, 2 :, 1,,,, 1:,,, exp, (Wang et al., 2017+) 63

28 MGRM Simple structure.5,, 0: 1,, 2 :, 1,,,, 1:,,, exp, (Wang et al., 2017+) 64

29 MGRM Complex structure If item j measures the pth trait (Wang et al., 2017+) 65

30 MGRM Complex structure If item j measures the pth trait pth element of The amount of information carried by item j (Wang et al., 2017+) 66

31 MGRM Complex structure If item j measures the pth trait (Wang et al., 2017+) 67

32 MGRM Complex structure If item j measures the pth trait If item j measures multiple traits (Wang et al., 2017+) 68

33 Primary vs. Secondary stopping rules Start Minimum test length 300+ items (Babcock & Weiss, 2012 Wang et al., 2017+) 69

34 Primary vs. Secondary stopping rules Start Minimum test length 300+ items If D rule is satisfied? (Wang et al., 2017+) 70

35 Primary vs. Secondary stopping rules Start Minimum test length 300+ items If D rule is satisfied? Yes No If T rule is satisfied? (Wang et al., 2017+) 71

36 Primary vs. Secondary stopping rules Start Minimum test length 300+ items If D rule is satisfied? Yes No If T rule is satisfied? Yes No Continue (Wang et al., 2017+) 72

37 Primary vs. Secondary stopping rules Start Minimum test length Maximum test length 300+ items If D rule is satisfied? 94.9% 28.5 Yes No If T rule is satisfied? Yes No Continue 5.1% 61.5 (Wang et al., 2017+) 73

38 Stopping rule results Applied Cognition Daily Activity Mobility SE θ 74

39 3D plot 75

40 Stopping rule Cont. Test length Overall precision Primary stop Mean SD Bias RMSE Determinant Actual Eventual % 76

41 Stopping rule Cont. Test length Overall precision Primary stop Mean SD Bias RMSE Determinant Actual Eventual % Test length Bias RMSE Stop End Stop End Stop End Mean SD N= N=

42 Outline Brief introduction to computerized adaptive testing (CAT) Multidimensional CAT Computerized Adaptive Testing to Direct Delivery of Hospital Based Rehabilitation (NIH R01HD079439, ) Item bank calibration Item selection Stopping rules Ongoing projects 78

43 Project I: Classification AM PAC Color Coded Stages FIM score FIM Stage Independent (Green) Supervision Contact Guard (Yellow) Assistance (Orange) Dependent (Red) Table 2. High 7 Independent Low 6 Modified independent High 5 Supervision Low 4 Contact guard High 2 3 Min Mod Assist Low 1 Max Assist Red 0 Dependent 79

44 Project I: Classification Multidimensional CAT + Post hoc classification Or Multidimensional Classification CAT? 80

45 Project II: Incorporating response time (Fan, Wang, et al., 2012; Wang, et al., 2013a, 2013b; Wang & Xu, 2015) Exploratory data analysis (analysis per batch first) Histogram of batch 1 response time of all person item combinations (SD= 21.28, Skew= 41.84). Red line stands for the 97.5% percentile (25.85). 81

46 Project II: Incorporating response time (Fan, Wang, et al., 2012; Wang, et al., 2013a, 2013b; Wang & Xu, 2015) Exploratory data analysis (analysis per batch first) After cutting the upper 2.5% of data (SD= 4.27, Skew= 1.23) 82

47 Project II: Incorporating response time (Fan, Wang, et al., 2012; Wang, et al., 2013a, 2013b; Wang & Xu, 2015) Exploratory data analysis (analysis per batch first) After log transformation 83

48 Project II: Incorporating response time (Fan, Wang, et al., 2012; Wang, et al., 2013a, 2013b; Wang & Xu, 2015) A hierarchical response time model (van der Linden, 2007) Population μ,, σ, Item Item Person θ Item φ, λ Person τ 84

49 Four different models EM algorithm (1) According to Molenaar, et al. (2015), we can reparameterize van der Linden (2007) s joint model as MGRM ( ) Correlation between and (2) Including interviewers as covariates, and the interviewer effects differ across items 85

50 Four different models EM algorithm (3) Including interviewers as covariates, and the interviewer effects differ across items by a same proportion (4) Including interviewers as fixed covariates 86

51 Model 1 Model 2 Model 3 Model 4 87

52 Model comparison & Results Equation # of Free Parameters AIC BIC Batch Batch Batch Batch Model 3 results (batch 1) θ θ θ θ Estimates of are: 0.591, and Compared to MGRM alone, adding response time results in higher item discrimination parameter estimates and smaller standard errors. 88

53 Concurrent calibration across 4 batches Adding response time information did not affect the item parameter estimates and their standard errors significantly; Adding response time information helped reduce the standard error of patients multidimensional latent trait estimates, but adding interviewer as a covariate did not result in further improvement. 89

54 Next steps II: Incorporating response time (Fan, Wang, et al., 2012; Wang, et al., 2013a, 2013b; Wang & Xu, 2015) A hierarchical response time model (van der Linden, 2007) Maximize item information per time unit Maximize 90

55 3 factors to consider Next steps III: DIF CAT (Wang, Weiss, & Wang, 2017) Gender (Male/Female) Education (College+/high school and below) Age (<65/65~90) 91

56 Example DIF items Gender How much difficulty do you currently have making decisions, such as what clothes you want to wear? (Applied Cognition), consistent with expert hypothesis. Age How much difficulty do you currently have removing a plastic lid from a hot beverage cup? (Daily activity) How much difficulty do you currently have climbing stairs step over step without a handrail? (Mobility) 92

57 How to deal with DIF in a CAT design? Items with extreme DIF delete? Items with small DIF keep? Doubly adaptive CAT using subgroup information to improve measurement precision (Wang et al., 2017) Allow DIF items to have different parameters per subgroup Constraint weighted D optimal 93

58 Project IV: Adaptive measure of change (Wang & Weiss, 2017, Wang, 2014) Specifying the MCAT to efficiently detect meaningful clinical change 94

59 Study I 95

60 Project IV: Adaptive measure of change (Wang & Weiss, 2017, Wang, 2014) θ Time 1 Time 2 96

61 Project IV: Adaptive measure of change (Wang & Weiss, 2017, Wang, 2014) Item selection? Select an item that can best differentiate null hypothesis (no individual change) from alternative hypothesis. Sequential hypothesis testing? Stopping rule Time 1 θ Time 2 maximize ˆ ˆpooled KLj( i2, i( L k 1) ) 97

62 Algorithms Web based delivery Data collection with MCAT Monitor item usage, and routinely recalibrate item parameters if needed (Chen & Wang, 2016) 98

63 My collaborators and team Dr. David Weiss University of Minnesota Dr. Andrea Cheville Mayo Clinic Research Assistants: Zhuoran Shang Shiyang Su 99

Item Selection in Polytomous CAT

Item Selection in Polytomous CAT Bernard P. Veldkamp* Department of Educational Measurement and Data-Analysis, University of Twente, P.O.Box 217, 7500 AE Enschede, The etherlands 6XPPDU\,QSRO\WRPRXV&$7LWHPVFDQEHVHOHFWHGXVLQJ)LVKHU,QIRUPDWLRQ