Selection of Linking Items Subset of items that maximally reflect the scale information function Denote the scale information as Linear programming solver (in R, lp_solve 5.5) min(y) Subject to θ, θs, where 4, 3.95,, 3.95, 4}, 0, 1,, 0. 37
An example: Subscale 2 Sum of Information Functions for 6, 7, and 8 Item Linking Sets 38
An example: Subscale 3 39
Why Fisher information is useful? In multidimensional CAT The volume of the confidence ellipsoid around is proportional to the determinant of (Anderson, 1984) Maximize the determinant of the Fisher information matrix (Segall, 1996, Wang & Chang, 2011). D optimal method 40
Fisher information vs. confidence ellipse θ 15 0 0 10 θ 0.067 0 41 0 0.1 Σ (Wang, et al., 2013)
Fisher information vs. confidence ellipse θ 50 0 0 25 θ 0.02 0 42 0 0.04 Σ (Wang, et al., 2013)
Mini max mechanism Assuming there are three dimensions, then,,, det, det, det, 2 det, This criterion tends to pick the items that minimize the variance of the estimator lagging behind most 43
Item bank Information 44
Domain/Content balancing Constraint weighted D optimal (Wang et al., 2017) Suppose for each domain, we have maximum and minimum number of items set in advance, {, }, k=1,..,d # of items belong to domain k so far, and n is the current test length, is the maximum test length indicates whether item j belongs to domain k (Cheng, et al., 2009) =, = 45
A simulation study Sample size N=2,000 Multivariate normal, with mean of 0 s, and covariance matrix Σ= Maximum a Posteriori (MAP) is used, and prior is multivariate normal with mean of 0 s and Evaluation criterion: root mean squared error (RMSE) N 1 RMSE( )= ( ˆ ) 1 i1 i1 N i 1 2 46
Results: Domain level recovery D optimal ( ) vs. Random selection ( ) 47
Results: Domain level recovery D optimal ( ) vs. Constraint weighted D optimal ( ) 48
Results: Domain level recovery D optimal ( ) vs. Constraint weighted D optimal ( ) 49
Reducing Test Length 50
(0, 0, 0) Test Length 51 θ Confidence Interval
(2, 2, 2) Test Length 52 θ Confidence Interval
Variable length CAT: Stopping rule Start 300+ items 53
Stopping rule Start 300+ items When the measurement precision criterion is satisfied (Dodd, Koch & De Ayala, 1993; Boyd, Dodd, & Choi, 2010) 54
Stopping rule Start 300+ items (a) Volume of the confidence ellipsoid (D rule) (b) Sum of S.E. per domain θ (c) Maximum axis of the confidence ellipsoid (d) Kullback Leibler divergence between to consecutive posteriors (Wang et al., 2013) 55
Cumulated information growth Test Length 56 Determinant of Fisher information matrix
Stopping rule Start 300+ items 57
Stopping rule Start 300+ items 58
Stopping rule Start 300+ items When θ does not change much: theta convergence rule (T rule) 0.01 (Babcock & Weiss, 2012 Wang et al., 2017+) 59
Why T rule is secondary? 2PL interval of ( ), is in the (Chang & Ying, 2008) 60
Why T rule is secondary? 2PL interval of ( ), is in the It does not monotonically decrease when test length increases! Terminate test pre maturely (Wang et al., 2017+) 61
Why T rule is secondary? 2PL interval of ( ) Undermine test efficiency Usually, the SE( )<.2 (Dodd, et al., 1993), is in the 25 If hypothetically 1, satisfying <.01 then 50 (Wang et al., 2017+) 62
MGRM Simple structure,, 0: 1,, 2 :, 1,,,, 1:,,, exp, (Wang et al., 2017+) 63
MGRM Simple structure.5,, 0: 1,, 2 :, 1,,,, 1:,,, exp, (Wang et al., 2017+) 64
MGRM Complex structure If item j measures the pth trait (Wang et al., 2017+) 65
MGRM Complex structure If item j measures the pth trait pth element of The amount of information carried by item j (Wang et al., 2017+) 66
MGRM Complex structure If item j measures the pth trait (Wang et al., 2017+) 67
MGRM Complex structure If item j measures the pth trait If item j measures multiple traits (Wang et al., 2017+) 68
Primary vs. Secondary stopping rules Start Minimum test length 300+ items (Babcock & Weiss, 2012 Wang et al., 2017+) 69
Primary vs. Secondary stopping rules Start Minimum test length 300+ items If D rule is satisfied? (Wang et al., 2017+) 70
Primary vs. Secondary stopping rules Start Minimum test length 300+ items If D rule is satisfied? Yes No If T rule is satisfied? (Wang et al., 2017+) 71
Primary vs. Secondary stopping rules Start Minimum test length 300+ items If D rule is satisfied? Yes No If T rule is satisfied? Yes No Continue (Wang et al., 2017+) 72
Primary vs. Secondary stopping rules Start Minimum test length Maximum test length 300+ items If D rule is satisfied? 94.9% 28.5 Yes No If T rule is satisfied? Yes No Continue 5.1% 61.5 (Wang et al., 2017+) 73
Stopping rule results Applied Cognition Daily Activity Mobility SE θ 74
3D plot 75
Stopping rule Cont. Test length Overall precision Primary stop Mean SD Bias RMSE Determinant Actual Eventual 28.5 13.3 0.005 0.303 514.7 0.949 0.965 1.6% 76
Stopping rule Cont. Test length Overall precision Primary stop Mean SD Bias RMSE Determinant Actual Eventual 28.5 13.3 0.005 0.303 514.7 0.949 0.965 1.6% Test length Bias RMSE Stop End Stop End Stop End Mean SD N=31 58.7 15.3 72.2 15.5 0.162 0.136 0.430 0.391 N=71 64.5 13.0 120 0 0.207 0.204 0.592 0.525 77
Outline Brief introduction to computerized adaptive testing (CAT) Multidimensional CAT Computerized Adaptive Testing to Direct Delivery of Hospital Based Rehabilitation (NIH R01HD079439, 2015 2020) Item bank calibration Item selection Stopping rules Ongoing projects 78
Project I: Classification AM PAC Color Coded Stages FIM score FIM Stage Independent (Green) Supervision Contact Guard (Yellow) Assistance (Orange) Dependent (Red) Table 2. High 7 Independent Low 6 Modified independent High 5 Supervision Low 4 Contact guard High 2 3 Min Mod Assist Low 1 Max Assist Red 0 Dependent 79
Project I: Classification Multidimensional CAT + Post hoc classification Or Multidimensional Classification CAT? 80
Project II: Incorporating response time (Fan, Wang, et al., 2012; Wang, et al., 2013a, 2013b; Wang & Xu, 2015) Exploratory data analysis (analysis per batch first) Histogram of batch 1 response time of all person item combinations (SD= 21.28, Skew= 41.84). Red line stands for the 97.5% percentile (25.85). 81
Project II: Incorporating response time (Fan, Wang, et al., 2012; Wang, et al., 2013a, 2013b; Wang & Xu, 2015) Exploratory data analysis (analysis per batch first) After cutting the upper 2.5% of data (SD= 4.27, Skew= 1.23) 82
Project II: Incorporating response time (Fan, Wang, et al., 2012; Wang, et al., 2013a, 2013b; Wang & Xu, 2015) Exploratory data analysis (analysis per batch first) After log transformation 83
Project II: Incorporating response time (Fan, Wang, et al., 2012; Wang, et al., 2013a, 2013b; Wang & Xu, 2015) A hierarchical response time model (van der Linden, 2007) Population μ,, σ, Item Item Person θ Item φ, λ Person τ 84
Four different models EM algorithm (1) According to Molenaar, et al. (2015), we can reparameterize van der Linden (2007) s joint model as MGRM ( ) Correlation between and (2) Including interviewers as covariates, and the interviewer effects differ across items 85
Four different models EM algorithm (3) Including interviewers as covariates, and the interviewer effects differ across items by a same proportion (4) Including interviewers as fixed covariates 86
Model 1 Model 2 Model 3 Model 4 87
Model comparison & Results Equation # of Free Parameters AIC BIC Batch 1 1 736 133566 136755 2 1281 133174 138725 3 741 133316 136527 4 741 133409 136620 Batch 2 1 652 102468 105202 2 940 102049 105992 3 655 102235 104982 4 655 102339 105086 Batch 3 1 656 111384 114149 2 1040 110613 114996 3 660 111001 113783 4 660 111323 114105 Batch 4 1 648 108550 111290 2 1028 107733 112080 3 652 108174 110931 4 652 108364 111121 Model 3 results (batch 1) θ θ 0.613 θ θ 0.466 0.853 Estimates of are: 0.591, 0.691 and 0.596 Compared to MGRM alone, adding response time results in higher item discrimination parameter estimates and smaller standard errors. 88
Concurrent calibration across 4 batches Adding response time information did not affect the item parameter estimates and their standard errors significantly; Adding response time information helped reduce the standard error of patients multidimensional latent trait estimates, but adding interviewer as a covariate did not result in further improvement. 89
Next steps II: Incorporating response time (Fan, Wang, et al., 2012; Wang, et al., 2013a, 2013b; Wang & Xu, 2015) A hierarchical response time model (van der Linden, 2007) Maximize item information per time unit Maximize 90
3 factors to consider Next steps III: DIF CAT (Wang, Weiss, & Wang, 2017) Gender (Male/Female) Education (College+/high school and below) Age (<65/65~90) 91
Example DIF items Gender How much difficulty do you currently have making decisions, such as what clothes you want to wear? (Applied Cognition), consistent with expert hypothesis. Age How much difficulty do you currently have removing a plastic lid from a hot beverage cup? (Daily activity) How much difficulty do you currently have climbing stairs step over step without a handrail? (Mobility) 92
How to deal with DIF in a CAT design? Items with extreme DIF delete? Items with small DIF keep? Doubly adaptive CAT using subgroup information to improve measurement precision (Wang et al., 2017) Allow DIF items to have different parameters per subgroup Constraint weighted D optimal 93
Project IV: Adaptive measure of change (Wang & Weiss, 2017, Wang, 2014) Specifying the MCAT to efficiently detect meaningful clinical change 94
Study I 95
Project IV: Adaptive measure of change (Wang & Weiss, 2017, Wang, 2014) θ Time 1 Time 2 96
Project IV: Adaptive measure of change (Wang & Weiss, 2017, Wang, 2014) Item selection? Select an item that can best differentiate null hypothesis (no individual change) from alternative hypothesis. Sequential hypothesis testing? Stopping rule Time 1 θ Time 2 maximize ˆ ˆpooled KLj( i2, i( L k 1) ) 97
Algorithms Web based delivery Data collection with MCAT Monitor item usage, and routinely recalibrate item parameters if needed (Chen & Wang, 2016) 98
My collaborators and team Dr. David Weiss University of Minnesota Dr. Andrea Cheville Mayo Clinic Research Assistants: Zhuoran Shang Shiyang Su 99