Workshop Overview. Diagnostic Measurement. Theory, Methods, and Applications. Session Overview. Conceptual Foundations of. Workshop Sessions:

Size: px

Start display at page:

Download "Workshop Overview. Diagnostic Measurement. Theory, Methods, and Applications. Session Overview. Conceptual Foundations of. Workshop Sessions:"

Mariah Palmer
5 years ago
Views:

1 Workshop Overview Workshop Sessions: Diagnostic Measurement: Theory, Methods, and Applications Jonathan Templin The University of Georgia Session 1 Conceptual Foundations of Diagnostic Measurement Session 2 Diagnostic Modeling Psychometric Models Session 3 Diagnostic Modeling in Educational and Psychological Settings Session 4 Advanced Concepts Session 5 Estimation of Diagnostic Classification Models with Mplus 2 Session Overview Key definitions Conceptual Foundations of Diagnostic Measurement Session 1 Conceptual example Example uses of diagnostic models in education Classroom use (formative assessment) Large scale testing use (summative assessment) Why diagnostic models should be used instead of traditional classification methods Concluding remarks 4 1

2 What are Diagnoses? The word and meaning of diagnosis is common in language Session 1: Conceptual Foundations of Diagnostic Measurement DEFINITIONS Meaning of diagnoses are deeply ingrained in our society Seldom merits a second thought 5 6 Definitions American Heritage Dictionary definition of diagnosis: Generally (a) A critical analysis of the nature of something (b) The conclusion reached by such analysis Medicine (a) The act or process of identifying of determining the nature and cause of a disease or injury through evaluation of a patient s history, examination, and review of laboratory data (b) The opinion derived from such an evaluation Biology (a) A brief description of the distinguishing characteristics of an organism, as for taxonomic classification (p. 500) Diagnosis: Defined A diagnosis is the decision that is being made based on information Within psychological testing, providing a test score gives the information that is used for a diagnosis BUT, the score is not the diagnosis For this workshop, a diagnosis is by its nature discrete Classification 7 8 2

3 Day to Day Diagnosis Decisions happen every day: Decide to wear a coat or bring an umbrella Decide to study Decide what to watch on TV tonight In all cases: Information (or data) is collected Inferences are made from data based on what is likely to be the true state of reality Diagnosis (Formalized) In diagnostic measurement, the procedures of diagnosis are formalized: We make a set of observations Usually through a set of test questions Based on these questions we make a decision as to the underlying state (or states) of a person The decision is the diagnosis 9 10 Diagnosis (Formalized) Diagnoses featured in this workshop: Educational Measurement The competencies (skills) that a person has or has not mastered Leads to possible tailored instruction and remediation Psychiatric Assessment The DSM criteria that a person meets Leads to a broader diagnosis of a disorder Workshop Terminology Respondents: The people from whom behavioral data are collected Behavioral data considered test item responses for workshop Not limited to only item responses Items: Test items used to classify/diagnose respondents Diagnostic Assessment: The method used to elicit behavioral data Attributes: Unobserved dichotomous characteristics underlying the behaviors (i.e., diagnostic status) Latent variables linked to behaviors diagnostic classification models Psychometric Models: Models used to analyze item response data Diagnostic Classification Models (DCMs) is the name of the models used to obtain classifications/diagnoses

4 Diagnostic Classification Model Names Diagnostic classification models (DCMs) have been called many different things Skills assessment models Cognitive diagnosis models Cognitive psychometric models Latent response models Restricted (constrained) latent class models Multiple classification models Structured located latent class models Structured item response theory Psychometric Soapbox DCMs are but a small set of tools that must be adapted for a common purpose Part of a methodological toolbox that is used to classify respondents Should also include content experts and end users of the diagnoses DCMs link empirical observations and respondents characteristics The models are only as good as underlying theories Diagnostic Modeling Concepts Imagine that an elementary teacher wants to test basic math ability Session 1: Conceptual Foundations of Diagnostic Measurement CONCEPTUAL EXAMPLE Using traditional psychometric approaches, the teacher could estimate an ability or test score for each respondent Classical Test Theory: Assign respondents a test score Item Response Theory: Assign respondents a latent (scaled) score By knowing each respondent s score, the students are ordered along a continuum

5 Traditional Psychometrics Traditional Psychometrics Low Mathematics Ability at UGA High What results is a (weak) ordering of respondents Ordering is called weak because of error in estimates Seock Ho Kim > Allan Cohen > Jonathan Templin Questions that traditional psychometrics cannot answer: Why is Jonathan so low? How can we get him some help? How much ability is enough to pass? How much is enough to be proficient? Jonathan Allan Cohen Seock Ho Kim What math skills have the students mastered? Multiple Dimensions of Ability Ability from a Diagnostic Perspective As an alternative, we could have expressed math ability as a set of basic skills: Has Mastered Has Not Mastered Addition Addition Subtraction Subtraction Multiplication Division Multiplication Division

6 Multiple Dimensions of Ability The set of skills represent the multiple dimensions of elementary mathematics ability Other psychometric approaches have been developed for multiple dimensions Classical Test Theory Scale Subscores Multidimensional Item Response Theory (MIRT) Yet, issues in application have remained: Reliability of estimates is often poor for most practical test lengths Dimensions are often very highly correlated Large samples are needed to calibrate item parameters in MIRT DCMs as an Alternative DCMs do not assign a single score Instead, a profile of mastered attributes is given to respondents Multidimensional models DCMs provide respondents valuable information with fewer data demands Higher reliability than comparable IRT/MIRT models Complex item structures possible Path Diagram of Traditional Psychometrics Psychometric Model Comparison Basic Math Ability Addition Subtraction Multiplication Division /2 (4x2)+3 Using Traditional Models Has a score of 20 Has a 75%, a grade of C Is in the 60 th percentile of math Scored above the cut off, passes math Using Diagnostic Models Is proficient using addition Is proficient using subtraction Should work on Multiplication Should work on Division

7 DCM Specifics Let s expand on the idea of the basic math test Possible items may be: /2 (4 x 2) + 3 Not all items measure all attributes A Q matrix is used to indicate the attributes measured by each item This is the factor pattern matrix that assigns the loadings in confirmatory factor analysis The Q Matrix An example of a Q matrix using our math test Add Sub Mult Div / (4 x 2) Respondent Profiles Expected Responses to Items Respondents are characterized by profiles specifying which attributes have been mastered Numeric values are arbitrary, but for our purposes Mastery given a 1 Non mastery given a 0 For example: Add Sub Mult Div Respondent A Respondent profile estimates are in the form of probabilities of mastery Q matrix Add Sub Mult Div / (4 x 2) Respondent Mastery Add Sub Mult Div Respondent Respondent Respondent Respondent By knowing which attributes are measured by each item and which attributes have been mastered by each respondent, we can determine the items that will likely be answered correctly by each respondent Prob Ans #1 Prob Ans #2 Prob Ans #3 Prob Ans #1 & #

DCM Scoring and Score Reporting DCM Conceptual Summary DCMs focus on WHY a respondent is not performing well as compared to only focusing on WHO The models define the chances of a correct response

model predicts how respondents will answer each item Also allows for classification/diagnoses based on item responses from Templin (2007) 29 30 How do DCMs Produce Diagnoses?

8 DCM Scoring and Score Reporting DCM Conceptual Summary DCMs focus on WHY a respondent is not performing well as compared to only focusing on WHO The models define the chances of a correct response based on the respondent s attribute profile Many models have been created ranging in complexity In Session #2 we discuss a general DCM The general model subsumes all other latent variable DCMs The model predicts how respondents will answer each item Also allows for classification/diagnoses based on item responses from Templin (2007) How do DCMs Produce Diagnoses? Diagnostic decisions come from comparing observed behaviors to two parts of the psychometric model: 1. Item/variable information (item parameters) How respondents with different diagnostic profiles perform on a set of test items Helps determine which items are better at discriminating between respondents with differing diagnostic profiles 2. Respondent information pertaining to the baserate or proportion of respondents with diagnoses in the Structural Model population Provides frequency of diagnosis (or diagnostic profile) Measurement Model Conceptual Model Mapping in DCMs Helps validate the plausibility of the observed diagnostic profiles

9 DCMs In Practice To demonstrate the potential benefits of using DCMs, I present a brief example of their use From Henson & Templin (2008); Templin & Henson (2008) An urban county in a southern state wanted to improve student s End Of Course (EOC) scores on the state s 10 th grade Algebra 2 exam Session 1: Conceptual Foundations of Diagnostic Measurement USES OF DIAGNOSTIC MODEL RESPONDENT ESTIMATES A benchmark test was given in the middle of a semester Formative test designed to help teachers focus instruction Respondents and their teachers received DCM estimates Used these to characterize student proficiency levels with respect to 5 state specified goals for Algebra 2 (standards) DCM Study The benchmark test was developed for use with a DCM Characteristics of the test were fixed via standard setting Five attributes were measured Mastery was defined as meeting the proficient level for each attribute Attributes were largest represented in EOC exam Respondents then took the EOC exam 50 item test: Score of 33+ considered proficient Benchmark estimates linked to EOC estimates Descriptive Statistics of Attribute Patterns First, the basic descriptive statistics for each possible pattern What we expect a respondent with a given attribute pattern to score on the EOC test Next slides describe how DCMs can help guide instruction

Gain by Mastery of Each Attribute Pathways to Proficiency The difference in test score between masters and non masters of an attribute can be quantified Correlation between attribute and EOC score

10 Gain by Mastery of Each Attribute Pathways to Proficiency The difference in test score between masters and non masters of an attribute can be quantified Correlation between attribute and EOC score indicates amount of gain in EOC score by mastery of attribute Note: 50 item test DCMs can be used to form of a learning path a respondent can follow that would most quickly lead to proficiency on the EOC test The pathway tells the respondent and the teacher the sequence of attributes to learn next that will provide the biggest increase in test score This mechanism may help teachers decide focus on when teaching a course Balances time spent on instruction with impact on test score Provides a practical implementation of DCMs in today s classroom testing environment Proficiency Road Map Fast Path to Proficiency

11 Harder Paths to Proficiency Some paths are less efficient at increasing EOC test scores Session 1: Conceptual Foundations of Diagnostic Measurement IMPLICATIONS FOR LARGE SCALE TESTING PROGRAMS DCM Characteristics Theoretical Reliability Comparison As mentioned previously, DCMs provide a higher level of reliability for their estimates than comparable IRT or CTT models (Templin & Bradshaw, in press) It is easier to place a respondent into one of two groups (mastery or non mastery) than to locate them on a scale Such characteristics allow DCMs to potentially change how large scale testing is conducted Most EOC type tests are for classification Proficiency standards DCMs provide direct link to classification And direct access to standards Reliability DCM IRT Reliability Level DCM IRT Items 34 Items Items 48 Items Items 77 Items Number of Items

12 Uni and Multidimensional Comparison DCMs for an EOC Test Reliability DCM IRT DCM IRT DCM IRT Reliability PL ρ θ =.87 2 Category 3 Category 4 Category Category: 24 Items 3 Category: 42 Items 4 Category: 50 Items 5 Category: 54 Items 5 Category Dimension 2-Dimension BiFactor Dimensional Model Number of Items Ramifications for Use of DCMs Reliable measurement of multiple dimensions is possible Two attribute DCM application to empirical data: Reliabilities of 0.95 and 0.90 (compared to 0.72 and 0.70 for IRT) Multidimensional proficiency standards Respondents must demonstrate proficiency on multiple areas to be considered proficient for an overall content domain Teaching to the test would therefore represent covering more curricular content to best prepare respondents Shorter unidimensional tests Two category unidimensional DCM application to empirical data: Test needed only 24 items to have same reliability as IRT with 73 items The Paradox of DCMs DCMs are often pitched as models that allow for measurement of fine grained skills (e.g., Rupp & Templin, 2008) Paradox of DCMs: Sacrifice fine grained measurement of a latent trait for only several categories Increased capacity to measure ability multidimensionally

13 When Are DCMs Appropriate? Which situations lend themselves more naturally to such diagnosis? The purpose of the diagnostic assessment matters most DCMs provide classifications directly Optimally used when tests are used for classification EOC Tests Licensure/certification Clinical screening College entrance Placement tests DCMs can be used as coarse approximations to continuous latent variable models i.e., EOG example (2 5 category levels shown) Session 1: Conceptual Foundations of Diagnostic Measurement BENEFITS OF DCMS OVER TRADITIONAL CLASSIFICATION METHODS Previous Methods for Classification Making diagnoses on the basis of test responses is not a new concept Classical test theory Item response theory Factor analysis Process is a two stage procedure 1. Scale respondents 2. Find appropriate cut scores Classify respondents based on cut scores Problems with the Two Stage Approach The two stage procedure allows for multiple sources of error to affect the results 1. The latent variable scores themselves: estimation error Uncertainty is typically not accounted for in the subsequent classification of respondents (i.e., standard errors) The classification of respondents at different locations on the score continuum with multiple cut scores is differentially precise Uncertainty of the latent variable scores varies as a function of the location of the score

14 Problems with the Two Stage Approach 2. Latent variable assumptions: that latent variable scores follow a continuous, typically normal, distribution Estimates reflect the assumed distribution Can introduce errors if the assumption is incorrect 3. Cut score determination Standard setting is imprecise when used with general abilities Standard setting methods can be directed to item performance Some theoretical justification needs to be provided for such a cut off Why are DCMs Better for Classification? The need for a two stage procedure to set cut scores for classification is eliminated when DCMs are used Reduces classification error Quantifies and models the measurement error of the observable variables Controlling for measurement error when producing the diagnosis DCMs have a natural and direct mechanism for incorporating base rate information into the analysis No direct way to do so objectively in two stage procedures Item parameters provide information as to the diagnostic quality of each item Not directly estimable in two stage approaches Can be used to build tests that optimally separate respondents Session 1 Take home Points DCMs provide direct link between diagnosis and behavior Provide diagnostic classifications directly Diagnoses set by psychometric model parameters DCMs are effective if classification is the ultimate purpose Reduce error by removing judgments necessary in two stage approach Session 1: Conceptual Foundations of Diagnostic Measurement CONCLUDING REMARKS DCMs can be used in many contexts Can be used to create highly informative tests Can be used to measure multiple dimensions DCMs are in their infancy Time will tell their effectiveness

15 Development of Psychometric Models Diagnostic Modeling: Psychometric Models Session 2 Over the past several years, numerous DCMs have been developed We will focus on DCMs that use latent variables for attributes Each DCM makes assumptions about how mastered attributes combine/interact to produce an item response Compensatory/disjunctive/additive models Non compensatory/conjunctive/non additive models With so many models, analysts have been unsure which model would best fit their purpose Difficult to imagine all items following same assumptions 58 General Models for Diagnosis Recent developments have produced very general diagnostic models General Diagnostic Model (GDM; von Davier, 2005) Loglinear Cognitive Diagnosis Model (LCDM; Henson, Templin, & Willse, 2009) Focus of this session The general DCMs (GDM; LCDM) provide great flexibility Subsume all other latent variable DCMs Allow for both additive and non additive relationships between attributes and items Sync with other psychometric models allowing for greater understanding of modeling process Session Overview Background information ANOVA models and the LCDM Logits explained The LCDM Parameter structure One item demonstration LCDM general form Linking the LCDM to other earlier developed DCMs

16 Notation Used Throughout Session Attributes: a = 1,, A Respondents: r = 1,,R Attribute Profiles: α r = [α r1, α r2,, α ra ] α ra is 0 or 1 Latent Classes: c = 1,,C We have C = 2 A latent classes one for each possible attribute profile Items: i = 1,,I Restricted to dichotomous item responses (X ri is 0 or 1) Q matrix: Elements q ia for an item i and attribute a q ia is 0 or 1 Session 2: Diagnostic Modeling Psychometric Models BACKGROUND INFORMATION: ANOVA MODELS Background Information ANOVA The LCDM models the probability of a correct response to an item as a function of the latent attributes of a respondent α = 0 α = 1 The latent attributes are categorical, meaning a respondent can have only a few possible statuses Each status corresponds to a predicted probability of a correct response P(X = 1) α = 0 α = 1 P(X=1) P(X = 1) As such, the LCDM is very similar to an ANOVA model Predicting the a dependent variable as a function of the experimental group of a respondent ANOVA Refresher As a refresher on ANOVA, lets imagine that we are interested in the factors that have an effect on work output (denoted by Y) We design a two factor study where work output may be affected by: Lighting of the workplace High or Low Temperature Cold or Warm This experimental design is known as a 2 Way ANOVA

17 ANOVA Model Here is the 2 x 2 Factorial design: ANOVA Model The ANOVA model allows us to test for the presence of: Cold Temperature Low Lighting High Lighting A main effect associated with Temperature (A t ) A main effect associated with Lighting (B l ) Warm Temperature An interaction effect associated with Temperature and Lighting (AB) tl The ANOVA model for a respondent s work output is ANOVA with Dummy Coded Variables ANOVA with Dummy Coded Variables The ANOVA model can also be re written using two dummy coded variables D rt and D rl Becomes a linear model (i.e., regression model) The ANOVA model then becomes: D rl = 0 Low Lighting D rl = 1 High Lighting D rt = 0 Cold Temperature D rt D rt =0 for respondents in cold temperature condition D rt =1 for respondents in warm temperature condition D rt = 1 Warm Temperature D light D rl =0 for respondents in low lighting condition D rl =1 for respondents in high lighting condition

18 ANOVA Effects Explained β 0 is the mean for the cold and low light condition (reference group) The intercept β t is the change of the mean when comparing cold to warm temperature for a business with low lights (Simple Main Effect) β l is the change of the mean when comparing low to high lights for a business with a cold temperature (Simple Main Effect) β t*l is additional mean change that is not explained by the shift in temperature and shift and lights, when both occur (2 Way Interaction) Respondents from in the same condition have the same predicted value ANOVA and the LCDM The ANOVA model and the LCDM take the same modeling approach Predict a response using dummy coded variables In LCDM dummy coded variables are latent attributes Using a set of main effects and interactions Links attributes to item response Where possible, we may look for ways to reduce the model Removing non significant interactions and/or main effects Differences Between LCDM and ANOVA The LCDM and the ANOVA model differ in two ways: Instead of a continuous outcome such as work output the LCDM models a function of the probability of a correct response The logit of a correct response (defined next) Instead of observed factors as predictors the LCDM uses discrete latent variables (the attributes being measured) Attributes are given dummy codes (act as latent factors) α ra = 1 if respondent r has mastered attribute a α ra = 0 if respondent r has not mastered attribute a Session 2: Diagnostic Modeling Psychometric Models LOGITS EXPLAINED

19 Model Background More on Logits Just as in IRT models, the LCDM models the log odds of a correct response conditional on a respondent s attribute pattern α r The log odds is called a logit The logit is used because the responses are binary Items are either answered correctly (1) or incorrectly (0) Probability Logit The linear model is inappropriate for categorical data Can lead to impossible predictions (i.e., probabilities greater than 1 or less than 0) From Logits to Probabilities Whereas logits are useful as the are unbounded continuous variables, categorical data analyses rely on estimated probabilities The inverse logit function coverts the unbounded logit to a probability This is also the form of an IRT model (and logistic regression) Session 2: Diagnostic Modeling Psychometric Models THE LCDM

20 Building the LCDM To demonstrate the LCDM, consider the item 2+3 1=? from our basic math example The item measured addition (attribute 1) and subtraction (attribute 2) Only attributes defined by the Q matrix are modeled for an item The LCDM provides the logit of a correct response as a function of the latent attributes mastered by a respondent: LCDM Explained logit(x ri = 1) is the logit of a correct response to item i by respondent r λ i,0 is the intercept The logit for non masters of addition and subtraction The reference group is respondents who have not mastered either attribute (α r1 = 0 and α r2 = 0) LCDM Explained Understanding LCDM Notation The LCDM item parameters have several subscripts: λ i,1,(1) = main effect for addition (attribute 1) The increase in the logit for mastering addition (in someone who has not also mastered subtraction) λ i,1,(2) = main effect for subtraction (attribute 2) The increase in the logit for mastering subtraction (in someone who has not also mastered addition) λ i,2,(1,2) is the interaction between addition and subtraction (attributes 1 and 2) Change in the logit for mastering both addition & subtraction 79 Subscript #1 i: the item to which parameters belong Subscript #2 e: the level of the effect 0 is the intercept 1 is the main effect 2 is the two way interaction 3 is the three way interaction Subscript #3 (a 1, ): the attributes to which the effect applies Same number of attributes listed as number in Subscript #

21 LCDM with Example Numbers Imagine we obtained the following estimates for the simple math item: Parameter Estimate Effect Name λ i,0-2 Intercept λ i,1,(1) 2 Addition Simple Main Effect Session 2: Diagnostic Modeling Psychometric Models LCDM: A NUMERICAL EXAMPLE λ i,1,(2) 1 Subtraction Simple Main Effect λ i,2,(1,2) 0 Addition/Subtraction Interaction LCDM Predicted Logits and Probabilities LCDM Interaction Plots α 1 α 2 LCDM Logit Function Logit Probability 0 0 λ i,0 + λ i,1,(1) *(0) + λ i,1,(2) *(0) + λ i,2,(1,2) *(0)*(0) λ i,0 + λ i,1,(1) *(0) + λ i,1,(2) *(1) + λ i,2,(1,2) *(0)*(1) λ i,0 + λ i,1,(1) *(1) + λ i,1,(2) *(0) + λ i,2,(1,2) *(1)*(0) λ i,0 + λ i,1,(1) *(1) + λ i,1,(2) *(1) + λ i,2,(1,2) *(1)*(1) Logit Response Function Probability Response Function The LCDM interaction term can be investigated via plots No interaction: parallel lines for the logit Compensatory RUM (Hartz, 2002) Logit Response Function 1.5 Probability Response Function Logit(X=1 α) α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2=1 Possible Attribute Patterns P(X=1 α) α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2=1 Possible Attribute Patterns Logit(X=1 α) α2=0 α2=1 α1=0 α1=1 P(X=1 α) α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2=1 Possible Attribute Patterns

22 Strong Positive Interactions Positive interaction: over additive logit model Conjunctive model (i.e., all or none) DINA model (Haertel, 1989; Junker & Sijtsma, 1999) Strong Negative Interactions Negative interaction: under additive logit model Disjunctive model (i.e., one or more) DINO model (Templin & Henson, 2006) Logit Response Function Probability Response Function Logit Response Function Probability Response Function Logit(X=1 α) α2=0 α2=1 α1=0 α1=1 P(X=1 α) Logit(X=1 α) α2=0 α2=1 α1=0 α1=1 P(X=1 α) α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2= α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2=1-2.5 Possible Attribute Patterns -2.5 Possible Attribute Patterns Less Extreme Interactions Extreme interactions are unlikely in practice Below: positive interaction with positive main effects Logit Response Function Probability Response Function Logit(X=1 α) α2=0 α2=1 α1=0 α1=1 GENERAL P(X=1 α) Session 2: Diagnostic Modeling Psychometric Models FORM OF THE LCDM α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2=1 Possible Attribute Patterns

23 More General Versions of the LCDM The LCDM is based on the General Diagnostic Model by von Davier (GDM; 2005) The GDM allows for both categorical and continuous latent variables For items measuring more than two attributes, higher level interactions are possible Difficult to estimate in practice General Form of the LCDM The LCDM specifies the probability of a correct response as a function of a set of attributes and a Q matrix: The term in the exponent is the logit we have been using all along Logit(X ri =1 α r ) The LCDM appears in the psychometric literature in a more general form See Henson, Templin, & Willse (2009) 89 Intercepts Main Effects Two-Way Interactions Higher Interactions 90 Previously Popular DCMs Because the advent of the GDM and LCDM has been fairly recent, other earlier DCMs are still in use Such DCMs are much more restrictive than the LCDM Not discussed at length here It is anticipated that field will adapt to more general forms Session 2: Diagnostic Modeling Psychometric Models SUBSUMED MODELS Each of these models can be fit using the LCDM Fixing certain model parameters Shown for reference purposes See Henson, Templin, & Willse (2009) for more detail

24 Other DCMs with the LCDM The Big 6 DCMs with latent variables: DINA (Deterministic Inputs, Noisy AND Gate) Haertel (1989); Junker and Sijtsma (1999) NIDA (Noisy Inputs, Deterministic AND Gate) Maris (1995) RUM (Reparameterized Unified Model) Hartz (2002) DINO (Deterministic Inputs, Noisy OR Gate) Templin & Henson (2006) NIDO (Noisy Inputs, Deterministic OR Gate) Templin (2006) C RUM (Compensatory Reparameterized Unified Model) Hartz (2002) LCDM Parameters Main Effects Other DCMs with the LCDM Non compensatory Models Compensatory Models DINA NIDA NC RUM DINO NIDO C RUM Zero Positive Positive Positive Positive Positive Interactions Positive Positive Positive Negative Zero Zero Parameter Restrictions Across Attributes Across Items Across Attributes Across Items Adapted from: Rupp, Templin, and Henson (forthcoming, 2010) Compensatory RUM (Hartz, 2002) No interactions in model No interaction: parallel lines for the logit DINA Model (Haertel, 1989; Junker & Sijstma, 1999) Positive interaction: over additive logit model Highest interaction parameter is non zero All main effects (and lower interactions) zero Logit Response Function Probability Response Function Logit Response Function Probability Response Function Logit(X=1 α) α2=0 α2=1 α1=0 α1=1 P(X=1 α) Logit(X=1 α) α2=0 α2=1 α1=0 α1=1 P(X=1 α) α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2= α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2=1-2.5 Possible Attribute Patterns -2.5 Possible Attribute Patterns

25 DINO Model (Templin & Henson, 2006) Negative interaction: under additive logit model All main effects equal Interaction terms are 1 sum of corresponding lower effects Logit Response Function 1.5 Probability Response Function Logit(X=1 α) α2=0 α2=1 α1=0 α1=1 P(X=1 α) Session 2: Diagnostic Modeling Psychometric Models CONCLUDING REMARKS α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2=1-2.5 Possible Attribute Patterns Session 2 Take Home Points The LCDM uses an ANOVA like approach to map latent attributes onto item responses Uses main effects and interactions for each attribute Uses a logit link function Multiple diagnostic models are subsumed by the LCDM Diagnostic Modeling in Educational and Psychological Settings Session

26 Session 3 Overview Examples of DCMs through applications Educational measurement English proficiency LCDM demonstration in practice Sample results Potential problems in analysis Psychological measurement pathological gambling Simplified LCDM (DINO model) Demonstration of what is possible with DCMs Session 3 Diagnostic Modeling in Educational and Psychological Settings LARGE SCALE LANGUAGE ASSESSMENT USING THE LCDM Introduction With the emphasis of today s academic environment on testing, the focus of formative assessment is growing Among possible formative settings, language assessment has received some attention (e.g., Buck & Tatsuoka, 1998; Jang, 2004; von Davier, 2005) The purpose of this study is to explore the possibility of using the LCDM for the evaluation of the Grammar section of the Examination for the Certificate of Proficiency in English (ECPE) Also provides an example analysis using the LCDM Examination for the Certificate of Proficiency in English (ECPE) The ECPE is a test developed and scored by the English Language Institute of the University of Michigan The ECPE was developed to measure advanced English ability in respondents for which English is not their first language Analysis is for the grammar section of the test 40 multiple choice items (28 items used in analysis) 10 were non operational 2 had difficulties greater than

27 Example Item from Grammar Section An example written to resemble an item in the Grammar section of the ECPE is: I have always snow. to enjoy enjoyed enjoying to enjoyed Session 3 Diagnostic Modeling in Educational and Psychological Settings ECPE ANALYSIS METHODS Examinees and Data A total of 2922 examinees are used to analyze the ECPE Grammar section The average age of examinees was approximately 23 years old Approximately 50% spoke Portuguese and an additional 31% of the examinees spoke Spanish as a first language Attributes Measured by Test Three attributes measured representing knowledge of: Morphosyntactic rules Cohesive rules Lexical rules The full LCDM was estimated using Mplus Marginal maximum likelihood estimation Q matrix characteristics 19 items measuring only one attribute (simple structure) 9 items measuring two attributes 0 items measuring all three attributes

28 ECPE Q matrix Here are the entries for several items from the ECPE Q matrix Item Morphosyntactic Rules Cohesive Rules Lexical Rules Session 3 Diagnostic Modeling in Educational and Psychological Settings LCDM RESULTS LCDM Results To further describe the parameters of the LCDM, several types of results will be presented: Model fit results Item parameter results Inspection of interactions Interpretation Structural parameter results Implied attribute hierarchy Respondent estimates/classifications Model Fit Results Overall model fit Chi square not computable AIC: ; BIC: Used to reduce model Bivariate model fit (Session 4) Compares model predicted and observed frequencies of responses for all pairs of items Of 378 item pairs 45 had p values less than 0.01 Items most indicated Item 13 (9 of 45 pairs) Item 4 (6 of 45 pairs) Item 5 (6 of 45 pairs) Indicates some items are not fit well by model We will ignore this and continue with analysis as example

29 Example Item LCDM Intercepts To demonstrate parameter interpretation, results from item 7 will be shown Attributes measured: Morphosyntactic rules (Attribute 1) Lexical rules (Attribute 3) Parameter estimates: Parameter Estimate SE p-value λ 7, λ 7,1,(1) λ 7,1,(3) λ 7,2,(1,3) Estimated Intercept: (0.095) Indicates the logit of a correct response for a nonmaster of all attributes Here, non masters have an average probability of a correct response: exp( 0.106)/1+exp( 0.106) = 0.47 Hypothesis test is not important Tests whether non masters have a probability of a correct response of 0.5 Problematic when very high Difficult to identify other parameters Indicates issues with test, Q matrix, or attributes Higher Order Model Parameters Examining Interaction Parameters Interpretation of main effects and interactions proceeds sequentially: 2 way interaction parameter: (0.144) If interactions are present: Examine highest level of interaction If significantly different from zero, leave in model If not, term can be omitted If interactions are not present: Examine how far main effect is from zero P value for parameter was small (0.000) Indicates parameter is significantly different from zero Candidate to leave in model Value indicates that there is an under additive effect of mastering both attributes Means mastery of one attribute is sufficient to have high chance to get item correct

30 More on Interactions Interaction pattern for this item indicates that mastery morphosyntatic rules is key to answering correctly Mastery of lexical rules helps, but not above that of mastery of morphosyntatic rules For why this is the case, stay tuned Logit(X=1 α) α3=0 α3=1 α1=0 α1=1 P(X=1 α) α1=0; α3=0 α1=0; α3=1 α1=1; α3=0 α1=1; α3=1 Possible Attribute Patterns Other Interactions Of 9 interaction parameters, 3 were significantly different from zero Candidates to be removed from model Of the 6 non significant interactions 4 had small main effects on one attribute Attribute not highly related to item response Indicates that Q matrix may be incorrect Have to re fit with new Q matrix and look at information criteria (Session 4) Interpreting Main Effects ECPE Item 7 Lexical Main Effect When significant interactions are present, main effects cannot be easily interpreted Sometimes called conditional main effects Need to know combination of attributes mastered to fully describe item response function Main effects in LCDM have added concern Lower bound is zero (for monotonicity) p values are inaccurate as they approach zero Because of the significant interaction, interpretation is conditional When Morphosyntactic Rules have not been mastered: Lexical main effect λ 7,1,(3) = Respondents who have mastered Lexical Rules have an increase in logit of over respondents who are non masters P(X=1 α) α1=0; α3=0 α1=0; α3=1 α1=1; α3=0 α1=1; α3=1 Possible Attribute Patterns

ECPE Item 7 Morphosyntactic Main Effect General Modeling Tips Because of the significant interaction, interpretation is conditional When Lexical Rules have not been mastered: Morphosyntactic main

31 ECPE Item 7 Morphosyntactic Main Effect General Modeling Tips Because of the significant interaction, interpretation is conditional When Lexical Rules have not been mastered: Morphosyntactic main effect λ 7,1,(1) = Respondents who have mastered Morphosyntactic Rules have an increase in logit of over respondents who are nonmasters P(X=1 α) α1=0; α3=0 α1=0; α3=1 α1=1; α3=0 α1=1; α3=1 Possible Attribute Patterns High level interactions are difficult to estimate in most samples More than 2 way interactions may not be possible Modeling strategy: Try all interactions If model does not converge, limit to only 2 way interactions Remove non significant interactions from model If all interactions and main effects for an attribute are close to zero: Entry for attribute in Q matrix can be removed Double check with AIC/BIC as hypothesis test is approximate Attribute Pattern Probabilities Base rate pattern of profiles mastered in sample indicates an attribute hierarchy Lexical Cohesive Morphosyntatic Implications for Item 7 Cannot have morphosyntatic without lexical Suggests information about second language acquisition Example Respondent Estimates Respondent estimates are probabilities of mastery for each attribute Shown for 5 example respondents Test score given to provide comparison Respondent Total Morphosyntactic Cohesive Lexical Score

32 Educational Measurement Wrap Up Demonstrated results from LCDM when applied to English language assessment Investigated model fit Very important as of yet not well understood in DCMs Described item parameter estimates Interpreting interactions/main effects Modeling strategy Session 3 Diagnostic Modeling in Educational and Psychological Settings CONCLUDING REMARKS EDUCATIONAL MEASUREMENT Described structural parameter estimates Useful for understanding latent variables measured by test Described respondent parameter estimates Normally these help understand the knowledge state of a respondent Attribute hierarchy here limits utility of information Gambling Application Overview Study of pathological gambling DSM criteria for pathological gambling Common methods for assessment How diagnostic models could help Session 3 Diagnostic Modeling in Educational and Psychological Settings UNDERSTANDING PATHOLOGICAL GAMBLING Psychometric Model Formulating the LCDM for Likert data (and smaller samples) Adapting structural (or hierarchical) models to evaluate the DSM definition of pathological gambling Pathological Gambling Application Model Development Estimation/Results

33 The Gambling Explosion DSM Definition of Pathological Gambling Exponential increase in accessibility of gambling opportunities: State lotteries Native American tribal casinos Riverboat gambling Internet gambling Incidences of pathological gambling have increased (Volberg, 2002) In order to limit the detrimental effects of gambling on a community: Easily identify potential pathological gamblers and provide treatment interventions Understand the underlying causes of the disorder The DSM IV TR defines pathological gambling as an impulse control disorder (not elsewhere classified) To be classified as a pathological gambler, an individual must meet 5 of 10 defined criteria All are dichotomous Meets/Does not meet DSM CRITERIA C1 Is preoccupied with gambling C2 Needs to gamble with increasing amounts of money in order to achieve the desired excitement C3 Has repeated unsuccessful efforts to control, cut back, or stop gambling C4 Is restless or irritable when attempting to cut down or stop gambling C5 Gambles as a way of escaping from problems or of relieving a dysphoric mood C6 After losing money gambling, often returns another day to get even C7 Lies to family members, therapist, or others to conceal the extend of involvement with gambling C8 Has committed illegal acts such as forgery, fraud, theft, or embezzlement to finance gambling C9 Has jeopardized or lost a significant relationship, job, or educational or career opportunity because of gambling C10 Relies on others to provide money to relieve a desperate financial situation caused by gambling Studying Pathological Gambling The DSM definition has several characteristics which make it seem somewhat implausible: All criteria are treated equally in that the sum of any five will result in the diagnosis of pathological gambling It seems odd to have the following given equal weight: C8 Has committed illegal acts such as forgery, fraud, theft, or embezzlement to finance gambling C1 Is preoccupied with gambling If all criteria are treated equally, does the diagnostic criterion of five or more seem realistic? Session 3 Diagnostic Modeling in Educational and Psychological Settings METHODS DCMs can help to answer both questions

34 Gambling Instruments Take each of the 10 criteria to be the dichotomous latent attributes Applying a DCM would simultaneously provide: Diagnostic information for each individual Underlying structural model parameters Evaluation of the above/below five DSM criteria for pathological diagnosis rule Evaluation of whether all criteria should be treated equivalently Study included 112 experienced gamblers. Participants provided responses to two instruments Gambling research instrument (Henson, Feasel, & Jones, 2000) 41 items; 6 point Likert scale South Oaks Gambling Screen (Lesieur & Blume, 1987) 20 items; binary Used to validate result Psychometric Model The full LCDM was not able to be estimated Small sample size Likert response data Instead, the DINO model was used All Set to be Equal One or more attribute model Two parameters per item (regardless of entries in Q matrix) Shown for a dichotomous item measuring two attributes: Binomial link function used to model Likert responses Polytomous model assuming Binomial distribution conditional on attribute profile Conditional Response Distributions Marginal Response Distributions

35 GRI Structural Model The structural model provides a model for the correlational structure of the attributes (Session 4) A two class mixture was used as the structural model Classes were meant to represent pathological gamblers (PG) and non pathological gamblers (NPG) Help determine how the latent criteria map onto pathological gamblers The mixture structural model allows us to: Calculate the probability that each criterion is met given an individual is a PG or a NPG Determine the criteria that best discriminate between PG and NPG Calculate the probability of being a PG based on the number of criteria met Evaluate the DSM stated criteria of 5 or more to be diagnosed PG Model Estimation Created a Markov Chain Monte Carlo estimation algorithm in Fortran Uniform prior for all item parameters Latent traits (α) modeled with empirical prior defined by structural model Uniform prior for all structural model parameters Chain length of 50,000 (burn in of 40,000) Convergence check: Geweke test Visual inspection of timeseries plots Algorithm Convergence Session 3 Diagnostic Modeling in Educational and Psychological Settings MODEL RESULTS

36 Results To Be Presented Fit check: Model fit evaluation Usability: Diagnostic estimates of gamblers DSM criteria profile Validation: How GRI/DCM diagnoses correspond to SOGS diagnoses Interpretation: Item parameter estimates Structural model estimates: Criteria with differential discrimination between PG and NPG How many criteria are indicative of PG Checking Goodness of Fit Typical measures of goodness of fit were unreasonable due to a sparse contingency table of responses (6 41 possible response patterns) Monte Carlo fit index was constructed (based on Langeheine et. al, 1996) for bivariate item statistics (Maydeu Olivares, A. & Joe, H. 2005) Root Mean Squared Residual (RMSR) of the Pearson correlation was used as a criterion Correlation RMSR = (p = 0.486) Indicates adequate fit Respondent Diagnoses Respondent Diagnoses

Criterion Validity SOGS Classification DCM NPG PG Total Classification

with SOGS 89.3% matching classifications Cohen s Kappa: 0.

for PG Blue bar: Average response for NPG Item 5 [C2]: I find it

gambled) for gambling to be exciting Item 13 [C3 or C4]: I find it

37 Criterion Validity SOGS Classification DCM NPG PG Total Classification NPG PG Total Compared GRI/DCM classification with SOGS 89.3% matching classifications Cohen s Kappa: 0.69 Item Parameter Interpretation Bar graph: Red bar: Average response for PG Blue bar: Average response for NPG Item 5 [C2]: I find it necessary to gamble with larger amounts of money (than when I first gambled) for gambling to be exciting Item 13 [C3 or C4]: I find it difficult to stop gambling Structural Model Estimates Evaluating the DSM 5 or More Rule

38 Concluding Remarks: Gambling Talk DCM respondent estimates give rich information about the pattern of satisfied criteria Could be used to tailor treatment strategies A better definition of PG would be one who meets at least FOUR or more criteria Session 3 Diagnostic Modeling in Educational and Psychological Settings CONCLUDING REMARKS PSYCHOLOGICAL MEASUREMENT Results suggest that Criteria 1, 3, and 10 are more discriminating of PG than other criteria Criteria such as 2, 5, and 7 have relatively high probability of being met by NPG (more than 20% chance) Weaker indicators of pathological gambling Wrap Up and Take Home Points Session 3 demonstrated some potentials uses of DCMs Session 3 Diagnostic Modeling in Educational and Psychological Settings CONCLUDING REMARKS SESSION Applications of DCMs are rare Tests haven t been built to measure categorical attributes Item information is different in DCMs Users haven t had access to software To date, most applications use software built by researchers MCMC in Fortran or WinBugs MML in Fortran This is about to change

39 Notes on Usefulness of DCMs Full utility of DCMs cannot be understood unless applications become more frequent For now, have to use sub optimal data and problems Future applications coming soon Mathematical reasoning test under development (NSF funded) Assessment of readiness for first grade in kindergartners Funding opportunities exist and seem to review well Educational Measurement: NSF (DR K12); IES (Goals 2 and 5) Psychological Measurement: NIH (NIMH; NIDA; NIA; ) Advanced Topics: Structural Models, Model Fit, and Respondent Estimates Session 4 Industry seems interested ETS/College Board/ACT/Measurement Inc. Typically proprietary dangerous for academics 153 Session Overview Session 4 will provide the advanced topics needed to apply DCMs Understanding structural models What they are How to summarize them Differing types Assessment of model fit How respondent diagnoses are made WARNING: Content can be very technical But fun, though Notation Used Throughout Session Attributes: a = 1,, A Respondents: r = 1,,R Attribute Profiles: α r = [α r1, α r2,, α ra ] α ra is 0 or 1 Latent Classes: c = 1,,C We have C = 2 A latent classes one for each possible attribute profile Items: i = 1,,I Restricted to dichotomous item responses (X ri is 0 or 1) Q matrix: Elements q ia for an item i and attribute a q ia is 0 or

40 DCM Structural Models Throughout the workshop, attribute profile base rates have been mentioned as being influential in DCMs Part of respondent diagnoses (to be shown) Describes nature of attribute profiles ECPE discovered apparent attribute hierarchy Gambling study provided feedback on DSM criteria rules Session 4: Advanced Topics Structural Models, Model Fit, and Respondent Estimates STRUCTURAL MODELS The base rates represent the probability any respondent has a given attribute profile For a test measuring A attributes, 2 A profiles are possible The structural model provides the probability for each profile DCM Structural Models Defined Interpreting the Structural Model The parameter for the structural model is η c Each attribute profile c has one η c is the base rate probability of attribute profile c: The ECPE estimates of η c are shown to the right c η c α 1 α 2 α Because there are numerous η c parameters, interpretation is difficult Useful for detecting attribute hierarchies Often, the η c parameters are re expressed as: The marginal probability an attribute is mastered in the population The correlation between any two attributes Both can be computed using a frequency analysis weighted by η c

SAS Structural Model Summary SAS can be used to compute summaries of the structural model parameters SAS Structural Model Summary For each attribute, marginally: Proportion of Masters 161 162 SAS

41 SAS Structural Model Summary SAS can be used to compute summaries of the structural model parameters SAS Structural Model Summary For each attribute, marginally: Proportion of Masters SAS Structural Model Summary Attribute Summary For each pair of attributes: Tetrachoric Correlation For the ECPE data, we have the following summary of attribute summary information Attribute Prop. Tetrachoric Correlation Masters 1. Morphosyntatic Cohesive Lexical Such information is helpful in determining nature of attributes in a population of interest Analogous to information about latent variables in CFA/MIRT

42 Differing Structural Models The structural model of a DCM has the potential to have an overwhelming number of parameters For A attributes, total estimated: 2 A 1 All must sum to 1 Saturated model Multiple structural models exist All reduce the number of parameters All use categorical data analysis techniques to model η c Analogous to latent variable covariance structure in structural equation modeling Distribution of attributes is categorical, not continuous Types of Structural Models Log linear model Predicts the natural logarithm of η c by the attributes in each profile Allows for varying levels of complexity Most: Saturated Model Least: Independent Attributes Model Implemented in Mplus (see session 5) and main focus of discussion today Tetrachoric correlation model Provides an item factor model for latent attributes Uses only bivariate information for pairs of attributes Allows for covariance structures to be estimated Not available in any software packages (but also shown briefly today) Hierarchical factors model Special case of tetrachoric correlation model Mixture models Shown in gambling example Also given by von Davier (2008) Log Linear Structural Models Log Linear Model for μ c The log linear structural model is the easiest to implement Due to its availability in Mplus μ c is the natural logarithm of η c c η c μ c α 1 α 2 α The structural model then uses an ANOVA like model to predict the value of μ c as a function of the attributes that are defined in attribute profile c Shown for 3 attribute model Includes main effects, 2 way, and 3 way interactions All parameters must sum to zero for identifiability Intercept and Main effects 2 way and 3 way interactions

43 Log Linear Structural Model Notation The log linear structural model parameters have several subscripts: Subscript #1 e: the level of the effect 0 is the intercept 1 is the main effect 2 is the two way interaction 3 is the three way interaction Subscript #2 (a 1, ): the attributes the effect applies to Same number of attributes listed as number in Subscript #2 Log Linear Model Explained Because not all attribute profiles include all attributes, only some terms get used to predict each value of μ c For attribute profile 1: α 1 = [α 11 = 0; α 12 = 0; α 13 = 0]: Only the intercept applies Log Linear Model Explained For attribute profile 2: α 1 = [α 11 = 0; α 12 = 0; α 13 = 1]: Log Linear Model Explained For attribute profile 6: α 1 = [α 11 = 1; α 12 = 0; α 13 = 1]: The intercept and main effect of attribute 2 apply The intercept, main effects of attribute 1 and attribute 3, and interaction between attributes 1 and 3 apply

44 Log Linear Model Explained For attribute profile 8: α 1 = [α 11 = 1; α 12 = 1; α 13 = 1]: Interpretations of Model Parameters The log linear model with ALL main effects and interactions is statistically equivalent to the saturated structural model Two way interactions are analogous to bivariate correlations in categorical models Higher level interactions represent higher level of characteristics of attribute distribution (i.e., skewness, kurtosis, etc ) All parameters apply Models without interactions imply uncorrelated attributes Main effects are essentially attribute base rates Models without main effects or interactions assume all attribute profiles are equally likely Higher order interactions can be removed if not significantly different from zero Log Linear Model for ECPE To demonstrate the log linear model, we again present our ECPE data Full model (all parameters) Parameter Estimate SE p-value γ γ 1,(1) γ 1,(2) γ 1,(3) γ 2,(1,2) γ 2,(1,3) γ 2,(2,3) γ 3,(1,2,3) Reductions in the Structural Model Because the three way interaction was not significant, we can remove that parameter from the model without greatly affecting model fit New results: Parameter Estimate SE p-value γ γ 1,(1) γ 1,(2) γ 1,(3) γ 2,(1,2) γ 2,(1,3) γ 2,(2,3)

45 New Results for Attribute Probabilities The reduced model only slightly modifies the attribute probabilities: c Original η c New η c Session 4: Advanced Topics Structural Models, Model Fit, and Respondent Estimates TETRACHORIC STRUCTURAL MODELS Tetrachoric Structural Models Because most summary information is given about attributes and pairs of attributes, tetrachoric models have been developed Such models use the tetrachoric correlation between attributes as a model for the probability for each attribute pattern Defining Tetrachoric Correlations The tetrachoric correlation is a measure of the association between two binary variables The correlation comes from mapping the binary variables onto two underlying continuous variables Each of the continuous variables is bisected by a threshold which transforms the continuous response into a categorical outcome The distribution of the underlying continuous variables is Available in Arpeggio software Assessment Systems Corporation ρ is the tetrachoric correlation coefficient

46 Tetrachoric Correlation Explained Technical Specifics: Multivariate Attributes The tetrachoric models assume use the following function to model the probability of an attribute profile: Tetrachoric Correlation Matrix Multivariate Normal Density Where: Structured Matrices Placing a structure on the Ξ tetrachoric correlation matrix expands the model to mimic SEM (Templin & Henson, 2006) Session 4: Advanced Topics Structural Models, Model Fit, and Respondent Estimates ASSESSMENT OF MODEL FIT

47 Assessing Model Fit There is no one best way to assess fit in DCMs Techniques typically used can put into several general categories: Absolute fit Model based hypothesis tests (if available) Entropy Relative fit Information criteria Item fit Topics discussed here will focus on fit statistics available in Mplus (also discussed in Session 5) Overall Model Fit: Chi Squared Test For small numbers of items (10 15), the traditional Chi Squared test of model fit can be used Test is invalid for too many items sparse data Shown for 28 item ECPE Mplus gives this automatically Omits when data are sparse Can omit extreme cells from an analysis Misleading Overall Model Fit: (Relative) Entropy The entropy of a model is a measure of classification uncertainty It is an absolute fit statistic Mplus reports relative entropy Value of 1.00 means all respondents classified with complete certainty (good fit) Value of 0.00 means all respondents classified with equal probabilities for all classes (poor fit) ECPE (relative) entropy: Hard to interpret by itself Relative Model Fit: Information Criteria Used when comparing between two models: Two DCMs (LCDM v. DINA) Two Q matrices (4 v. 5 attributes) Two different models (IRT v. DCM) Mplus reports: AIC and BIC Sample size adjusted BIC All can be used Smallest value is best

Item Fit Statistics The TECH10 option reports a degree of misfit for each Item individually (Univariate) Pair of two items (Bivariate) Uses Chi squared test for misfit Values for each item are

48 Item Fit Statistics The TECH10 option reports a degree of misfit for each Item individually (Univariate) Pair of two items (Bivariate) Uses Chi squared test for misfit Values for each item are distributed as Chi square with 1 df (for binary items) Misfitting items can be investigated Q matrix can be changed Items can be removed Item Fit Statistics: Univariate Fit Univariate fit attempts to determine if the model fits each item marginally A limited information statistic Not useful in DCMs Model is for probability Will always fit perfectly Item Fit Statistics: Bivariate Fit Bivariate fit is an index of fit for a pair of items Compares observed data with frequency expected under DCM Produces a 1 df Chi Squared test Can help identify items that do not fit model Rough approximation Concluding Remarks: Model Fit Assessment of model fit in DCMs is currently a difficult task Easily accessible options are limited Can quickly find options that take longer to assess fit than to estimate model Mplus options are adequate for initial screening DCMs share this problem with IRT models General categorical data analyses Other model fit options are available and forthcoming Based on limited information (i.e., Templin, 2007) Need further testing

Diagnostic Classification Models

Diagnostic Classification Models Lecture #13 ICPSR Item Response Theory Workshop Lecture #13: 1of 86 Lecture Overview Key definitions Conceptual example Example uses of diagnostic models in education Classroom