Workshop Overview. Diagnostic Measurement. Theory, Methods, and Applications. Session Overview. Conceptual Foundations of. Workshop Sessions:

Size: px
Start display at page:

Download "Workshop Overview. Diagnostic Measurement. Theory, Methods, and Applications. Session Overview. Conceptual Foundations of. Workshop Sessions:"

Transcription

1 Workshop Overview Workshop Sessions: Diagnostic Measurement: Theory, Methods, and Applications Jonathan Templin The University of Georgia Session 1 Conceptual Foundations of Diagnostic Measurement Session 2 Diagnostic Modeling Psychometric Models Session 3 Diagnostic Modeling in Educational and Psychological Settings Session 4 Advanced Concepts Session 5 Estimation of Diagnostic Classification Models with Mplus 2 Session Overview Key definitions Conceptual Foundations of Diagnostic Measurement Session 1 Conceptual example Example uses of diagnostic models in education Classroom use (formative assessment) Large scale testing use (summative assessment) Why diagnostic models should be used instead of traditional classification methods Concluding remarks 4 1

2 What are Diagnoses? The word and meaning of diagnosis is common in language Session 1: Conceptual Foundations of Diagnostic Measurement DEFINITIONS Meaning of diagnoses are deeply ingrained in our society Seldom merits a second thought 5 6 Definitions American Heritage Dictionary definition of diagnosis: Generally (a) A critical analysis of the nature of something (b) The conclusion reached by such analysis Medicine (a) The act or process of identifying of determining the nature and cause of a disease or injury through evaluation of a patient s history, examination, and review of laboratory data (b) The opinion derived from such an evaluation Biology (a) A brief description of the distinguishing characteristics of an organism, as for taxonomic classification (p. 500) Diagnosis: Defined A diagnosis is the decision that is being made based on information Within psychological testing, providing a test score gives the information that is used for a diagnosis BUT, the score is not the diagnosis For this workshop, a diagnosis is by its nature discrete Classification 7 8 2

3 Day to Day Diagnosis Decisions happen every day: Decide to wear a coat or bring an umbrella Decide to study Decide what to watch on TV tonight In all cases: Information (or data) is collected Inferences are made from data based on what is likely to be the true state of reality Diagnosis (Formalized) In diagnostic measurement, the procedures of diagnosis are formalized: We make a set of observations Usually through a set of test questions Based on these questions we make a decision as to the underlying state (or states) of a person The decision is the diagnosis 9 10 Diagnosis (Formalized) Diagnoses featured in this workshop: Educational Measurement The competencies (skills) that a person has or has not mastered Leads to possible tailored instruction and remediation Psychiatric Assessment The DSM criteria that a person meets Leads to a broader diagnosis of a disorder Workshop Terminology Respondents: The people from whom behavioral data are collected Behavioral data considered test item responses for workshop Not limited to only item responses Items: Test items used to classify/diagnose respondents Diagnostic Assessment: The method used to elicit behavioral data Attributes: Unobserved dichotomous characteristics underlying the behaviors (i.e., diagnostic status) Latent variables linked to behaviors diagnostic classification models Psychometric Models: Models used to analyze item response data Diagnostic Classification Models (DCMs) is the name of the models used to obtain classifications/diagnoses

4 Diagnostic Classification Model Names Diagnostic classification models (DCMs) have been called many different things Skills assessment models Cognitive diagnosis models Cognitive psychometric models Latent response models Restricted (constrained) latent class models Multiple classification models Structured located latent class models Structured item response theory Psychometric Soapbox DCMs are but a small set of tools that must be adapted for a common purpose Part of a methodological toolbox that is used to classify respondents Should also include content experts and end users of the diagnoses DCMs link empirical observations and respondents characteristics The models are only as good as underlying theories Diagnostic Modeling Concepts Imagine that an elementary teacher wants to test basic math ability Session 1: Conceptual Foundations of Diagnostic Measurement CONCEPTUAL EXAMPLE Using traditional psychometric approaches, the teacher could estimate an ability or test score for each respondent Classical Test Theory: Assign respondents a test score Item Response Theory: Assign respondents a latent (scaled) score By knowing each respondent s score, the students are ordered along a continuum

5 Traditional Psychometrics Traditional Psychometrics Low Mathematics Ability at UGA High What results is a (weak) ordering of respondents Ordering is called weak because of error in estimates Seock Ho Kim > Allan Cohen > Jonathan Templin Questions that traditional psychometrics cannot answer: Why is Jonathan so low? How can we get him some help? How much ability is enough to pass? How much is enough to be proficient? Jonathan Allan Cohen Seock Ho Kim What math skills have the students mastered? Multiple Dimensions of Ability Ability from a Diagnostic Perspective As an alternative, we could have expressed math ability as a set of basic skills: Has Mastered Has Not Mastered Addition Addition Subtraction Subtraction Multiplication Division Multiplication Division

6 Multiple Dimensions of Ability The set of skills represent the multiple dimensions of elementary mathematics ability Other psychometric approaches have been developed for multiple dimensions Classical Test Theory Scale Subscores Multidimensional Item Response Theory (MIRT) Yet, issues in application have remained: Reliability of estimates is often poor for most practical test lengths Dimensions are often very highly correlated Large samples are needed to calibrate item parameters in MIRT DCMs as an Alternative DCMs do not assign a single score Instead, a profile of mastered attributes is given to respondents Multidimensional models DCMs provide respondents valuable information with fewer data demands Higher reliability than comparable IRT/MIRT models Complex item structures possible Path Diagram of Traditional Psychometrics Psychometric Model Comparison Basic Math Ability Addition Subtraction Multiplication Division /2 (4x2)+3 Using Traditional Models Has a score of 20 Has a 75%, a grade of C Is in the 60 th percentile of math Scored above the cut off, passes math Using Diagnostic Models Is proficient using addition Is proficient using subtraction Should work on Multiplication Should work on Division

7 DCM Specifics Let s expand on the idea of the basic math test Possible items may be: /2 (4 x 2) + 3 Not all items measure all attributes A Q matrix is used to indicate the attributes measured by each item This is the factor pattern matrix that assigns the loadings in confirmatory factor analysis The Q Matrix An example of a Q matrix using our math test Add Sub Mult Div / (4 x 2) Respondent Profiles Expected Responses to Items Respondents are characterized by profiles specifying which attributes have been mastered Numeric values are arbitrary, but for our purposes Mastery given a 1 Non mastery given a 0 For example: Add Sub Mult Div Respondent A Respondent profile estimates are in the form of probabilities of mastery Q matrix Add Sub Mult Div / (4 x 2) Respondent Mastery Add Sub Mult Div Respondent Respondent Respondent Respondent By knowing which attributes are measured by each item and which attributes have been mastered by each respondent, we can determine the items that will likely be answered correctly by each respondent Prob Ans #1 Prob Ans #2 Prob Ans #3 Prob Ans #1 & #

8 DCM Scoring and Score Reporting DCM Conceptual Summary DCMs focus on WHY a respondent is not performing well as compared to only focusing on WHO The models define the chances of a correct response based on the respondent s attribute profile Many models have been created ranging in complexity In Session #2 we discuss a general DCM The general model subsumes all other latent variable DCMs The model predicts how respondents will answer each item Also allows for classification/diagnoses based on item responses from Templin (2007) How do DCMs Produce Diagnoses? Diagnostic decisions come from comparing observed behaviors to two parts of the psychometric model: 1. Item/variable information (item parameters) How respondents with different diagnostic profiles perform on a set of test items Helps determine which items are better at discriminating between respondents with differing diagnostic profiles 2. Respondent information pertaining to the baserate or proportion of respondents with diagnoses in the Structural Model population Provides frequency of diagnosis (or diagnostic profile) Measurement Model Conceptual Model Mapping in DCMs Helps validate the plausibility of the observed diagnostic profiles

9 DCMs In Practice To demonstrate the potential benefits of using DCMs, I present a brief example of their use From Henson & Templin (2008); Templin & Henson (2008) An urban county in a southern state wanted to improve student s End Of Course (EOC) scores on the state s 10 th grade Algebra 2 exam Session 1: Conceptual Foundations of Diagnostic Measurement USES OF DIAGNOSTIC MODEL RESPONDENT ESTIMATES A benchmark test was given in the middle of a semester Formative test designed to help teachers focus instruction Respondents and their teachers received DCM estimates Used these to characterize student proficiency levels with respect to 5 state specified goals for Algebra 2 (standards) DCM Study The benchmark test was developed for use with a DCM Characteristics of the test were fixed via standard setting Five attributes were measured Mastery was defined as meeting the proficient level for each attribute Attributes were largest represented in EOC exam Respondents then took the EOC exam 50 item test: Score of 33+ considered proficient Benchmark estimates linked to EOC estimates Descriptive Statistics of Attribute Patterns First, the basic descriptive statistics for each possible pattern What we expect a respondent with a given attribute pattern to score on the EOC test Next slides describe how DCMs can help guide instruction

10 Gain by Mastery of Each Attribute Pathways to Proficiency The difference in test score between masters and non masters of an attribute can be quantified Correlation between attribute and EOC score indicates amount of gain in EOC score by mastery of attribute Note: 50 item test DCMs can be used to form of a learning path a respondent can follow that would most quickly lead to proficiency on the EOC test The pathway tells the respondent and the teacher the sequence of attributes to learn next that will provide the biggest increase in test score This mechanism may help teachers decide focus on when teaching a course Balances time spent on instruction with impact on test score Provides a practical implementation of DCMs in today s classroom testing environment Proficiency Road Map Fast Path to Proficiency

11 Harder Paths to Proficiency Some paths are less efficient at increasing EOC test scores Session 1: Conceptual Foundations of Diagnostic Measurement IMPLICATIONS FOR LARGE SCALE TESTING PROGRAMS DCM Characteristics Theoretical Reliability Comparison As mentioned previously, DCMs provide a higher level of reliability for their estimates than comparable IRT or CTT models (Templin & Bradshaw, in press) It is easier to place a respondent into one of two groups (mastery or non mastery) than to locate them on a scale Such characteristics allow DCMs to potentially change how large scale testing is conducted Most EOC type tests are for classification Proficiency standards DCMs provide direct link to classification And direct access to standards Reliability DCM IRT Reliability Level DCM IRT Items 34 Items Items 48 Items Items 77 Items Number of Items

12 Uni and Multidimensional Comparison DCMs for an EOC Test Reliability DCM IRT DCM IRT DCM IRT Reliability PL ρ θ =.87 2 Category 3 Category 4 Category Category: 24 Items 3 Category: 42 Items 4 Category: 50 Items 5 Category: 54 Items 5 Category Dimension 2-Dimension BiFactor Dimensional Model Number of Items Ramifications for Use of DCMs Reliable measurement of multiple dimensions is possible Two attribute DCM application to empirical data: Reliabilities of 0.95 and 0.90 (compared to 0.72 and 0.70 for IRT) Multidimensional proficiency standards Respondents must demonstrate proficiency on multiple areas to be considered proficient for an overall content domain Teaching to the test would therefore represent covering more curricular content to best prepare respondents Shorter unidimensional tests Two category unidimensional DCM application to empirical data: Test needed only 24 items to have same reliability as IRT with 73 items The Paradox of DCMs DCMs are often pitched as models that allow for measurement of fine grained skills (e.g., Rupp & Templin, 2008) Paradox of DCMs: Sacrifice fine grained measurement of a latent trait for only several categories Increased capacity to measure ability multidimensionally

13 When Are DCMs Appropriate? Which situations lend themselves more naturally to such diagnosis? The purpose of the diagnostic assessment matters most DCMs provide classifications directly Optimally used when tests are used for classification EOC Tests Licensure/certification Clinical screening College entrance Placement tests DCMs can be used as coarse approximations to continuous latent variable models i.e., EOG example (2 5 category levels shown) Session 1: Conceptual Foundations of Diagnostic Measurement BENEFITS OF DCMS OVER TRADITIONAL CLASSIFICATION METHODS Previous Methods for Classification Making diagnoses on the basis of test responses is not a new concept Classical test theory Item response theory Factor analysis Process is a two stage procedure 1. Scale respondents 2. Find appropriate cut scores Classify respondents based on cut scores Problems with the Two Stage Approach The two stage procedure allows for multiple sources of error to affect the results 1. The latent variable scores themselves: estimation error Uncertainty is typically not accounted for in the subsequent classification of respondents (i.e., standard errors) The classification of respondents at different locations on the score continuum with multiple cut scores is differentially precise Uncertainty of the latent variable scores varies as a function of the location of the score

14 Problems with the Two Stage Approach 2. Latent variable assumptions: that latent variable scores follow a continuous, typically normal, distribution Estimates reflect the assumed distribution Can introduce errors if the assumption is incorrect 3. Cut score determination Standard setting is imprecise when used with general abilities Standard setting methods can be directed to item performance Some theoretical justification needs to be provided for such a cut off Why are DCMs Better for Classification? The need for a two stage procedure to set cut scores for classification is eliminated when DCMs are used Reduces classification error Quantifies and models the measurement error of the observable variables Controlling for measurement error when producing the diagnosis DCMs have a natural and direct mechanism for incorporating base rate information into the analysis No direct way to do so objectively in two stage procedures Item parameters provide information as to the diagnostic quality of each item Not directly estimable in two stage approaches Can be used to build tests that optimally separate respondents Session 1 Take home Points DCMs provide direct link between diagnosis and behavior Provide diagnostic classifications directly Diagnoses set by psychometric model parameters DCMs are effective if classification is the ultimate purpose Reduce error by removing judgments necessary in two stage approach Session 1: Conceptual Foundations of Diagnostic Measurement CONCLUDING REMARKS DCMs can be used in many contexts Can be used to create highly informative tests Can be used to measure multiple dimensions DCMs are in their infancy Time will tell their effectiveness

15 Development of Psychometric Models Diagnostic Modeling: Psychometric Models Session 2 Over the past several years, numerous DCMs have been developed We will focus on DCMs that use latent variables for attributes Each DCM makes assumptions about how mastered attributes combine/interact to produce an item response Compensatory/disjunctive/additive models Non compensatory/conjunctive/non additive models With so many models, analysts have been unsure which model would best fit their purpose Difficult to imagine all items following same assumptions 58 General Models for Diagnosis Recent developments have produced very general diagnostic models General Diagnostic Model (GDM; von Davier, 2005) Loglinear Cognitive Diagnosis Model (LCDM; Henson, Templin, & Willse, 2009) Focus of this session The general DCMs (GDM; LCDM) provide great flexibility Subsume all other latent variable DCMs Allow for both additive and non additive relationships between attributes and items Sync with other psychometric models allowing for greater understanding of modeling process Session Overview Background information ANOVA models and the LCDM Logits explained The LCDM Parameter structure One item demonstration LCDM general form Linking the LCDM to other earlier developed DCMs

16 Notation Used Throughout Session Attributes: a = 1,, A Respondents: r = 1,,R Attribute Profiles: α r = [α r1, α r2,, α ra ] α ra is 0 or 1 Latent Classes: c = 1,,C We have C = 2 A latent classes one for each possible attribute profile Items: i = 1,,I Restricted to dichotomous item responses (X ri is 0 or 1) Q matrix: Elements q ia for an item i and attribute a q ia is 0 or 1 Session 2: Diagnostic Modeling Psychometric Models BACKGROUND INFORMATION: ANOVA MODELS Background Information ANOVA The LCDM models the probability of a correct response to an item as a function of the latent attributes of a respondent α = 0 α = 1 The latent attributes are categorical, meaning a respondent can have only a few possible statuses Each status corresponds to a predicted probability of a correct response P(X = 1) α = 0 α = 1 P(X=1) P(X = 1) As such, the LCDM is very similar to an ANOVA model Predicting the a dependent variable as a function of the experimental group of a respondent ANOVA Refresher As a refresher on ANOVA, lets imagine that we are interested in the factors that have an effect on work output (denoted by Y) We design a two factor study where work output may be affected by: Lighting of the workplace High or Low Temperature Cold or Warm This experimental design is known as a 2 Way ANOVA

17 ANOVA Model Here is the 2 x 2 Factorial design: ANOVA Model The ANOVA model allows us to test for the presence of: Cold Temperature Low Lighting High Lighting A main effect associated with Temperature (A t ) A main effect associated with Lighting (B l ) Warm Temperature An interaction effect associated with Temperature and Lighting (AB) tl The ANOVA model for a respondent s work output is ANOVA with Dummy Coded Variables ANOVA with Dummy Coded Variables The ANOVA model can also be re written using two dummy coded variables D rt and D rl Becomes a linear model (i.e., regression model) The ANOVA model then becomes: D rl = 0 Low Lighting D rl = 1 High Lighting D rt = 0 Cold Temperature D rt D rt =0 for respondents in cold temperature condition D rt =1 for respondents in warm temperature condition D rt = 1 Warm Temperature D light D rl =0 for respondents in low lighting condition D rl =1 for respondents in high lighting condition

18 ANOVA Effects Explained β 0 is the mean for the cold and low light condition (reference group) The intercept β t is the change of the mean when comparing cold to warm temperature for a business with low lights (Simple Main Effect) β l is the change of the mean when comparing low to high lights for a business with a cold temperature (Simple Main Effect) β t*l is additional mean change that is not explained by the shift in temperature and shift and lights, when both occur (2 Way Interaction) Respondents from in the same condition have the same predicted value ANOVA and the LCDM The ANOVA model and the LCDM take the same modeling approach Predict a response using dummy coded variables In LCDM dummy coded variables are latent attributes Using a set of main effects and interactions Links attributes to item response Where possible, we may look for ways to reduce the model Removing non significant interactions and/or main effects Differences Between LCDM and ANOVA The LCDM and the ANOVA model differ in two ways: Instead of a continuous outcome such as work output the LCDM models a function of the probability of a correct response The logit of a correct response (defined next) Instead of observed factors as predictors the LCDM uses discrete latent variables (the attributes being measured) Attributes are given dummy codes (act as latent factors) α ra = 1 if respondent r has mastered attribute a α ra = 0 if respondent r has not mastered attribute a Session 2: Diagnostic Modeling Psychometric Models LOGITS EXPLAINED

19 Model Background More on Logits Just as in IRT models, the LCDM models the log odds of a correct response conditional on a respondent s attribute pattern α r The log odds is called a logit The logit is used because the responses are binary Items are either answered correctly (1) or incorrectly (0) Probability Logit The linear model is inappropriate for categorical data Can lead to impossible predictions (i.e., probabilities greater than 1 or less than 0) From Logits to Probabilities Whereas logits are useful as the are unbounded continuous variables, categorical data analyses rely on estimated probabilities The inverse logit function coverts the unbounded logit to a probability This is also the form of an IRT model (and logistic regression) Session 2: Diagnostic Modeling Psychometric Models THE LCDM

20 Building the LCDM To demonstrate the LCDM, consider the item 2+3 1=? from our basic math example The item measured addition (attribute 1) and subtraction (attribute 2) Only attributes defined by the Q matrix are modeled for an item The LCDM provides the logit of a correct response as a function of the latent attributes mastered by a respondent: LCDM Explained logit(x ri = 1) is the logit of a correct response to item i by respondent r λ i,0 is the intercept The logit for non masters of addition and subtraction The reference group is respondents who have not mastered either attribute (α r1 = 0 and α r2 = 0) LCDM Explained Understanding LCDM Notation The LCDM item parameters have several subscripts: λ i,1,(1) = main effect for addition (attribute 1) The increase in the logit for mastering addition (in someone who has not also mastered subtraction) λ i,1,(2) = main effect for subtraction (attribute 2) The increase in the logit for mastering subtraction (in someone who has not also mastered addition) λ i,2,(1,2) is the interaction between addition and subtraction (attributes 1 and 2) Change in the logit for mastering both addition & subtraction 79 Subscript #1 i: the item to which parameters belong Subscript #2 e: the level of the effect 0 is the intercept 1 is the main effect 2 is the two way interaction 3 is the three way interaction Subscript #3 (a 1, ): the attributes to which the effect applies Same number of attributes listed as number in Subscript #

21 LCDM with Example Numbers Imagine we obtained the following estimates for the simple math item: Parameter Estimate Effect Name λ i,0-2 Intercept λ i,1,(1) 2 Addition Simple Main Effect Session 2: Diagnostic Modeling Psychometric Models LCDM: A NUMERICAL EXAMPLE λ i,1,(2) 1 Subtraction Simple Main Effect λ i,2,(1,2) 0 Addition/Subtraction Interaction LCDM Predicted Logits and Probabilities LCDM Interaction Plots α 1 α 2 LCDM Logit Function Logit Probability 0 0 λ i,0 + λ i,1,(1) *(0) + λ i,1,(2) *(0) + λ i,2,(1,2) *(0)*(0) λ i,0 + λ i,1,(1) *(0) + λ i,1,(2) *(1) + λ i,2,(1,2) *(0)*(1) λ i,0 + λ i,1,(1) *(1) + λ i,1,(2) *(0) + λ i,2,(1,2) *(1)*(0) λ i,0 + λ i,1,(1) *(1) + λ i,1,(2) *(1) + λ i,2,(1,2) *(1)*(1) Logit Response Function Probability Response Function The LCDM interaction term can be investigated via plots No interaction: parallel lines for the logit Compensatory RUM (Hartz, 2002) Logit Response Function 1.5 Probability Response Function Logit(X=1 α) α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2=1 Possible Attribute Patterns P(X=1 α) α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2=1 Possible Attribute Patterns Logit(X=1 α) α2=0 α2=1 α1=0 α1=1 P(X=1 α) α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2=1 Possible Attribute Patterns

22 Strong Positive Interactions Positive interaction: over additive logit model Conjunctive model (i.e., all or none) DINA model (Haertel, 1989; Junker & Sijtsma, 1999) Strong Negative Interactions Negative interaction: under additive logit model Disjunctive model (i.e., one or more) DINO model (Templin & Henson, 2006) Logit Response Function Probability Response Function Logit Response Function Probability Response Function Logit(X=1 α) α2=0 α2=1 α1=0 α1=1 P(X=1 α) Logit(X=1 α) α2=0 α2=1 α1=0 α1=1 P(X=1 α) α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2= α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2=1-2.5 Possible Attribute Patterns -2.5 Possible Attribute Patterns Less Extreme Interactions Extreme interactions are unlikely in practice Below: positive interaction with positive main effects Logit Response Function Probability Response Function Logit(X=1 α) α2=0 α2=1 α1=0 α1=1 GENERAL P(X=1 α) Session 2: Diagnostic Modeling Psychometric Models FORM OF THE LCDM α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2=1 Possible Attribute Patterns

23 More General Versions of the LCDM The LCDM is based on the General Diagnostic Model by von Davier (GDM; 2005) The GDM allows for both categorical and continuous latent variables For items measuring more than two attributes, higher level interactions are possible Difficult to estimate in practice General Form of the LCDM The LCDM specifies the probability of a correct response as a function of a set of attributes and a Q matrix: The term in the exponent is the logit we have been using all along Logit(X ri =1 α r ) The LCDM appears in the psychometric literature in a more general form See Henson, Templin, & Willse (2009) 89 Intercepts Main Effects Two-Way Interactions Higher Interactions 90 Previously Popular DCMs Because the advent of the GDM and LCDM has been fairly recent, other earlier DCMs are still in use Such DCMs are much more restrictive than the LCDM Not discussed at length here It is anticipated that field will adapt to more general forms Session 2: Diagnostic Modeling Psychometric Models SUBSUMED MODELS Each of these models can be fit using the LCDM Fixing certain model parameters Shown for reference purposes See Henson, Templin, & Willse (2009) for more detail

24 Other DCMs with the LCDM The Big 6 DCMs with latent variables: DINA (Deterministic Inputs, Noisy AND Gate) Haertel (1989); Junker and Sijtsma (1999) NIDA (Noisy Inputs, Deterministic AND Gate) Maris (1995) RUM (Reparameterized Unified Model) Hartz (2002) DINO (Deterministic Inputs, Noisy OR Gate) Templin & Henson (2006) NIDO (Noisy Inputs, Deterministic OR Gate) Templin (2006) C RUM (Compensatory Reparameterized Unified Model) Hartz (2002) LCDM Parameters Main Effects Other DCMs with the LCDM Non compensatory Models Compensatory Models DINA NIDA NC RUM DINO NIDO C RUM Zero Positive Positive Positive Positive Positive Interactions Positive Positive Positive Negative Zero Zero Parameter Restrictions Across Attributes Across Items Across Attributes Across Items Adapted from: Rupp, Templin, and Henson (forthcoming, 2010) Compensatory RUM (Hartz, 2002) No interactions in model No interaction: parallel lines for the logit DINA Model (Haertel, 1989; Junker & Sijstma, 1999) Positive interaction: over additive logit model Highest interaction parameter is non zero All main effects (and lower interactions) zero Logit Response Function Probability Response Function Logit Response Function Probability Response Function Logit(X=1 α) α2=0 α2=1 α1=0 α1=1 P(X=1 α) Logit(X=1 α) α2=0 α2=1 α1=0 α1=1 P(X=1 α) α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2= α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2=1-2.5 Possible Attribute Patterns -2.5 Possible Attribute Patterns

25 DINO Model (Templin & Henson, 2006) Negative interaction: under additive logit model All main effects equal Interaction terms are 1 sum of corresponding lower effects Logit Response Function 1.5 Probability Response Function Logit(X=1 α) α2=0 α2=1 α1=0 α1=1 P(X=1 α) Session 2: Diagnostic Modeling Psychometric Models CONCLUDING REMARKS α1=0; α2=0 α1=0; α2=1 α1=1; α2=0 α1=1; α2=1-2.5 Possible Attribute Patterns Session 2 Take Home Points The LCDM uses an ANOVA like approach to map latent attributes onto item responses Uses main effects and interactions for each attribute Uses a logit link function Multiple diagnostic models are subsumed by the LCDM Diagnostic Modeling in Educational and Psychological Settings Session

26 Session 3 Overview Examples of DCMs through applications Educational measurement English proficiency LCDM demonstration in practice Sample results Potential problems in analysis Psychological measurement pathological gambling Simplified LCDM (DINO model) Demonstration of what is possible with DCMs Session 3 Diagnostic Modeling in Educational and Psychological Settings LARGE SCALE LANGUAGE ASSESSMENT USING THE LCDM Introduction With the emphasis of today s academic environment on testing, the focus of formative assessment is growing Among possible formative settings, language assessment has received some attention (e.g., Buck & Tatsuoka, 1998; Jang, 2004; von Davier, 2005) The purpose of this study is to explore the possibility of using the LCDM for the evaluation of the Grammar section of the Examination for the Certificate of Proficiency in English (ECPE) Also provides an example analysis using the LCDM Examination for the Certificate of Proficiency in English (ECPE) The ECPE is a test developed and scored by the English Language Institute of the University of Michigan The ECPE was developed to measure advanced English ability in respondents for which English is not their first language Analysis is for the grammar section of the test 40 multiple choice items (28 items used in analysis) 10 were non operational 2 had difficulties greater than

27 Example Item from Grammar Section An example written to resemble an item in the Grammar section of the ECPE is: I have always snow. to enjoy enjoyed enjoying to enjoyed Session 3 Diagnostic Modeling in Educational and Psychological Settings ECPE ANALYSIS METHODS Examinees and Data A total of 2922 examinees are used to analyze the ECPE Grammar section The average age of examinees was approximately 23 years old Approximately 50% spoke Portuguese and an additional 31% of the examinees spoke Spanish as a first language Attributes Measured by Test Three attributes measured representing knowledge of: Morphosyntactic rules Cohesive rules Lexical rules The full LCDM was estimated using Mplus Marginal maximum likelihood estimation Q matrix characteristics 19 items measuring only one attribute (simple structure) 9 items measuring two attributes 0 items measuring all three attributes

28 ECPE Q matrix Here are the entries for several items from the ECPE Q matrix Item Morphosyntactic Rules Cohesive Rules Lexical Rules Session 3 Diagnostic Modeling in Educational and Psychological Settings LCDM RESULTS LCDM Results To further describe the parameters of the LCDM, several types of results will be presented: Model fit results Item parameter results Inspection of interactions Interpretation Structural parameter results Implied attribute hierarchy Respondent estimates/classifications Model Fit Results Overall model fit Chi square not computable AIC: ; BIC: Used to reduce model Bivariate model fit (Session 4) Compares model predicted and observed frequencies of responses for all pairs of items Of 378 item pairs 45 had p values less than 0.01 Items most indicated Item 13 (9 of 45 pairs) Item 4 (6 of 45 pairs) Item 5 (6 of 45 pairs) Indicates some items are not fit well by model We will ignore this and continue with analysis as example

29 Example Item LCDM Intercepts To demonstrate parameter interpretation, results from item 7 will be shown Attributes measured: Morphosyntactic rules (Attribute 1) Lexical rules (Attribute 3) Parameter estimates: Parameter Estimate SE p-value λ 7, λ 7,1,(1) λ 7,1,(3) λ 7,2,(1,3) Estimated Intercept: (0.095) Indicates the logit of a correct response for a nonmaster of all attributes Here, non masters have an average probability of a correct response: exp( 0.106)/1+exp( 0.106) = 0.47 Hypothesis test is not important Tests whether non masters have a probability of a correct response of 0.5 Problematic when very high Difficult to identify other parameters Indicates issues with test, Q matrix, or attributes Higher Order Model Parameters Examining Interaction Parameters Interpretation of main effects and interactions proceeds sequentially: 2 way interaction parameter: (0.144) If interactions are present: Examine highest level of interaction If significantly different from zero, leave in model If not, term can be omitted If interactions are not present: Examine how far main effect is from zero P value for parameter was small (0.000) Indicates parameter is significantly different from zero Candidate to leave in model Value indicates that there is an under additive effect of mastering both attributes Means mastery of one attribute is sufficient to have high chance to get item correct

30 More on Interactions Interaction pattern for this item indicates that mastery morphosyntatic rules is key to answering correctly Mastery of lexical rules helps, but not above that of mastery of morphosyntatic rules For why this is the case, stay tuned Logit(X=1 α) α3=0 α3=1 α1=0 α1=1 P(X=1 α) α1=0; α3=0 α1=0; α3=1 α1=1; α3=0 α1=1; α3=1 Possible Attribute Patterns Other Interactions Of 9 interaction parameters, 3 were significantly different from zero Candidates to be removed from model Of the 6 non significant interactions 4 had small main effects on one attribute Attribute not highly related to item response Indicates that Q matrix may be incorrect Have to re fit with new Q matrix and look at information criteria (Session 4) Interpreting Main Effects ECPE Item 7 Lexical Main Effect When significant interactions are present, main effects cannot be easily interpreted Sometimes called conditional main effects Need to know combination of attributes mastered to fully describe item response function Main effects in LCDM have added concern Lower bound is zero (for monotonicity) p values are inaccurate as they approach zero Because of the significant interaction, interpretation is conditional When Morphosyntactic Rules have not been mastered: Lexical main effect λ 7,1,(3) = Respondents who have mastered Lexical Rules have an increase in logit of over respondents who are non masters P(X=1 α) α1=0; α3=0 α1=0; α3=1 α1=1; α3=0 α1=1; α3=1 Possible Attribute Patterns

31 ECPE Item 7 Morphosyntactic Main Effect General Modeling Tips Because of the significant interaction, interpretation is conditional When Lexical Rules have not been mastered: Morphosyntactic main effect λ 7,1,(1) = Respondents who have mastered Morphosyntactic Rules have an increase in logit of over respondents who are nonmasters P(X=1 α) α1=0; α3=0 α1=0; α3=1 α1=1; α3=0 α1=1; α3=1 Possible Attribute Patterns High level interactions are difficult to estimate in most samples More than 2 way interactions may not be possible Modeling strategy: Try all interactions If model does not converge, limit to only 2 way interactions Remove non significant interactions from model If all interactions and main effects for an attribute are close to zero: Entry for attribute in Q matrix can be removed Double check with AIC/BIC as hypothesis test is approximate Attribute Pattern Probabilities Base rate pattern of profiles mastered in sample indicates an attribute hierarchy Lexical Cohesive Morphosyntatic Implications for Item 7 Cannot have morphosyntatic without lexical Suggests information about second language acquisition Example Respondent Estimates Respondent estimates are probabilities of mastery for each attribute Shown for 5 example respondents Test score given to provide comparison Respondent Total Morphosyntactic Cohesive Lexical Score

32 Educational Measurement Wrap Up Demonstrated results from LCDM when applied to English language assessment Investigated model fit Very important as of yet not well understood in DCMs Described item parameter estimates Interpreting interactions/main effects Modeling strategy Session 3 Diagnostic Modeling in Educational and Psychological Settings CONCLUDING REMARKS EDUCATIONAL MEASUREMENT Described structural parameter estimates Useful for understanding latent variables measured by test Described respondent parameter estimates Normally these help understand the knowledge state of a respondent Attribute hierarchy here limits utility of information Gambling Application Overview Study of pathological gambling DSM criteria for pathological gambling Common methods for assessment How diagnostic models could help Session 3 Diagnostic Modeling in Educational and Psychological Settings UNDERSTANDING PATHOLOGICAL GAMBLING Psychometric Model Formulating the LCDM for Likert data (and smaller samples) Adapting structural (or hierarchical) models to evaluate the DSM definition of pathological gambling Pathological Gambling Application Model Development Estimation/Results

33 The Gambling Explosion DSM Definition of Pathological Gambling Exponential increase in accessibility of gambling opportunities: State lotteries Native American tribal casinos Riverboat gambling Internet gambling Incidences of pathological gambling have increased (Volberg, 2002) In order to limit the detrimental effects of gambling on a community: Easily identify potential pathological gamblers and provide treatment interventions Understand the underlying causes of the disorder The DSM IV TR defines pathological gambling as an impulse control disorder (not elsewhere classified) To be classified as a pathological gambler, an individual must meet 5 of 10 defined criteria All are dichotomous Meets/Does not meet DSM CRITERIA C1 Is preoccupied with gambling C2 Needs to gamble with increasing amounts of money in order to achieve the desired excitement C3 Has repeated unsuccessful efforts to control, cut back, or stop gambling C4 Is restless or irritable when attempting to cut down or stop gambling C5 Gambles as a way of escaping from problems or of relieving a dysphoric mood C6 After losing money gambling, often returns another day to get even C7 Lies to family members, therapist, or others to conceal the extend of involvement with gambling C8 Has committed illegal acts such as forgery, fraud, theft, or embezzlement to finance gambling C9 Has jeopardized or lost a significant relationship, job, or educational or career opportunity because of gambling C10 Relies on others to provide money to relieve a desperate financial situation caused by gambling Studying Pathological Gambling The DSM definition has several characteristics which make it seem somewhat implausible: All criteria are treated equally in that the sum of any five will result in the diagnosis of pathological gambling It seems odd to have the following given equal weight: C8 Has committed illegal acts such as forgery, fraud, theft, or embezzlement to finance gambling C1 Is preoccupied with gambling If all criteria are treated equally, does the diagnostic criterion of five or more seem realistic? Session 3 Diagnostic Modeling in Educational and Psychological Settings METHODS DCMs can help to answer both questions

34 Gambling Instruments Take each of the 10 criteria to be the dichotomous latent attributes Applying a DCM would simultaneously provide: Diagnostic information for each individual Underlying structural model parameters Evaluation of the above/below five DSM criteria for pathological diagnosis rule Evaluation of whether all criteria should be treated equivalently Study included 112 experienced gamblers. Participants provided responses to two instruments Gambling research instrument (Henson, Feasel, & Jones, 2000) 41 items; 6 point Likert scale South Oaks Gambling Screen (Lesieur & Blume, 1987) 20 items; binary Used to validate result Psychometric Model The full LCDM was not able to be estimated Small sample size Likert response data Instead, the DINO model was used All Set to be Equal One or more attribute model Two parameters per item (regardless of entries in Q matrix) Shown for a dichotomous item measuring two attributes: Binomial link function used to model Likert responses Polytomous model assuming Binomial distribution conditional on attribute profile Conditional Response Distributions Marginal Response Distributions

35 GRI Structural Model The structural model provides a model for the correlational structure of the attributes (Session 4) A two class mixture was used as the structural model Classes were meant to represent pathological gamblers (PG) and non pathological gamblers (NPG) Help determine how the latent criteria map onto pathological gamblers The mixture structural model allows us to: Calculate the probability that each criterion is met given an individual is a PG or a NPG Determine the criteria that best discriminate between PG and NPG Calculate the probability of being a PG based on the number of criteria met Evaluate the DSM stated criteria of 5 or more to be diagnosed PG Model Estimation Created a Markov Chain Monte Carlo estimation algorithm in Fortran Uniform prior for all item parameters Latent traits (α) modeled with empirical prior defined by structural model Uniform prior for all structural model parameters Chain length of 50,000 (burn in of 40,000) Convergence check: Geweke test Visual inspection of timeseries plots Algorithm Convergence Session 3 Diagnostic Modeling in Educational and Psychological Settings MODEL RESULTS

36 Results To Be Presented Fit check: Model fit evaluation Usability: Diagnostic estimates of gamblers DSM criteria profile Validation: How GRI/DCM diagnoses correspond to SOGS diagnoses Interpretation: Item parameter estimates Structural model estimates: Criteria with differential discrimination between PG and NPG How many criteria are indicative of PG Checking Goodness of Fit Typical measures of goodness of fit were unreasonable due to a sparse contingency table of responses (6 41 possible response patterns) Monte Carlo fit index was constructed (based on Langeheine et. al, 1996) for bivariate item statistics (Maydeu Olivares, A. & Joe, H. 2005) Root Mean Squared Residual (RMSR) of the Pearson correlation was used as a criterion Correlation RMSR = (p = 0.486) Indicates adequate fit Respondent Diagnoses Respondent Diagnoses

37 Criterion Validity SOGS Classification DCM NPG PG Total Classification NPG PG Total Compared GRI/DCM classification with SOGS 89.3% matching classifications Cohen s Kappa: 0.69 Item Parameter Interpretation Bar graph: Red bar: Average response for PG Blue bar: Average response for NPG Item 5 [C2]: I find it necessary to gamble with larger amounts of money (than when I first gambled) for gambling to be exciting Item 13 [C3 or C4]: I find it difficult to stop gambling Structural Model Estimates Evaluating the DSM 5 or More Rule

38 Concluding Remarks: Gambling Talk DCM respondent estimates give rich information about the pattern of satisfied criteria Could be used to tailor treatment strategies A better definition of PG would be one who meets at least FOUR or more criteria Session 3 Diagnostic Modeling in Educational and Psychological Settings CONCLUDING REMARKS PSYCHOLOGICAL MEASUREMENT Results suggest that Criteria 1, 3, and 10 are more discriminating of PG than other criteria Criteria such as 2, 5, and 7 have relatively high probability of being met by NPG (more than 20% chance) Weaker indicators of pathological gambling Wrap Up and Take Home Points Session 3 demonstrated some potentials uses of DCMs Session 3 Diagnostic Modeling in Educational and Psychological Settings CONCLUDING REMARKS SESSION Applications of DCMs are rare Tests haven t been built to measure categorical attributes Item information is different in DCMs Users haven t had access to software To date, most applications use software built by researchers MCMC in Fortran or WinBugs MML in Fortran This is about to change

39 Notes on Usefulness of DCMs Full utility of DCMs cannot be understood unless applications become more frequent For now, have to use sub optimal data and problems Future applications coming soon Mathematical reasoning test under development (NSF funded) Assessment of readiness for first grade in kindergartners Funding opportunities exist and seem to review well Educational Measurement: NSF (DR K12); IES (Goals 2 and 5) Psychological Measurement: NIH (NIMH; NIDA; NIA; ) Advanced Topics: Structural Models, Model Fit, and Respondent Estimates Session 4 Industry seems interested ETS/College Board/ACT/Measurement Inc. Typically proprietary dangerous for academics 153 Session Overview Session 4 will provide the advanced topics needed to apply DCMs Understanding structural models What they are How to summarize them Differing types Assessment of model fit How respondent diagnoses are made WARNING: Content can be very technical But fun, though Notation Used Throughout Session Attributes: a = 1,, A Respondents: r = 1,,R Attribute Profiles: α r = [α r1, α r2,, α ra ] α ra is 0 or 1 Latent Classes: c = 1,,C We have C = 2 A latent classes one for each possible attribute profile Items: i = 1,,I Restricted to dichotomous item responses (X ri is 0 or 1) Q matrix: Elements q ia for an item i and attribute a q ia is 0 or

40 DCM Structural Models Throughout the workshop, attribute profile base rates have been mentioned as being influential in DCMs Part of respondent diagnoses (to be shown) Describes nature of attribute profiles ECPE discovered apparent attribute hierarchy Gambling study provided feedback on DSM criteria rules Session 4: Advanced Topics Structural Models, Model Fit, and Respondent Estimates STRUCTURAL MODELS The base rates represent the probability any respondent has a given attribute profile For a test measuring A attributes, 2 A profiles are possible The structural model provides the probability for each profile DCM Structural Models Defined Interpreting the Structural Model The parameter for the structural model is η c Each attribute profile c has one η c is the base rate probability of attribute profile c: The ECPE estimates of η c are shown to the right c η c α 1 α 2 α Because there are numerous η c parameters, interpretation is difficult Useful for detecting attribute hierarchies Often, the η c parameters are re expressed as: The marginal probability an attribute is mastered in the population The correlation between any two attributes Both can be computed using a frequency analysis weighted by η c

41 SAS Structural Model Summary SAS can be used to compute summaries of the structural model parameters SAS Structural Model Summary For each attribute, marginally: Proportion of Masters SAS Structural Model Summary Attribute Summary For each pair of attributes: Tetrachoric Correlation For the ECPE data, we have the following summary of attribute summary information Attribute Prop. Tetrachoric Correlation Masters 1. Morphosyntatic Cohesive Lexical Such information is helpful in determining nature of attributes in a population of interest Analogous to information about latent variables in CFA/MIRT

42 Differing Structural Models The structural model of a DCM has the potential to have an overwhelming number of parameters For A attributes, total estimated: 2 A 1 All must sum to 1 Saturated model Multiple structural models exist All reduce the number of parameters All use categorical data analysis techniques to model η c Analogous to latent variable covariance structure in structural equation modeling Distribution of attributes is categorical, not continuous Types of Structural Models Log linear model Predicts the natural logarithm of η c by the attributes in each profile Allows for varying levels of complexity Most: Saturated Model Least: Independent Attributes Model Implemented in Mplus (see session 5) and main focus of discussion today Tetrachoric correlation model Provides an item factor model for latent attributes Uses only bivariate information for pairs of attributes Allows for covariance structures to be estimated Not available in any software packages (but also shown briefly today) Hierarchical factors model Special case of tetrachoric correlation model Mixture models Shown in gambling example Also given by von Davier (2008) Log Linear Structural Models Log Linear Model for μ c The log linear structural model is the easiest to implement Due to its availability in Mplus μ c is the natural logarithm of η c c η c μ c α 1 α 2 α The structural model then uses an ANOVA like model to predict the value of μ c as a function of the attributes that are defined in attribute profile c Shown for 3 attribute model Includes main effects, 2 way, and 3 way interactions All parameters must sum to zero for identifiability Intercept and Main effects 2 way and 3 way interactions

43 Log Linear Structural Model Notation The log linear structural model parameters have several subscripts: Subscript #1 e: the level of the effect 0 is the intercept 1 is the main effect 2 is the two way interaction 3 is the three way interaction Subscript #2 (a 1, ): the attributes the effect applies to Same number of attributes listed as number in Subscript #2 Log Linear Model Explained Because not all attribute profiles include all attributes, only some terms get used to predict each value of μ c For attribute profile 1: α 1 = [α 11 = 0; α 12 = 0; α 13 = 0]: Only the intercept applies Log Linear Model Explained For attribute profile 2: α 1 = [α 11 = 0; α 12 = 0; α 13 = 1]: Log Linear Model Explained For attribute profile 6: α 1 = [α 11 = 1; α 12 = 0; α 13 = 1]: The intercept and main effect of attribute 2 apply The intercept, main effects of attribute 1 and attribute 3, and interaction between attributes 1 and 3 apply

44 Log Linear Model Explained For attribute profile 8: α 1 = [α 11 = 1; α 12 = 1; α 13 = 1]: Interpretations of Model Parameters The log linear model with ALL main effects and interactions is statistically equivalent to the saturated structural model Two way interactions are analogous to bivariate correlations in categorical models Higher level interactions represent higher level of characteristics of attribute distribution (i.e., skewness, kurtosis, etc ) All parameters apply Models without interactions imply uncorrelated attributes Main effects are essentially attribute base rates Models without main effects or interactions assume all attribute profiles are equally likely Higher order interactions can be removed if not significantly different from zero Log Linear Model for ECPE To demonstrate the log linear model, we again present our ECPE data Full model (all parameters) Parameter Estimate SE p-value γ γ 1,(1) γ 1,(2) γ 1,(3) γ 2,(1,2) γ 2,(1,3) γ 2,(2,3) γ 3,(1,2,3) Reductions in the Structural Model Because the three way interaction was not significant, we can remove that parameter from the model without greatly affecting model fit New results: Parameter Estimate SE p-value γ γ 1,(1) γ 1,(2) γ 1,(3) γ 2,(1,2) γ 2,(1,3) γ 2,(2,3)

45 New Results for Attribute Probabilities The reduced model only slightly modifies the attribute probabilities: c Original η c New η c Session 4: Advanced Topics Structural Models, Model Fit, and Respondent Estimates TETRACHORIC STRUCTURAL MODELS Tetrachoric Structural Models Because most summary information is given about attributes and pairs of attributes, tetrachoric models have been developed Such models use the tetrachoric correlation between attributes as a model for the probability for each attribute pattern Defining Tetrachoric Correlations The tetrachoric correlation is a measure of the association between two binary variables The correlation comes from mapping the binary variables onto two underlying continuous variables Each of the continuous variables is bisected by a threshold which transforms the continuous response into a categorical outcome The distribution of the underlying continuous variables is Available in Arpeggio software Assessment Systems Corporation ρ is the tetrachoric correlation coefficient

46 Tetrachoric Correlation Explained Technical Specifics: Multivariate Attributes The tetrachoric models assume use the following function to model the probability of an attribute profile: Tetrachoric Correlation Matrix Multivariate Normal Density Where: Structured Matrices Placing a structure on the Ξ tetrachoric correlation matrix expands the model to mimic SEM (Templin & Henson, 2006) Session 4: Advanced Topics Structural Models, Model Fit, and Respondent Estimates ASSESSMENT OF MODEL FIT

47 Assessing Model Fit There is no one best way to assess fit in DCMs Techniques typically used can put into several general categories: Absolute fit Model based hypothesis tests (if available) Entropy Relative fit Information criteria Item fit Topics discussed here will focus on fit statistics available in Mplus (also discussed in Session 5) Overall Model Fit: Chi Squared Test For small numbers of items (10 15), the traditional Chi Squared test of model fit can be used Test is invalid for too many items sparse data Shown for 28 item ECPE Mplus gives this automatically Omits when data are sparse Can omit extreme cells from an analysis Misleading Overall Model Fit: (Relative) Entropy The entropy of a model is a measure of classification uncertainty It is an absolute fit statistic Mplus reports relative entropy Value of 1.00 means all respondents classified with complete certainty (good fit) Value of 0.00 means all respondents classified with equal probabilities for all classes (poor fit) ECPE (relative) entropy: Hard to interpret by itself Relative Model Fit: Information Criteria Used when comparing between two models: Two DCMs (LCDM v. DINA) Two Q matrices (4 v. 5 attributes) Two different models (IRT v. DCM) Mplus reports: AIC and BIC Sample size adjusted BIC All can be used Smallest value is best

48 Item Fit Statistics The TECH10 option reports a degree of misfit for each Item individually (Univariate) Pair of two items (Bivariate) Uses Chi squared test for misfit Values for each item are distributed as Chi square with 1 df (for binary items) Misfitting items can be investigated Q matrix can be changed Items can be removed Item Fit Statistics: Univariate Fit Univariate fit attempts to determine if the model fits each item marginally A limited information statistic Not useful in DCMs Model is for probability Will always fit perfectly Item Fit Statistics: Bivariate Fit Bivariate fit is an index of fit for a pair of items Compares observed data with frequency expected under DCM Produces a 1 df Chi Squared test Can help identify items that do not fit model Rough approximation Concluding Remarks: Model Fit Assessment of model fit in DCMs is currently a difficult task Easily accessible options are limited Can quickly find options that take longer to assess fit than to estimate model Mplus options are adequate for initial screening DCMs share this problem with IRT models General categorical data analyses Other model fit options are available and forthcoming Based on limited information (i.e., Templin, 2007) Need further testing

Diagnostic Classification Models

Diagnostic Classification Models Diagnostic Classification Models Lecture #13 ICPSR Item Response Theory Workshop Lecture #13: 1of 86 Lecture Overview Key definitions Conceptual example Example uses of diagnostic models in education Classroom

More information

Model-based Diagnostic Assessment. University of Kansas Item Response Theory Stats Camp 07

Model-based Diagnostic Assessment. University of Kansas Item Response Theory Stats Camp 07 Model-based Diagnostic Assessment University of Kansas Item Response Theory Stats Camp 07 Overview Diagnostic Assessment Methods (commonly called Cognitive Diagnosis). Why Cognitive Diagnosis? Cognitive

More information

Scale Building with Confirmatory Factor Analysis

Scale Building with Confirmatory Factor Analysis Scale Building with Confirmatory Factor Analysis Latent Trait Measurement and Structural Equation Models Lecture #7 February 27, 2013 PSYC 948: Lecture #7 Today s Class Scale building with confirmatory

More information

Multifactor Confirmatory Factor Analysis

Multifactor Confirmatory Factor Analysis Multifactor Confirmatory Factor Analysis Latent Trait Measurement and Structural Equation Models Lecture #9 March 13, 2013 PSYC 948: Lecture #9 Today s Class Confirmatory Factor Analysis with more than

More information

Blending Psychometrics with Bayesian Inference Networks: Measuring Hundreds of Latent Variables Simultaneously

Blending Psychometrics with Bayesian Inference Networks: Measuring Hundreds of Latent Variables Simultaneously Blending Psychometrics with Bayesian Inference Networks: Measuring Hundreds of Latent Variables Simultaneously Jonathan Templin Department of Educational Psychology Achievement and Assessment Institute

More information

Fundamental Concepts for Using Diagnostic Classification Models. Section #2 NCME 2016 Training Session. NCME 2016 Training Session: Section 2

Fundamental Concepts for Using Diagnostic Classification Models. Section #2 NCME 2016 Training Session. NCME 2016 Training Session: Section 2 Fundamental Concepts for Using Diagnostic Classification Models Section #2 NCME 2016 Training Session NCME 2016 Training Session: Section 2 Lecture Overview Nature of attributes What s in a name? Grain

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

COMBINING SCALING AND CLASSIFICATION: A PSYCHOMETRIC MODEL FOR SCALING ABILITY AND DIAGNOSING MISCONCEPTIONS LAINE P. BRADSHAW

COMBINING SCALING AND CLASSIFICATION: A PSYCHOMETRIC MODEL FOR SCALING ABILITY AND DIAGNOSING MISCONCEPTIONS LAINE P. BRADSHAW COMBINING SCALING AND CLASSIFICATION: A PSYCHOMETRIC MODEL FOR SCALING ABILITY AND DIAGNOSING MISCONCEPTIONS by LAINE P. BRADSHAW (Under the Direction of Jonathan Templin and Karen Samuelsen) ABSTRACT

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

Factors Affecting the Item Parameter Estimation and Classification Accuracy of the DINA Model

Factors Affecting the Item Parameter Estimation and Classification Accuracy of the DINA Model Journal of Educational Measurement Summer 2010, Vol. 47, No. 2, pp. 227 249 Factors Affecting the Item Parameter Estimation and Classification Accuracy of the DINA Model Jimmy de la Torre and Yuan Hong

More information

for Scaling Ability and Diagnosing Misconceptions Laine P. Bradshaw James Madison University Jonathan Templin University of Georgia Author Note

for Scaling Ability and Diagnosing Misconceptions Laine P. Bradshaw James Madison University Jonathan Templin University of Georgia Author Note Combing Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions Laine P. Bradshaw James Madison University Jonathan Templin University

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses. Structural Equation Modeling Lecture #12 April 29, 2015

On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses. Structural Equation Modeling Lecture #12 April 29, 2015 On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses Structural Equation Modeling Lecture #12 April 29, 2015 PRE 906, SEM: On Test Scores #2--The Proper Use of Scores Today s Class:

More information

Scale Building with Confirmatory Factor Analysis

Scale Building with Confirmatory Factor Analysis Scale Building with Confirmatory Factor Analysis Introduction to Structural Equation Modeling Lecture #6 February 22, 2012 ERSH 8750: Lecture #6 Today s Class Scale building with confirmatory factor analysis

More information

Comprehensive Statistical Analysis of a Mathematics Placement Test

Comprehensive Statistical Analysis of a Mathematics Placement Test Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational

More information

JONATHAN TEMPLIN LAINE BRADSHAW THE USE AND MISUSE OF PSYCHOMETRIC MODELS

JONATHAN TEMPLIN LAINE BRADSHAW THE USE AND MISUSE OF PSYCHOMETRIC MODELS PSYCHOMETRIKA VOL. 79, NO. 2, 347 354 APRIL 2014 DOI: 10.1007/S11336-013-9364-Y THE USE AND MISUSE OF PSYCHOMETRIC MODELS JONATHAN TEMPLIN UNIVERSITY OF KANSAS LAINE BRADSHAW THE UNIVERSITY OF GEORGIA

More information

Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study

Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study Research Report Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study Xueli Xu Matthias von Davier April 2010 ETS RR-10-10 Listening. Learning. Leading. Linking Errors in Trend Estimation

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data TECHNICAL REPORT Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data CONTENTS Executive Summary...1 Introduction...2 Overview of Data Analysis Concepts...2

More information

Parallel Forms for Diagnostic Purpose

Parallel Forms for Diagnostic Purpose Paper presented at AERA, 2010 Parallel Forms for Diagnostic Purpose Fang Chen Xinrui Wang UNCG, USA May, 2010 INTRODUCTION With the advancement of validity discussions, the measurement field is pushing

More information

Structural Equation Modeling (SEM)

Structural Equation Modeling (SEM) Structural Equation Modeling (SEM) Today s topics The Big Picture of SEM What to do (and what NOT to do) when SEM breaks for you Single indicator (ASU) models Parceling indicators Using single factor scores

More information

Differential Item Functioning

Differential Item Functioning Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item

More information

Chapter 1. Introductory Information for Therapists. Background Information and Purpose of This Program

Chapter 1. Introductory Information for Therapists. Background Information and Purpose of This Program Chapter 1 Introductory Information for Therapists Background Information and Purpose of This Program Changes in gaming legislation have led to a substantial expansion of gambling opportunities in America,

More information

SHA, SHUYING, Ph.D. Nonparametric Diagnostic Classification Analysis for Testlet Based Tests. (2016) Directed by Dr. Robert. A. Henson. 121pp.

SHA, SHUYING, Ph.D. Nonparametric Diagnostic Classification Analysis for Testlet Based Tests. (2016) Directed by Dr. Robert. A. Henson. 121pp. SHA, SHUYING, Ph.D. Nonparametric Diagnostic Classification Analysis for Testlet Based Tests. (2016) Directed by Dr. Robert. A. Henson. 121pp. Diagnostic classification Diagnostic Classification Models

More information

The application of Big Data in the prevention of Problem Gambling

The application of Big Data in the prevention of Problem Gambling The application of Big Data in the prevention of Problem Gambling DR. MICHAEL AUER neccton m.auer@neccton.com DR. MARK GRIFFITHS Nottingham Trent University United Kingdom mark.griffiths@ntu.ac.uk 1 Biographie

More information

Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT. Amin Mousavi

Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT. Amin Mousavi Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT Amin Mousavi Centre for Research in Applied Measurement and Evaluation University of Alberta Paper Presented at the 2013

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * by J. RICHARD LANDIS** and GARY G. KOCH** 4 Methods proposed for nominal and ordinal data Many

More information

What is Gambling? Gambling or ludomania is an urge to continuously gamble despite harmful negative consequences or a desire to stop.

What is Gambling? Gambling or ludomania is an urge to continuously gamble despite harmful negative consequences or a desire to stop. By Benjamin Bunker What is Gambling? Gambling or ludomania is an urge to continuously gamble despite harmful negative consequences or a desire to stop. What is Gambling? Pt. 2 Gambling is an Impulse Control

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

Bayesian and Frequentist Approaches

Bayesian and Frequentist Approaches Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law

More information

Does factor indeterminacy matter in multi-dimensional item response theory?

Does factor indeterminacy matter in multi-dimensional item response theory? ABSTRACT Paper 957-2017 Does factor indeterminacy matter in multi-dimensional item response theory? Chong Ho Yu, Ph.D., Azusa Pacific University This paper aims to illustrate proper applications of multi-dimensional

More information

You must answer question 1.

You must answer question 1. Research Methods and Statistics Specialty Area Exam October 28, 2015 Part I: Statistics Committee: Richard Williams (Chair), Elizabeth McClintock, Sarah Mustillo You must answer question 1. 1. Suppose

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison Group-Level Diagnosis 1 N.B. Please do not cite or distribute. Multilevel IRT for group-level diagnosis Chanho Park Daniel M. Bolt University of Wisconsin-Madison Paper presented at the annual meeting

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

Quantifying Problem Gambling: Explorations in measurement. Nigel E. Turner, Ph.D. Centre for Addiction and Mental Health

Quantifying Problem Gambling: Explorations in measurement. Nigel E. Turner, Ph.D. Centre for Addiction and Mental Health Quantifying Problem Gambling: Explorations in measurement Nigel E. Turner, Ph.D. Centre for Addiction and Mental Health Original abstract Abstract: Over the past few years I had conducted several studies

More information

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations May 215 Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items Tamara Beth

More information

A Case Study: Two-sample categorical data

A Case Study: Two-sample categorical data A Case Study: Two-sample categorical data Patrick Breheny January 31 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/43 Introduction Model specification Continuous vs. mixture priors Choice

More information

Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker

Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis Russell W. Smith Susan L. Davis-Becker Alpine Testing Solutions Paper presented at the annual conference of the National

More information

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm Journal of Social and Development Sciences Vol. 4, No. 4, pp. 93-97, Apr 203 (ISSN 222-52) Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm Henry De-Graft Acquah University

More information

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati. Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions

More information

Sheila Barron Statistics Outreach Center 2/8/2011

Sheila Barron Statistics Outreach Center 2/8/2011 Sheila Barron Statistics Outreach Center 2/8/2011 What is Power? When conducting a research study using a statistical hypothesis test, power is the probability of getting statistical significance when

More information

Endorsement of Criminal Behavior Amongst Offenders: Implications for DSM-5 Gambling Disorder

Endorsement of Criminal Behavior Amongst Offenders: Implications for DSM-5 Gambling Disorder J Gambl Stud (2016) 32:35 45 DOI 10.1007/s10899-015-9540-3 ORIGINAL PAPER Endorsement of Criminal Behavior Amongst Offenders: Implications for DSM-5 Gambling Disorder Nigel E. Turner 1,2 Randy Stinchfield

More information

Fundamental Clinical Trial Design

Fundamental Clinical Trial Design Design, Monitoring, and Analysis of Clinical Trials Session 1 Overview and Introduction Overview Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics, University of Washington February 17-19, 2003

More information

Running head: ATTRIBUTE CODING FOR RETROFITTING MODELS. Comparison of Attribute Coding Procedures for Retrofitting Cognitive Diagnostic Models

Running head: ATTRIBUTE CODING FOR RETROFITTING MODELS. Comparison of Attribute Coding Procedures for Retrofitting Cognitive Diagnostic Models Running head: ATTRIBUTE CODING FOR RETROFITTING MODELS Comparison of Attribute Coding Procedures for Retrofitting Cognitive Diagnostic Models Amy Clark Neal Kingston University of Kansas Corresponding

More information

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-2016 ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION

More information

Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models

Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models Jin Gong University of Iowa June, 2012 1 Background The Medical Council of

More information

Michael Hallquist, Thomas M. Olino, Paul A. Pilkonis University of Pittsburgh

Michael Hallquist, Thomas M. Olino, Paul A. Pilkonis University of Pittsburgh Comparing the evidence for categorical versus dimensional representations of psychiatric disorders in the presence of noisy observations: a Monte Carlo study of the Bayesian Information Criterion and Akaike

More information

Computerized Mastery Testing

Computerized Mastery Testing Computerized Mastery Testing With Nonequivalent Testlets Kathleen Sheehan and Charles Lewis Educational Testing Service A procedure for determining the effect of testlet nonequivalence on the operating

More information

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. Greenland/Arah, Epi 200C Sp 2000 1 of 6 EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. INSTRUCTIONS: Write all answers on the answer sheets supplied; PRINT YOUR NAME and STUDENT ID NUMBER

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

The Evolving Definition of Pathological Gambling in the DSM-5

The Evolving Definition of Pathological Gambling in the DSM-5 The Evolving Definition of Pathological Gambling in the DSM-5 By Christine Reilly and Nathan Smith National Center for Responsible Gaming One of the most anticipated events in the mental health field is

More information

REPORT. Technical Report: Item Characteristics. Jessica Masters

REPORT. Technical Report: Item Characteristics. Jessica Masters August 2010 REPORT Diagnostic Geometry Assessment Project Technical Report: Item Characteristics Jessica Masters Technology and Assessment Study Collaborative Lynch School of Education Boston College Chestnut

More information

12/30/2017. PSY 5102: Advanced Statistics for Psychological and Behavioral Research 2

12/30/2017. PSY 5102: Advanced Statistics for Psychological and Behavioral Research 2 PSY 5102: Advanced Statistics for Psychological and Behavioral Research 2 Selecting a statistical test Relationships among major statistical methods General Linear Model and multiple regression Special

More information

Centre for Education Research and Policy

Centre for Education Research and Policy THE EFFECT OF SAMPLE SIZE ON ITEM PARAMETER ESTIMATION FOR THE PARTIAL CREDIT MODEL ABSTRACT Item Response Theory (IRT) models have been widely used to analyse test data and develop IRT-based tests. An

More information

Selection of Linking Items

Selection of Linking Items Selection of Linking Items Subset of items that maximally reflect the scale information function Denote the scale information as Linear programming solver (in R, lp_solve 5.5) min(y) Subject to θ, θs,

More information

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,

More information

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Today s Class: Features of longitudinal data Features of longitudinal models What can MLM do for you? What to expect in this

More information

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Vs. 2 Background 3 There are different types of research methods to study behaviour: Descriptive: observations,

More information

Turning Output of Item Response Theory Data Analysis into Graphs with R

Turning Output of Item Response Theory Data Analysis into Graphs with R Overview Turning Output of Item Response Theory Data Analysis into Graphs with R Motivation Importance of graphing data Graphical methods for item response theory Why R? Two examples Ching-Fan Sheu, Cheng-Te

More information

multilevel modeling for social and personality psychology

multilevel modeling for social and personality psychology 1 Introduction Once you know that hierarchies exist, you see them everywhere. I have used this quote by Kreft and de Leeuw (1998) frequently when writing about why, when, and how to use multilevel models

More information

Pathological Gambling Report by Sean Quinn

Pathological Gambling Report by Sean Quinn Pathological Gambling Report by Sean Quinn Signs of pathological gambling A persistent and recurrent maladaptive gambling behavior is indicated by five or more of the following: Is preoccupied with gambling

More information

SUPPLEMENTAL MATERIAL

SUPPLEMENTAL MATERIAL 1 SUPPLEMENTAL MATERIAL Response time and signal detection time distributions SM Fig. 1. Correct response time (thick solid green curve) and error response time densities (dashed red curve), averaged across

More information

Re-Examining the Role of Individual Differences in Educational Assessment

Re-Examining the Role of Individual Differences in Educational Assessment Re-Examining the Role of Individual Differences in Educational Assesent Rebecca Kopriva David Wiley Phoebe Winter University of Maryland College Park Paper presented at the Annual Conference of the National

More information

IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS

IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS A Dissertation Presented to The Academic Faculty by HeaWon Jun In Partial Fulfillment of the Requirements

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

Having your cake and eating it too: multiple dimensions and a composite

Having your cake and eating it too: multiple dimensions and a composite Having your cake and eating it too: multiple dimensions and a composite Perman Gochyyev and Mark Wilson UC Berkeley BEAR Seminar October, 2018 outline Motivating example Different modeling approaches Composite

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus

More information

Political Science 15, Winter 2014 Final Review

Political Science 15, Winter 2014 Final Review Political Science 15, Winter 2014 Final Review The major topics covered in class are listed below. You should also take a look at the readings listed on the class website. Studying Politics Scientifically

More information

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests David Shin Pearson Educational Measurement May 007 rr0701 Using assessment and research to promote learning Pearson Educational

More information

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Introduction to Multilevel Models for Longitudinal and Repeated Measures Data Today s Class: Features of longitudinal data Features of longitudinal models What can MLM do for you? What to expect in this

More information

Understandable Statistics

Understandable Statistics Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement

More information

ABERRANT RESPONSE PATTERNS AS A MULTIDIMENSIONAL PHENOMENON: USING FACTOR-ANALYTIC MODEL COMPARISON TO DETECT CHEATING. John Michael Clark III

ABERRANT RESPONSE PATTERNS AS A MULTIDIMENSIONAL PHENOMENON: USING FACTOR-ANALYTIC MODEL COMPARISON TO DETECT CHEATING. John Michael Clark III ABERRANT RESPONSE PATTERNS AS A MULTIDIMENSIONAL PHENOMENON: USING FACTOR-ANALYTIC MODEL COMPARISON TO DETECT CHEATING BY John Michael Clark III Submitted to the graduate degree program in Psychology and

More information

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia 1 Introduction The Teacher Test-English (TT-E) is administered by the NCA

More information

VR for pathological gambling

VR for pathological gambling CYBERTHERAPY 2006 VIRTUAL REALITY IN THE TREATMENT OF PATHOLOGICAL GAMBLING A. Garcia-Palacios, N. Lasso de la Vega, C. Botella,, R.M. Baños & S. Quero Universitat Jaume I. Universidad de Valencia. Universidad

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

Midterm project due next Wednesday at 2 PM

Midterm project due next Wednesday at 2 PM Course Business Midterm project due next Wednesday at 2 PM Please submit on CourseWeb Next week s class: Discuss current use of mixed-effects models in the literature Short lecture on effect size & statistical

More information

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Timothy N. Rubin (trubin@uci.edu) Michael D. Lee (mdlee@uci.edu) Charles F. Chubb (cchubb@uci.edu) Department of Cognitive

More information

A MONTE CARLO STUDY OF MODEL SELECTION PROCEDURES FOR THE ANALYSIS OF CATEGORICAL DATA

A MONTE CARLO STUDY OF MODEL SELECTION PROCEDURES FOR THE ANALYSIS OF CATEGORICAL DATA A MONTE CARLO STUDY OF MODEL SELECTION PROCEDURES FOR THE ANALYSIS OF CATEGORICAL DATA Elizabeth Martin Fischer, University of North Carolina Introduction Researchers and social scientists frequently confront

More information

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017 Learning objectives 1. Get familiar with the basic idea

More information

IAPT: Regression. Regression analyses

IAPT: Regression. Regression analyses Regression analyses IAPT: Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a student project

More information

Small-area estimation of mental illness prevalence for schools

Small-area estimation of mental illness prevalence for schools Small-area estimation of mental illness prevalence for schools Fan Li 1 Alan Zaslavsky 2 1 Department of Statistical Science Duke University 2 Department of Health Care Policy Harvard Medical School March

More information

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol. Ho (null hypothesis) Ha (alternative hypothesis) Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol. Hypothesis: Ho:

More information

Measurement Models for Behavioral Frequencies: A Comparison Between Numerically and Vaguely Quantified Reports. September 2012 WORKING PAPER 10

Measurement Models for Behavioral Frequencies: A Comparison Between Numerically and Vaguely Quantified Reports. September 2012 WORKING PAPER 10 WORKING PAPER 10 BY JAMIE LYNN MARINCIC Measurement Models for Behavioral Frequencies: A Comparison Between Numerically and Vaguely Quantified Reports September 2012 Abstract Surveys collecting behavioral

More information

The University of North Carolina at Chapel Hill School of Social Work

The University of North Carolina at Chapel Hill School of Social Work The University of North Carolina at Chapel Hill School of Social Work SOWO 918: Applied Regression Analysis and Generalized Linear Models Spring Semester, 2014 Instructor Shenyang Guo, Ph.D., Room 524j,

More information

Influences of IRT Item Attributes on Angoff Rater Judgments

Influences of IRT Item Attributes on Angoff Rater Judgments Influences of IRT Item Attributes on Angoff Rater Judgments Christian Jones, M.A. CPS Human Resource Services Greg Hurt!, Ph.D. CSUS, Sacramento Angoff Method Assemble a panel of subject matter experts

More information

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Impact and adjustment of selection bias. in the assessment of measurement equivalence Impact and adjustment of selection bias in the assessment of measurement equivalence Thomas Klausch, Joop Hox,& Barry Schouten Working Paper, Utrecht, December 2012 Corresponding author: Thomas Klausch,

More information