Exploring dimensionality of scores for mixedformat

Size: px
Start display at page:

Download "Exploring dimensionality of scores for mixedformat"

Transcription

1 University of Iowa Iowa Research Online Theses and Dissertations Summer 2016 Exploring dimensionality of scores for mixedformat tests Mengyao Zhang University of Iowa Copyright 2016 Mengyao Zhang This dissertation is available at Iowa Research Online: Recommended Citation Zhang, Mengyao. "Exploring dimensionality of scores for mixed-format tests." PhD (Doctor of Philosophy) thesis, University of Iowa, Follow this and additional works at: Part of the Educational Psychology Commons

2 EXPLORING DIMENSIONALITY OF SCORES FOR MIXED-FORMAT TESTS by Mengyao Zhang A thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Psychological and Quantitative Foundations (Educational Measurement and Statistics) in the Graduate College of The University of Iowa August 2016 Thesis Supervisors: Professor Michael J. Kolen Associate Professor Won-Chan Lee!

3 Copyright by MENGYAO ZHANG 2016 All Rights Reserved!

4 Graduate College The University of Iowa Iowa City, Iowa CERTIFICATE OF APPROVAL PH.D. THESIS This is to certify that the Ph.D. thesis of Mengyao Zhang has been approved by the Examining Committee for the thesis requirement for the Doctor of Philosophy degree in Psychological and Quantitative Foundations (Educational Measurement and Statistics) at the August 2016 graduation. Thesis Committee: Michael J. Kolen, Thesis Supervisor Won-Chan Lee, Thesis Supervisor Robert L. Brennan Timothy N. Ansley Kate Cowles!

5 To Shilan, Li, Mingjie, and Aaron! ii!

6 ACKNOWLEDGEMENTS I would first like to thank Dr. Michael J. Kolen, my advisor and thesis supervisor, for seeing my potential and always encouraging me to pursue research that was of interest to me and of use to the profession. Without his great insights, invaluable feedback, constant support and patience throughout my study in Iowa, my dissertation and many other works would not have gone so smoothly. I would also like to thank Dr. Won-Chan Lee who served as one of the co-chairs of this dissertation and has continuously guided and encouraged me. He was always willing to discuss research ideas with me, and motivated me to see the big picture while paying attention to the details. I sincerely thank my other committee members: Dr. Robert Brennan, Dr. Timothy Ansley, and Dr. Kate Cowles. Their knowledge and advices greatly improved this dissertation. Dr. Robert Brennan shared many insightful comments concerning the issues dealt with in this dissertation, which greatly enriched this dissertation. I am also indebted to him for his support and encouragement at various stages of my study. A special thanks goes to Jane Persons whose edits and comments were invaluable. I also wish to thank the College Board for allowing me to use the Advanced Placement Exam data in my dissertation. Finally, I would like to thank the many friends and colleagues from whom I received help and support. Especially, Min Wang, our friendship means a lot to me. I am also truly thankful to my parents, Shilan and Li, for their support and understanding. To my husband, Mingjie, your unconditional love and support carries me through always. To my lovely son, Aaron, you are always my sunshine.! iii!

7 ABSTRACT Dimensionality assessment provides test developers and users with a better understanding of how test scores make human abilities concrete. Issues dealt with by dimensionality assessment include, but are not restricted to, (a) whether unidimensionality holds; (b) the number of dimensions influencing test scores; and (c) the relationships among items, among underlying dimensions, and between items and dimensions. Results from dimensionality assessment allow test developers and users to carefully validate specific interpretations and uses of test scores. The widespread use of mixed-format tests complicates dimensionality assessment both conceptually and methodologically. This dissertation is the first to propose a framework tailored for exploratory type of dimensionality assessment for mixed-format tests. Based on real data from three large-scale mixed-format tests, this dissertation examined the performance of a number of popular and promising dimensionality assessment methods and procedures. Major findings were summarized, along with more extensive descriptions of the similarities and dissimilarities among methods and across different test subject areas, forms, and sample sizes. Limitations and possible further research topics were also discussed.! iv!

8 PUBLIC ABSTRACT Dimensionality assessment provides test developers and users with a better understanding of how test scores make human abilities concrete. Issues dealt with by dimensionality assessment include, but are not restricted to, (a) whether unidimensionality holds; (b) the number of dimensions influencing test scores; and (c) the relationships among items, among underlying dimensions, and between items and dimensions. Results from dimensionality assessment allow test developers and users to carefully validate specific interpretations and uses of test scores. The widespread use of mixed-format tests complicates dimensionality assessment both conceptually and methodologically. This dissertation is the first to propose a framework tailored for exploratory type of dimensionality assessment for mixed-format tests. Based on real data from three large-scale mixed-format tests, this dissertation examined the performance of a number of popular and promising dimensionality assessment methods and procedures. Major findings were summarized, along with more extensive descriptions of the similarities and dissimilarities among methods and across different test subject areas, forms, and sample sizes. Limitations and possible further research topics were also discussed.! v!

9 TABLE OF CONTENTS LIST OF TABLES... viii LIST OF FIGURES... xi NOTATION... xiii CHAPTER I. INTRODUCTION...1 Defining Dimensionality...1 Related Concepts...2 Assessing Dimensionality...5 Exploratory versus Confirmatory Assessment...6 Operational Definitions of Dimensionality...6 Assessment Outcomes...8 Types of Statistical Procedure...9 Remark...10 Challenges Introduced by Mixed-Format Tests...10 Definitions of MC and CR Items...10 Potential Sources of Multidimensionality...11 Methodological Challenges in Dimensionality Assessment...12 Overview of Data Sources...14 Research Objectives and Questions...15 CHAPTER II. LITERATURE REVIEW...19 A Framework for Exploring Dimensionality of Mixed-Format Test Scores...19 Operational Definitions of Dimensionality...22 Defining Dimensionality in EFA...22 Defining Dimensionality in IRT...29 Description of Dimensionality Assessment Methods...34 Methods Using the EFA Definition of Dimensionality...34 Methods Using the IRT Defition of Dimensionality...39 Comparisons of Dimensionality Assessment Methods...44 Earlier Research...44 Hattie s and Tate s Comparative Studies...45 Review of Relevant Literature by Method...46 Item Format and Content Effects in Mixed-Format Tests...49 Item Format Effects...49 Item Content Effects...52 Summary...54 CHAPTER III. METHODOLOGY...56 Data Sources...56 Selection of Tests...56 Test Blueprints...57! vi!

10 Data Preparation...59 Data Description...60 Procedure...64 Methods Using the EFA Definition of Dimensionality...64 Methods Using the IRT Defition of Dimensionality...66 Evaluation Criteria...69 CHAPTER IV. RESULTS...77 Descriptive Statistics...77 Whether Unidimensionality Holds...80 Number of Dimensions...83 Dimensional Structure...89 Results of Item-Level EFA...90 Results of MIRT Cluster Analysis...97 Summary Item Format and Content Effects CHAPTER V. DISCUSSION Summary of Findings Research Question Research Question Research Question Research Question Research Question Research Question Limitations and Future Research Conclusions and Implications REFERENCES APPENDIX A. RESCORING CHEMISTRY TEST ITEMS APPENDIX B. SAMPLE MPLUS ITEM-LEVEL EFA SYNTAX APPENDIX C. SAMPLE FLEXMIRT MODEL ESTIMATION SYNTAX...196! vii!

11 LIST OF TABLES Table 3.1. Test Blueprints for the English Test, Form A and Form B...74 Table 3.2. Test Blueprints for the Spanish Test, Form A and Form B...74 Table 3.3. Test Blueprints for the Chemistry Test, Form A and Form B...74 Table 3.4. Item and Total Score Points for Selected Test Forms...74 Table 3.5. Dimensionality Assessment Methods Investigated...75 Table 4.1. Total Score Scale, Moments, and Reliability Coefficients Table 4.2. Disattenuated Correlations for the English Datasets Table 4.3. Disattenuated Correlations for the Spanish Datasets Table 4.4. Disattenuated Correlations for the Chemistry Datasets Table 4.5. Item Difficulty, Discrimination, and Irregular Items for MC Items Table 4.6. Item Difficulty, Discrimination, and Irregular Items for CR Items Table 4.7. Table 4.8. Table 4.9. Skewness of Frequency Distribution of Response Categories for CR Items Results of Kaiser s Rule, Percent Variance, Ratio of Eigenvalues, and EFA-Based Poly-DIMTEST Methods Claims of Unidimensionality Using Kaiser s Rule, Percent Variance, Ratio of Eigenvalues, and EFA-Based Poly-DIMTEST Methods Table Results of MAP and PA Methods Table Values of Model-Fit Statistics for the English Datasets Table Number of Items Substantially Related to Factors for the English Datasets Table Interfactor Correlations for the English Datasets Table Descriptive Statistics of Correlation Residuals for the English Datasets Table A 3-Factor Solution for the English Test, Baseline (Form A, N = 3,000) Table Item Clusters Produced by Item-Level EFA for the English Datasets...119! viii!

12 Table Values of Model-Fit Statistics for the Spanish Datasets Table Number of Items Substantially Related to Factors for the Spanish Datasets Table Interfactor Correlations for the Spanish Datasets Table Descriptive Statistics of Correlation Residuals for the Spanish Datasets Table A 4-Factor Solution for the Spanish Test, Baseline (Form A, N = 3,000) Table Item Clusters Produced by Item-Level EFA for the Spanish Datasets Table Values of Model-Fit Statistics for the Chemistry Datasets Table Number of Items Substantially Related to Factors for the Chemistry Datasets Table Interfactor Correlations for the Chemistry Datasets Table Descriptive Statistics of Correlation Residuals for the Chemistry Datasets Table Estimated Slope Parameters for the English Test, Baseline (Form A, N = 3,000) Table Stopping Rules and Suggested Number of Clusters for the English Datasets Table Item Clusters Produced by MIRT Cluster Analysis for the English Datasets Table Estimated Slope Parameters for the Spanish Test, Baseline (Form A, N = 3,000) Table Stopping Rules and Suggested Number of Clusters for the Spanish Datasets Table Item Clusters Produced by MIRT Cluster Analysis for the Spanish Datasets Table Estimated Slope Parameters for the Chemistry Test, Baseline (Form A, N = 3,000) Table Stopping Rules and Suggested Number of Clusters for the Chemistry Datasets...141! ix!

13 Table Item Clusters Produced by MIRT Cluster Analysis for the Chemistry Datasets Table Consistency in Dimensionality-Based Item Clusters between Item- Level EFA and MIRT Cluster Analysis Table Results of Format- and Content-Based Poly-DIMTEST Method, Baseline (Form A, N = 3,000) Table Dimensionality-Based Item Clusters Produced by Item-Level EFA and MIRT Cluster Analysis for the English Test, Baseline (Form A, N = 3,000) Table Dimensionality-Based Item Clusters Produced by Item-Level EFA and MIRT Cluster Analysis for the Spanish Test, Baseline (Form A, N = 3,000) Table A.1. Table A.2. Total Score Scale, Moments, and Reliability Coefficients for the Chemistry Datasets, Before and After Rescoring Disattenuated Correlations for the Chemistry Datasets, Before and After Rescoring...194! x!

14 LIST OF FIGURES Figure 2.1. An Example Framework for Exploring Dimensionality of Mixed- Format Test Scores...55 Figure 3.1. Proportions of Consistent and Inconsistent Structural Identifications...76 Figure 3.2. Example Proportions of Consistent and Inconsistent Structural Identifications...76 Figure 4.1. Summed MC Score by Proportion Correct for Two MC Items Figure 4.2. Scree Plots of the First 10 Eigenvalues of the Original and Smoothed Polychoric Correlation Matrices Figure 4.3. Scree Plots of the First 10 Eigenvalues of the Polychoric Correlation Matrices Figure 4.4. MAP Plots for the First 10 Principal Components, Baseline (Form A, N = 3,000) Figure 4.5. MAP Plots for the First 10 Principal Components, Comparisons Between Forms Figure 4.6. MAP Plots for the First 10 Principal Components, Comparisons Among Sample Sizes Figure 4.7. Plots of Eigenvalues Minus PA Threshold for the First 10 Principal Components, Baseline (Form A, N = 3,000) Figure 4.8. Plots of Eigenvalues Minus PA Threshold for the First 10 Principal Components, Comparisons Between Forms Figure 4.9. Plots of Eigenvalues Minus PA Threshold for the First 10 Principal Components, Comparisons Among Sample Sizes Figure R-Squares of 1-Factor Model and R-Square Improvements of 2-, 3-, 4-Factor Models for the English Datasets Figure R-Squares of 1-Factor Model and R-Square Improvements of 2-, 3-, 4-Factor Models for the Spanish Datasets Figure R-Squares of 1-Factor Model and R-Square Improvements of 2-, 3-, 4-Factor Models for the Chemistry Datasets Figure Cluster Dendrogram for the English Test, Baseline (Form A, N = 3,000)...161! xi!

15 Figure Cluster Dendrogram for the Spanish Test, Baseline (Form A, N = 3,000) Figure Cluster Dendrogram for the Chemistry Test, Baseline (Form A, N = 3,000)...163! xii!

16 NOTATION 1PL 2PL 3PL AP CH CINEG CFA CR DH EFA ES GR IRF IRT LS M3PL MAP MAP-R2 MAP-R4 MAP-P2 MAP-P4 MC MDIFF MDISC One-parameter logistic model in IRT Two-parameter logistic model in IRT Three-parameter logistic model in IRT College Board Advanced Placement Exams The Caliński and Harabasz (1974) index Common-item nonequivalent groups design in equating Confirmatory factor analysis Constructed-response item format The Duda and Hart (1973) index Exploratory factor analysis Common-item effect size The graded-response model in IRT Item response function in IRT Item response theory Least square estimation method Multidimensional extension of the 3PL model in IRT Minimum average partial procedure MAP with Pearson partial correlations squared MAP with Pearson partial correlations to the fourth power MAP with polychoric partial correlations squared MAP with polychoric partial correlations to the fourth power Multiple-choice item format Multidimensional item difficulty parameter in IRT Multidimensional item discrimination parameter in IRT! xiii!

17 MGR MIRT ML O1 O2 O3 O4 PA PA-MCm PA-MC95 PA-Pm PA-P95 PCA PD PFA RMSEA RMSR SEM SLI TOEFL Multidimensional extension of the GR model in IRT Multidimensional IRT Maximum likelihood estimation method Assessment outcome 1: whether unidimensionality holds Assessment outcome 2: the number of dimensions Assessment outcome 3: specific dimensional structure Assessment outcome 4: explanations of dimensions Parallel analysis PA based on Monte Carlo simulation with a mean threshold PA based on Monte Carlo simulation with a 95th-percentile threshold PA based on permutations with a mean threshold PA based on permutations with a 95th-percentile threshold Principal component analysis Positive definite matrix Principal factor analysis Root mean square error of approximation calculated by Mplus Root mean square residual calculated by Mplus Structural equation model Strong local independence Test of English as a Foreign Language TOEFL-iBT Internet-based version of TOEFL UIRT WLI Unidimensional IRT Weak local independence!! xiv!

18 ! Principal component loading matrix in PCA Factor loading (factor pattern) matrix in PFA!! Vector of item discriminations or slopes!! Item discrimination or slope parameter in IRT!! Item or category difficulty parameter in IRT!! Item pseudo-guessing parameter in IRT!! Unique factor loading matrix in PFA Matrix scaling a covariance matrix into a correlation matrix!! Number of common factors in PFA Number of latent traits in IRT Item or category intercept parameter in IRT!!! Number of essential dimensions in IRT!! Vector of common factors!! Identity matrix! and!! Items!! Number of items!!! Lower bound used in defining weak monotonicity in IRT!! Item response category!! Number of item response categories!! Number of items in AT1/AT2 when conducting Poly-DIMTEST Sample mean of common-item scores!! Number of examinees!! The polychoric correlation matrix of item scores!! Probability of a correct response in IRT!! Cumulative response function in IRT!! The Pearson correlation matrix of item scores!! Partial correlation matrix used when conducting the MAP! xv!

19 procedure!!! Reduced correlation matrix in PFA!! The Pearson correlation!! and!! Section score!!! Sample variance of common-item scores!! Stout s statistic used when conducting Poly-DIMTEST!! Thresholds used in defining the polychoric correlation!! Vector of unique factors in PFA!! Linear transformation matrix defining the combination of item scores!! Eigenvector of a matrix!! Vector of item scores!! Item score!! Vector of linear combinations of item scores!! Vector of principal components!! Linear combination of item scores!! Principal component!!!! Direction between the item vector and the first latent trait!!!!! Angular distance between two items!! Vector of latent traits!! Latent trait!! Variance-covariance matrix of principal components!! Eigenvalue of a matrix!! Latent variable used in defining the polychoric correlation!!!!!!! Feldt s reliability coefficient estimate! xvi!

20 !!!!!! Disattenuated correlation between two section scores!! Matrix of correlations between common factors in PFA!! Consistency index used to compare dimensional solutions from using two methods!!!! Proportion of item pairs assigned to different dimensions by two methods!!"! Proportion of item pairs assigned to different dimensions by the first method but assigned to the same dimension by the second method!!!! Proportion of item pairs assigned to the same dimension by two methods!!"! Proportion of item pairs assigned to the same dimension by the first method but assigned to different dimensions by the second method! xvii!

21 1 CHAPTER I INTRODUCTION An effective educational test reflects important differences among examinees on one or more dimensions it intends to measure (Yen & Fitzpatrick, 2006). Depending on the specific purposes and uses of a test, such dimensions could be reading comprehension, mathematical reasoning, or chemistry laboratory skills, to name a few. Dimensionality assessment provides empirical knowledge of how these dimensions are reflected by the actual test data and whether or not some unexpected dimensions emerge. The dimensional structure of the data is one of the critical pieces of evidence used to validate interpretations and uses of test scores. Defining Dimensionality Examination of dimensionality is more challenging than it might seem. Perhaps the most fundamental problem facing researchers and practitioners is: what is meant by dimensionality? Hattie (1981, 1984, 1985) first distinguished dimensionality from reliability, homogeneity, and internal consistency. He defined dimensionality in a straightforward way that a unidimensional test is one that has one latent trait underlying the data (Hattie, 1985, p. 157). Following his rationale, a multidimensional test has two or more distinct latent traits underlying the data. However, since latent trait itself is another abstract and ambiguous concept, such definitions are more theoretical than practical and as such shed little light on how dimensionality could be evaluated. As a result, various operational definitions of dimensionality have been developed, serving as the basis for different dimensionality assessment methods. Among these operational definitions, unfortunately, no consensus exists on which one best describes the nature of dimensionality. Researchers and practitioners just implicitly decide on the lens through which they view dimensionality as they choose one or more methods to analyze their own data.

22 2 The term dimensionality has also been used somewhat loosely in the existing literature, appearing at times as a feature of a test and at others as a feature of test scores (Reckase, 2009). Even when the authors expressed literally that a test is unidimensional (or multidimensional), they might imply that dimensionality is some kind of underlying characteristics of the given data (e.g., see Hattie s definition of unidimensionality: a unidimensional test is one that has one latent trait underlying the data). This dissertation adopted the following viewpoint that dimensionality is a characteristic of test scores, rather than that of the test or test form. More specifically, dimensionality represents the dynamic interaction between a test and an examinee population who respond to the test items (Ackerman, 1994; Reckase, 2009; Tate, 2002). In other words, dimensionality is population dependent by definition, although the dimensional structure of test scores does not necessarily change when different examinee populations are involved. Related Concepts Two pairs of contrasting concepts have been mentioned frequently in previous research and application studies on dimensionality assessment, though the names of the specific concept may vary: (a) strict versus essential dimensionality (e.g., McDonald, 1981; Nandakumar, 1991; Stout, 1987, 1990; Tate, 2002, 2003), and (b) fixed versus random dimensionality (e.g., Wainer & Thissen, 1996). In the first pair of concepts, mathematical definitions of strict and essential dimensionality have been proposed using item response theory (IRT) (Stout, 1987), which are explained in Chapter II. Roughly stated here, the former considers all possible dimensions reflected by the data, whereas the latter is only concerned with a limited number of major or dominant dimensions. For operational mixed-format tests, perfect unidimensionality, where all items strictly measure a single dimension, likely rarely occurs. Therefore, this dissertation evaluates essential dimensionality of data and sometimes uses the two terms dimensionality and essential dimensionality

23 3 interchangeably without explicit distinction being made. Some questions of interest could be: Are the data (essentially) unidimensional? How many (major or dominant) dimensions are reflected in the data? It should be noted that essential dimensionality of data is examined in different ways across methods, some of which rely on strong theoretical foundations, whereas others rely greatly on subjective judgments. In the second pair of concepts, an essential distinction between fixed and random dimensionality is the intention of test developers (Wainer & Thissen, 1996). In particular, when test developers build a test to cover a broad range of content subdomains or cognitive skills, they might expect to see some degree of multidimensionality in the scores. On the other hand, after the test is administered and responses are scored, test developers might encounter some unexpected dimensions, for example, due to inappropriate reading load or speededness. Wainer and Thissen (1996) named these concepts fixed and random multidimensionality, respectively. However, it is sometimes difficult to determine whether or not dimensionality is as expected. Take item format as an example. When test developers decide to combine multiple-choice (MC) items with constructed-response (CR) items to create a test form, they may or may not expect to find multidimensionality due to the item format, depending on their conceptual framework regarding whether or not different traits are related to MC and CR items. Furthermore, as will be discussed in greater detail in the literature review in Chapter II, data from a test that consists of different item formats or content subdomains may not show clear multidimensionality at all. Nevertheless, this pair of concepts conveys an important aspect of the essence of dimensionality: it is not solely about statistics. Final decisions on the dimensionality of test scores are generally made based on both statistical results and substantive judgments (Ackerman, Gierl, & Walker, 2003). Especially as unidimensionality appears untenable,

24 4 efforts are needed to find plausible interpretations of dimensions, which could be associated with aspects of test development (the intention of test developers), administration, scoring, and other related policies and practices. Dimensionality assessment is as much an art as it is a science. Dimensionality is a characteristic of test scores, and is thereby affected by the specific scoring method used to produce those scores. For example, a special issue in dimensionality assessment for mixed-format tests stems from the different types of item scores used with MC and CR items, which is discussed later in this chapter. Even for an MC-only test, either number-correct or various correction-for-guessing scoring methods could be considered to score item responses. The choice of scoring method often affects how different dimensionality assessment methods behave and what results they provide. Thus, it becomes necessary to describe how scores are calculated when reporting specific dimensional solutions. When the scoring method is changed, it is prudent to replicate dimensionality assessment so as to determine whether or not different scoring methods lead to substantially different conclusions on dimensional structure. Score reliability is interrelated with dimensionality. Reliability refers to consistency of scores of specified examinees over replicate measurements (Feldt & Brennan, 1989; Haertel, 2006). Consider the situation in which scores are not sufficiently reliable. In this situation, scores of examinees that result from repeated measurements vary to a large extent. Consequently, any dimensional solution based on scores from one replication appears less convincing. Dimensionality in turn affects the evaluation of reliability. Compared with unidimensional data, if the dimensional structure of the data is complex, creating replications of measurements of the same quality demands considerably more effort. Selecting a proper way to express reliability (e.g., which type of reliability coefficients to calculate) also needs careful consideration. Dimensionality also shares a close relationship with equating, a statistical process that enables comparability of scores on multiple forms of a test (Kolen & Brennan, 2014).

25 5 Dimensionality assessment could help inform various decisions for equating, such as construction of test forms and selection of equating models and procedures. One example is how to build a common-item set for equating of mixed-format test forms under a common-item nonequivalent groups (CINEG; Kolen & Brennan, 2014) design. In general, the common items would better represent the content and statistical characteristics of the total test, so that performance on the common items could be generalized to that on the rest of the test. But for mixed-format tests that contain both MC and CR items, is there an additional need for balancing the item format of the common items? Previous research has revealed that the composition of the common-item set, either MC-only or format-representative, could lead to acceptable or unacceptable equating results under certain conditions (Kolen & Lee, 2014). Dimensionality is an important factor to consider, because no such concern might arise when there is no substantial multidimensionality due to the item format. Another example is whether to apply unidimensional IRT (UIRT) or multidimensional IRT (MIRT) models for equating. As their names suggest, UIRT models assume unidimensionality of data (or essential unidimensionality of data in most practical situations), whereas MIRT models are capable of handling multidimensionality. Evaluation of the unidimensionality assumption is a prerequisite for using UIRT models. Assessment of the degree of multidimensionality and particular dimensional structure helps in the decision about whether the benefit of using a MIRT model exceeds the price paid (e.g., heavy computation) and what MIRT models would be appropriate. Assessing Dimensionality This section introduces a rough classification of dimensionality assessment and overviews three major aspects in dimensionality assessment, including operational definition, assessment outcome, and type of statistical procedure. These aspects later

26 6 serve as components for developing a framework to classify and discuss a variety of dimensionality assessment methods, which will be delineated in Chapter II. Exploratory versus Confirmatory Assessment Dimensionality could be evaluated in both exploratory and confirmatory manners (Reckase, 2009; Svetina & Levy, 2014). Exploratory dimensionality assessment, which is the focus of this dissertation, is typically employed when there is no clear hypothesis or evidence concerning the dimensional structure of the given data. It has been routinely conducted, either alone or combined with confirmatory dimensionality assessment, in operational testing programs to check the alignment of actual dimensionality with the intended dimensionality (e.g., Fu, Chung, & Wise, 2013; Jang & Roussos, 2007; Wilson, 2000; Zwick, 1987). Exploratory dimensionality assessment also often serves as an integrated part of a preliminary analysis before studying other psychometric procedures (e.g., MIRT equating, see Brossman & Lee, 2013). However, the borderline between the two types of dimensionality assessment is blurry, and any given method is more likely to lie somewhere on the exploratory-confirmatory spectrum (Svetina & Levy, 2014). The dimensionality assessment methods that are more exploratory in nature are investigated in this dissertation, although the findings from these methods might also be useful for some confirmatory scenarios. Operational Definitions of Dimensionality As noted previously, there is no generally agreed upon unified operational definition of dimensionality. Discrepancies might occur among results from methods driven by different perspectives towards dimensionality. Two perspectives are considered in this dissertation, including an exploratory factor analysis (EFA) perspective and an IRT perspective. A more in-depth description of the two perspectives will be provided in Chapter II.

27 7 Dimensionality has long been analyzed using factor analysis. For example, see Hattie (1985), Reckase (2009), Stone and Yeh (2006), and Velicer, Eaton, and Fava (2000), for overviews of the application of factor analysis methods to dimensionality assessment. EFA, in a broad sense, stands for a class of statistical procedures used to explain the observed variances and covariances (Kline, 2010). Unlike confirmatory factor analysis (CFA), EFA does not require the use of a hypothesized dimensional structure, which seems advantageous for exploratory purposes. From the perspective of EFA, the number of dimensions equals the number of components or factors to retain. The data are considered to be unidimensional when only one component or factor is kept, and otherwise some degree of multidimensionality emerges. Dimensional structure underlying the data corresponds to a particular factor solution produced by EFA. Alternatively, dimensionality could be defined in the context of IRT. Advocates of this perspective believe that IRT allows for a clear and precise way to understand the concepts of unidimensionality and multidimensionality (Hattie, 1985; Nandakumar, 1991; Stout, 1987; Stout et al., 1996; Zhang & Stout, 1999a, 1999b). A traditional IRT definition of dimensionality is the minimum number of latent traits required for a locally independent and monotone model, which was presented in Stout (1990) and later elaborated in Nandakumar (1991), Stout et al. (1996), and Zhang and Stout (1999a, 1999b). Specifically, local independence implies that, after controlling for underlying latent traits, item responses are either mutually independent or pairwise uncorrelated, depending on whether strong local independence (SLI; Lord, 1980) or weak local independence (WLI; McDonald, 1981) is considered. Monotonicity indicates that the probability of correctly answering an item changes monotonically with values of latent traits. When a single latent trait is sufficient to produce such a model, the data are considered to be unidimensional. If the data are not unidimensional, the number of latent traits required defines the number of dimensions. Based on IRT, Stout (1987) also defined essential dimensionality as the number of major or dominant latent traits, which

28 8 turns out to be one of the most substantial concepts in designing some nonparametric dimensionality assessment procedures. Similar ideas were conveyed in some of the factor analysis literature (for a brief overview, see Stout, 1990), but a rigorous mathematical definition of essential dimensionality was first provided in Stout (1987) using IRT. Assessment Outcomes Dimensionality assessment, in general, is intended to assist researchers and practitioners in gaining a more accurate understanding of a test s internal structure. In earlier research, the primary purpose of dimensionality assessment was to check whether unidimensionality holds, because it was considered as one of the most desirable characteristics (Hattie, 1984, 1985). As more and more complex educational tests are developed, issues of multidimensionality have arisen more frequently; increasingly, dimensionality assessment has been applied to investigate multidimensional structure when unidimensionality looks doubtful (Svetina & Levy, 2014; Tate, 2003; Zhang & Stout, 1999a). Furr and Bacharach (2003) characterized the purposes of general dimensionality assessment to (a) decide on the number of dimensions, (b) estimate the correlation between multiple dimensions (if any), and (c) map statistical dimensions to psychological attributes. Taking into account different perspectives of previous studies (e.g., Furr & Bacharach, 2003; Hattie, 1985; Svetina & Levy, 2014; Tate, 2003; Zhang & Stout, 1999a) and personal experiences in analyzing dimensionality of operational mixedformat test data, this dissertation addresses the following major outcomes for exploratory dimensionality assessment: O1. Whether unidimensionality holds; O2. The number of dimensions; and O3. Specific dimensional structure (i.e., how items cluster around multiple underlying dimensions).

29 9 Information and documentation collected in addition to the test scores might also facilitate one further outcome (O4) including possible and meaningful explanations of emerging dimensions. Ackerman et al. (2003) among other researchers and practitioners have discussed some useful practices for making judgments on how and why a certain dimensional pattern is reflected by the data. For example, they suggested practitioners at least consider test specifications and conduct some additional content and psychological analyses to inform the dimensionality assessment. Because the primary purpose of this dissertation is to compare different dimensionality assessment methods using real data, attention was mostly paid to the first three outcomes, O1, O2, and O3; a brief discussion related to O4 is included but not emphasized. Types of Statistical Procedure Depending on the statistical procedure used, dimensionality assessment methods could be viewed as members from two different families, parametric and nonparametric. Parametric methods rely on certain psychometric and statistical models and assumptions, whereas nonparametric methods are more data-oriented and require few or weak models and assumptions. It is misleading to establish hasty connections between type of procedure (i.e., parametric or nonparametric) and definition of dimensionality (i.e., EFA or IRT) while ignoring the essential difference between parametric and nonparametric methods as indicated above. In particular, not all EFA-based methods are parametric, nor are all IRTbased methods nonparametric. For EFA, an example is Buja and Eyuboglu s (1992) version of parallel analysis (PA). Unlike the original PA (Horn, 1965) that is a parametric procedure, their version possesses nonparametric properties by replacing the normality assumption with the permutation principle. In IRT, although some nonparametric methods based on conditional covariances have been widely used, such as DIMTEST

30 10 (Nandakumar & Stout, 1993;; Stout, 1987), Miller and Hirsch s (1992) method appears to be parametric as it greatly depends on MIRT models. Remarks Variations in all of these aspects directly and indirectly impact the performance of dimensionality assessment methods. Different methods often yield dissimilar dimensional solutions, but no method has been clearly demonstrated to be universally satisfactory (Hakstian & Muller, 1973; Hambleton & Rovinelli, 1986; Hattie, 1984; Mroch & Bolt, 2006; Nandakumar, 1994; Stone & Yeh, 2006; Svetina & Levy, 2014; Tate, 2003; van Abswoude, van der Ark, & Sijtsma, 2004). Ease of implementation, and especially software accessibility, also plays a subtle role in dimensionality assessment. Some seemingly promising methods are not accompanied by any widely available software, which could compromise their practical applications. In contrast, default methods of some popular software packages might distort understanding of dimensionality to a considerable extent. Researchers and practitioners have to make several decisions to weigh potential gains and losses when selecting specific methods. Challenges Introduced by Mixed-Format Tests The goal of this dissertation is to understand performance of some popular and promising dimensionality assessment methods in mixed-format tests containing MC and CR items. In this section, definitions of MC and CR items are provided, followed by discussions on potential sources of multidimensionality and challenges of dimensionality assessment in the context of mixed-format tests. Definitions of MC and CR Items An MC item requires examinees to select one correct answer from a list of possible answers (Downing, 2006). Typically, an MC item is composed of a stem and four or five options, including one best answer and several distractors. In order to provide more context for an item, a stimulus such as a passage, table, or graph might also be

31 11 presented. Such a stimulus is either linked to a single item or shared by a set of items known as the testlet (Wainer & Kiely, 1987; Wainer & Lewis, 1990). Although different scoring methods could be used for scoring MC items, the number-correct scoring method is considered in this dissertation. Specifically, responses are dichotomously scored as 0 for an incorrect answer and 1 for a correct answer without penalty for guessing. A CR item, in contrast, requires examinees to construct their own answers. Various types of CR items have been seen in operational tests, such as short answer, essay, and speaking prompts. CR items are typically scored on an integer scale, such as , according to some predetermined scoring rubrics. Human raters, machine raters, or both could be involved in the scoring for CR items; for tests applying multiple raters, the number of raters and methods coordinating ratings vary across different tests. Advantages and limitations of MC and CR items have been discussed extensively in the literature (for a review, see Schmeiser & Welch, 2006). To combine strengths of different item formats and mitigate possible weaknesses of using either MC or CR items alone, many state and national educational testing programs have adopted mixed-format tests. The widespread use of mixed-format tests, however, complicates dimensionality assessment both conceptually and methodologically, as discussed next. Potential Sources of Multidimensionality Conceptually, data from mixed-format tests tend to have a more complex dimensional structure compared to those from single-format tests. A unique source of multidimensionality associated with mixed-format tests is the item format. However, previous research studies on whether varying item format introduces extra dimensions known as the item format effect (e.g., Bennett, Rock, & Wang, 1991; Bridgeman & Rock, 1993; Hohensin & Kubinger, 2011; Lissitz, Hou, & Slater, 2012; Manhart, 1996; Perkhounkova & Dunbar, 1999; Thissen, Wainer, & Wang, 1994; Traub, 1993; Traub & Fisher, 1977) have shown mixed findings.

32 12 Data from mixed-format tests are also vulnerable to the same potential sources of multidimensionality that may affect single-format tests. One major source is the item content. As indicated in Wainer and Thissen (1996), when a test is developed to measure several content subdomains, there is a natural view that some degree of multidimensionality might occur. However, such a view has not always been supported by empirical studies on single-format and mixed-format educational tests (e.g., Perkhounkova & Dunbar, 1999; Wilson, 2000; Zwick, 1987). Further, for some mixedformat tests, item content and format entangle themselves to influence the dimensional structure of the data, making dimensionality assessment even more difficult (Perkhounkova & Dunbar, 1999). In addition to item format and content, there are a number of factors that may cause the presence of multidimensionality in data from mixed-format tests. For example, Yen (1993) included passage or stimulus dependence in the testlet, speededness, rater effects, and inappropriate external interference among potential causes of local dependence and, given the close relationship between dimensionality and local dependence from the IRT perspective, those reasons likely lead to multidimensional data. This dissertation, however, focuses on two of the major sources: item format and content. Methodological Challenges in Dimensionality Assessment A methodological challenge facing dimensionality assessment of mixed-format test data stems from different types of item scores used with MC and CR items. Typically MC items are dichotomously scored, whereas CR items are polytomously scored. Thus, an ideal method for analyzing mixed-format test data should be able to handle dichotomous and polytomous item data simultaneously, which eliminates some widely used procedures developed for MC-only tests (Svetina & Levy, 2012). Those methods might still be used after polytomous data are dichotomized by collapsing multiple

33 13 categories into two categories (Svetina & Levy, 2014). However, unique information provided by CR items is lost during the dichotomization process. Even for those methods that appear technically feasible for mixed-format tests, their performance in this new context has not been studied as fully as would be desired. Fundamentally, there is a need to devise a general framework to categorize a variety of methods currently available. Some rough classifications have been suggested over the past few decades. For instance, methods have been classified according to their theoretical underpinnings (Hattie, 1985), types of procedures (Tate, 2003; Zhang & Stout, 1999a), or levels of complexity in assessment (Zhang & Stout, 1999a). Svetina and Levy (2012, 2014) put forward frameworks for dimensionality assessment methods and software packages with more sophisticated structure. But most of these frameworks were proposed based on dichotomous data (or requiring the dichotomization process prior to dimensionality assessment), so that their conclusions are not directly applicable to the context of mixed-format tests. To the best of the author s knowledge, there is currently no framework specifically for mixed-format tests that has been built with a clear focus on exploratory dimensionality assessment. There is also a need for more empirical comparisons of methods. Whereas a number of comparative studies have been done for MC-only tests (e.g., Hattie, 1984; Mroch & Bolt, 2006; Nandakumar, 1994; Stone & Yeh, 2006; Tate, 2003; van Abswoude et al., 2004; Svetina & Levy, 2014; Velicer et al., 2000), few studies have been done based on data from operational mixed-format tests. One of the consequences of this is a lack of recommendations and guidelines for researchers and practitioners who want to select methods that would be more beneficial for their own mixed-format test data, needs, and resources.

34 14 Overview of Data Sources An overview of the College Board Advanced Placement (AP) Exams, which are examples of mixed-format tests, is provided in this section. Data used in this dissertation were from several AP Exams. The AP Exam is a critical component of the AP Program that offers high school students the opportunity to receive college-level course credit and/or advanced placement in many colleges and universities in the United States (The College Board, 2014). There are over thirty AP Exams assessing students proficiency in subjects from six general areas: Arts, English, History & Social Science, Math & Computer Science, Sciences, and World Languages & Cultures. Information and documents concerning the courses and exams are available on the College Board website ( A typical AP Exam is comprised of both the MC and CR sections and covers multiple content subdomains or cognitive skills aligning to the proposed test specifications (The College Board, 2014). Yet the test length, test configuration (e.g., content and format distribution), and testing time can greatly vary depending on the examination. In the MC section, MC items may be stand-alone or grouped into testlets sharing common passages, audio scripts, tables, problem-solving settings, or other types of stimuli. Typically, MC items have four or five options each. Starting from the 2011 administration, number-correct scoring has been used for MC items, meaning that only correct answers count and no penalty is imposed for wrong answers (Kolen & Lee, 2011). In the CR section, various types of CR items are employed to best fulfill the proposed test purposes. Some typical types of CR items include short answer, long answer, synthesis essay, and speaking prompts under different scenarios. CR items are scored by human raters following standardized scoring rubrics. There is considerable diversity in the number of response categories and the resulting score range of the CR item among tests and items.

35 15 Operationally, a composite score is calculated based on weighted MC and CR section scores. In order to obtain interchangeable scores from different forms of the test, equating is conducted under the CINEG design (Kolen & Lee, 2011). A set of MC items resembling the total test in terms of content and statistical characteristics is treated as the common-item set for operational equating. The composite score is later converted into the AP Grade for each examinee. The AP Grades of 1 to 5 inform how well an examinee is qualified to receive college-level credit in a specific subject area (The College Board, 2014). Three AP Exams are used as illustrative examples in this dissertation: AP English Language and Composition (English), AP Spanish Language and Culture (Spanish), and AP Chemistry (Chemistry). It is important to note that a sequence of modifications of operational data were made in order to better address the research objectives and questions of this dissertation. As a result, the characteristics of these tests and samples are no longer similar to those in the operational settings, and results and findings in the empirical study have no direct implication for operational AP Exams. The primary focus of this dissertation is on the performance of different exploratory dimensionality assessment methods under varying conditions, and not on the characteristics of operational AP Exams. Research Objectives and Questions The intent of this dissertation is to achieve two main research objectives. First, a framework is built for considering various dimensionality assessment methods. Second, empirical method comparisons are conducted to investigate how different methods behave relative to each other under realistic conditions. Both real and simulated data are useful for the method comparison, but the former is of particular interest in this dissertation. The use of real data is important for several reasons. First, without a general knowledge of how specific dimensionality assessment

36 16 methods perform for operational mixed-format tests, it is difficult to design simulations. Unfortunately, such knowledge is scarce in the literature. It is real data analysis that allows for patterns of similarities and dissimilarities among methods to be observed and understood. Second, the inevitable biasing influence of the data generation method on the findings when using simulated data vanishes when real data are used. With real data studies, no dimensionality assessment method would be unfairly advantaged or disadvantaged due to the way the artificial multidimensional data were created. In spite of the fact that empirical studies alone cannot lead to definitive conclusions about which method is the best to use, the findings can still be valuable for researchers and practitioners when selecting among different methods. Not all possible methods are examined in this dissertation, but the most popular and promising methods for exploring dimensionality of mixed-format test data are included. Specific methods include Kaiser s rule (Kaiser, 1960), percent variance (Reckase, 1979), ratio of eigenvalues (Hattie, 1985; Lord, 1980), scree test (Cattell, 1966), original and refined versions of minimum average partial (MAP) procedure (Velicer, 1976; Velicer et al., 2000), original and refined versions of PA (Buja & Eyuboglu, 1992; Glorfeld, 1995; Horn, 1965), item-level EFA (Muthén & Muthén, ), exploratory and confirmatory modes of Poly-DIMTEST (Nandakumar, Yu, Li, & Stout, 1998), and MIRT cluster analysis (Miller & Hirsch, 1992). These methods will be described in Chapter II of this dissertation. Given the large number of methods available for examining dimensionality of mixed-format test data, the intent of the empirical study is to evaluate the consistency of their results. That is, the extent to which these methods provide similar dimensional solutions. Considering possible assessment outcomes, results are compared with regard to (O1) whether unidimensionality holds, (O2) the number of dimensions, and (O3) specific dimensional structure. Based on previous work on MC-only tests, discrepancies are likely

Assessing Dimensionality in Complex Data Structures: A Performance Comparison of DETECT and NOHARM Procedures. Dubravka Svetina

Assessing Dimensionality in Complex Data Structures: A Performance Comparison of DETECT and NOHARM Procedures. Dubravka Svetina Assessing Dimensionality in Complex Data Structures: A Performance Comparison of DETECT and NOHARM Procedures by Dubravka Svetina A Dissertation Presented in Partial Fulfillment of the Requirements for

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Multidimensional Modeling of Learning Progression-based Vertical Scales 1

Multidimensional Modeling of Learning Progression-based Vertical Scales 1 Multidimensional Modeling of Learning Progression-based Vertical Scales 1 Nina Deng deng.nina@measuredprogress.org Louis Roussos roussos.louis@measuredprogress.org Lee LaFond leelafond74@gmail.com 1 This

More information

Effect of Violating Unidimensional Item Response Theory Vertical Scaling Assumptions on Developmental Score Scales

Effect of Violating Unidimensional Item Response Theory Vertical Scaling Assumptions on Developmental Score Scales University of Iowa Iowa Research Online Theses and Dissertations Summer 2013 Effect of Violating Unidimensional Item Response Theory Vertical Scaling Assumptions on Developmental Score Scales Anna Marie

More information

Decision consistency and accuracy indices for the bifactor and testlet response theory models

Decision consistency and accuracy indices for the bifactor and testlet response theory models University of Iowa Iowa Research Online Theses and Dissertations Summer 2014 Decision consistency and accuracy indices for the bifactor and testlet response theory models Lee James LaFond University of

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Assessing IRT Model-Data Fit for Mixed Format Tests

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Assessing IRT Model-Data Fit for Mixed Format Tests Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 26 for Mixed Format Tests Kyong Hee Chon Won-Chan Lee Timothy N. Ansley November 2007 The authors are grateful to

More information

Linking across forms in vertical scaling under the common-item nonequvalent groups design

Linking across forms in vertical scaling under the common-item nonequvalent groups design University of Iowa Iowa Research Online Theses and Dissertations Spring 2013 Linking across forms in vertical scaling under the common-item nonequvalent groups design Xuan Wang University of Iowa Copyright

More information

The Effect of Guessing on Assessing Dimensionality in Multiple-Choice Tests: A Monte Carlo Study with Application. Chien-Chi Yeh

The Effect of Guessing on Assessing Dimensionality in Multiple-Choice Tests: A Monte Carlo Study with Application. Chien-Chi Yeh The Effect of Guessing on Assessing Dimensionality in Multiple-Choice Tests: A Monte Carlo Study with Application by Chien-Chi Yeh B.S., Chung Yuan Christian University, 1988 M.Ed., National Tainan Teachers

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

USING MULTIDIMENSIONAL ITEM RESPONSE THEORY TO REPORT SUBSCORES ACROSS MULTIPLE TEST FORMS. Jing-Ru Xu

USING MULTIDIMENSIONAL ITEM RESPONSE THEORY TO REPORT SUBSCORES ACROSS MULTIPLE TEST FORMS. Jing-Ru Xu USING MULTIDIMENSIONAL ITEM RESPONSE THEORY TO REPORT SUBSCORES ACROSS MULTIPLE TEST FORMS By Jing-Ru Xu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking Jee Seon Kim University of Wisconsin, Madison Paper presented at 2006 NCME Annual Meeting San Francisco, CA Correspondence

More information

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing

The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing Terry A. Ackerman University of Illinois This study investigated the effect of using multidimensional items in

More information

Using the Score-based Testlet Method to Handle Local Item Dependence

Using the Score-based Testlet Method to Handle Local Item Dependence Using the Score-based Testlet Method to Handle Local Item Dependence Author: Wei Tao Persistent link: http://hdl.handle.net/2345/1363 This work is posted on escholarship@bc, Boston College University Libraries.

More information

Jason L. Meyers. Ahmet Turhan. Steven J. Fitzpatrick. Pearson. Paper presented at the annual meeting of the

Jason L. Meyers. Ahmet Turhan. Steven J. Fitzpatrick. Pearson. Paper presented at the annual meeting of the Performance of Ability Estimation Methods for Writing Assessments under Conditio ns of Multidime nsionality Jason L. Meyers Ahmet Turhan Steven J. Fitzpatrick Pearson Paper presented at the annual meeting

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 39 Evaluation of Comparability of Scores and Passing Decisions for Different Item Pools of Computerized Adaptive Examinations

More information

Multidimensionality and Item Bias

Multidimensionality and Item Bias Multidimensionality and Item Bias in Item Response Theory T. C. Oshima, Georgia State University M. David Miller, University of Florida This paper demonstrates empirically how item bias indexes based on

More information

UNIDIMENSIONAL VERTICAL SCALING OF MIXED FORMAT TESTS IN THE PRESENCE OF ITEM FORMAT EFFECT. Debra White Moore

UNIDIMENSIONAL VERTICAL SCALING OF MIXED FORMAT TESTS IN THE PRESENCE OF ITEM FORMAT EFFECT. Debra White Moore UNIDIMENSIONAL VERTICAL SCALING OF MIXED FORMAT TESTS IN THE PRESENCE OF ITEM FORMAT EFFECT by Debra White Moore B.M.Ed., University of North Carolina, Greensboro, 1989 M.A., University of Pittsburgh,

More information

Does factor indeterminacy matter in multi-dimensional item response theory?

Does factor indeterminacy matter in multi-dimensional item response theory? ABSTRACT Paper 957-2017 Does factor indeterminacy matter in multi-dimensional item response theory? Chong Ho Yu, Ph.D., Azusa Pacific University This paper aims to illustrate proper applications of multi-dimensional

More information

VERDIN MANUSCRIPT REVIEW HISTORY REVISION NOTES FROM AUTHORS (ROUND 2)

VERDIN MANUSCRIPT REVIEW HISTORY REVISION NOTES FROM AUTHORS (ROUND 2) 1 VERDIN MANUSCRIPT REVIEW HISTORY REVISION NOTES FROM AUTHORS (ROUND 2) Thank you for providing us with the opportunity to revise our paper. We have revised the manuscript according to the editors and

More information

linking in educational measurement: Taking differential motivation into account 1

linking in educational measurement: Taking differential motivation into account 1 Selecting a data collection design for linking in educational measurement: Taking differential motivation into account 1 Abstract In educational measurement, multiple test forms are often constructed to

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

Linking Assessments: Concept and History

Linking Assessments: Concept and History Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.

More information

How Do We Assess Students in the Interpreting Examinations?

How Do We Assess Students in the Interpreting Examinations? How Do We Assess Students in the Interpreting Examinations? Fred S. Wu 1 Newcastle University, United Kingdom The field of assessment in interpreter training is under-researched, though trainers and researchers

More information

Evaluating the quality of analytic ratings with Mokken scaling

Evaluating the quality of analytic ratings with Mokken scaling Psychological Test and Assessment Modeling, Volume 57, 2015 (3), 423-444 Evaluating the quality of analytic ratings with Mokken scaling Stefanie A. Wind 1 Abstract Greatly influenced by the work of Rasch

More information

was also my mentor, teacher, colleague, and friend. It is tempting to review John Horn s main contributions to the field of intelligence by

was also my mentor, teacher, colleague, and friend. It is tempting to review John Horn s main contributions to the field of intelligence by Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179 185. (3362 citations in Google Scholar as of 4/1/2016) Who would have thought that a paper

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

Graphical Representation of Multidimensional

Graphical Representation of Multidimensional Graphical Representation of Multidimensional Item Response Theory Analyses Terry Ackerman University of Illinois, Champaign-Urbana This paper illustrates how graphical analyses can enhance the interpretation

More information

Methodological Issues in Measuring the Development of Character

Methodological Issues in Measuring the Development of Character Methodological Issues in Measuring the Development of Character Noel A. Card Department of Human Development and Family Studies College of Liberal Arts and Sciences Supported by a grant from the John Templeton

More information

Nonparametric DIF. Bruno D. Zumbo and Petronilla M. Witarsa University of British Columbia

Nonparametric DIF. Bruno D. Zumbo and Petronilla M. Witarsa University of British Columbia Nonparametric DIF Nonparametric IRT Methodology For Detecting DIF In Moderate-To-Small Scale Measurement: Operating Characteristics And A Comparison With The Mantel Haenszel Bruno D. Zumbo and Petronilla

More information

AN EXPLORATORY STUDY OF LEADER-MEMBER EXCHANGE IN CHINA, AND THE ROLE OF GUANXI IN THE LMX PROCESS

AN EXPLORATORY STUDY OF LEADER-MEMBER EXCHANGE IN CHINA, AND THE ROLE OF GUANXI IN THE LMX PROCESS UNIVERSITY OF SOUTHERN QUEENSLAND AN EXPLORATORY STUDY OF LEADER-MEMBER EXCHANGE IN CHINA, AND THE ROLE OF GUANXI IN THE LMX PROCESS A Dissertation submitted by Gwenda Latham, MBA For the award of Doctor

More information

The Influence of Test Characteristics on the Detection of Aberrant Response Patterns

The Influence of Test Characteristics on the Detection of Aberrant Response Patterns The Influence of Test Characteristics on the Detection of Aberrant Response Patterns Steven P. Reise University of California, Riverside Allan M. Due University of Minnesota Statistical methods to assess

More information

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data Item Response Theory: Methods for the Analysis of Discrete Survey Response Data ICPSR Summer Workshop at the University of Michigan June 29, 2015 July 3, 2015 Presented by: Dr. Jonathan Templin Department

More information

Paul Irwing, Manchester Business School

Paul Irwing, Manchester Business School Paul Irwing, Manchester Business School Factor analysis has been the prime statistical technique for the development of structural theories in social science, such as the hierarchical factor model of human

More information

A Comparison of DIMTEST and Generalized Dimensionality Discrepancy. Approaches to Assessing Dimensionality in Item Response Theory. Ray E.

A Comparison of DIMTEST and Generalized Dimensionality Discrepancy. Approaches to Assessing Dimensionality in Item Response Theory. Ray E. A Comparison of DIMTEST and Generalized Dimensionality Discrepancy Approaches to Assessing Dimensionality in Item Response Theory by Ray E. Reichenberg A Thesis Presented in Partial Fulfillment of the

More information

Computerized Mastery Testing

Computerized Mastery Testing Computerized Mastery Testing With Nonequivalent Testlets Kathleen Sheehan and Charles Lewis Educational Testing Service A procedure for determining the effect of testlet nonequivalence on the operating

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

Ecological Statistics

Ecological Statistics A Primer of Ecological Statistics Second Edition Nicholas J. Gotelli University of Vermont Aaron M. Ellison Harvard Forest Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Brief Contents

More information

2 Types of psychological tests and their validity, precision and standards

2 Types of psychological tests and their validity, precision and standards 2 Types of psychological tests and their validity, precision and standards Tests are usually classified in objective or projective, according to Pasquali (2008). In case of projective tests, a person is

More information

IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS

IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS A Dissertation Presented to The Academic Faculty by HeaWon Jun In Partial Fulfillment of the Requirements

More information

Linking Mixed-Format Tests Using Multiple Choice Anchors. Michael E. Walker. Sooyeon Kim. ETS, Princeton, NJ

Linking Mixed-Format Tests Using Multiple Choice Anchors. Michael E. Walker. Sooyeon Kim. ETS, Princeton, NJ Linking Mixed-Format Tests Using Multiple Choice Anchors Michael E. Walker Sooyeon Kim ETS, Princeton, NJ Paper presented at the annual meeting of the American Educational Research Association (AERA) and

More information

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and private study only. The thesis may not be reproduced elsewhere

More information

PLANNING THE RESEARCH PROJECT

PLANNING THE RESEARCH PROJECT Van Der Velde / Guide to Business Research Methods First Proof 6.11.2003 4:53pm page 1 Part I PLANNING THE RESEARCH PROJECT Van Der Velde / Guide to Business Research Methods First Proof 6.11.2003 4:53pm

More information

A Bayesian Nonparametric Model Fit statistic of Item Response Models

A Bayesian Nonparametric Model Fit statistic of Item Response Models A Bayesian Nonparametric Model Fit statistic of Item Response Models Purpose As more and more states move to use the computer adaptive test for their assessments, item response theory (IRT) has been widely

More information

Development, Standardization and Application of

Development, Standardization and Application of American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,

More information

Research Questions and Survey Development

Research Questions and Survey Development Research Questions and Survey Development R. Eric Heidel, PhD Associate Professor of Biostatistics Department of Surgery University of Tennessee Graduate School of Medicine Research Questions 1 Research

More information

Development and Psychometric Properties of the Relational Mobility Scale for the Indonesian Population

Development and Psychometric Properties of the Relational Mobility Scale for the Indonesian Population Development and Psychometric Properties of the Relational Mobility Scale for the Indonesian Population Sukaesi Marianti Abstract This study aims to develop the Relational Mobility Scale for the Indonesian

More information

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017 Learning objectives 1. Get familiar with the basic idea

More information

Assessing the Validity and Reliability of the Teacher Keys Effectiveness. System (TKES) and the Leader Keys Effectiveness System (LKES)

Assessing the Validity and Reliability of the Teacher Keys Effectiveness. System (TKES) and the Leader Keys Effectiveness System (LKES) Assessing the Validity and Reliability of the Teacher Keys Effectiveness System (TKES) and the Leader Keys Effectiveness System (LKES) of the Georgia Department of Education Submitted by The Georgia Center

More information

Professional Counseling Psychology

Professional Counseling Psychology Professional Counseling Psychology Regulations for Case Conceptualization Preparation Manual Revised Spring 2015 Table of Contents Timeline... 3 Committee Selection and Paperwork... 3 Selection of Client

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

isc ove ring i Statistics sing SPSS

isc ove ring i Statistics sing SPSS isc ove ring i Statistics sing SPSS S E C O N D! E D I T I O N (and sex, drugs and rock V roll) A N D Y F I E L D Publications London o Thousand Oaks New Delhi CONTENTS Preface How To Use This Book Acknowledgements

More information

Supplementary Material*

Supplementary Material* Supplementary Material* Lipner RS, Brossman BG, Samonte KM, Durning SJ. Effect of Access to an Electronic Medical Resource on Performance Characteristics of a Certification Examination. A Randomized Controlled

More information

Fundamental Concepts for Using Diagnostic Classification Models. Section #2 NCME 2016 Training Session. NCME 2016 Training Session: Section 2

Fundamental Concepts for Using Diagnostic Classification Models. Section #2 NCME 2016 Training Session. NCME 2016 Training Session: Section 2 Fundamental Concepts for Using Diagnostic Classification Models Section #2 NCME 2016 Training Session NCME 2016 Training Session: Section 2 Lecture Overview Nature of attributes What s in a name? Grain

More information

Comprehensive Statistical Analysis of a Mathematics Placement Test

Comprehensive Statistical Analysis of a Mathematics Placement Test Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational

More information

The effects of ordinal data on coefficient alpha

The effects of ordinal data on coefficient alpha James Madison University JMU Scholarly Commons Masters Theses The Graduate School Spring 2015 The effects of ordinal data on coefficient alpha Kathryn E. Pinder James Madison University Follow this and

More information

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations May 215 Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items Tamara Beth

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

EFFECTS OF OUTLIER ITEM PARAMETERS ON IRT CHARACTERISTIC CURVE LINKING METHODS UNDER THE COMMON-ITEM NONEQUIVALENT GROUPS DESIGN

EFFECTS OF OUTLIER ITEM PARAMETERS ON IRT CHARACTERISTIC CURVE LINKING METHODS UNDER THE COMMON-ITEM NONEQUIVALENT GROUPS DESIGN EFFECTS OF OUTLIER ITEM PARAMETERS ON IRT CHARACTERISTIC CURVE LINKING METHODS UNDER THE COMMON-ITEM NONEQUIVALENT GROUPS DESIGN By FRANCISCO ANDRES JIMENEZ A THESIS PRESENTED TO THE GRADUATE SCHOOL OF

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

TECHNICAL REPORT. The Added Value of Multidimensional IRT Models. Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock

TECHNICAL REPORT. The Added Value of Multidimensional IRT Models. Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock 1 TECHNICAL REPORT The Added Value of Multidimensional IRT Models Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock Center for Health Statistics, University of Illinois at Chicago Corresponding

More information

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida and Oleksandr S. Chernyshenko University of Canterbury Presented at the New CAT Models

More information

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus

More information

Progressive Matrices

Progressive Matrices Seeing Reason: Visuospatial Ability, Sex Differences and the Raven s Progressive Matrices Nicolette Amanda Waschl School of Psychology, University of Adelaide A thesis submitted in fulfillment of the requirements

More information

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia

Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia Analyzing Teacher Professional Standards as Latent Factors of Assessment Data: The Case of Teacher Test-English in Saudi Arabia 1 Introduction The Teacher Test-English (TT-E) is administered by the NCA

More information

LUO, XIAO, Ph.D. The Optimal Design of the Dual-purpose Test. (2013) Directed by Dr. Richard M. Luecht. 155 pp.

LUO, XIAO, Ph.D. The Optimal Design of the Dual-purpose Test. (2013) Directed by Dr. Richard M. Luecht. 155 pp. LUO, XIAO, Ph.D. The Optimal Design of the Dual-purpose Test. (2013) Directed by Dr. Richard M. Luecht. 155 pp. Traditional test development focused on one purpose of the test, either ranking test-takers

More information

Michael Hallquist, Thomas M. Olino, Paul A. Pilkonis University of Pittsburgh

Michael Hallquist, Thomas M. Olino, Paul A. Pilkonis University of Pittsburgh Comparing the evidence for categorical versus dimensional representations of psychiatric disorders in the presence of noisy observations: a Monte Carlo study of the Bayesian Information Criterion and Akaike

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research

Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Alternative Methods for Assessing the Fit of Structural Equation Models in Developmental Research Michael T. Willoughby, B.S. & Patrick J. Curran, Ph.D. Duke University Abstract Structural Equation Modeling

More information

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses Journal of Modern Applied Statistical Methods Copyright 2005 JMASM, Inc. May, 2005, Vol. 4, No.1, 275-282 1538 9472/05/$95.00 Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement

More information

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests David Shin Pearson Educational Measurement May 007 rr0701 Using assessment and research to promote learning Pearson Educational

More information

Re-Examining the Role of Individual Differences in Educational Assessment

Re-Examining the Role of Individual Differences in Educational Assessment Re-Examining the Role of Individual Differences in Educational Assesent Rebecca Kopriva David Wiley Phoebe Winter University of Maryland College Park Paper presented at the Annual Conference of the National

More information

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-2016 ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Optimism in child development: Conceptual issues and methodological approaches. Edwina M. Farrall

Optimism in child development: Conceptual issues and methodological approaches. Edwina M. Farrall Optimism in child development: Conceptual issues and methodological approaches. Edwina M. Farrall School of Psychology University of Adelaide South Australia October, 2007 ii TABLE OF CONTENTS ABSTRACT

More information

Measuring and Assessing Study Quality

Measuring and Assessing Study Quality Measuring and Assessing Study Quality Jeff Valentine, PhD Co-Chair, Campbell Collaboration Training Group & Associate Professor, College of Education and Human Development, University of Louisville Why

More information

A critical look at the use of SEM in international business research

A critical look at the use of SEM in international business research sdss A critical look at the use of SEM in international business research Nicole F. Richter University of Southern Denmark Rudolf R. Sinkovics The University of Manchester Christian M. Ringle Hamburg University

More information

PharmaSUG Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching

PharmaSUG Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching PharmaSUG 207 - Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching Aran Canes, Cigna Corporation ABSTRACT Coarsened Exact

More information

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis?

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? Richards J. Heuer, Jr. Version 1.2, October 16, 2005 This document is from a collection of works by Richards J. Heuer, Jr.

More information

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA

On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA STRUCTURAL EQUATION MODELING, 13(2), 186 203 Copyright 2006, Lawrence Erlbaum Associates, Inc. On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation

More information

Chapter 1: Explaining Behavior

Chapter 1: Explaining Behavior Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring

More information

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory Kate DeRoche, M.A. Mental Health Center of Denver Antonio Olmos, Ph.D. Mental Health

More information

College Student Self-Assessment Survey (CSSAS)

College Student Self-Assessment Survey (CSSAS) 13 College Student Self-Assessment Survey (CSSAS) Development of College Student Self Assessment Survey (CSSAS) The collection and analysis of student achievement indicator data are of primary importance

More information

The Influence of Conditioning Scores In Performing DIF Analyses

The Influence of Conditioning Scores In Performing DIF Analyses The Influence of Conditioning Scores In Performing DIF Analyses Terry A. Ackerman and John A. Evans University of Illinois The effect of the conditioning score on the results of differential item functioning

More information

Reveal Relationships in Categorical Data

Reveal Relationships in Categorical Data SPSS Categories 15.0 Specifications Reveal Relationships in Categorical Data Unleash the full potential of your data through perceptual mapping, optimal scaling, preference scaling, and dimension reduction

More information

During the past century, mathematics

During the past century, mathematics An Evaluation of Mathematics Competitions Using Item Response Theory Jim Gleason During the past century, mathematics competitions have become part of the landscape in mathematics education. The first

More information

CYRINUS B. ESSEN, IDAKA E. IDAKA AND MICHAEL A. METIBEMU. (Received 31, January 2017; Revision Accepted 13, April 2017)

CYRINUS B. ESSEN, IDAKA E. IDAKA AND MICHAEL A. METIBEMU. (Received 31, January 2017; Revision Accepted 13, April 2017) DOI: http://dx.doi.org/10.4314/gjedr.v16i2.2 GLOBAL JOURNAL OF EDUCATIONAL RESEARCH VOL 16, 2017: 87-94 COPYRIGHT BACHUDO SCIENCE CO. LTD PRINTED IN NIGERIA. ISSN 1596-6224 www.globaljournalseries.com;

More information

1. Evaluate the methodological quality of a study with the COSMIN checklist

1. Evaluate the methodological quality of a study with the COSMIN checklist Answers 1. Evaluate the methodological quality of a study with the COSMIN checklist We follow the four steps as presented in Table 9.2. Step 1: The following measurement properties are evaluated in the

More information

Diagnostic Classification Models

Diagnostic Classification Models Diagnostic Classification Models Lecture #13 ICPSR Item Response Theory Workshop Lecture #13: 1of 86 Lecture Overview Key definitions Conceptual example Example uses of diagnostic models in education Classroom

More information

Item Analysis Explanation

Item Analysis Explanation Item Analysis Explanation The item difficulty is the percentage of candidates who answered the question correctly. The recommended range for item difficulty set forth by CASTLE Worldwide, Inc., is between

More information

SEMINAR ON SERVICE MARKETING

SEMINAR ON SERVICE MARKETING SEMINAR ON SERVICE MARKETING Tracy Mary - Nancy LOGO John O. Summers Indiana University Guidelines for Conducting Research and Publishing in Marketing: From Conceptualization through the Review Process

More information

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria Thakur Karkee Measurement Incorporated Dong-In Kim CTB/McGraw-Hill Kevin Fatica CTB/McGraw-Hill

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information