An Alternative Way of Establishing Measurement in Marketing Research Its Implications for Scale Development and Validity

Size: px
Start display at page:

Download "An Alternative Way of Establishing Measurement in Marketing Research Its Implications for Scale Development and Validity"

Transcription

1 An Alternative Way of Establishing Measurement in Marketing Research Its Implications for Scale Development and Validity Thomas Salzberger University of Economics and Business Administration, Vienna (WU-Wien) Abstract Quantitative consumer and marketing research is looking back on an era of construct operationalization predominantly based on classical test theory as a technical framework of scale development. Rasch measurement theory provides an alternative framework of measurement. Previous studies demonstrated the potential of Rasch measurement for marketing research from a theoretical viewpoint, and reported applications of Rasch measurement models to existing marketing scales. This paper focuses on the fact that the Rasch model explicitly accounts for the different amount of the construct that is needed by the respondents to agree with different items. Each item is characterized by an item parameter, i.e. the item location, that expresses the amount of the property to be measured the item stands for. Whereas the foundation of the Rasch model, i.e. specific objectivity, provide evidence of construct validity of a scale that fits the model, the range of item locations spans the latent dimension and gives insight into the meaning of different levels of the construct and thereby adds to content validity. An empirical example shows that applying Rasch models to existing scales does not reveal the full potential of the model, even if a comprehensive, albeit classical item pool is referred to. Consequently, only newly generated items are likely to span a wide range of items providing content validity. Introduction When developing instruments to measure latent constructs in marketing and consumer behaviour research, the procedure suggested by Churchill (1979) has been routinely applied, and it has been adopted by most of the textbooks in marketing research. This procedure rests on the basics of classical test theory (CTT, Lord and Novick 1968). During the last decade, an alternative framework of measurement has been introduced to marketing research (e.g. Soutar et al. 1990, Soutar and Cornish-Ward 1997, Soutar and Ryan 1999, Salzberger et al. 1999, Balasubramian and Wagner 1989, Singh et al. 1990, Singh 1996). From a general perspective, this alternative measurement theory may be referred to as item response theory (IRT, Lord 1980) or latent trait theory (LTT). A special class of models, the family of Rasch models, though, stands out by featuring special properties which more general models do not share. This paper does not offer a comprehensive introduction into LTT-models in general or Rasch models in particular (see, e.g., Andrich 1988a) but focuses on specific consequences as to the process of scale development in line with Rasch measurement theory (RMT). The Principles of Latent Trait Theory Since RMT is beyond the mainstream paradigm of construct operationalization in marketing research and, consequently, most marketing scholars are not fully familiar with it, a short introduction is provided highlighting the most fundamental differences between RMT and classical approaches. The classical approach is based essentially on the principle of correlation. From a comprehensive pool of items those are retained that show high loadings in factor analysis and contribute to reliability, i.e. their exclusion would decrease reliability. Both criteria require high item inter-correlations. This approach entails some theoretical drawbacks. First of all, it is not explained how an item score is actually accomplished. Rather, the item score is treated immediately as errorcontaminated measurement. Secondly, to be meaningful a correlation coefficient requires scale properties of the item scores, i.e. interval scale level in most cases, that are more than questionable and not testable in practice. Thirdly, correlations are affected by the distribution of the respondents. Consequently, a different sample of respondents is very likely to yield a different picture. Finally, the limited range of actually possible item scores, e.g. 1 to 5, has an important impact on the correlation of two items: due to floor and ceiling effects, only those items that have similar means may show high item inter-correlations. LTT models proceed on a totally different rationale. Rather than correlating manifest item scores, LTT models attempt to explain how a particular item score comes about. While classical approaches focus on summary statistics, i.e. variances, correlations, LTT refers primarily to individual responses. It depends on the particular type of LTT model which parameters are conceptualized in order to explain the respondents answer behaviour. 1111

2 However, there are two parameters all models have in common. The first one refers to the respondent s amount of the property to be measured - the ultimate goal of measuring. The second one parallels the first one but stands for the item s amount of the property. These parameters, along with others depending on the model, govern the answer behaviour. Classical approaches simply treat manifest item scores as meaningful data provided the scores from items measuring the same dimension show substantial correlations. LTT models examine whether empirical response patterns make sense and whether these patterns may be explained by item and person parameters, in other words, whether these patterns constitute measurement. To this end, items are to vary in the amount of the property to allow for determining likely and unlikely patterns of response. Further more, a wide range of items provides insight into what various levels of the property actually mean. Foundations of Rasch Measurement Theory (RMT) Notwithstanding the fact that the Rasch model (Rasch 1960/1980, see figure 1) shares some features with other LTT models, it has some unique properties, i.e. specific objectivity and raw score sufficiency and their consequences, that other models do not have, e.g. Birnbaum s logistic models for dichotomous items (Birnbaum 1968) and generalizations for polytomous items like the graded response model (Samejima 1969). It is these unique features that make up the measurement theory underlying the Rasch model, and all models adhering to these principles are termed Rasch models. Figure 1: Depiction P( a vi = 1) P( a vi = 0) The Dichotomous Rasch Model (Rasch 1960/80, p.187): Parametrization and Graphical e β v δ i = e β v δ i + 1 = e β v δ i + βv...person location parameter δi...item location parameter a vi..answer of person v to item i (0 = disagree, 1 = agree) P(a vi =x δ i, β v ) ICC P(a vi =1 δ i =0, β v ) δ i = δ i, β v All LTT models follow the concept of a latent dimension of the respondents degree of the property to be measured. Respondents are scaled onto this dimension in terms of their attitude, satisfaction, propensity to buy or whatsoever. The items are scaled onto this scale as well, i.e. a common dimension of respondents and items is established. The parameter characterizing the item s location on the latent dimension expresses the amount of the property the item stands for. The Rasch measurement model then defines the probability that a given respondent agrees with a given item characterized by its location. Each item may be represented graphically by a curve, the item characteristic curve (ICC), depicting the probability of agreement depending on the respondents location (see figure 1). In contrast to CTT based parameters of item location (the simple proportion of people agreeing to an item, or the more sophisticated item intercept in factor analysis), the item location within the Rasch model is sample independent provided the data fit the model. The basic principle of the Rasch model is the principle of objectivity. Objectivity in this context means, the respondents location must not depend on specific items answered and, vice versa, the item s location must not depend on specific respondents. Rasch (1960/1980) called this principle specific objectivity and deduced the model that follows necessarily (see also Fischer 1995). Only under the Rasch model is the unweighted raw score for respondents and items a sufficient statistic, i.e. the specific response patterns do not provide additional information. Consequently, maximum likelihood estimation of the parameters may be conditioned on these scores and any assumptions concerning the distribution of the respondents are no longer necessary (see, e.g., Molenaar 1995, for parameter estimation techniques). The person and item parameters have interval scale properties. The unit of the scale is defined by the common item discrimination implicitly set to one, while the origin of the scale is usually defined by constraining the mean of the item parameters to zero. 1112

3 As marketing research mostly employs multicategorical item scales (i.e., widely applied rating scales), the dichotomous Rasch model (as in figure 1) may not be applied. However, the Rasch model may be generalized for polytomous items in a straightforward way without losing its key property, i.e. specific objectivity. The two most important models are the rating scale model (Andrich 1978) and the partial credit model (Masters 1982, Andrich 1988b, see figure 2). In the numerator, there is, like in the dichotomous model, the difference between the person location v and the item mean location i. A positive difference contributes to a higher probability of agreement. The difference is multiplied by the score of the category because, e.g., choosing category 3 requires passing threshold 1 and threshold 2 which are theoretically independent. Furthermore, there is the negative of the sum of thresholds ij in the numerator. Thus, the higher the thresholds, the lower the numerator and, consequently, the lower the probability of choosing an affirmative category. The denominator is simply the sum of all numerators, i.e. the numerators of all category probabilities, to ensure that all probabilities add up to one. Both the partial credit model and the rating scale model may be derived by applying the dichotomous Rasch model repeatedly to adjacent categories of polytomous items. Between any pair of adjacent categories a threshold parameter is modelled. Consequently k answer categories call for k-1 threshold parameters. In the following we will concentrate on the rating scale model, which assumes a uniform scale across items, i.e. equal threshold distances across items but not necessarily within items. Figure 2: General Polytomous Rasch Model (Andrich, 1988b, p.366) P( a vi = x β v, τ ij, j = 1 m, 0 < x m) = with: m ϒ = 1 + e k = 1 k τ ij j = 1 + k ( β v δ i ) x τ ij j = 1 + x ( β v δ i ) e ϒ β v...person v location parameter δ i...item i location parameter τ ij...threshold j of item i parameter m...maximum score, number of categories - 1 a vi...answer of person v to item i (item score) Andrich (1995a, 1995b) pointed out that due to the fact that polytomous Rasch models estimate the threshold parameters independently of each other, the empirical threshold parameters may or may not reflect the order that is hypothesized when setting up a polytomous answer scale. If the empirical threshold estimates are not properly ordered, i.e. they are reversed, the scale does not really work as intended and, in fact, lacks ordinal properties. In this case, adjacent categories should be collapsed, i.e. the scoring function assigns the same numbers to adjacent categories. However, further data have to be collected in order to cross-validate the new scale format. The most important features of the Rasch model may be summarized as follows, the model provides a theory of how measurement is accomplished based on the principle of specific objectivity, namely by a comparison of an item and a person in the empirical domain and thereby establishing an interval scale for item and person parameters, the model may be falsified empirically, assessed by various tests of fit (which go beyond the scope of this paper), the model defines only one dimension, i.e. it rests on the prerequisite of unidimensionality; this prerequisite is subject to empirical falsification, however; the principle of local stochastic independence, i.e. the answer to one item is independent of the answer to a different item given the person parameter, is closely related to unidimensionality (see, e.g., Gustafsson 1980), the answer scale of a polytomous item is hypothesized to have ordinal properties which are subject to empirical falsification (reversed thresholds), for any person a specific answer pattern is expected to occur most likely (i.e. agreement with all items standing for less of the property than the person itself has, disagreement with all other items), offering opportunities to test for person fit. 1113

4 The Application of the Rasch Model and Its Consequences For The Scale Development Process The question arises whether the application of the Rasch model may simply be seen as a technique of analysis to be carried out instead of or parallel to classical techniques of scale analysis. From the perspective of item generation, the Rasch model and classical analysis differ substantially. While the classical approach requires to cover as many facets as possible, the Rasch model additionally requires to consider different levels of the construct to be measured. That s why, the classical approach to scale development usually does not account for varying degrees of the property and is not very likely to provide a foundation of establishing a useful Rasch scale. Thus, the application of the Rasch model is more than an alternative way of mere data analysis. Rasch measurement represents a different philosophy of construct operationalization. It aims at developing a type of a ruler with the items representing the marks which the respondents are checked against. It provides a superior foundation for assessing content validity as well as construct validity since it gives insight into what various levels of the construct actually mean. A mere re-analysis of an existing scale is a priori not very likely to yield a Rasch scale with a wide range of item locations. The empirical example examines whether, in order to establish a Rasch scale, it is sufficient to go back to the original comprehensive item pool underlying the development of a widely used marketing scale, the CETSCALE, and if it is not, how the content of the scale may be extended by including additional items. Empirical Example The CETSCALE (Shimp and Sharma 1987) has obtained much popularity in consumer research since its introduction as is demonstrated by the multiplicity of applications in national as well as in cross-national marketing research (e.g. Herche 1992, Netemeyer et al. 1991, Durvasula et al., Good and Huddleston 1995, Steenkamp and Baumgartner 1998). The idea of consumer ethnocentric tendencies transfers the sociological concept of ethnocentrism to marketing and consumer research in that it focuses on the attitude towards foreign economies and their products opposed to one s own domestic economy. Both the general level of a nation s consumer ethnocentric tendencies and the level within segments of consumers relevant to a company are obviously important for corporate location policy, product mix decisions, and corporate communication strategy. The CETSCALE Data Set Data has been collected in Austria (n=974 listwise nonmissing respondents, self administered interviews) based on a translated version of the whole set of 100 items that remained in the item pool after a judgmental panel screening of originally 180 items generated to develop the CETSCALE (Shimp and Sharma 1987, Sinkovics 1999). The items seven-point scale provides categories labelled as follows: fully disagree, partly disagree, somewhat disagree, neither disagree nor agree, somewhat agree, partly agree, and fully agree. Rasch Based Analysis Using a data set restricted to the original 17 CETSCALE items, Salzberger (1999) showed that six of these items may indeed be scaled successfully applying the Rasch model for polytomous data (rating scale model). Some of the thresholds were reversed, however. So two pairs of adjacent categories had to be collapsed leading to a fivepoint rating scale. The range of item parameters amounted to a mere log-units with five items within approximately 0.2 log- units. Consequently, these items do not yield a profound understanding of the latent construct that goes beyond the expectation that the higher the ethnocentric tendencies the higher the probability of agreement with nearly all items in the same way. The current analysis built upon these results. It started with a conventional factor analysis (principal axis factoring) in order to ensure unidimensionality. As a cutoff criterion a factor loading of.3 has been chosen which is rather small compared to CTT standards. The reason is that the correlation of the item and the factor may be reduced due to scale bounding effects especially if the item is extraordinarily easy or hard to endorse. The remaining 65 items have been analysed using the partial credit model implemented in RUMM 2.7 (Sheridan et al. 1997) as rough screening of suitability. 25 items were retained. Subsequently, these items have been analysed using the rating scale model. In line with the results of Salzberger (1999), the original seven-point Likert scale had to be transformed to a five-point rating scale in order to achieve a proper order of thresholds. On each step of parameter estimation the worst significantly misfitting item (alpha =.001) in terms of a chisquare test of fit provided by RUMM 2.7, which compares model predicted probability and actual response 1114

5 behaviour, has been deleted. Ultimately, a scale has been derived containing ten items fitting the model. The established rating scale is very similar to that reported by Salzberger (1999), i.e. the threshold distances are almost identical. The same applies to the item location parameters as the mean of the thresholds of each item (detailed results are available upon request from the first author). The striking outcome of the current analysis, however, is the fact that widening the base of the analysis from 17 to 100 items resulted in a mere increase of four additional items fitting the model yielding only a small increase in the range of item locations from to log- units. While ten items might in principle suffice for most applications, the small range of item locations leads to two different problems. First, it increases the measurement error for respondents who do not fall into this small area, i.e. the area of the item locations considering the thresholds. (The measurement error for a specific person depends on the item information which reaches a maximum when person and item location coincide.) Second, content validity is limited to the number of facets of the construct covered by the items. It remains unclear, however, what a certain degree of ethnocentric tendencies actually means. If there were a broader range of item locations, any non-extreme area on the scale would be associated with specific items agreed with and others disagreed with. It should be noted that from the viewpoint of CTT the small range of item locations does not represent a severe problem at all. In fact, the whole item pool has proved to be designed for CTT based analyses. Consequently, following the LTT approach of measurement means more than (re-)analysing a data set creation of which has been guided by a different measurement paradigm, i.e. CTT. The measurement theory adhered to has a significant impact on the items generated and, eventually, on the data collected. In other words, data are, at least in part, determined by the measurement paradigm chosen. Extending the CETSCALE Given the small range of item locations, a preliminary follow-up study aimed at widening the items in terms of the amount of the property they stand for. To this end, 14 additional items have been generated as an extension of the CETSCALE to cover the positive and negative extremes of the construct. The answer scale has been confined to five categories. Based on a small convenience sample (n=80), these items were analysed together with 19 items stemming from the CETSCALE item pool to evaluate their locations. 26 items proved to fit the model. 16 items came from the CETSCALE item pool ensuring that the basic concept to be measured stays the same. The other ten items, which were newly generated, successfully widened the range of item locations. The items of the extended CETSCALE differ as much as log-units in their locations (detailed results and a list of the items are available upon request from the author). A Classical Re-analysis of the Extended Scale In order to contrast the results with those based on the classical paradigm of scale development, a re-analysis using principal components analysis has been carried out. The 26 items of the final Rasch scale yield a unidimensional set of indicators with a mean loading of.687 (ranging from.55 to.86) and a scale reliability of.96 (Cronbach s alpha). Thus, the scale developed by Rasch modelling proves tenable from the classical perspective of scale development. Certainly, the number of items has to be reduced for practical purposes. However, the approaches differ significantly in the way the number of items would be reduced. The Rasch approach would drop items based on their locations, i.e. for any region of the latent dimension at least one item has to be retained. Thereby the range of item locations would not be reduced since the extreme items would certainly be kept in the instrument. In contrast, the classical approach would discard items showing loadings below average. Not really surprisingly, almost all of the extreme items show loadings below average. Consequently, the classical approach of item selection would lead to a narrower instrument in terms of item locations. Implications and Conclusions From a theoretical viewpoint, the Rasch measurement approach has the potential to lift measurement in consumer and marketing research to a higher level and provide a better foundation for managerial decision making. It provides a powerful foundation of assessing content and construct validity. The application of Rasch models to existing marketing scales is a good starting point for further dissemination. However, in the long run Rasch measurement should guide us from the beginning of scale development. The first step of the scale development process as outlined by Churchill (1979) should not be restricted to domain specification in terms of aspects to be considered but also aim at covering a range of the construct as wide as possible. 1115

6 A Rasch scale with widely varying item locations provides a deeper understanding of the construct, i.e. what it means for respondents to be located at a certain position on the latent dimension. Moreover, this is the prerequisite of precise measurement over a wide range for measurement error increases strongly when items are off-target for the respondents. References Andrich, David (1978), A Rating Formulation for Ordered Response Categories, Psy- chometrika, 43 (4), (1988a), Rasch Models for Measurement, Sage University Paper Series on Quantitative Applications in the Social Sciences 68, Beverly Hills: Sage (1988b), A General Form of Rasch s Extended Logistic Model for Partial Credit Scoring, Applied Measurement in Education, 1 (4), (1995a), Models for Measurement, Precision and the Non-Dichotomization of Graded Responses, Psychometrika, 60 (1), (1995b), Further Remarks on the Non-Dichotomization of Graded Responses, Psychometrika, 60 (1), Balasubramian, Siva. K. and Wagner A. Kamakura (1989), Measuring Consumer Attitudes Toward the Marketplace With Tailored Interviews, Journal of Marketing Research, 26 (3), Birnbaum, Allan (1968), Some Latent Trait Models and Their Use in Inferring an Examinee s Ability, in: Statistical Theories of Mental Test Scores, Chapters 17-20, Eds. Frederic Lord and Melvin R. Novick, Reading (Mass.): Addison- Wesley. Churchill, Gilbert A. (1979), A Paradigm for Developing Better Measures of Marketing Constructs, Journal of Marketing Research, 26, Durvasula, Srinivas, Craig J. Andrews and Richard G. Netemeyer (1997), A Cross-Cultural Comparison of Consumer Ethnocentrism in the United States and Russia, Journal of International Consumer Marketing, 9 (4), Fischer, Gerhard H. (1995), Derivations of the Rasch Model, in: Rasch Models, Foundations Recent Developments, and Applications, Eds. Gerhard H. Fischer and Ivo W. Molenaar, New York: Springer, Good, Linda and Patricia Huddleston (1995), Ethnocentrism of Polish and Russian Consumers: Are Feelings and Intentions related?, International Marketing Review 12 (5), Gustafsson, Jan-Eric (1980), Testing and Obtaining Fit of Data to the Rasch Model, British Journal of Mathematical and Statistical Psychology, 32, Herche, Joel (1992), A Note on the Predictive Validity of the CETSCALE, Journal of the Academy of Marketing Science, 20(3), Lord, Frederic M. (1980), Applications of Item Response Theory to Practical Testing Problems, Hillsdale, New Jersey: Lawrence Erlbaum Associates. ----, and Melvin R. Novick (1968), Statistical Theories of Mental Test Scores, Reading (Mass): Addison-Wesley. Masters, Geofferey N. (1982), A Rasch Model for Partial Credit Scoring, Psychometrika, 47 (2), Molenaar, Ivo W. (1995) Estimation of Item Parameters. in: Rasch Models, Foundations Recent Developments, and Applications. Eds. Gerhard H. Fischer and Ivo W. Molenaar. New York: Springer, Netemeyer, Richard G., Srinivas Durvasula and Donald R. Lichtenstein (1991), A Cross-National Assessment of the Reliability and Validity of the CETSCALE, Journal of Marketing Research, 28 (3), Rasch, Georg (1960/1980) Probabilistic Models for Some Intelligence and Attainment Tests, Chicago: MESA Press. Reprint of the original publication in 1960 by the Danish Institute for Educational Research. Salzberger, Thomas (1999), How the Rasch Model May Shift Our Perspective of Measurement in Marketing Research, Paper presented at the 1999 Australia and New Zealand Marketing Academy Conference (ANZMAC), Sydney. ----, Rudolf Sinkovics and Bodo B. Schlegelmilch (1999), Data Equivalence in Cross-Cultural Research: A Comparison of Classical Test Theory and Latent Trait Theory Based Approaches, Australasian Marketing Journal, 7 (2),

7 Samejima, Fumiko (1969) Estimation of Latent Ability Using a Response Pattern of Graded Responses, Psychometric Monograph, 17, Iowa City (IA): Psychometric Society. Sheridan, Barry, David Andrich and Guanzhong Luo (1997), User s Guide to RUMM Rasch Unidimensional Measurement Models, Perth: RUMM Laboratory. Shimp, Terence A. and Subhash Sharma (1987), Consumer Ethnocentrism: Construction and Validation of the CETSCALE, Journal of Marketing Research, 24 (3), Singh, Jagdip (1996), A Latent Trait Theory Approach to Measurement Issues in Marketing Research: Principles, Relevance and Application, Proceedings of the EMAC Annual Conference, Budapest University of Economic Sciences, Vol. 1. Eds. József Berács, András Bauer and Judith Simon, , Roy D. Howell and Gary K. Rhoads (1990), Adaptive Designs for Likert-Type Data: An Approach for Implementing Marketing Surveys, Journal of Marketing Research, 27 (3), Sinkovics, Rudolf R. (1999), Ethnozentrismus und Konsumentenverhalten [Ethnocentrism and Consumer Behaviour], Wiesbaden: Deutscher Universitätsverlag. Soutar, Geoffrey N., Richard Bell and Yvonne Wallis (1990), Consumer Acquisition Patterns for Durable Goods: A Rasch Analysis, Asia Pacific International Journal of Marketing, 2 (1), , and Steven P. Cornish-Ward (1997), Ownership Patterns for Durable Goods and Financial Assets: A Rasch Analysis, Applied Econimics, 29, , and Maria M. Ryan (1999), People's Leisure Activities: A Logistic Modelling Approach, Paper presented at the 1999 Australia and New Zealand Marketing Academy Conference (ANZMAC), Sydney. Steenkamp, Jan-Benedict E.M. and Hans Baumgartner (1998), Assessing Measurement Invariance in Cross-National Consumer Research, Journal of Consumer Research, 25,

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky Validating Measures of Self Control via Rasch Measurement Jonathan Hasford Department of Marketing, University of Kentucky Kelly D. Bradley Department of Educational Policy Studies & Evaluation, University

More information

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

The validity of polytomous items in the Rasch model The role of statistical evidence of the threshold order

The validity of polytomous items in the Rasch model The role of statistical evidence of the threshold order Psychological Test and Assessment Modeling, Volume 57, 2015 (3), 377-395 The validity of polytomous items in the Rasch model The role of statistical evidence of the threshold order Thomas Salzberger 1

More information

Psychometric properties of the PsychoSomatic Problems scale an examination using the Rasch model

Psychometric properties of the PsychoSomatic Problems scale an examination using the Rasch model Psychometric properties of the PsychoSomatic Problems scale an examination using the Rasch model Curt Hagquist Karlstad University, Karlstad, Sweden Address: Karlstad University SE-651 88 Karlstad Sweden

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Chapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models.

Chapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models. Ostini & Nering - Chapter 1 - Page 1 POLYTOMOUS ITEM RESPONSE THEORY MODELS Chapter 1 Introduction Measurement Theory Mathematical models have been found to be very useful tools in the process of human

More information

CONSTRUCTION OF THE MEASUREMENT SCALE FOR CONSUMER S ATTITUDES IN THE FRAME OF ONE-PARAMETRIC RASCH MODEL

CONSTRUCTION OF THE MEASUREMENT SCALE FOR CONSUMER S ATTITUDES IN THE FRAME OF ONE-PARAMETRIC RASCH MODEL ACTA UNIVERSITATIS LODZIENSIS FOLIA OECONOMICA 286, 2013 * CONSTRUCTION OF THE MEASUREMENT SCALE FOR CONSUMER S ATTITUDES IN THE FRAME OF ONE-PARAMETRIC RASCH MODEL Abstract. The article discusses issues

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

RATER EFFECTS AND ALIGNMENT 1. Modeling Rater Effects in a Formative Mathematics Alignment Study

RATER EFFECTS AND ALIGNMENT 1. Modeling Rater Effects in a Formative Mathematics Alignment Study RATER EFFECTS AND ALIGNMENT 1 Modeling Rater Effects in a Formative Mathematics Alignment Study An integrated assessment system considers the alignment of both summative and formative assessments with

More information

Measurement issues in the use of rating scale instruments in learning environment research

Measurement issues in the use of rating scale instruments in learning environment research Cav07156 Measurement issues in the use of rating scale instruments in learning environment research Associate Professor Robert Cavanagh (PhD) Curtin University of Technology Perth, Western Australia Address

More information

Conceptualising computerized adaptive testing for measurement of latent variables associated with physical objects

Conceptualising computerized adaptive testing for measurement of latent variables associated with physical objects Journal of Physics: Conference Series OPEN ACCESS Conceptualising computerized adaptive testing for measurement of latent variables associated with physical objects Recent citations - Adaptive Measurement

More information

AN ALTERNATE APPROACH TO ASSESSING CROSS-CULTURAL MEASUREMENT EQUIVALENCE IN ADVERTISING RESEARCH

AN ALTERNATE APPROACH TO ASSESSING CROSS-CULTURAL MEASUREMENT EQUIVALENCE IN ADVERTISING RESEARCH AN ALTERNATE APPROACH TO ASSESSING CROSS-CULTURAL MEASUREMENT EQUIVALENCE IN ADVERTISING RESEARCH Michael T. Ewing, Thomas Salzberger, and Rudolf R. Sinkovics ABSTRACT: This paper offers a new methodological

More information

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,

More information

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model Gary Skaggs Fairfax County, Virginia Public Schools José Stevenson

More information

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL JOURNAL OF EDUCATIONAL MEASUREMENT VOL. II, NO, 2 FALL 1974 THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL SUSAN E. WHITELY' AND RENE V. DAWIS 2 University of Minnesota Although it has been claimed that

More information

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University. Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong

More information

Raschmätning [Rasch Measurement]

Raschmätning [Rasch Measurement] Raschmätning [Rasch Measurement] Forskarutbildningskurs vid Karlstads universitet höstterminen 2014 Kursen anordnas av Centrum för forskning om barns och ungdomars psykiska hälsa och avdelningen för psykologi.

More information

MEASURING AFFECTIVE RESPONSES TO CONFECTIONARIES USING PAIRED COMPARISONS

MEASURING AFFECTIVE RESPONSES TO CONFECTIONARIES USING PAIRED COMPARISONS MEASURING AFFECTIVE RESPONSES TO CONFECTIONARIES USING PAIRED COMPARISONS Farzilnizam AHMAD a, Raymond HOLT a and Brian HENSON a a Institute Design, Robotic & Optimizations (IDRO), School of Mechanical

More information

A TEST OF A MULTI-FACETED, HIERARCHICAL MODEL OF SELF-CONCEPT. Russell F. Waugh. Edith Cowan University

A TEST OF A MULTI-FACETED, HIERARCHICAL MODEL OF SELF-CONCEPT. Russell F. Waugh. Edith Cowan University A TEST OF A MULTI-FACETED, HIERARCHICAL MODEL OF SELF-CONCEPT Russell F. Waugh Edith Cowan University Paper presented at the Australian Association for Research in Education Conference held in Melbourne,

More information

AND ITS VARIOUS DEVICES. Attitude is such an abstract, complex mental set. up that its measurement has remained controversial.

AND ITS VARIOUS DEVICES. Attitude is such an abstract, complex mental set. up that its measurement has remained controversial. CHAPTER III attitude measurement AND ITS VARIOUS DEVICES Attitude is such an abstract, complex mental set up that its measurement has remained controversial. Psychologists studied attitudes of individuals

More information

Evaluating the quality of analytic ratings with Mokken scaling

Evaluating the quality of analytic ratings with Mokken scaling Psychological Test and Assessment Modeling, Volume 57, 2015 (3), 423-444 Evaluating the quality of analytic ratings with Mokken scaling Stefanie A. Wind 1 Abstract Greatly influenced by the work of Rasch

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective

The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective Vol. 9, Issue 5, 2016 The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective Kenneth D. Royal 1 Survey Practice 10.29115/SP-2016-0027 Sep 01, 2016 Tags: bias, item

More information

ch1 1. What is the relationship between theory and each of the following terms: (a) philosophy, (b) speculation, (c) hypothesis, and (d) taxonomy?

ch1 1. What is the relationship between theory and each of the following terms: (a) philosophy, (b) speculation, (c) hypothesis, and (d) taxonomy? ch1 Student: 1. What is the relationship between theory and each of the following terms: (a) philosophy, (b) speculation, (c) hypothesis, and (d) taxonomy? 2. What is the relationship between theory and

More information

The Influence of Test Characteristics on the Detection of Aberrant Response Patterns

The Influence of Test Characteristics on the Detection of Aberrant Response Patterns The Influence of Test Characteristics on the Detection of Aberrant Response Patterns Steven P. Reise University of California, Riverside Allan M. Due University of Minnesota Statistical methods to assess

More information

Evaluating and restructuring a new faculty survey: Measuring perceptions related to research, service, and teaching

Evaluating and restructuring a new faculty survey: Measuring perceptions related to research, service, and teaching Evaluating and restructuring a new faculty survey: Measuring perceptions related to research, service, and teaching Kelly D. Bradley 1, Linda Worley, Jessica D. Cunningham, and Jeffery P. Bieber University

More information

INTRODUCTION TO ITEM RESPONSE THEORY APPLIED TO FOOD SECURITY MEASUREMENT. Basic Concepts, Parameters and Statistics

INTRODUCTION TO ITEM RESPONSE THEORY APPLIED TO FOOD SECURITY MEASUREMENT. Basic Concepts, Parameters and Statistics INTRODUCTION TO ITEM RESPONSE THEORY APPLIED TO FOOD SECURITY MEASUREMENT Basic Concepts, Parameters and Statistics The designations employed and the presentation of material in this information product

More information

Examining Factors Affecting Language Performance: A Comparison of Three Measurement Approaches

Examining Factors Affecting Language Performance: A Comparison of Three Measurement Approaches Pertanika J. Soc. Sci. & Hum. 21 (3): 1149-1162 (2013) SOCIAL SCIENCES & HUMANITIES Journal homepage: http://www.pertanika.upm.edu.my/ Examining Factors Affecting Language Performance: A Comparison of

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

alternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over

More information

By Hui Bian Office for Faculty Excellence

By Hui Bian Office for Faculty Excellence By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys

More information

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p )

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p ) Rasch Measurementt iin Language Educattiion Partt 2:: Measurementt Scalles and Invariiance by James Sick, Ed.D. (J. F. Oberlin University, Tokyo) Part 1 of this series presented an overview of Rasch measurement

More information

Measuring the External Factors Related to Young Alumni Giving to Higher Education. J. Travis McDearmon, University of Kentucky

Measuring the External Factors Related to Young Alumni Giving to Higher Education. J. Travis McDearmon, University of Kentucky Measuring the External Factors Related to Young Alumni Giving to Higher Education Kathryn Shirley Akers 1, University of Kentucky J. Travis McDearmon, University of Kentucky 1 1 Please use Kathryn Akers

More information

Using the Partial Credit Model

Using the Partial Credit Model A Likert-type Data Analysis Using the Partial Credit Model Sun-Geun Baek Korean Educational Development Institute This study is about examining the possibility of using the partial credit model to solve

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

Does factor indeterminacy matter in multi-dimensional item response theory?

Does factor indeterminacy matter in multi-dimensional item response theory? ABSTRACT Paper 957-2017 Does factor indeterminacy matter in multi-dimensional item response theory? Chong Ho Yu, Ph.D., Azusa Pacific University This paper aims to illustrate proper applications of multi-dimensional

More information

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati. Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions

More information

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data Item Response Theory: Methods for the Analysis of Discrete Survey Response Data ICPSR Summer Workshop at the University of Michigan June 29, 2015 July 3, 2015 Presented by: Dr. Jonathan Templin Department

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-2016 ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION

More information

A typology of polytomously scored mathematics items disclosed by the Rasch model: implications for constructing a continuum of achievement

A typology of polytomously scored mathematics items disclosed by the Rasch model: implications for constructing a continuum of achievement A typology of polytomously scored mathematics items 1 A typology of polytomously scored mathematics items disclosed by the Rasch model: implications for constructing a continuum of achievement John van

More information

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides

More information

Jason L. Meyers. Ahmet Turhan. Steven J. Fitzpatrick. Pearson. Paper presented at the annual meeting of the

Jason L. Meyers. Ahmet Turhan. Steven J. Fitzpatrick. Pearson. Paper presented at the annual meeting of the Performance of Ability Estimation Methods for Writing Assessments under Conditio ns of Multidime nsionality Jason L. Meyers Ahmet Turhan Steven J. Fitzpatrick Pearson Paper presented at the annual meeting

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

Meeting Feynman: Bringing light into the black box of social measurement

Meeting Feynman: Bringing light into the black box of social measurement Journal of Physics: Conference Series PAPER OPEN ACCESS Meeting Feynman: Bringing light into the black box of social measurement To cite this article: Thomas Salzberger 2018 J. Phys.: Conf. Ser. 1065 072035

More information

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus

More information

Latent Trait Standardization of the Benzodiazepine Dependence. Self-Report Questionnaire using the Rasch Scaling Model

Latent Trait Standardization of the Benzodiazepine Dependence. Self-Report Questionnaire using the Rasch Scaling Model Chapter 7 Latent Trait Standardization of the Benzodiazepine Dependence Self-Report Questionnaire using the Rasch Scaling Model C.C. Kan 1, A.H.G.S. van der Ven 2, M.H.M. Breteler 3 and F.G. Zitman 1 1

More information

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Connexion of Item Response Theory to Decision Making in Chess Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Acknowledgement A few Slides have been taken from the following presentation

More information

THE COURSE EXPERIENCE QUESTIONNAIRE: A RASCH MEASUREMENT MODEL ANALYSIS

THE COURSE EXPERIENCE QUESTIONNAIRE: A RASCH MEASUREMENT MODEL ANALYSIS THE COURSE EXPERIENCE QUESTIONNAIRE: A RASCH MEASUREMENT MODEL ANALYSIS Russell F. Waugh Edith Cowan University Key words: attitudes, graduates, university, measurement Running head: COURSE EXPERIENCE

More information

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations May 215 Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items Tamara Beth

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES IN GHANA.

THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES IN GHANA. Africa Journal of Teacher Education ISSN 1916-7822. A Journal of Spread Corporation Vol. 6 No. 1 2017 Pages 56-64 THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information

Exploring rater errors and systematic biases using adjacent-categories Mokken models

Exploring rater errors and systematic biases using adjacent-categories Mokken models Psychological Test and Assessment Modeling, Volume 59, 2017 (4), 493-515 Exploring rater errors and systematic biases using adjacent-categories Mokken models Stefanie A. Wind 1 & George Engelhard, Jr.

More information

Introduction to Measurement

Introduction to Measurement This is a chapter excerpt from Guilford Publications. The Theory and Practice of Item Response Theory, by R. J. de Ayala. Copyright 2009. 1 Introduction to Measurement I often say that when you can measure

More information

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality

More information

Psychological testing

Psychological testing Psychological testing Lecture 12 Mikołaj Winiewski, PhD Test Construction Strategies Content validation Empirical Criterion Factor Analysis Mixed approach (all of the above) Content Validation Defining

More information

Construct Validity of Mathematics Test Items Using the Rasch Model

Construct Validity of Mathematics Test Items Using the Rasch Model Construct Validity of Mathematics Test Items Using the Rasch Model ALIYU, R.TAIWO Department of Guidance and Counselling (Measurement and Evaluation Units) Faculty of Education, Delta State University,

More information

Development, Standardization and Application of

Development, Standardization and Application of American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,

More information

CHAPTER - III METHODOLOGY CONTENTS. 3.1 Introduction. 3.2 Attitude Measurement & its devices

CHAPTER - III METHODOLOGY CONTENTS. 3.1 Introduction. 3.2 Attitude Measurement & its devices 102 CHAPTER - III METHODOLOGY CONTENTS 3.1 Introduction 3.2 Attitude Measurement & its devices 3.2.1. Prior Scales 3.2.2. Psychophysical Scales 3.2.3. Sigma Scales 3.2.4. Master Scales 3.3 Attitude Measurement

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 39 Evaluation of Comparability of Scores and Passing Decisions for Different Item Pools of Computerized Adaptive Examinations

More information

UvA-DARE (Digital Academic Repository)

UvA-DARE (Digital Academic Repository) UvA-DARE (Digital Academic Repository) Standaarden voor kerndoelen basisonderwijs : de ontwikkeling van standaarden voor kerndoelen basisonderwijs op basis van resultaten uit peilingsonderzoek van der

More information

References. Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Mahwah,

References. Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, The Western Aphasia Battery (WAB) (Kertesz, 1982) is used to classify aphasia by classical type, measure overall severity, and measure change over time. Despite its near-ubiquitousness, it has significant

More information

how good is the Instrument? Dr Dean McKenzie

how good is the Instrument? Dr Dean McKenzie how good is the Instrument? Dr Dean McKenzie BA(Hons) (Psychology) PhD (Psych Epidemiology) Senior Research Fellow (Abridged Version) Full version to be presented July 2014 1 Goals To briefly summarize

More information

Factors Influencing Undergraduate Students Motivation to Study Science

Factors Influencing Undergraduate Students Motivation to Study Science Factors Influencing Undergraduate Students Motivation to Study Science Ghali Hassan Faculty of Education, Queensland University of Technology, Australia Abstract The purpose of this exploratory study was

More information

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida

Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida and Oleksandr S. Chernyshenko University of Canterbury Presented at the New CAT Models

More information

Linking Assessments: Concept and History

Linking Assessments: Concept and History Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.

More information

Centre for Education Research and Policy

Centre for Education Research and Policy THE EFFECT OF SAMPLE SIZE ON ITEM PARAMETER ESTIMATION FOR THE PARTIAL CREDIT MODEL ABSTRACT Item Response Theory (IRT) models have been widely used to analyse test data and develop IRT-based tests. An

More information

Development and psychometric evaluation of scales to measure professional confidence in manual medicine: a Rasch measurement approach

Development and psychometric evaluation of scales to measure professional confidence in manual medicine: a Rasch measurement approach Hecimovich et al. BMC Research Notes 2014, 7:338 RESEARCH ARTICLE Open Access Development and psychometric evaluation of scales to measure professional confidence in manual medicine: a Rasch measurement

More information

Nearest-Integer Response from Normally-Distributed Opinion Model for Likert Scale

Nearest-Integer Response from Normally-Distributed Opinion Model for Likert Scale Nearest-Integer Response from Normally-Distributed Opinion Model for Likert Scale Jonny B. Pornel, Vicente T. Balinas and Giabelle A. Saldaña University of the Philippines Visayas This paper proposes that

More information

1. Evaluate the methodological quality of a study with the COSMIN checklist

1. Evaluate the methodological quality of a study with the COSMIN checklist Answers 1. Evaluate the methodological quality of a study with the COSMIN checklist We follow the four steps as presented in Table 9.2. Step 1: The following measurement properties are evaluated in the

More information

Computerized Mastery Testing

Computerized Mastery Testing Computerized Mastery Testing With Nonequivalent Testlets Kathleen Sheehan and Charles Lewis Educational Testing Service A procedure for determining the effect of testlet nonequivalence on the operating

More information

Validity and reliability of measurements

Validity and reliability of measurements Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication

More information

Item Response Theory. Author's personal copy. Glossary

Item Response Theory. Author's personal copy. Glossary Item Response Theory W J van der Linden, CTB/McGraw-Hill, Monterey, CA, USA ã 2010 Elsevier Ltd. All rights reserved. Glossary Ability parameter Parameter in a response model that represents the person

More information

APPLYING THE RASCH MODEL TO PSYCHO-SOCIAL MEASUREMENT A PRACTICAL APPROACH

APPLYING THE RASCH MODEL TO PSYCHO-SOCIAL MEASUREMENT A PRACTICAL APPROACH APPLYING THE RASCH MODEL TO PSYCHO-SOCIAL MEASUREMENT A PRACTICAL APPROACH Margaret Wu & Ray Adams Documents supplied on behalf of the authors by Educational Measurement Solutions TABLE OF CONTENT CHAPTER

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

MEASURING SUBJECTIVE HEALTH AMONG ADOLESCENTS IN SWEDEN A Rasch-analysis of the HBSC Instrument

MEASURING SUBJECTIVE HEALTH AMONG ADOLESCENTS IN SWEDEN A Rasch-analysis of the HBSC Instrument CURT HAGQUIST and DAVID ANDRICH MEASURING SUBJECTIVE HEALTH AMONG ADOLESCENTS IN SWEDEN A Rasch-analysis of the HBSC Instrument (Accepted 27 June 2003) ABSTRACT. The cross-national WHO-study Health Behaviour

More information

Measurement Invariance (MI): a general overview

Measurement Invariance (MI): a general overview Measurement Invariance (MI): a general overview Eric Duku Offord Centre for Child Studies 21 January 2015 Plan Background What is Measurement Invariance Methodology to test MI Challenges with post-hoc

More information

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of

More information

Information Structure for Geometric Analogies: A Test Theory Approach

Information Structure for Geometric Analogies: A Test Theory Approach Information Structure for Geometric Analogies: A Test Theory Approach Susan E. Whitely and Lisa M. Schneider University of Kansas Although geometric analogies are popular items for measuring intelligence,

More information

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology ISC- GRADE XI HUMANITIES (2018-19) PSYCHOLOGY Chapter 2- Methods of Psychology OUTLINE OF THE CHAPTER (i) Scientific Methods in Psychology -observation, case study, surveys, psychological tests, experimentation

More information

Recent advances in analysis of differential item functioning in health research using the Rasch model

Recent advances in analysis of differential item functioning in health research using the Rasch model Hagquist and Andrich Health and Quality of Life Outcomes (2017) 15:181 DOI 10.1186/s12955-017-0755-0 RESEARCH Open Access Recent advances in analysis of differential item functioning in health research

More information

Agreement Coefficients and Statistical Inference

Agreement Coefficients and Statistical Inference CHAPTER Agreement Coefficients and Statistical Inference OBJECTIVE This chapter describes several approaches for evaluating the precision associated with the inter-rater reliability coefficients of the

More information

Thriving in College: The Role of Spirituality. Laurie A. Schreiner, Ph.D. Azusa Pacific University

Thriving in College: The Role of Spirituality. Laurie A. Schreiner, Ph.D. Azusa Pacific University Thriving in College: The Role of Spirituality Laurie A. Schreiner, Ph.D. Azusa Pacific University WHAT DESCRIBES COLLEGE STUDENTS ON EACH END OF THIS CONTINUUM? What are they FEELING, DOING, and THINKING?

More information

[3] Coombs, C.H., 1964, A theory of data, New York: Wiley.

[3] Coombs, C.H., 1964, A theory of data, New York: Wiley. Bibliography [1] Birnbaum, A., 1968, Some latent trait models and their use in inferring an examinee s ability, In F.M. Lord & M.R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-479),

More information

COMPUTING READER AGREEMENT FOR THE GRE

COMPUTING READER AGREEMENT FOR THE GRE RM-00-8 R E S E A R C H M E M O R A N D U M COMPUTING READER AGREEMENT FOR THE GRE WRITING ASSESSMENT Donald E. Powers Princeton, New Jersey 08541 October 2000 Computing Reader Agreement for the GRE Writing

More information

Marc J. Tassé, PhD Nisonger Center UCEDD

Marc J. Tassé, PhD Nisonger Center UCEDD FINALLY... AN ADAPTIVE BEHAVIOR SCALE FOCUSED ON PROVIDING PRECISION AT THE DIAGNOSTIC CUT-OFF. How Item Response Theory Contributed to the Development of the DABS Marc J. Tassé, PhD UCEDD The Ohio State

More information

Students' perceived understanding and competency in probability concepts in an e- learning environment: An Australian experience

Students' perceived understanding and competency in probability concepts in an e- learning environment: An Australian experience University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2016 Students' perceived understanding and competency

More information

AN ANALYSIS OF THE ITEM CHARACTERISTICS OF THE CONDITIONAL REASONING TEST OF AGGRESSION

AN ANALYSIS OF THE ITEM CHARACTERISTICS OF THE CONDITIONAL REASONING TEST OF AGGRESSION AN ANALYSIS OF THE ITEM CHARACTERISTICS OF THE CONDITIONAL REASONING TEST OF AGGRESSION A Dissertation Presented to The Academic Faculty by Justin A. DeSimone In Partial Fulfillment of the Requirements

More information

Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study

Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study Research Report Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study Xueli Xu Matthias von Davier April 2010 ETS RR-10-10 Listening. Learning. Leading. Linking Errors in Trend Estimation

More information

Sensitivity of DFIT Tests of Measurement Invariance for Likert Data

Sensitivity of DFIT Tests of Measurement Invariance for Likert Data Meade, A. W. & Lautenschlager, G. J. (2005, April). Sensitivity of DFIT Tests of Measurement Invariance for Likert Data. Paper presented at the 20 th Annual Conference of the Society for Industrial and

More information

Building Evaluation Scales for NLP using Item Response Theory

Building Evaluation Scales for NLP using Item Response Theory Building Evaluation Scales for NLP using Item Response Theory John Lalor CICS, UMass Amherst Joint work with Hao Wu (BC) and Hong Yu (UMMS) Motivation Evaluation metrics for NLP have been mostly unchanged

More information

Kersten, P. and N. M. Kayes (2011). "Outcome measurement and the use of Rasch

Kersten, P. and N. M. Kayes (2011). Outcome measurement and the use of Rasch Kersten, P. and N. M. Kayes (2011). "Outcome measurement and the use of Rasch analysis, a statistics-free introduction." New Zealand Journal of Physiotherapy 39(2): 92-99. Abstract Outcome measures, which

More information

INTERPRETING IRT PARAMETERS: PUTTING PSYCHOLOGICAL MEAT ON THE PSYCHOMETRIC BONE

INTERPRETING IRT PARAMETERS: PUTTING PSYCHOLOGICAL MEAT ON THE PSYCHOMETRIC BONE The University of British Columbia Edgeworth Laboratory for Quantitative Educational & Behavioural Science INTERPRETING IRT PARAMETERS: PUTTING PSYCHOLOGICAL MEAT ON THE PSYCHOMETRIC BONE Anita M. Hubley,

More information

ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT TEST IN BIOLOGY FOR STD. IX STUDENTS

ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT TEST IN BIOLOGY FOR STD. IX STUDENTS International Journal of Educational Science and Research (IJESR) ISSN(P): 2249-6947; ISSN(E): 2249-8052 Vol. 4, Issue 4, Aug 2014, 29-36 TJPRC Pvt. Ltd. ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT

More information