An Alternative Way of Establishing Measurement in Marketing Research Its Implications for Scale Development and Validity
|
|
- Jonathan Baldwin
- 5 years ago
- Views:
Transcription
1 An Alternative Way of Establishing Measurement in Marketing Research Its Implications for Scale Development and Validity Thomas Salzberger University of Economics and Business Administration, Vienna (WU-Wien) Abstract Quantitative consumer and marketing research is looking back on an era of construct operationalization predominantly based on classical test theory as a technical framework of scale development. Rasch measurement theory provides an alternative framework of measurement. Previous studies demonstrated the potential of Rasch measurement for marketing research from a theoretical viewpoint, and reported applications of Rasch measurement models to existing marketing scales. This paper focuses on the fact that the Rasch model explicitly accounts for the different amount of the construct that is needed by the respondents to agree with different items. Each item is characterized by an item parameter, i.e. the item location, that expresses the amount of the property to be measured the item stands for. Whereas the foundation of the Rasch model, i.e. specific objectivity, provide evidence of construct validity of a scale that fits the model, the range of item locations spans the latent dimension and gives insight into the meaning of different levels of the construct and thereby adds to content validity. An empirical example shows that applying Rasch models to existing scales does not reveal the full potential of the model, even if a comprehensive, albeit classical item pool is referred to. Consequently, only newly generated items are likely to span a wide range of items providing content validity. Introduction When developing instruments to measure latent constructs in marketing and consumer behaviour research, the procedure suggested by Churchill (1979) has been routinely applied, and it has been adopted by most of the textbooks in marketing research. This procedure rests on the basics of classical test theory (CTT, Lord and Novick 1968). During the last decade, an alternative framework of measurement has been introduced to marketing research (e.g. Soutar et al. 1990, Soutar and Cornish-Ward 1997, Soutar and Ryan 1999, Salzberger et al. 1999, Balasubramian and Wagner 1989, Singh et al. 1990, Singh 1996). From a general perspective, this alternative measurement theory may be referred to as item response theory (IRT, Lord 1980) or latent trait theory (LTT). A special class of models, the family of Rasch models, though, stands out by featuring special properties which more general models do not share. This paper does not offer a comprehensive introduction into LTT-models in general or Rasch models in particular (see, e.g., Andrich 1988a) but focuses on specific consequences as to the process of scale development in line with Rasch measurement theory (RMT). The Principles of Latent Trait Theory Since RMT is beyond the mainstream paradigm of construct operationalization in marketing research and, consequently, most marketing scholars are not fully familiar with it, a short introduction is provided highlighting the most fundamental differences between RMT and classical approaches. The classical approach is based essentially on the principle of correlation. From a comprehensive pool of items those are retained that show high loadings in factor analysis and contribute to reliability, i.e. their exclusion would decrease reliability. Both criteria require high item inter-correlations. This approach entails some theoretical drawbacks. First of all, it is not explained how an item score is actually accomplished. Rather, the item score is treated immediately as errorcontaminated measurement. Secondly, to be meaningful a correlation coefficient requires scale properties of the item scores, i.e. interval scale level in most cases, that are more than questionable and not testable in practice. Thirdly, correlations are affected by the distribution of the respondents. Consequently, a different sample of respondents is very likely to yield a different picture. Finally, the limited range of actually possible item scores, e.g. 1 to 5, has an important impact on the correlation of two items: due to floor and ceiling effects, only those items that have similar means may show high item inter-correlations. LTT models proceed on a totally different rationale. Rather than correlating manifest item scores, LTT models attempt to explain how a particular item score comes about. While classical approaches focus on summary statistics, i.e. variances, correlations, LTT refers primarily to individual responses. It depends on the particular type of LTT model which parameters are conceptualized in order to explain the respondents answer behaviour. 1111
2 However, there are two parameters all models have in common. The first one refers to the respondent s amount of the property to be measured - the ultimate goal of measuring. The second one parallels the first one but stands for the item s amount of the property. These parameters, along with others depending on the model, govern the answer behaviour. Classical approaches simply treat manifest item scores as meaningful data provided the scores from items measuring the same dimension show substantial correlations. LTT models examine whether empirical response patterns make sense and whether these patterns may be explained by item and person parameters, in other words, whether these patterns constitute measurement. To this end, items are to vary in the amount of the property to allow for determining likely and unlikely patterns of response. Further more, a wide range of items provides insight into what various levels of the property actually mean. Foundations of Rasch Measurement Theory (RMT) Notwithstanding the fact that the Rasch model (Rasch 1960/1980, see figure 1) shares some features with other LTT models, it has some unique properties, i.e. specific objectivity and raw score sufficiency and their consequences, that other models do not have, e.g. Birnbaum s logistic models for dichotomous items (Birnbaum 1968) and generalizations for polytomous items like the graded response model (Samejima 1969). It is these unique features that make up the measurement theory underlying the Rasch model, and all models adhering to these principles are termed Rasch models. Figure 1: Depiction P( a vi = 1) P( a vi = 0) The Dichotomous Rasch Model (Rasch 1960/80, p.187): Parametrization and Graphical e β v δ i = e β v δ i + 1 = e β v δ i + βv...person location parameter δi...item location parameter a vi..answer of person v to item i (0 = disagree, 1 = agree) P(a vi =x δ i, β v ) ICC P(a vi =1 δ i =0, β v ) δ i = δ i, β v All LTT models follow the concept of a latent dimension of the respondents degree of the property to be measured. Respondents are scaled onto this dimension in terms of their attitude, satisfaction, propensity to buy or whatsoever. The items are scaled onto this scale as well, i.e. a common dimension of respondents and items is established. The parameter characterizing the item s location on the latent dimension expresses the amount of the property the item stands for. The Rasch measurement model then defines the probability that a given respondent agrees with a given item characterized by its location. Each item may be represented graphically by a curve, the item characteristic curve (ICC), depicting the probability of agreement depending on the respondents location (see figure 1). In contrast to CTT based parameters of item location (the simple proportion of people agreeing to an item, or the more sophisticated item intercept in factor analysis), the item location within the Rasch model is sample independent provided the data fit the model. The basic principle of the Rasch model is the principle of objectivity. Objectivity in this context means, the respondents location must not depend on specific items answered and, vice versa, the item s location must not depend on specific respondents. Rasch (1960/1980) called this principle specific objectivity and deduced the model that follows necessarily (see also Fischer 1995). Only under the Rasch model is the unweighted raw score for respondents and items a sufficient statistic, i.e. the specific response patterns do not provide additional information. Consequently, maximum likelihood estimation of the parameters may be conditioned on these scores and any assumptions concerning the distribution of the respondents are no longer necessary (see, e.g., Molenaar 1995, for parameter estimation techniques). The person and item parameters have interval scale properties. The unit of the scale is defined by the common item discrimination implicitly set to one, while the origin of the scale is usually defined by constraining the mean of the item parameters to zero. 1112
3 As marketing research mostly employs multicategorical item scales (i.e., widely applied rating scales), the dichotomous Rasch model (as in figure 1) may not be applied. However, the Rasch model may be generalized for polytomous items in a straightforward way without losing its key property, i.e. specific objectivity. The two most important models are the rating scale model (Andrich 1978) and the partial credit model (Masters 1982, Andrich 1988b, see figure 2). In the numerator, there is, like in the dichotomous model, the difference between the person location v and the item mean location i. A positive difference contributes to a higher probability of agreement. The difference is multiplied by the score of the category because, e.g., choosing category 3 requires passing threshold 1 and threshold 2 which are theoretically independent. Furthermore, there is the negative of the sum of thresholds ij in the numerator. Thus, the higher the thresholds, the lower the numerator and, consequently, the lower the probability of choosing an affirmative category. The denominator is simply the sum of all numerators, i.e. the numerators of all category probabilities, to ensure that all probabilities add up to one. Both the partial credit model and the rating scale model may be derived by applying the dichotomous Rasch model repeatedly to adjacent categories of polytomous items. Between any pair of adjacent categories a threshold parameter is modelled. Consequently k answer categories call for k-1 threshold parameters. In the following we will concentrate on the rating scale model, which assumes a uniform scale across items, i.e. equal threshold distances across items but not necessarily within items. Figure 2: General Polytomous Rasch Model (Andrich, 1988b, p.366) P( a vi = x β v, τ ij, j = 1 m, 0 < x m) = with: m ϒ = 1 + e k = 1 k τ ij j = 1 + k ( β v δ i ) x τ ij j = 1 + x ( β v δ i ) e ϒ β v...person v location parameter δ i...item i location parameter τ ij...threshold j of item i parameter m...maximum score, number of categories - 1 a vi...answer of person v to item i (item score) Andrich (1995a, 1995b) pointed out that due to the fact that polytomous Rasch models estimate the threshold parameters independently of each other, the empirical threshold parameters may or may not reflect the order that is hypothesized when setting up a polytomous answer scale. If the empirical threshold estimates are not properly ordered, i.e. they are reversed, the scale does not really work as intended and, in fact, lacks ordinal properties. In this case, adjacent categories should be collapsed, i.e. the scoring function assigns the same numbers to adjacent categories. However, further data have to be collected in order to cross-validate the new scale format. The most important features of the Rasch model may be summarized as follows, the model provides a theory of how measurement is accomplished based on the principle of specific objectivity, namely by a comparison of an item and a person in the empirical domain and thereby establishing an interval scale for item and person parameters, the model may be falsified empirically, assessed by various tests of fit (which go beyond the scope of this paper), the model defines only one dimension, i.e. it rests on the prerequisite of unidimensionality; this prerequisite is subject to empirical falsification, however; the principle of local stochastic independence, i.e. the answer to one item is independent of the answer to a different item given the person parameter, is closely related to unidimensionality (see, e.g., Gustafsson 1980), the answer scale of a polytomous item is hypothesized to have ordinal properties which are subject to empirical falsification (reversed thresholds), for any person a specific answer pattern is expected to occur most likely (i.e. agreement with all items standing for less of the property than the person itself has, disagreement with all other items), offering opportunities to test for person fit. 1113
4 The Application of the Rasch Model and Its Consequences For The Scale Development Process The question arises whether the application of the Rasch model may simply be seen as a technique of analysis to be carried out instead of or parallel to classical techniques of scale analysis. From the perspective of item generation, the Rasch model and classical analysis differ substantially. While the classical approach requires to cover as many facets as possible, the Rasch model additionally requires to consider different levels of the construct to be measured. That s why, the classical approach to scale development usually does not account for varying degrees of the property and is not very likely to provide a foundation of establishing a useful Rasch scale. Thus, the application of the Rasch model is more than an alternative way of mere data analysis. Rasch measurement represents a different philosophy of construct operationalization. It aims at developing a type of a ruler with the items representing the marks which the respondents are checked against. It provides a superior foundation for assessing content validity as well as construct validity since it gives insight into what various levels of the construct actually mean. A mere re-analysis of an existing scale is a priori not very likely to yield a Rasch scale with a wide range of item locations. The empirical example examines whether, in order to establish a Rasch scale, it is sufficient to go back to the original comprehensive item pool underlying the development of a widely used marketing scale, the CETSCALE, and if it is not, how the content of the scale may be extended by including additional items. Empirical Example The CETSCALE (Shimp and Sharma 1987) has obtained much popularity in consumer research since its introduction as is demonstrated by the multiplicity of applications in national as well as in cross-national marketing research (e.g. Herche 1992, Netemeyer et al. 1991, Durvasula et al., Good and Huddleston 1995, Steenkamp and Baumgartner 1998). The idea of consumer ethnocentric tendencies transfers the sociological concept of ethnocentrism to marketing and consumer research in that it focuses on the attitude towards foreign economies and their products opposed to one s own domestic economy. Both the general level of a nation s consumer ethnocentric tendencies and the level within segments of consumers relevant to a company are obviously important for corporate location policy, product mix decisions, and corporate communication strategy. The CETSCALE Data Set Data has been collected in Austria (n=974 listwise nonmissing respondents, self administered interviews) based on a translated version of the whole set of 100 items that remained in the item pool after a judgmental panel screening of originally 180 items generated to develop the CETSCALE (Shimp and Sharma 1987, Sinkovics 1999). The items seven-point scale provides categories labelled as follows: fully disagree, partly disagree, somewhat disagree, neither disagree nor agree, somewhat agree, partly agree, and fully agree. Rasch Based Analysis Using a data set restricted to the original 17 CETSCALE items, Salzberger (1999) showed that six of these items may indeed be scaled successfully applying the Rasch model for polytomous data (rating scale model). Some of the thresholds were reversed, however. So two pairs of adjacent categories had to be collapsed leading to a fivepoint rating scale. The range of item parameters amounted to a mere log-units with five items within approximately 0.2 log- units. Consequently, these items do not yield a profound understanding of the latent construct that goes beyond the expectation that the higher the ethnocentric tendencies the higher the probability of agreement with nearly all items in the same way. The current analysis built upon these results. It started with a conventional factor analysis (principal axis factoring) in order to ensure unidimensionality. As a cutoff criterion a factor loading of.3 has been chosen which is rather small compared to CTT standards. The reason is that the correlation of the item and the factor may be reduced due to scale bounding effects especially if the item is extraordinarily easy or hard to endorse. The remaining 65 items have been analysed using the partial credit model implemented in RUMM 2.7 (Sheridan et al. 1997) as rough screening of suitability. 25 items were retained. Subsequently, these items have been analysed using the rating scale model. In line with the results of Salzberger (1999), the original seven-point Likert scale had to be transformed to a five-point rating scale in order to achieve a proper order of thresholds. On each step of parameter estimation the worst significantly misfitting item (alpha =.001) in terms of a chisquare test of fit provided by RUMM 2.7, which compares model predicted probability and actual response 1114
5 behaviour, has been deleted. Ultimately, a scale has been derived containing ten items fitting the model. The established rating scale is very similar to that reported by Salzberger (1999), i.e. the threshold distances are almost identical. The same applies to the item location parameters as the mean of the thresholds of each item (detailed results are available upon request from the first author). The striking outcome of the current analysis, however, is the fact that widening the base of the analysis from 17 to 100 items resulted in a mere increase of four additional items fitting the model yielding only a small increase in the range of item locations from to log- units. While ten items might in principle suffice for most applications, the small range of item locations leads to two different problems. First, it increases the measurement error for respondents who do not fall into this small area, i.e. the area of the item locations considering the thresholds. (The measurement error for a specific person depends on the item information which reaches a maximum when person and item location coincide.) Second, content validity is limited to the number of facets of the construct covered by the items. It remains unclear, however, what a certain degree of ethnocentric tendencies actually means. If there were a broader range of item locations, any non-extreme area on the scale would be associated with specific items agreed with and others disagreed with. It should be noted that from the viewpoint of CTT the small range of item locations does not represent a severe problem at all. In fact, the whole item pool has proved to be designed for CTT based analyses. Consequently, following the LTT approach of measurement means more than (re-)analysing a data set creation of which has been guided by a different measurement paradigm, i.e. CTT. The measurement theory adhered to has a significant impact on the items generated and, eventually, on the data collected. In other words, data are, at least in part, determined by the measurement paradigm chosen. Extending the CETSCALE Given the small range of item locations, a preliminary follow-up study aimed at widening the items in terms of the amount of the property they stand for. To this end, 14 additional items have been generated as an extension of the CETSCALE to cover the positive and negative extremes of the construct. The answer scale has been confined to five categories. Based on a small convenience sample (n=80), these items were analysed together with 19 items stemming from the CETSCALE item pool to evaluate their locations. 26 items proved to fit the model. 16 items came from the CETSCALE item pool ensuring that the basic concept to be measured stays the same. The other ten items, which were newly generated, successfully widened the range of item locations. The items of the extended CETSCALE differ as much as log-units in their locations (detailed results and a list of the items are available upon request from the author). A Classical Re-analysis of the Extended Scale In order to contrast the results with those based on the classical paradigm of scale development, a re-analysis using principal components analysis has been carried out. The 26 items of the final Rasch scale yield a unidimensional set of indicators with a mean loading of.687 (ranging from.55 to.86) and a scale reliability of.96 (Cronbach s alpha). Thus, the scale developed by Rasch modelling proves tenable from the classical perspective of scale development. Certainly, the number of items has to be reduced for practical purposes. However, the approaches differ significantly in the way the number of items would be reduced. The Rasch approach would drop items based on their locations, i.e. for any region of the latent dimension at least one item has to be retained. Thereby the range of item locations would not be reduced since the extreme items would certainly be kept in the instrument. In contrast, the classical approach would discard items showing loadings below average. Not really surprisingly, almost all of the extreme items show loadings below average. Consequently, the classical approach of item selection would lead to a narrower instrument in terms of item locations. Implications and Conclusions From a theoretical viewpoint, the Rasch measurement approach has the potential to lift measurement in consumer and marketing research to a higher level and provide a better foundation for managerial decision making. It provides a powerful foundation of assessing content and construct validity. The application of Rasch models to existing marketing scales is a good starting point for further dissemination. However, in the long run Rasch measurement should guide us from the beginning of scale development. The first step of the scale development process as outlined by Churchill (1979) should not be restricted to domain specification in terms of aspects to be considered but also aim at covering a range of the construct as wide as possible. 1115
6 A Rasch scale with widely varying item locations provides a deeper understanding of the construct, i.e. what it means for respondents to be located at a certain position on the latent dimension. Moreover, this is the prerequisite of precise measurement over a wide range for measurement error increases strongly when items are off-target for the respondents. References Andrich, David (1978), A Rating Formulation for Ordered Response Categories, Psy- chometrika, 43 (4), (1988a), Rasch Models for Measurement, Sage University Paper Series on Quantitative Applications in the Social Sciences 68, Beverly Hills: Sage (1988b), A General Form of Rasch s Extended Logistic Model for Partial Credit Scoring, Applied Measurement in Education, 1 (4), (1995a), Models for Measurement, Precision and the Non-Dichotomization of Graded Responses, Psychometrika, 60 (1), (1995b), Further Remarks on the Non-Dichotomization of Graded Responses, Psychometrika, 60 (1), Balasubramian, Siva. K. and Wagner A. Kamakura (1989), Measuring Consumer Attitudes Toward the Marketplace With Tailored Interviews, Journal of Marketing Research, 26 (3), Birnbaum, Allan (1968), Some Latent Trait Models and Their Use in Inferring an Examinee s Ability, in: Statistical Theories of Mental Test Scores, Chapters 17-20, Eds. Frederic Lord and Melvin R. Novick, Reading (Mass.): Addison- Wesley. Churchill, Gilbert A. (1979), A Paradigm for Developing Better Measures of Marketing Constructs, Journal of Marketing Research, 26, Durvasula, Srinivas, Craig J. Andrews and Richard G. Netemeyer (1997), A Cross-Cultural Comparison of Consumer Ethnocentrism in the United States and Russia, Journal of International Consumer Marketing, 9 (4), Fischer, Gerhard H. (1995), Derivations of the Rasch Model, in: Rasch Models, Foundations Recent Developments, and Applications, Eds. Gerhard H. Fischer and Ivo W. Molenaar, New York: Springer, Good, Linda and Patricia Huddleston (1995), Ethnocentrism of Polish and Russian Consumers: Are Feelings and Intentions related?, International Marketing Review 12 (5), Gustafsson, Jan-Eric (1980), Testing and Obtaining Fit of Data to the Rasch Model, British Journal of Mathematical and Statistical Psychology, 32, Herche, Joel (1992), A Note on the Predictive Validity of the CETSCALE, Journal of the Academy of Marketing Science, 20(3), Lord, Frederic M. (1980), Applications of Item Response Theory to Practical Testing Problems, Hillsdale, New Jersey: Lawrence Erlbaum Associates. ----, and Melvin R. Novick (1968), Statistical Theories of Mental Test Scores, Reading (Mass): Addison-Wesley. Masters, Geofferey N. (1982), A Rasch Model for Partial Credit Scoring, Psychometrika, 47 (2), Molenaar, Ivo W. (1995) Estimation of Item Parameters. in: Rasch Models, Foundations Recent Developments, and Applications. Eds. Gerhard H. Fischer and Ivo W. Molenaar. New York: Springer, Netemeyer, Richard G., Srinivas Durvasula and Donald R. Lichtenstein (1991), A Cross-National Assessment of the Reliability and Validity of the CETSCALE, Journal of Marketing Research, 28 (3), Rasch, Georg (1960/1980) Probabilistic Models for Some Intelligence and Attainment Tests, Chicago: MESA Press. Reprint of the original publication in 1960 by the Danish Institute for Educational Research. Salzberger, Thomas (1999), How the Rasch Model May Shift Our Perspective of Measurement in Marketing Research, Paper presented at the 1999 Australia and New Zealand Marketing Academy Conference (ANZMAC), Sydney. ----, Rudolf Sinkovics and Bodo B. Schlegelmilch (1999), Data Equivalence in Cross-Cultural Research: A Comparison of Classical Test Theory and Latent Trait Theory Based Approaches, Australasian Marketing Journal, 7 (2),
7 Samejima, Fumiko (1969) Estimation of Latent Ability Using a Response Pattern of Graded Responses, Psychometric Monograph, 17, Iowa City (IA): Psychometric Society. Sheridan, Barry, David Andrich and Guanzhong Luo (1997), User s Guide to RUMM Rasch Unidimensional Measurement Models, Perth: RUMM Laboratory. Shimp, Terence A. and Subhash Sharma (1987), Consumer Ethnocentrism: Construction and Validation of the CETSCALE, Journal of Marketing Research, 24 (3), Singh, Jagdip (1996), A Latent Trait Theory Approach to Measurement Issues in Marketing Research: Principles, Relevance and Application, Proceedings of the EMAC Annual Conference, Budapest University of Economic Sciences, Vol. 1. Eds. József Berács, András Bauer and Judith Simon, , Roy D. Howell and Gary K. Rhoads (1990), Adaptive Designs for Likert-Type Data: An Approach for Implementing Marketing Surveys, Journal of Marketing Research, 27 (3), Sinkovics, Rudolf R. (1999), Ethnozentrismus und Konsumentenverhalten [Ethnocentrism and Consumer Behaviour], Wiesbaden: Deutscher Universitätsverlag. Soutar, Geoffrey N., Richard Bell and Yvonne Wallis (1990), Consumer Acquisition Patterns for Durable Goods: A Rasch Analysis, Asia Pacific International Journal of Marketing, 2 (1), , and Steven P. Cornish-Ward (1997), Ownership Patterns for Durable Goods and Financial Assets: A Rasch Analysis, Applied Econimics, 29, , and Maria M. Ryan (1999), People's Leisure Activities: A Logistic Modelling Approach, Paper presented at the 1999 Australia and New Zealand Marketing Academy Conference (ANZMAC), Sydney. Steenkamp, Jan-Benedict E.M. and Hans Baumgartner (1998), Assessing Measurement Invariance in Cross-National Consumer Research, Journal of Consumer Research, 25,
Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD
Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT
More informationValidating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky
Validating Measures of Self Control via Rasch Measurement Jonathan Hasford Department of Marketing, University of Kentucky Kelly D. Bradley Department of Educational Policy Studies & Evaluation, University
More informationMeasuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University
Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety
More informationItem Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses
Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,
More informationThe validity of polytomous items in the Rasch model The role of statistical evidence of the threshold order
Psychological Test and Assessment Modeling, Volume 57, 2015 (3), 377-395 The validity of polytomous items in the Rasch model The role of statistical evidence of the threshold order Thomas Salzberger 1
More informationPsychometric properties of the PsychoSomatic Problems scale an examination using the Rasch model
Psychometric properties of the PsychoSomatic Problems scale an examination using the Rasch model Curt Hagquist Karlstad University, Karlstad, Sweden Address: Karlstad University SE-651 88 Karlstad Sweden
More informationDescription of components in tailored testing
Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of
More informationTechnical Specifications
Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically
More informationChapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models.
Ostini & Nering - Chapter 1 - Page 1 POLYTOMOUS ITEM RESPONSE THEORY MODELS Chapter 1 Introduction Measurement Theory Mathematical models have been found to be very useful tools in the process of human
More informationCONSTRUCTION OF THE MEASUREMENT SCALE FOR CONSUMER S ATTITUDES IN THE FRAME OF ONE-PARAMETRIC RASCH MODEL
ACTA UNIVERSITATIS LODZIENSIS FOLIA OECONOMICA 286, 2013 * CONSTRUCTION OF THE MEASUREMENT SCALE FOR CONSUMER S ATTITUDES IN THE FRAME OF ONE-PARAMETRIC RASCH MODEL Abstract. The article discusses issues
More informationA Comparison of Several Goodness-of-Fit Statistics
A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures
More informationRATER EFFECTS AND ALIGNMENT 1. Modeling Rater Effects in a Formative Mathematics Alignment Study
RATER EFFECTS AND ALIGNMENT 1 Modeling Rater Effects in a Formative Mathematics Alignment Study An integrated assessment system considers the alignment of both summative and formative assessments with
More informationMeasurement issues in the use of rating scale instruments in learning environment research
Cav07156 Measurement issues in the use of rating scale instruments in learning environment research Associate Professor Robert Cavanagh (PhD) Curtin University of Technology Perth, Western Australia Address
More informationConceptualising computerized adaptive testing for measurement of latent variables associated with physical objects
Journal of Physics: Conference Series OPEN ACCESS Conceptualising computerized adaptive testing for measurement of latent variables associated with physical objects Recent citations - Adaptive Measurement
More informationAN ALTERNATE APPROACH TO ASSESSING CROSS-CULTURAL MEASUREMENT EQUIVALENCE IN ADVERTISING RESEARCH
AN ALTERNATE APPROACH TO ASSESSING CROSS-CULTURAL MEASUREMENT EQUIVALENCE IN ADVERTISING RESEARCH Michael T. Ewing, Thomas Salzberger, and Rudolf R. Sinkovics ABSTRACT: This paper offers a new methodological
More informationUSE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION
USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION Iweka Fidelis (Ph.D) Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt,
More informationA Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model
A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model Gary Skaggs Fairfax County, Virginia Public Schools José Stevenson
More informationTHE NATURE OF OBJECTIVITY WITH THE RASCH MODEL
JOURNAL OF EDUCATIONAL MEASUREMENT VOL. II, NO, 2 FALL 1974 THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL SUSAN E. WHITELY' AND RENE V. DAWIS 2 University of Minnesota Although it has been claimed that
More informationAssessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.
Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong
More informationRaschmätning [Rasch Measurement]
Raschmätning [Rasch Measurement] Forskarutbildningskurs vid Karlstads universitet höstterminen 2014 Kursen anordnas av Centrum för forskning om barns och ungdomars psykiska hälsa och avdelningen för psykologi.
More informationMEASURING AFFECTIVE RESPONSES TO CONFECTIONARIES USING PAIRED COMPARISONS
MEASURING AFFECTIVE RESPONSES TO CONFECTIONARIES USING PAIRED COMPARISONS Farzilnizam AHMAD a, Raymond HOLT a and Brian HENSON a a Institute Design, Robotic & Optimizations (IDRO), School of Mechanical
More informationA TEST OF A MULTI-FACETED, HIERARCHICAL MODEL OF SELF-CONCEPT. Russell F. Waugh. Edith Cowan University
A TEST OF A MULTI-FACETED, HIERARCHICAL MODEL OF SELF-CONCEPT Russell F. Waugh Edith Cowan University Paper presented at the Australian Association for Research in Education Conference held in Melbourne,
More informationAND ITS VARIOUS DEVICES. Attitude is such an abstract, complex mental set. up that its measurement has remained controversial.
CHAPTER III attitude measurement AND ITS VARIOUS DEVICES Attitude is such an abstract, complex mental set up that its measurement has remained controversial. Psychologists studied attitudes of individuals
More informationEvaluating the quality of analytic ratings with Mokken scaling
Psychological Test and Assessment Modeling, Volume 57, 2015 (3), 423-444 Evaluating the quality of analytic ratings with Mokken scaling Stefanie A. Wind 1 Abstract Greatly influenced by the work of Rasch
More informationCHAPTER VI RESEARCH METHODOLOGY
CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the
More informationInvestigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories
Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,
More informationThe Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective
Vol. 9, Issue 5, 2016 The Impact of Item Sequence Order on Local Item Dependence: An Item Response Theory Perspective Kenneth D. Royal 1 Survey Practice 10.29115/SP-2016-0027 Sep 01, 2016 Tags: bias, item
More informationch1 1. What is the relationship between theory and each of the following terms: (a) philosophy, (b) speculation, (c) hypothesis, and (d) taxonomy?
ch1 Student: 1. What is the relationship between theory and each of the following terms: (a) philosophy, (b) speculation, (c) hypothesis, and (d) taxonomy? 2. What is the relationship between theory and
More informationThe Influence of Test Characteristics on the Detection of Aberrant Response Patterns
The Influence of Test Characteristics on the Detection of Aberrant Response Patterns Steven P. Reise University of California, Riverside Allan M. Due University of Minnesota Statistical methods to assess
More informationEvaluating and restructuring a new faculty survey: Measuring perceptions related to research, service, and teaching
Evaluating and restructuring a new faculty survey: Measuring perceptions related to research, service, and teaching Kelly D. Bradley 1, Linda Worley, Jessica D. Cunningham, and Jeffery P. Bieber University
More informationINTRODUCTION TO ITEM RESPONSE THEORY APPLIED TO FOOD SECURITY MEASUREMENT. Basic Concepts, Parameters and Statistics
INTRODUCTION TO ITEM RESPONSE THEORY APPLIED TO FOOD SECURITY MEASUREMENT Basic Concepts, Parameters and Statistics The designations employed and the presentation of material in this information product
More informationExamining Factors Affecting Language Performance: A Comparison of Three Measurement Approaches
Pertanika J. Soc. Sci. & Hum. 21 (3): 1149-1162 (2013) SOCIAL SCIENCES & HUMANITIES Journal homepage: http://www.pertanika.upm.edu.my/ Examining Factors Affecting Language Performance: A Comparison of
More informationUsing the Rasch Modeling for psychometrics examination of food security and acculturation surveys
Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,
More informationalternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over
More informationBy Hui Bian Office for Faculty Excellence
By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 1001 Joyner Library, room 1006 Office hours: 8:00am-5:00pm, Monday-Friday 2 Educational tests and regular surveys
More informationShiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p )
Rasch Measurementt iin Language Educattiion Partt 2:: Measurementt Scalles and Invariiance by James Sick, Ed.D. (J. F. Oberlin University, Tokyo) Part 1 of this series presented an overview of Rasch measurement
More informationMeasuring the External Factors Related to Young Alumni Giving to Higher Education. J. Travis McDearmon, University of Kentucky
Measuring the External Factors Related to Young Alumni Giving to Higher Education Kathryn Shirley Akers 1, University of Kentucky J. Travis McDearmon, University of Kentucky 1 1 Please use Kathryn Akers
More informationUsing the Partial Credit Model
A Likert-type Data Analysis Using the Partial Credit Model Sun-Geun Baek Korean Educational Development Institute This study is about examining the possibility of using the partial credit model to solve
More informationMCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2
MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts
More informationDoes factor indeterminacy matter in multi-dimensional item response theory?
ABSTRACT Paper 957-2017 Does factor indeterminacy matter in multi-dimensional item response theory? Chong Ho Yu, Ph.D., Azusa Pacific University This paper aims to illustrate proper applications of multi-dimensional
More informationLikelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.
Likelihood Ratio Based Computerized Classification Testing Nathan A. Thompson Assessment Systems Corporation & University of Cincinnati Shungwon Ro Kenexa Abstract An efficient method for making decisions
More informationItem Response Theory: Methods for the Analysis of Discrete Survey Response Data
Item Response Theory: Methods for the Analysis of Discrete Survey Response Data ICPSR Summer Workshop at the University of Michigan June 29, 2015 July 3, 2015 Presented by: Dr. Jonathan Templin Department
More informationBruno D. Zumbo, Ph.D. University of Northern British Columbia
Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.
More informationITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE
California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-2016 ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION
More informationA typology of polytomously scored mathematics items disclosed by the Rasch model: implications for constructing a continuum of achievement
A typology of polytomously scored mathematics items 1 A typology of polytomously scored mathematics items disclosed by the Rasch model: implications for constructing a continuum of achievement John van
More informationItem Analysis: Classical and Beyond
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides
More informationJason L. Meyers. Ahmet Turhan. Steven J. Fitzpatrick. Pearson. Paper presented at the annual meeting of the
Performance of Ability Estimation Methods for Writing Assessments under Conditio ns of Multidime nsionality Jason L. Meyers Ahmet Turhan Steven J. Fitzpatrick Pearson Paper presented at the annual meeting
More informationEmpowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison
Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological
More informationBasic concepts and principles of classical test theory
Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must
More informationMeeting Feynman: Bringing light into the black box of social measurement
Journal of Physics: Conference Series PAPER OPEN ACCESS Meeting Feynman: Bringing light into the black box of social measurement To cite this article: Thomas Salzberger 2018 J. Phys.: Conf. Ser. 1065 072035
More informationIssues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy
Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus
More informationLatent Trait Standardization of the Benzodiazepine Dependence. Self-Report Questionnaire using the Rasch Scaling Model
Chapter 7 Latent Trait Standardization of the Benzodiazepine Dependence Self-Report Questionnaire using the Rasch Scaling Model C.C. Kan 1, A.H.G.S. van der Ven 2, M.H.M. Breteler 3 and F.G. Zitman 1 1
More informationConnexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan
Connexion of Item Response Theory to Decision Making in Chess Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Acknowledgement A few Slides have been taken from the following presentation
More informationTHE COURSE EXPERIENCE QUESTIONNAIRE: A RASCH MEASUREMENT MODEL ANALYSIS
THE COURSE EXPERIENCE QUESTIONNAIRE: A RASCH MEASUREMENT MODEL ANALYSIS Russell F. Waugh Edith Cowan University Key words: attitudes, graduates, university, measurement Running head: COURSE EXPERIENCE
More informationUsing Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items
University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations May 215 Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items Tamara Beth
More informationUsing Analytical and Psychometric Tools in Medium- and High-Stakes Environments
Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session
More informationTHE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES IN GHANA.
Africa Journal of Teacher Education ISSN 1916-7822. A Journal of Spread Corporation Vol. 6 No. 1 2017 Pages 56-64 THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES
More informationAndré Cyr and Alexander Davies
Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander
More informationExploring rater errors and systematic biases using adjacent-categories Mokken models
Psychological Test and Assessment Modeling, Volume 59, 2017 (4), 493-515 Exploring rater errors and systematic biases using adjacent-categories Mokken models Stefanie A. Wind 1 & George Engelhard, Jr.
More informationIntroduction to Measurement
This is a chapter excerpt from Guilford Publications. The Theory and Practice of Item Response Theory, by R. J. de Ayala. Copyright 2009. 1 Introduction to Measurement I often say that when you can measure
More informationResearch and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida
Vol. 2 (1), pp. 22-39, Jan, 2015 http://www.ijate.net e-issn: 2148-7456 IJATE A Comparison of Logistic Regression Models for Dif Detection in Polytomous Items: The Effect of Small Sample Sizes and Non-Normality
More informationPsychological testing
Psychological testing Lecture 12 Mikołaj Winiewski, PhD Test Construction Strategies Content validation Empirical Criterion Factor Analysis Mixed approach (all of the above) Content Validation Defining
More informationConstruct Validity of Mathematics Test Items Using the Rasch Model
Construct Validity of Mathematics Test Items Using the Rasch Model ALIYU, R.TAIWO Department of Guidance and Counselling (Measurement and Evaluation Units) Faculty of Education, Delta State University,
More informationDevelopment, Standardization and Application of
American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,
More informationCHAPTER - III METHODOLOGY CONTENTS. 3.1 Introduction. 3.2 Attitude Measurement & its devices
102 CHAPTER - III METHODOLOGY CONTENTS 3.1 Introduction 3.2 Attitude Measurement & its devices 3.2.1. Prior Scales 3.2.2. Psychophysical Scales 3.2.3. Sigma Scales 3.2.4. Master Scales 3.3 Attitude Measurement
More informationCenter for Advanced Studies in Measurement and Assessment. CASMA Research Report
Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 39 Evaluation of Comparability of Scores and Passing Decisions for Different Item Pools of Computerized Adaptive Examinations
More informationUvA-DARE (Digital Academic Repository)
UvA-DARE (Digital Academic Repository) Standaarden voor kerndoelen basisonderwijs : de ontwikkeling van standaarden voor kerndoelen basisonderwijs op basis van resultaten uit peilingsonderzoek van der
More informationReferences. Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Mahwah,
The Western Aphasia Battery (WAB) (Kertesz, 1982) is used to classify aphasia by classical type, measure overall severity, and measure change over time. Despite its near-ubiquitousness, it has significant
More informationhow good is the Instrument? Dr Dean McKenzie
how good is the Instrument? Dr Dean McKenzie BA(Hons) (Psychology) PhD (Psych Epidemiology) Senior Research Fellow (Abridged Version) Full version to be presented July 2014 1 Goals To briefly summarize
More informationFactors Influencing Undergraduate Students Motivation to Study Science
Factors Influencing Undergraduate Students Motivation to Study Science Ghali Hassan Faculty of Education, Queensland University of Technology, Australia Abstract The purpose of this exploratory study was
More informationAdaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida
Adaptive Testing With the Multi-Unidimensional Pairwise Preference Model Stephen Stark University of South Florida and Oleksandr S. Chernyshenko University of Canterbury Presented at the New CAT Models
More informationLinking Assessments: Concept and History
Linking Assessments: Concept and History Michael J. Kolen, University of Iowa In this article, the history of linking is summarized, and current linking frameworks that have been proposed are considered.
More informationCentre for Education Research and Policy
THE EFFECT OF SAMPLE SIZE ON ITEM PARAMETER ESTIMATION FOR THE PARTIAL CREDIT MODEL ABSTRACT Item Response Theory (IRT) models have been widely used to analyse test data and develop IRT-based tests. An
More informationDevelopment and psychometric evaluation of scales to measure professional confidence in manual medicine: a Rasch measurement approach
Hecimovich et al. BMC Research Notes 2014, 7:338 RESEARCH ARTICLE Open Access Development and psychometric evaluation of scales to measure professional confidence in manual medicine: a Rasch measurement
More informationNearest-Integer Response from Normally-Distributed Opinion Model for Likert Scale
Nearest-Integer Response from Normally-Distributed Opinion Model for Likert Scale Jonny B. Pornel, Vicente T. Balinas and Giabelle A. Saldaña University of the Philippines Visayas This paper proposes that
More information1. Evaluate the methodological quality of a study with the COSMIN checklist
Answers 1. Evaluate the methodological quality of a study with the COSMIN checklist We follow the four steps as presented in Table 9.2. Step 1: The following measurement properties are evaluated in the
More informationComputerized Mastery Testing
Computerized Mastery Testing With Nonequivalent Testlets Kathleen Sheehan and Charles Lewis Educational Testing Service A procedure for determining the effect of testlet nonequivalence on the operating
More informationValidity and reliability of measurements
Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication
More informationItem Response Theory. Author's personal copy. Glossary
Item Response Theory W J van der Linden, CTB/McGraw-Hill, Monterey, CA, USA ã 2010 Elsevier Ltd. All rights reserved. Glossary Ability parameter Parameter in a response model that represents the person
More informationAPPLYING THE RASCH MODEL TO PSYCHO-SOCIAL MEASUREMENT A PRACTICAL APPROACH
APPLYING THE RASCH MODEL TO PSYCHO-SOCIAL MEASUREMENT A PRACTICAL APPROACH Margaret Wu & Ray Adams Documents supplied on behalf of the authors by Educational Measurement Solutions TABLE OF CONTENT CHAPTER
More informationINVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form
INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement
More informationMEASURING SUBJECTIVE HEALTH AMONG ADOLESCENTS IN SWEDEN A Rasch-analysis of the HBSC Instrument
CURT HAGQUIST and DAVID ANDRICH MEASURING SUBJECTIVE HEALTH AMONG ADOLESCENTS IN SWEDEN A Rasch-analysis of the HBSC Instrument (Accepted 27 June 2003) ABSTRACT. The cross-national WHO-study Health Behaviour
More informationMeasurement Invariance (MI): a general overview
Measurement Invariance (MI): a general overview Eric Duku Offord Centre for Child Studies 21 January 2015 Plan Background What is Measurement Invariance Methodology to test MI Challenges with post-hoc
More informationMantel-Haenszel Procedures for Detecting Differential Item Functioning
A Comparison of Logistic Regression and Mantel-Haenszel Procedures for Detecting Differential Item Functioning H. Jane Rogers, Teachers College, Columbia University Hariharan Swaminathan, University of
More informationInformation Structure for Geometric Analogies: A Test Theory Approach
Information Structure for Geometric Analogies: A Test Theory Approach Susan E. Whitely and Lisa M. Schneider University of Kansas Although geometric analogies are popular items for measuring intelligence,
More informationISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology
ISC- GRADE XI HUMANITIES (2018-19) PSYCHOLOGY Chapter 2- Methods of Psychology OUTLINE OF THE CHAPTER (i) Scientific Methods in Psychology -observation, case study, surveys, psychological tests, experimentation
More informationRecent advances in analysis of differential item functioning in health research using the Rasch model
Hagquist and Andrich Health and Quality of Life Outcomes (2017) 15:181 DOI 10.1186/s12955-017-0755-0 RESEARCH Open Access Recent advances in analysis of differential item functioning in health research
More informationAgreement Coefficients and Statistical Inference
CHAPTER Agreement Coefficients and Statistical Inference OBJECTIVE This chapter describes several approaches for evaluating the precision associated with the inter-rater reliability coefficients of the
More informationThriving in College: The Role of Spirituality. Laurie A. Schreiner, Ph.D. Azusa Pacific University
Thriving in College: The Role of Spirituality Laurie A. Schreiner, Ph.D. Azusa Pacific University WHAT DESCRIBES COLLEGE STUDENTS ON EACH END OF THIS CONTINUUM? What are they FEELING, DOING, and THINKING?
More information[3] Coombs, C.H., 1964, A theory of data, New York: Wiley.
Bibliography [1] Birnbaum, A., 1968, Some latent trait models and their use in inferring an examinee s ability, In F.M. Lord & M.R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-479),
More informationCOMPUTING READER AGREEMENT FOR THE GRE
RM-00-8 R E S E A R C H M E M O R A N D U M COMPUTING READER AGREEMENT FOR THE GRE WRITING ASSESSMENT Donald E. Powers Princeton, New Jersey 08541 October 2000 Computing Reader Agreement for the GRE Writing
More informationMarc J. Tassé, PhD Nisonger Center UCEDD
FINALLY... AN ADAPTIVE BEHAVIOR SCALE FOCUSED ON PROVIDING PRECISION AT THE DIAGNOSTIC CUT-OFF. How Item Response Theory Contributed to the Development of the DABS Marc J. Tassé, PhD UCEDD The Ohio State
More informationStudents' perceived understanding and competency in probability concepts in an e- learning environment: An Australian experience
University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2016 Students' perceived understanding and competency
More informationAN ANALYSIS OF THE ITEM CHARACTERISTICS OF THE CONDITIONAL REASONING TEST OF AGGRESSION
AN ANALYSIS OF THE ITEM CHARACTERISTICS OF THE CONDITIONAL REASONING TEST OF AGGRESSION A Dissertation Presented to The Academic Faculty by Justin A. DeSimone In Partial Fulfillment of the Requirements
More informationLinking Errors in Trend Estimation in Large-Scale Surveys: A Case Study
Research Report Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study Xueli Xu Matthias von Davier April 2010 ETS RR-10-10 Listening. Learning. Leading. Linking Errors in Trend Estimation
More informationSensitivity of DFIT Tests of Measurement Invariance for Likert Data
Meade, A. W. & Lautenschlager, G. J. (2005, April). Sensitivity of DFIT Tests of Measurement Invariance for Likert Data. Paper presented at the 20 th Annual Conference of the Society for Industrial and
More informationBuilding Evaluation Scales for NLP using Item Response Theory
Building Evaluation Scales for NLP using Item Response Theory John Lalor CICS, UMass Amherst Joint work with Hao Wu (BC) and Hong Yu (UMMS) Motivation Evaluation metrics for NLP have been mostly unchanged
More informationKersten, P. and N. M. Kayes (2011). "Outcome measurement and the use of Rasch
Kersten, P. and N. M. Kayes (2011). "Outcome measurement and the use of Rasch analysis, a statistics-free introduction." New Zealand Journal of Physiotherapy 39(2): 92-99. Abstract Outcome measures, which
More informationINTERPRETING IRT PARAMETERS: PUTTING PSYCHOLOGICAL MEAT ON THE PSYCHOMETRIC BONE
The University of British Columbia Edgeworth Laboratory for Quantitative Educational & Behavioural Science INTERPRETING IRT PARAMETERS: PUTTING PSYCHOLOGICAL MEAT ON THE PSYCHOMETRIC BONE Anita M. Hubley,
More informationESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT TEST IN BIOLOGY FOR STD. IX STUDENTS
International Journal of Educational Science and Research (IJESR) ISSN(P): 2249-6947; ISSN(E): 2249-8052 Vol. 4, Issue 4, Aug 2014, 29-36 TJPRC Pvt. Ltd. ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT
More information