Introduction to Measurement

Size: px
Start display at page:

Download "Introduction to Measurement"

Transcription

1 This is a chapter excerpt from Guilford Publications. The Theory and Practice of Item Response Theory, by R. J. de Ayala. Copyright Introduction to Measurement I often say that when you can measure what you are speaking about and express it in numbers you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind: it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the state of science, whatever the matter may be. Sir William Thomson (Lord Kelvin) (1891, p. 80) This book is about a particular measurement perspective called item response theory (IRT), latent trait theory, or item characteristic curve theory. To understand this measurement perspective, we need to address what we mean by the concept of measurement. Measurement can be defined in many different ways. A classic definition is that measurement is the assignment of numerals to objects or events according to rules. The fact that numerals can be assigned under different rules leads to different kinds of scales and different kinds of measurement (Stevens, 1946, p. 677). Although commonly used in introductory measurement and statistics texts, this definition reflects a rather limited view. Measurement is more than just the assignment of numbers according to rules (i.e., labeling); it is a process by which an attempt is made to understand the nature of a variable (cf. Bridgman, 1928). Moreover, whether the process results in numeric values with inherent properties or the identification of different classes depends on whether we conceptualize the variable of interest as continuous or categorical. IRT provides one particular mathematical technique for performing measurement in which the variable is considered to be continuous in nature. Measurement For a simple example of measurement as a process, imagine that a researcher is interested in measuring generalized anxiety. Anxiety may be loosely defined as feelings that may range from general uneasiness to incapacitating attacks of terror. Because the very nature of 1

2 2 THE THEORY AND PRACTICE OF ITEM RESPONSE THEORY anxiety involves feelings, it is not possible to directly observe anxiety. As such, anxiety is an unobservable or latent variable or construct. The measurement process involves deciding whether our latent variable, anxiety, should be conceptualized as categorical, continuous, or both. In the categorical case we would classify individuals into qualitatively different latent groups so that, for example, one group may be interpreted as representing individuals with incapacitating anxiety and another group representing individuals without anxiety. In this conceptualization the persons differ from one another in kind on the latent variable. Typically, these latent categories are referred to as latent classes. Alternatively, anxiety could be conceptualized as continuous. From this perspective, individuals differ from one another in their quantity of the latent variable. Thus, we might label the ends of the latent continuum as, say, high anxiety and low anxiety. When the latent variable is conceptualized as having categorical and continuous facets, then we have a combination of one or more latent classes and one or more latent continua. In this case, the latent classes are subpopulations that are homogeneous with respect to the variable of interest, but differ from one another in kind. Within each of these classes there is a latent continuum on which the individuals within the class may be located. For example, assume that our sample of respondents consists of two classes. One class could consist of individuals whose anxiety is so severe that they suffer from incapacitating attacks of terror. As such, these individuals are so qualitatively different from other persons that they need to be addressed separately from those whose anxiety is not so severe. Therefore, the second class contains individuals who do not suffer from incapacitating attacks of terror. Within each of these classes we have a latent continuum on which we locate the class s respondents. Although we cannot observe our latent variable, its existence may be inferred from behavioral manifestations or manifest variables (e.g., restlessness, sleeping difficulties, headaches, trembling, muscle tension, item responses, self-reports). These manifestations allow for several different approaches to measuring generalized anxiety. For example, one approach may involve physiological assessment via an electromyogram of the degree of muscle tension. Other approaches might involve recording the number of hours spent sleeping or the frequency and duration of headaches, using a galvanic skin response (GSR) feedback device to assess sweat gland activity, or more psychological approaches, such as asking a series of questions. These approaches, either individually or collectively, provide our operational definition of generalized anxiety (Bridgman, 1928). That is, our operational definition specifies how we go about collecting our observations (i.e., the latent variable s manifestations). Stated concisely, our interest is in our latent variable and its operational definition is a means to that end. The measurement process, so far, has involved our conceptualization of the latent variable s nature and its operational definition. We also need to decide on the correspondence between our observations of the individuals anxiety levels and their locations on the continuum and/or in a class. In general, scaling is the process of establishing the correspondence between the observation data and the persons locations on the latent variable. Once we have our individuals located on the latent variable, we can then compare them to one another. IRT is one approach to establishing this correspondence between the observation data and the persons locations on the latent variable. Examples of other relevant scaling processes are Guttman Scalogram analysis (Guttman, 1950), Coombs Unfolding (Coombs,

3 Introduction to Measurement ), and the various Thurstone approaches (Thurstone, 1925, 1928, 1938). Alternative scaling approaches may be found in Dunn-Rankin, Knezek, Wallace, and Zhang (2004), Gulliksen (1987), Maranell (1974), and Nunnally and Bernstein (1994). Some Measurement Issues Before proceeding to discuss various latent variable methods for scaling our observations, we need to discuss four issues. The first issue involves the consistency of the measures. By way of analogy, assume that we are measuring the length of a box. If our repeated measurements of the length of the box were constant, then these measurements would be considered to be highly consistent or to have high reliability. However, if these repeated measurements varied wildly from one another, then they would be considered to have low consistency or to have low reliability. In the former case, our measurements would have a small amount of error, whereas in the latter they would have a comparatively larger amount of error. The consistency (or lack thereof) would affect our confidence in the measurements. That is, in the first scenario we would have greater confidence in our measurements than in the second scenario. The second issue concerns the validity of the measures. Although there are various types of validity, we define validity as the degree to which our measures are actually manifestations of the latent variable. As a contrarian example, assume we use the frequency and duration of headaches approach for measuring anxiety. Although some persons may recognize that there might be a relationship between frequency and duration of headaches and anxiety level, they may not consider this approach, in and of itself, to be an accurate representation of anxiety. In short, simply because we make a measure does not mean that the measure necessarily results in an accurate reflection of the variable of theoretical interest (i.e., our measurements may or may not have validity). A necessary, but not sufficient condition for our measurements to have validity is that they possess a high degree of reliability. Therefore, it is necessary to be concerned not only with the consistency of our measurements, but also with their validity. Obtaining validity evidence is part of the measurement process. The third issue concerns a desirable property we would like our measurements to possess. Thurstone (1928) noted that a measuring instrument must not be seriously affected in its measuring function by the object of measurement. In other words, we would like our measurement instrument to be independent of what it is we are measuring. If this is true, then the instrument possesses the property of invariance. For instance, if we measure the size of a shoe box by using a meter stick, then the measurement instrument (i.e., the meter stick) is not affected by and is independent of which box is measured. Contrast this with the situation in which measuring a shoe box s size is done not by using a meter stick, but by stretching a string along the shortest dimension of the box and cutting the string so that its length equals the shortest dimension. This string would serve as our measurement instrument and we would use it to measure the other two dimensions of the box. In short, the measurements would be multiples of the shortest dimension. Then suppose we use this approach to measure a cereal box. That is, for the cereal box its shortest dimension is used

4 4 THE THEORY AND PRACTICE OF ITEM RESPONSE THEORY to define the measurement instrument. Obviously, the box we are measuring affects our measurement instrument and our measurements would not possess the invariance property. Without invariance our comparisons across different boxes would have limited utility. The final issue we present brings us back to the classic definition of measurement mentioned above. Depending on which approach we use to measure anxiety (i.e., GSR, duration of headache, item responses, etc.), the measurements have certain inherent properties that affect how we interpret their information. For instance, the duration of headaches approach produces measurements that cannot be negative and that allow us to make comparative statements among people as well as to determine whether a person has a headache. These properties are a reflection of the fact that the measurements have not only a constant unit, but also a (absolute) zero point that reflects the absence of what is being measured. Invoking Stevens s (1946) levels of measurement taxonomy or Coombs s (1974) taxonomy these numbers would reflect a ratio scale. In contrast, if we use a GSR device for measuring anxiety we would need to establish a baseline or a zero point by canceling out an individual s normal skin resistance static level before we measure the person s GSR. As a result, and unlike that of the ratio scale, this zero point is not an absolute zero, but rather a relative one. However, all of our measurements would still have a constant unit and would be considered to be on an interval scale. Another approach to measuring anxiety is to ask an individual to rate his or her anxiety in terms of severity. This ratings approach would produce numbers that are on an ordinal scale. These approaches allow us to make comparative statements, such as This person s anxiety level is greater than (or less than) that of another, or in the case of the interval scale, This person s anxiety level is half as severe as that person s anxiety level. Alternatively, if our question simply requires the respondent to reply yes, he or she is experiencing a symptom, or no, he or she is not, then the yes/no responses would reflect a nominal scale. These various scenarios show that how we interpret and use our data needs to take into account the different types of information that the observations carry. In the following discussion we present three approaches for establishing a correspondence between our observations and our latent variable. We begin by briefly introducing IRT, followed by classical test theory (CTT). Both of these approaches assume that the latent variable is continuous. The last approach discussed, latent class analysis (LCA), is appropriate for categorical latent variables. Appendix E, Mixture Models, addresses the situation when a latent variable is conceptualized as having categorical and continuous facets. Item Response Theory Theory is used here in the sense that it is a paradigm that attempts to explain all the facts with which it can be confronted (Kuhn, 1970, p. 18). IRT is, in effect, a system of models that defines one way of establishing the correspondence between latent variables and their manifestations. It is not a theory in the traditional sense because it does not explain why a person provides a particular response to an item or how the person decides what to answer (cf. Falmagne, 1989). Instead, IRT is like the theory of statistical estimation. IRT uses latent characterizations of individuals and items as predictors of observed responses. Although

5 Introduction to Measurement 5 some researchers (e.g., Embretson, 1984; Fischer & Formann, 1982) have attempted to use item characteristics to explain why an item is located at a particular point, for the most part, IRT like other scaling methods (e.g., Guttman Scalogram, Coombs Unfolding) treats the individual as a black box. (See Appendix E, Linear Logistic Test Model [LLTM], for a brief presentation of one of these explanatory approaches, as well as De Boeck & Wilson [2004] for alternative approaches.) The cognitive processes used by an individual to respond to an item are not modeled in the commonly used IRT models. In short, this approach is analogous to measuring the speed of an automobile without understanding how an automobile moves. 1 In IRT persons and items are located on the same continuum. Most IRT models assume that the latent variable is represented by a unidimensional continuum. In addition, for an item to have any utility it must be able to differentiate among persons located at different points along a continuum. An item s capacity to differentiate among persons reduces our uncertainty about their locations. This capacity to differentiate among people with different locations may be held constant or allowed to vary across an instrument s items. Therefore, individuals are characterized in terms of their locations on the latent variable and, at a minimum, items are characterized with respect to their locations and capacity to discriminate among persons. The gist of IRT is the (logistic or multinomial) regression of observed item responses on the persons locations on the latent variable and the item s latent characterizations. Classical Test Theory Like IRT, classical test theory (CTT) or true score theory also assumes that the latent variable is continuous. CTT is the approach that most readers have been exposed to throughout their education. In contrast to IRT in which the item is the unit of focus, in CTT the respondent s observed score on a whole instrument is the unit of focus. The individual s observed score, X, is (typically) the unweighted sum of the person s responses to an instrument s items. In ability or achievement assessment this sum reflects the number of correct responses. CTT is based on the true score model. This model relates the individual s observed score to his or her location on the latent variable. To understand this model, assume that an individual is administered an instrument an infinite independent number of times. On each of these administrations we calculate the individual s observed score. The mean of the infinite number of observed scores is the expectation of the observed scores (i.e., µ i = E(X i )). On any given administration of the instrument the person s observed score will not exactly agree with the mean, µ, of the observed scores. This difference between the observed score and the mean is considered to be error. Symbolically, we may write the relationship between person i s observed score, the expectation, and error as X i = µ i + Ε i (1.1) where Ε i is the error score or the error of measurement (i.e., Ε i = X i µ i ); Ε is the capital Greek letter epsilon. Equation 1.1 is known as the true score model. In words, this model states

6 6 THE THEORY AND PRACTICE OF ITEM RESPONSE THEORY that person i s observed performance on an instrument is a function of his or her expected performance on the instrument plus error. Given that the error scores are considered to be random and that µ i = E(X i ), then it follows that the mean error for an individual across the infinite number of independent administrations of the instrument is zero. By convention µ i is typically represented by the Latin or Roman letter T. However, to be consistent with our use of Greek letters to symbolize parameters, we use the capital Greek letter tau, Τ. The symbol Τ represents the person s true score (i.e., Τ i = E(X i )). The term true score should not be interpreted as indicating truth in any way. As such, true score is a misnomer. To avoid this possible misinterpretation, we refer to Τ i (or µ i ) as individual i s trait score. 2 A person s trait score represents his or her location on the latent variable of interest and is fixed for an individual and instrument. The common representation of the model in Equation 1.1 is X i = Τ i + Ε i (1.2) Although Equation 1.1 may be considered more informative than Equation 1.2, we follow the convention of using Τ for the trait score. There is a functional relationship between the IRT person latent trait (θ) and the CTT person trait characterization. This relationship is based on the assumption of parallel forms for an instrument. That is, each item has the same response function on all the forms. Following Lord and Novick (1968), assume that we administer an infinite number of independent parallel forms of an instrument to an individual. Then the expected proportion of 1s or expected trait score, EΤ, across these parallel forms is equal to the average probability of a response of 1 on the instrument, given the person s latent trait and an IRT model. As a consequence, the IRT θ is the same as the expected proportion EΤ except for the difference in their scales of measurement. That is, θ has a range of to, whereas for EΤ the range is 0 to 1. The expected trait score EΤ is related to the IRT latent trait by a monotonic increasing transformation. This transformation is discussed in Chapters 4 and 10. In addition to the true score model, CTT is based on a set of assumptions. These assumptions are that, in the population, (1) the errors are uncorrelated with the trait scores for an instrument, (2) the errors on one instrument are uncorrelated with the trait scores on a different instrument, and (3) the errors on one instrument are uncorrelated with the error scores on a different instrument. These assumptions are considered to be weak assumptions because they are likely to be met by the data. In contrast, IRT is based on strong assumptions. 3 These IRT assumptions are discussed in the following chapter. These CTT assumptions and the model given in Equation 1.1 (or Equation 1.2) form the basis of the psychometric concept of reliability and the validity coefficient. For example, the correlation of the observed scores on an instrument and the corresponding trait scores is the index of reliability for the instrument. Moreover, using the variances of the trait scores ( σ 2 T ) and observed scores ( σ 2 X ) we can obtain the population reliability of an instrument s scores: ρ XX = σ σ 2 T 2 X (1.3)

7 Introduction to Measurement 7 Because trait score variance is unknown, we can only estimate ρ XX. Some of the traditional approaches for estimating reliability are KR-20, KR-21, and coefficient alpha. An assessment of the variance of the errors of measurement in any set of observed scores may be obtained by substituting σ 2 T = σ 2 X σ 2 E into Equation 1.3 to get σ 2 2 E = σx( 1 ρxx ) (1.4) The square root of σ 2 E (i.e., σ E )is referred to as the standard error of measurement. The standard error of measurement is the standard deviation of the errors of measurement associated with the observed scores for a particular group of respondents. From the foregoing it should be clear that because an individual s trait score is latent and unknown, then the error associated with an observed score is also unknown. Therefore, Equations 1.1 and 1.2 have two unknown quantities, an individual s trait score and error score. Lord (1980) points out that the model given in Equations 1.1 or 1.2 cannot be disproved by any set of data. As a result, one difference between IRT and CTT is that with IRT we can engage in model data fit analysis, whereas in CTT we do not examine model data fit and simply assume the model to be true. As may be obvious, the observed score X is influenced by the instrument s characteristics. For example, assume a proficiency testing situation. An easy test administered an infinite number of independent times to an individual will yield a different value of Τ i than a difficult test administered an infinite number of independent times to the same individual. This is analogous to the example of measuring the shoe and cereal boxes by using the shortest dimension of each box. In short, as is the case with the box example, in CTT person measurement is dependent on the instrument s characteristics. Moreover, because the variance of the sample s observed scores appears in both Equations 1.3 and 1.4, one may deduce that the heterogeneity (or lack thereof) of the observed scores affects both reliability and the standard error of measurement. Moreover, Equations 1.3 and 1.4 cannot be considered to solely be properties of the instrument, but rather also reflect the sample s characteristics. In short, the instrument s characteristics affect the person scores and sample characteristics affect the quantitative indices of the instrument (e.g., item difficulty and discrimination, reliability, etc.). Thus, Thurstone s (1928) idea of invariance does not exist in CTT. In contrast, with IRT it is possible to have invariance of both person and item characterizations. See Appendix E, Dependency in Traditional Analysis Statistics and Observed Scores, for a demonstration of this lack of invariance with CTT. In addition, Gulliksen (1987) contains detailed information on CTT and Engelhard (1994, 2008) presents a historical view of invariance. Latent Class Analysis Unlike IRT s premise of a continuous latent variable, in latent class analysis (LCA) the latent variable is assumed to be categorical. That is, the latent variable consists of a set of mutually exclusive and exhaustive latent classes. 4 To be more specific, there exists a set of latent classes, such that the manifest relationship between any two or more items on a test can be

8 8 THE THEORY AND PRACTICE OF ITEM RESPONSE THEORY accounted for by the existence of these basic classes and by these alone (Stouffer, 1950, p. 6). In LCA the comparison of individuals involves comparing their latent class memberships rather than comparing their locations on a continuous latent variable. For an understanding of the nature of a categorical latent variable, we turn to two empirical studies. The first is a study of the nosologic structure of psychotic illness by Kendler, Karkowski, and Walsh (1998). In their study, these authors conceptualized this latent variable as categorical. Their LCA showed that their participants belonged to one of six classes: (1) classic schizophrenia, (2) major depression, (3) schizophreniform disorder, (4) bipolarschizomania, (5) schizodepression, and (6) hebephrenia. The second example involves cheating on academic examinations (Dayton & Scheers, 1997); the latent variable is cheating. The LCA of the investigators data revealed a structure with two latent classes. One class consisted of persons who were persistent cheaters, whereas the second class consisted of individuals who would either exhibit opportunistic cheating or might not cheat at all. In both of these examples, we can see that respondents differ from one another on the latent variable in terms of their class membership rather than in terms of their locations on a continuum. In Appendix E, Mixture Models, we discuss an approach in which we combine LCA and IRT. That is, we can conceptualize academic performance as involving both latent classes and continua. For example, we can have one class of persistent cheaters and another class of noncheaters. Within each class there is a proficiency variable continuum. Therefore, the cheater class has its own continuum on which we can compare individual performances. Similarly, the noncheater class has a separate continuum that we use to compare the noncheaters performances with one another. These latter comparisons are not contaminated by the cheaters performances and the noncheaters are not disadvantaged by the presence of the cheaters. In general, LCA determines the number of latent classes that best explains the data (i.e., determining the latent class structure). This process involves comparing models that vary in their respective number of classes (e.g., a one-class model, a two-class model, and so on). Determining the latent class structure involves not only statistical tests of fit, but also the interpretability of the solution. With each class structure one has estimates of the items characteristics. Based on these item characteristics and the individuals responses, the respondents are assigned to one of the latent classes. Subsequent to this assignment, we obtain estimates of the relative size of each class. These relative sizes are known as the latent class proportions (πνs, where ν is the latent class index). The sum of the latent class proportions across the latent classes is constrained to 1.0. For example, if the latent variable, say algebra proficiency, has a two-class structure, then π 1 might equal 0.65 and π 2 = = Moreover, our latent classes interpretation may reveal that the larger class (i.e., π 1 = 0.65) consists of persons who have mastered algebraic problems, whereas the other class consists of individuals who have not mastered the problems. In short, the data s latent structure consists of masters and nonmasters. One may conceive of a situation in which if one had a large number of latent classes and if they were ordered, then there would be little difference between conceptualizing the latent variable as continuous or as categorical. In point of fact, a latent class model with a sufficient number of latent classes is equivalent to an IRT model. For example, for a data set with four items, then a latent class model with at least three latent classes would provide equivalent

9 Introduction to Measurement 9 item characterizations as an IRT model that uses only item location parameters. Appendix E, Mixture Models, contains additional information about LCA. Summary Typically, measurement is viewed as analogous to using a ruler to measure the length of an object. In effect, this is analogous to Stevens s (1946) definition of measurement in that the ruler provides the rules and the numeric values associated with the ruler s tick marks provide the numeric labels. However, Stevens s definition invites misinterpretation. Although one can infer from his definition that he is describing an act or a process, this aspect of the definition is not made salient. Moreover, by focusing only on the assignment of numbers, one is left with the impression that measurement results in only a set of numeric labels. We consider measurement to be a process by which one attempts to understand the nature of a variable by applying mathematical techniques. The result may or may not be numeric labels and may or may not involve a continuous variable. For example, LCA is a measurement paradigm that allows one to understand the nature of a latent variable, such as ethnocentrism or test anxiety, without resulting in numeric labels. The use of LCAs involves the application of mathematical techniques that results in individuals being classified into latent classes and an assessment of how well the class structure describes the manifest data. The term manifest data refers to the information obtained by direct observation, whereas the term latent refers to the information obtained on the basis of additional assumptions and/or by making inferences from the original (manifest) data (Lazarsfeld, 1950). Presumably, one or more latent variables can account for the patterns or relationships that are evident in the manifest data. Therefore, a manifest variable is an observed manifestation of one or more latent (i.e., unobservable) variables. We outlined different paradigms that allow the tools of mathematics to be applied to explaining manifest observations from a latent variable perspective. A latent variable may be conceptualized as continuous, categorical, or some combination of the two. When the variable is conceptualized as continuous, then the use of CTT or IRT may be the appropriate mathematical technique. However, if the variable is conceptualized as categorical, then LCA may be the most appropriate psychometric method to use. It is possible to conceptualize the latent space as a set of latent classes, within each of which there is a continuum, or as a combination of latent classes and a latent continuum. In this situation a mixture of IRT and LCA may be considered an appropriate representation of the latent space. As part of measurement it is necessary to operationalize the variable(s) of interest (i.e., provide operational definitions). The measurement process also involves assessing how much information the measures yield about the participants (e.g., reliability) as well as how well the measures reflect the latent variable(s) (i.e., validity). When IRT is appropriate and when there is model data fit, then IRT offers advantages over CTT. For instance, with IRT our person location estimates are invariant with respect to the instrument, the precision of these estimates is known at the individual level and not just at the group level (as is the case with Equation 1.4), and the item parameter estimates

10 10 THE THEORY AND PRACTICE OF ITEM RESPONSE THEORY transcend the particular sample used in their estimation. Moreover, unlike CTT, with IRT we are able to make predictive statements about respondents performance as well as examine the tenability of the model vis-à-vis the data. In the next chapter the simplest of the IRT models is presented. This model, the Rasch or one-parameter logistic model, contains a single parameter that characterizes the item s location on the latent variable continuum. We show how this single-item parameter can be used to estimate a respondent s location on a latent variable. Notes 1. Although understanding how the automobile moves is not necessary in order to measure the speed with which it moves, nonetheless fully understanding how the automobile moves can lead to an improved measurement process. 2. A trait is exemplified primarily in the things that a person can do (Thurstone, 1947) and is any distinguishable, relatively enduring way in which one individual varies from another (Guilford, 1959, p. 6). However, we do not consider traits to be rigidly fixed or predetermined (see Anastasi, 1983). 3. There is an implicit unidimensionality assumption in CTT. That is, for observed scores to have any meaning they need to represent the sum of responses to items that measure the same thing. For instance, assume that an examination consists of five spelling questions and five single-digit addition problems. Presumably our examination data would consist of two dimensions representing spelling and addition proficiencies. If a person had an observed score of 5, it would not be possible to determine whether he or she is perfect in spelling, perfect in addition, or in some combination of spelling and addition proficiencies. In this case, the observed score has no intrinsic meaning. In contrast, if the examination consists of only spelling questions, then the score would indicate how well a person could spell the questions on the test and would have intrinsic meaning. 4. Both IRT and LCA can be considered to be special instances of the general theoretical framework for modeling categorical variables, known as latent structure analysis (LSA; Lazarsfeld, 1950). Moreover, both linear and nonlinear factor analysis may be regarded as special cases of LSA (McDonald, 1967). Copyright 2009 The Guilford Press. All rights reserved under International Copyright Convention. No part of this text may be reproduced, transmitted, downloaded, or stored in or introduced into any information storage or retrieval system, in any form or by any means, whether electronic or mechanical, now known or hereinafter invented, without the written permission of The Guilford Press. Guilford Publications, 72 Spring Street, New York, NY 10012,

VARIABLES AND MEASUREMENT

VARIABLES AND MEASUREMENT ARTHUR SYC 204 (EXERIMENTAL SYCHOLOGY) 16A LECTURE NOTES [01/29/16] VARIABLES AND MEASUREMENT AGE 1 Topic #3 VARIABLES AND MEASUREMENT VARIABLES Some definitions of variables include the following: 1.

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Connexion of Item Response Theory to Decision Making in Chess Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Acknowledgement A few Slides have been taken from the following presentation

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS

IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS IDENTIFYING DATA CONDITIONS TO ENHANCE SUBSCALE SCORE ACCURACY BASED ON VARIOUS PSYCHOMETRIC MODELS A Dissertation Presented to The Academic Faculty by HeaWon Jun In Partial Fulfillment of the Requirements

More information

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such

More information

computation and interpretation of indices of reliability. In

computation and interpretation of indices of reliability. In THE CONCEPTS OF RELIABILITY AND HOMOGENEITY C. H. COOMBS 1 University of Michigan I. Introduction THE literature of test theory is replete with articles on the computation and interpretation of indices

More information

Introduction On Assessing Agreement With Continuous Measurement

Introduction On Assessing Agreement With Continuous Measurement Introduction On Assessing Agreement With Continuous Measurement Huiman X. Barnhart, Michael Haber, Lawrence I. Lin 1 Introduction In social, behavioral, physical, biological and medical sciences, reliable

More information

1 The conceptual underpinnings of statistical power

1 The conceptual underpinnings of statistical power 1 The conceptual underpinnings of statistical power The importance of statistical power As currently practiced in the social and health sciences, inferential statistics rest solidly upon two pillars: statistical

More information

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE

ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION SCALE California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of Graduate Studies 6-2016 ITEM RESPONSE THEORY ANALYSIS OF THE TOP LEADERSHIP DIRECTION

More information

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form INVESTIGATING FIT WITH THE RASCH MODEL Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form of multidimensionality. The settings in which measurement

More information

Bayesian and Frequentist Approaches

Bayesian and Frequentist Approaches Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law

More information

Chapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models.

Chapter 1 Introduction. Measurement Theory. broadest sense and not, as it is sometimes used, as a proxy for deterministic models. Ostini & Nering - Chapter 1 - Page 1 POLYTOMOUS ITEM RESPONSE THEORY MODELS Chapter 1 Introduction Measurement Theory Mathematical models have been found to be very useful tools in the process of human

More information

Statistical Methods and Reasoning for the Clinical Sciences

Statistical Methods and Reasoning for the Clinical Sciences Statistical Methods and Reasoning for the Clinical Sciences Evidence-Based Practice Eiki B. Satake, PhD Contents Preface Introduction to Evidence-Based Statistics: Philosophical Foundation and Preliminaries

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

On the purpose of testing:

On the purpose of testing: Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase

More information

Structural Equation Modeling (SEM)

Structural Equation Modeling (SEM) Structural Equation Modeling (SEM) Today s topics The Big Picture of SEM What to do (and what NOT to do) when SEM breaks for you Single indicator (ASU) models Parceling indicators Using single factor scores

More information

Speaker Notes: Qualitative Comparative Analysis (QCA) in Implementation Studies

Speaker Notes: Qualitative Comparative Analysis (QCA) in Implementation Studies Speaker Notes: Qualitative Comparative Analysis (QCA) in Implementation Studies PART 1: OVERVIEW Slide 1: Overview Welcome to Qualitative Comparative Analysis in Implementation Studies. This narrated powerpoint

More information

Fixed-Effect Versus Random-Effects Models

Fixed-Effect Versus Random-Effects Models PART 3 Fixed-Effect Versus Random-Effects Models Introduction to Meta-Analysis. Michael Borenstein, L. V. Hedges, J. P. T. Higgins and H. R. Rothstein 2009 John Wiley & Sons, Ltd. ISBN: 978-0-470-05724-7

More information

UvA-DARE (Digital Academic Repository) Statistical evaluation of binary measurement systems Erdmann, T.P. Link to publication

UvA-DARE (Digital Academic Repository) Statistical evaluation of binary measurement systems Erdmann, T.P. Link to publication UvA-DARE (Digital Academic Repository) Statistical evaluation of binary measurement systems Erdmann, T.P. Link to publication Citation for published version (APA): Erdmann, T. P. (2012). Statistical evaluation

More information

Development, Standardization and Application of

Development, Standardization and Application of American Journal of Educational Research, 2018, Vol. 6, No. 3, 238-257 Available online at http://pubs.sciepub.com/education/6/3/11 Science and Education Publishing DOI:10.12691/education-6-3-11 Development,

More information

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL JOURNAL OF EDUCATIONAL MEASUREMENT VOL. II, NO, 2 FALL 1974 THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL SUSAN E. WHITELY' AND RENE V. DAWIS 2 University of Minnesota Although it has been claimed that

More information

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE 1. When you assert that it is improbable that the mean intelligence test score of a particular group is 100, you are using. a. descriptive

More information

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky Validating Measures of Self Control via Rasch Measurement Jonathan Hasford Department of Marketing, University of Kentucky Kelly D. Bradley Department of Educational Policy Studies & Evaluation, University

More information

Variability. After reading this chapter, you should be able to do the following:

Variability. After reading this chapter, you should be able to do the following: LEARIG OBJECTIVES C H A P T E R 3 Variability After reading this chapter, you should be able to do the following: Explain what the standard deviation measures Compute the variance and the standard deviation

More information

Evaluating the quality of analytic ratings with Mokken scaling

Evaluating the quality of analytic ratings with Mokken scaling Psychological Test and Assessment Modeling, Volume 57, 2015 (3), 423-444 Evaluating the quality of analytic ratings with Mokken scaling Stefanie A. Wind 1 Abstract Greatly influenced by the work of Rasch

More information

Basic Concepts in Research and DATA Analysis

Basic Concepts in Research and DATA Analysis Basic Concepts in Research and DATA Analysis 1 Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...2 The Research Question...3 The Hypothesis...3 Defining the

More information

Introduction to Test Theory & Historical Perspectives

Introduction to Test Theory & Historical Perspectives Introduction to Test Theory & Historical Perspectives Measurement Methods in Psychological Research Lecture 2 02/06/2007 01/31/2006 Today s Lecture General introduction to test theory/what we will cover

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

The Influence of Test Characteristics on the Detection of Aberrant Response Patterns

The Influence of Test Characteristics on the Detection of Aberrant Response Patterns The Influence of Test Characteristics on the Detection of Aberrant Response Patterns Steven P. Reise University of California, Riverside Allan M. Due University of Minnesota Statistical methods to assess

More information

Having your cake and eating it too: multiple dimensions and a composite

Having your cake and eating it too: multiple dimensions and a composite Having your cake and eating it too: multiple dimensions and a composite Perman Gochyyev and Mark Wilson UC Berkeley BEAR Seminar October, 2018 outline Motivating example Different modeling approaches Composite

More information

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison Empowered by Psychometrics The Fundamentals of Psychometrics Jim Wollack University of Wisconsin Madison Psycho-what? Psychometrics is the field of study concerned with the measurement of mental and psychological

More information

alternate-form reliability The degree to which two or more versions of the same test correlate with one another. In clinical studies in which a given function is going to be tested more than once over

More information

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1: Research Methods 1 Handouts, Graham Hole,COGS - version 10, September 000: Page 1: T-TESTS: When to use a t-test: The simplest experimental design is to have two conditions: an "experimental" condition

More information

Latent Variable Modeling - PUBH Latent variable measurement models and path analysis

Latent Variable Modeling - PUBH Latent variable measurement models and path analysis Latent Variable Modeling - PUBH 7435 Improved Name: Latent variable measurement models and path analysis Slide 9:45 - :00 Tuesday and Thursday Fall 2006 Melanie M. Wall Division of Biostatistics School

More information

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016 The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016 This course does not cover how to perform statistical tests on SPSS or any other computer program. There are several courses

More information

MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION

MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION Variables In the social sciences data are the observed and/or measured characteristics of individuals and groups

More information

CHAPTER 3 RESEARCH METHODOLOGY

CHAPTER 3 RESEARCH METHODOLOGY CHAPTER 3 RESEARCH METHODOLOGY 3.1 Introduction 3.1 Methodology 3.1.1 Research Design 3.1. Research Framework Design 3.1.3 Research Instrument 3.1.4 Validity of Questionnaire 3.1.5 Statistical Measurement

More information

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items

Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations May 215 Using Differential Item Functioning to Test for Inter-rater Reliability in Constructed Response Items Tamara Beth

More information

(entry, )

(entry, ) http://www.eolss.net (entry, 6.27.3.4) Reprint of: THE CONSTRUCTION AND USE OF PSYCHOLOGICAL TESTS AND MEASURES Bruno D. Zumbo, Michaela N. Gelin, & Anita M. Hubley The University of British Columbia,

More information

The effects of ordinal data on coefficient alpha

The effects of ordinal data on coefficient alpha James Madison University JMU Scholarly Commons Masters Theses The Graduate School Spring 2015 The effects of ordinal data on coefficient alpha Kathryn E. Pinder James Madison University Follow this and

More information

ADMS Sampling Technique and Survey Studies

ADMS Sampling Technique and Survey Studies Principles of Measurement Measurement As a way of understanding, evaluating, and differentiating characteristics Provides a mechanism to achieve precision in this understanding, the extent or quality As

More information

APPLYING THE RASCH MODEL TO PSYCHO-SOCIAL MEASUREMENT A PRACTICAL APPROACH

APPLYING THE RASCH MODEL TO PSYCHO-SOCIAL MEASUREMENT A PRACTICAL APPROACH APPLYING THE RASCH MODEL TO PSYCHO-SOCIAL MEASUREMENT A PRACTICAL APPROACH Margaret Wu & Ray Adams Documents supplied on behalf of the authors by Educational Measurement Solutions TABLE OF CONTENT CHAPTER

More information

Denny Borsboom Jaap van Heerden Gideon J. Mellenbergh

Denny Borsboom Jaap van Heerden Gideon J. Mellenbergh Validity and Truth Denny Borsboom Jaap van Heerden Gideon J. Mellenbergh Department of Psychology, University of Amsterdam ml borsboom.d@macmail.psy.uva.nl Summary. This paper analyzes the semantics of

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

Answers to end of chapter questions

Answers to end of chapter questions Answers to end of chapter questions Chapter 1 What are the three most important characteristics of QCA as a method of data analysis? QCA is (1) systematic, (2) flexible, and (3) it reduces data. What are

More information

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

A simulation study of person-fit in the Rasch model

A simulation study of person-fit in the Rasch model Psychological Test and Assessment Modeling, Volume 58, 2016 (3), 531-563 A simulation study of person-fit in the Rasch model Richard Artner 1 Abstract The validation of individual test scores in the Rasch

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy Industrial and Organizational Psychology, 3 (2010), 489 493. Copyright 2010 Society for Industrial and Organizational Psychology. 1754-9426/10 Issues That Should Not Be Overlooked in the Dominance Versus

More information

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape.

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape. MODULE 02: DESCRIBING DT SECTION C: KEY POINTS C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape. C-2:

More information

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013 Why is item analysis relevant? Item analysis provides

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction Whatever exists at all exists in some amount. To know it thoroughly involves knowing its quantity as well as its quality. Education is concerned with changes in human beings; a change

More information

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory Kate DeRoche, M.A. Mental Health Center of Denver Antonio Olmos, Ph.D. Mental Health

More information

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p )

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p ) Rasch Measurementt iin Language Educattiion Partt 2:: Measurementt Scalles and Invariiance by James Sick, Ed.D. (J. F. Oberlin University, Tokyo) Part 1 of this series presented an overview of Rasch measurement

More information

Comprehensive Statistical Analysis of a Mathematics Placement Test

Comprehensive Statistical Analysis of a Mathematics Placement Test Comprehensive Statistical Analysis of a Mathematics Placement Test Robert J. Hall Department of Educational Psychology Texas A&M University, USA (bobhall@tamu.edu) Eunju Jung Department of Educational

More information

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017 Learning objectives 1. Get familiar with the basic idea

More information

Introduction. 1.1 Facets of Measurement

Introduction. 1.1 Facets of Measurement 1 Introduction This chapter introduces the basic idea of many-facet Rasch measurement. Three examples of assessment procedures taken from the field of language testing illustrate its context of application.

More information

Chapter 1: Introduction to Statistics

Chapter 1: Introduction to Statistics Chapter 1: Introduction to Statistics Variables A variable is a characteristic or condition that can change or take on different values. Most research begins with a general question about the relationship

More information

9 research designs likely for PSYC 2100

9 research designs likely for PSYC 2100 9 research designs likely for PSYC 2100 1) 1 factor, 2 levels, 1 group (one group gets both treatment levels) related samples t-test (compare means of 2 levels only) 2) 1 factor, 2 levels, 2 groups (one

More information

Diagnostic Classification Models

Diagnostic Classification Models Diagnostic Classification Models Lecture #13 ICPSR Item Response Theory Workshop Lecture #13: 1of 86 Lecture Overview Key definitions Conceptual example Example uses of diagnostic models in education Classroom

More information

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX

Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX Paper 1766-2014 Parameter Estimation of Cognitive Attributes using the Crossed Random- Effects Linear Logistic Test Model with PROC GLIMMIX ABSTRACT Chunhua Cao, Yan Wang, Yi-Hsin Chen, Isaac Y. Li University

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

Lecture 1 An introduction to statistics in Ichthyology and Fisheries Science

Lecture 1 An introduction to statistics in Ichthyology and Fisheries Science Lecture 1 An introduction to statistics in Ichthyology and Fisheries Science What is statistics and why do we need it? Statistics attempts to make inferences about unknown values that are common to a population

More information

Chapter 14: More Powerful Statistical Methods

Chapter 14: More Powerful Statistical Methods Chapter 14: More Powerful Statistical Methods Most questions will be on correlation and regression analysis, but I would like you to know just basically what cluster analysis, factor analysis, and conjoint

More information

IAPT: Regression. Regression analyses

IAPT: Regression. Regression analyses Regression analyses IAPT: Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a student project

More information

Political Science 15, Winter 2014 Final Review

Political Science 15, Winter 2014 Final Review Political Science 15, Winter 2014 Final Review The major topics covered in class are listed below. You should also take a look at the readings listed on the class website. Studying Politics Scientifically

More information

Basic SPSS for Postgraduate

Basic SPSS for Postgraduate Basic SPSS for Postgraduate Dr. Shamshuritawati Sharif School of Quantitative Science Email : shamshurita@uum.edu.my Office : +6049286336 Mobile :+60194248001 In the process of carrying out the research,

More information

On cows and test construction

On cows and test construction On cows and test construction NIELS SMITS & KEES JAN KAN Research Institute of Child Development and Education University of Amsterdam, The Netherlands SEM WORKING GROUP AMSTERDAM 16/03/2018 Looking at

More information

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated

More information

Agents with Attitude: Exploring Coombs Unfolding Technique with Agent-Based Models

Agents with Attitude: Exploring Coombs Unfolding Technique with Agent-Based Models Int J Comput Math Learning (2009) 14:51 60 DOI 10.1007/s10758-008-9142-6 COMPUTER MATH SNAPHSHOTS - COLUMN EDITOR: URI WILENSKY* Agents with Attitude: Exploring Coombs Unfolding Technique with Agent-Based

More information

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc. Chapter 23 Inference About Means Copyright 2010 Pearson Education, Inc. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it d be nice to be able

More information

25. EXPLAINING VALIDITYAND RELIABILITY

25. EXPLAINING VALIDITYAND RELIABILITY 25. EXPLAINING VALIDITYAND RELIABILITY "Validity" and "reliability" are ubiquitous terms in social science measurement. They are prominent in the APA "Standards" (1985) and earn chapters in test theory

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

CHAPTER 4 THE QUESTIONNAIRE DESIGN /SOLUTION DESIGN. This chapter contains explanations that become a basic knowledge to create a good

CHAPTER 4 THE QUESTIONNAIRE DESIGN /SOLUTION DESIGN. This chapter contains explanations that become a basic knowledge to create a good CHAPTER 4 THE QUESTIONNAIRE DESIGN /SOLUTION DESIGN This chapter contains explanations that become a basic knowledge to create a good questionnaire which is able to meet its objective. Just like the thesis

More information

THE ROLE OF THE COMPUTER IN DATA ANALYSIS

THE ROLE OF THE COMPUTER IN DATA ANALYSIS CHAPTER ONE Introduction Welcome to the study of statistics! It has been our experience that many students face the prospect of taking a course in statistics with a great deal of anxiety, apprehension,

More information

Long Term: Systematically study children s understanding of mathematical equivalence and the ways in which it develops.

Long Term: Systematically study children s understanding of mathematical equivalence and the ways in which it develops. Long Term: Systematically study children s understanding of mathematical equivalence and the ways in which it develops. Short Term: Develop a valid and reliable measure of students level of understanding

More information

Carrying out an Empirical Project

Carrying out an Empirical Project Carrying out an Empirical Project Empirical Analysis & Style Hint Special program: Pre-training 1 Carrying out an Empirical Project 1. Posing a Question 2. Literature Review 3. Data Collection 4. Econometric

More information

Decision consistency and accuracy indices for the bifactor and testlet response theory models

Decision consistency and accuracy indices for the bifactor and testlet response theory models University of Iowa Iowa Research Online Theses and Dissertations Summer 2014 Decision consistency and accuracy indices for the bifactor and testlet response theory models Lee James LaFond University of

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Name Date Per Key Vocabulary: response variable explanatory variable independent variable dependent variable scatterplot positive association negative association linear correlation r-value regression

More information

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys Jill F. Kilanowski, PhD, APRN,CPNP Associate Professor Alpha Zeta & Mu Chi Acknowledgements Dr. Li Lin,

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

Measuring noncompliance in insurance benefit regulations with randomized response methods for multiple items

Measuring noncompliance in insurance benefit regulations with randomized response methods for multiple items Measuring noncompliance in insurance benefit regulations with randomized response methods for multiple items Ulf Böckenholt 1 and Peter G.M. van der Heijden 2 1 Faculty of Management, McGill University,

More information

On the Many Claims and Applications of the Latent Variable

On the Many Claims and Applications of the Latent Variable On the Many Claims and Applications of the Latent Variable Science is an attempt to exploit this contact between our minds and the world, and science is also motivated by the limitations that result from

More information

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data TECHNICAL REPORT Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data CONTENTS Executive Summary...1 Introduction...2 Overview of Data Analysis Concepts...2

More information

Chapter 4: Defining and Measuring Variables

Chapter 4: Defining and Measuring Variables Chapter 4: Defining and Measuring Variables A. LEARNING OUTCOMES. After studying this chapter students should be able to: Distinguish between qualitative and quantitative, discrete and continuous, and

More information

Differential Item Functioning

Differential Item Functioning Differential Item Functioning Lecture #11 ICPSR Item Response Theory Workshop Lecture #11: 1of 62 Lecture Overview Detection of Differential Item Functioning (DIF) Distinguish Bias from DIF Test vs. Item

More information

GMAC. Scaling Item Difficulty Estimates from Nonequivalent Groups

GMAC. Scaling Item Difficulty Estimates from Nonequivalent Groups GMAC Scaling Item Difficulty Estimates from Nonequivalent Groups Fanmin Guo, Lawrence Rudner, and Eileen Talento-Miller GMAC Research Reports RR-09-03 April 3, 2009 Abstract By placing item statistics

More information

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER Introduction, 639. Factor analysis, 639. Discriminant analysis, 644. INTRODUCTION

More information

Empirical Research Methods for Human-Computer Interaction. I. Scott MacKenzie Steven J. Castellucci

Empirical Research Methods for Human-Computer Interaction. I. Scott MacKenzie Steven J. Castellucci Empirical Research Methods for Human-Computer Interaction I. Scott MacKenzie Steven J. Castellucci 1 Topics The what, why, and how of empirical research Group participation in a real experiment Observations

More information

Reliability, validity, and all that jazz

Reliability, validity, and all that jazz Reliability, validity, and all that jazz Dylan Wiliam King s College London Introduction No measuring instrument is perfect. The most obvious problems relate to reliability. If we use a thermometer to

More information

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety

More information

André Cyr and Alexander Davies

André Cyr and Alexander Davies Item Response Theory and Latent variable modeling for surveys with complex sampling design The case of the National Longitudinal Survey of Children and Youth in Canada Background André Cyr and Alexander

More information