Tilburg University. Mokken scale analysis for polychotomous items Sijtsma, K.; Debets, P.; Molenaar, I.W. Published in: Quality & Quantity

Size: px
Start display at page:

Download "Tilburg University. Mokken scale analysis for polychotomous items Sijtsma, K.; Debets, P.; Molenaar, I.W. Published in: Quality & Quantity"

Transcription

1 Tilburg University Mokken scale analysis for polychotomous items Sijtsma, K.; Debets, P.; Molenaar, I.W. Published in: Quality & Quantity Publication date: 1990 Link to publication Citation for published version (APA): Sijtsma, K., Debets, P., & Molenaar, I. W. (1990). Mokken scale analysis for polychotomous items: Theory, a computer program and an empirical application. Quality & Quantity, 24(2), General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. - Users may download and print one copy of any publication from the public portal for the purpose of private study or research - You may not further distribute the material or use it for any profit-making activity or commercial gain - You may freely distribute the URL identifying the publication in the public portal Take down policy If you believe that this document breaches copyright, please contact us providing details, and we will remove access to the work immediately and investigate your claim. Download date: 29. Jun. 2018

2 Quality & Quantity 24: , Kluwer Academic Publishers. Printed in the Netherlands. Mokken scale analysis for polychotomous items: theory, a computer program and an empirical application K. SIJTSMA 1, P. DEBETS 2 & I.W. MOLENAAR 3 1Free University of Amsterdam, The Netherlands (Correspondence address: Sectie Arbeids- en Organisatiepsychologie, Vrije Universiteit, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands) 2University of Amsterdam, The Netherlands 3State University of Groningen, The Netherlands Abstract. This paper contains three subjects. First, an extension of Mokken's nonparametric item response models from dichotomous items to items with two or more ordered answer categories is proposed. Second, a computer program to analyze multicategory item scores is presented. This program is called MSP. The analyses by means of MSP are based on the multicategory extension of Mokken's theory. Finally, an application of MSP to empirical multicategory test data is presented in order to illuminate its possibilities. Introduction Several computer programs exist for the analysis of dichotomously scored test items according to the nonparametric item response models proposed by Mokken (1971); also see Mokken & Lewis (1982). Among these programs are MOKKEN SCALE and MOKKEN TEST in the package STAP and their stand alone versions MOKSC and MOKTST that were designed for mainframe computers during the seventies in The Netherlands (Niem611er, 1980; Niem611er & Van Schuur, 1980). Kingma and Reuvekamp (1986a, b) recently presented programs for microcomputers. The test models by Mokken and the programs based on these models are restricted to dichotomous item responses. Unless the researcher is willing to dichotomize his/her multicategory responses, which means throwing away information provided by examinees, another test model must be used to analyze the test data. Molenaar (1982, 1986) and Molenaar & Sijtsma (1988) have extended Mokken's theory to the general case of multicategory items with ordered response categories, dichotomous items being a special case. Based on this generalization, a computer program has been developed (Debets & Brouwer, 1986), which is suited both for mainframe and micro computers. This program, which is called MSP (Mokken Scale analysis for

3 174 K. Sijtsma et al. Polychotomous items), can be regarded as a generalization as well as a successor of the MOKKEN SCALE program mentioned above. In this paper we first give a brief introduction to the dichotomous Mokken approach, followed by a review of its multicategory extension. Second, relevant technical information is provided with respect to the program MSP. Finally, an example of an application of MSP to thirteen items having four response categories each is given. These data were originally analyzed by Middel & Van Schuur (1981) after a proper dichotomization. Results from their study are compared with our results. Mokken's approach for dichotomously scored items Mokken (1971) proposed his approach to item response theory for dichotomously scored test items only. Response categories are usually interpreted as either positive or negative with respect to the attribute measured by the test. It is assumed that the probability of a positive response on an item is a monotonely nondecreasing function of the attribute measured. This function is usually called the Item Characteristic Curve (ICC). A set of items having monotonely nondecreasing ICC's, and each being a measure of the same attribute, is called a monotonely homogeneous set. It can be shown that apart from ties, the expected order of the examinees on the latent attribute is the same for each selection of items from a monotonely homogeneous set. An additional restriction on the set of items is that their ICC's do not intersect. Such a set of items is called doubly monotone. Analogously to the "item free" order of subjects, it can be shown that the order of items according to their difficulty is the same within each subpopulation of persons from the population for which the model of double monotonicity holds. It may be noted that neither of the models discussed assumes a parametric definition of the ICC or of the distribution of the attribute across persons. Hence these models are called nonparametric. Mokken (1971) derived observable properties of both models. These properties can be used to check the empirical fit of these models to test data. Results based on these properties are output of the programs mentioned in the preceding section. Introductions to Mokken's approach are provided by, e.g., Stokman and Van Schuur (1980), Mokken and Lewis (1982), Niem611er and Van Schuur (1983) and Sijtsma (1988b). Empirical applications of the dichotomous Mokken models in test construction research are presented by, e.g., Mokken (1971), Henning and Six (1977), Middel and Van Schuur (1981) and by Gillespie et al. (1987).

4 Mokken scale analysis for polychotomous items 175 Mokken's approach extended to multieategory items Attitudes and personality traits are often measured by means of items with ordered answer categories. Rating scales are well known examples of such ordered categories. Molenaar (1982, 1986) has proposed an extension of Mokken's model from the dichotomous to the multicategory ordered case. This extension is discussed in the next three subsections. Nonparametric models for multicategory items The notion of the item step is central in the multicategory approach. As an example we mention an attitude item having three ordered response categories, e.g., disagree, neutral and agree. This item may be thought to consist of two dichotomous item steps. The first item step consists of the imaginary question whether the respondent agrees enough to the positively phrased attitude statement to take the step from disagree to neutral. If the imaginary answer is affirmative, the item step score equals one, and in the negative case it equals zero. The second step consists of the imaginary question whether the step can be made from neutral to agree. A positive answer again yields an item step score equal to one and a score equal to zero if the response is negative. The item score equals the sum of the two separate item step scores. It may be tempting to think that the multicategory models actually can be replaced by the dichotomous models for item steps instead of items. Item steps belonging to the same item are dependent however: a negative response on the first step implies negative responses on the subsequent steps of a fixed item, and a positive response on a specific step implies positive responses on the preceding steps. Such dependencies are not allowed in item response theories. Molenaar's (1982, 1986) approach is, therefore, based on the multicategory items. We introduce some notation to formalize the definitions of item steps, their difficulties and the Item Step Characteristic Curve (ISCC). First, we assume that each of k items in a test consists of m + 1 ordered answer categories. This means that each item can be viewed as a sequence of m imaginary dichotomous item steps ordered along the latent continuum. The attribute values on this continuum are denoted by ~:. Furthermore the score on item i is denoted by X~, and the score on step g of item i is denoted by Ygi. The item score takes integer values ranging from 0 to m, while Ygi takes the values zero (item step is failed) or one (item step is passed). The relation between Xi and Ygi is

5 176 K. Sijtsma et al. X, = ~] Yg,. (1) g=l Second, we assume that each person in a population of interest is characterized by a position on the latent continuum ~, and that the cumulative distribution function of ~ within this population is denoted by G(~). Third, we assume that unidimensionality of measurement holds across the items in a test, as well as local stochastic independence of all item responses of fixed persons. The ISCC may now be defined as follows, denoting probabilities by "rr: Wgi(~) = Prob(Yg, = 11~) = Prob(Xi I> gl~). (2) The ISCC thus gives the conditional probability that the item step score equals one, or that the item score equals or exceeds the value g. In the multicategory model of monotone homogeneity, the ISCC is a nondecreasing function of ~. In the multicategory model of double monotonicity, in addition the ISCC's do not intersect. The difficulty of an item step can be obtained from ~rg~ = fe 7rg,(~:) dg(~:) = Prob(Ygi = 1) = Prob(Xi >/g). (3) This is the proportion of persons having an item step score equal to one. These persons have item scores equal to Xi = g, g + 1,..., m. In the sample, ~rgi is estimated by means of "~gi = ~ nm/n, h=g (4) where n denotes the sample size, and nhi denotes the number of persons in the sample having an item score Xi = h. Since ~ can not be estimated, we may replace it by the true score from classical test theory, which has the same ordering as ~. The true score is estimated by the test score X which is the unweighted sum of all item scores X~ in a test. For dichotomous items it follows from Grayson (1988) that apart from sampling errors, X and ~ are monotonely related. A heuristic method for checking monotone homogeneity may thus be obtained in the following way. First, all score groups are constructed, where each score group contains all persons having the same test score X. Second, for a specific item step the difficulty #8~ is determined within each score group. A check on monotone

6 Mokken scale analysis for polychotomous items 177 homogeneity may then consist of inspecting whether the magnitude of the estimated item step difficulties ~g~ is nondecreasing across increasing values of the test score X. This check is carried out for all item step difficulties. Since the mathematical proof of a nondecreasing relation between 7rg~ and the test score X has not been supplied, this check on monotone homogeneity should be used with caution. MSP provides a table with all estimated item step difficulties for all score groups. This table can be used to perform the check on monotone homogeneity. For two items i and j and response categories g and h, the bivariate proportion 7r~i,hj can be obtained: ( 7"Fgi,hj = Je 7rg~(SC)Trhj(s c) dg(~) = = Prob(Yg~ = 1, Yhj = 1) = = Prob(Xi/> g, Xj/> h). (5) In the sample, this proportion is estimated by 7rgi,hj = ~ ~ nei,fj/n. (6) e=gf=h In (6) e and f denote category numbers, and nei,~ denotes the number of persons in the sample who have item scores Xi = e and Xj = f. Based on Mokken's results (1971, p. 132) for dichotomous items it can be shown that under double monotonicity the proportions 7ru,-hj and the item step difficulties ~rg~ are ordered identically within the group of examinees who have a fixed item step score on step hi. This means that within the group of persons all characterized by a score equal to one on item step hi, the order of the difficulties 7rgi (g = 1,..., m; i = 1... k) is identical to the order of the bivariate proportions 7rgi,hj (g = 1..., m; i = 1... k). The output of MSP provides a table where the proportions #gi are arranged according to increasing magnitude along the marginals. Inside the table are the bivariate proportions #g~,hj. Apart from sampling fluctuations rows and columns of the table should be monotonely nondecreasing. This table thus makes possible an empirical check of double monotonicity. A scalability coefficient for multicategory items The scalability coefficient for dichotomous items (Mokken, 1971, p. 148) has

7 178 K. Sijtsma et al. been generalized (Molenaar, 1982) to multicategory items. The definition of an error pattern needed for this coefficient is adapted to the level of item steps: an error pattern is obtained when an examinee succeeds on a relatively difficult item step but does not succeed on an easier item step concerning another item. The multicategory scalability coefficient expresses the presence of such error patterns compared with the situation in which the null model of statistical independence holds among the items. To define the scalability coefficient for two items i and j, as an example we take two arbitrary item steps characterized by difficulties 7rgs and %/. We assume that 7rg~ < 7rhj, so that step g of item i is the most difficult of the two steps. An error thus occurs if a person passes step g of item i but fails step h of item j. If 7rg~ = ~'hj the definition of an error is arbitrary. For items i and j the crosstabulation of item scores is considered next. Each cell of the table contains the number of persons having a specific score pattern on i and j. Given the orderings of the steps of these items according to increasing difficulty as defined in (3) and estimated by (4), it can be deduced that some score patterns on items i and j are permitted while others are errors as defined above. The total frequency of error patterns on i and j" is obtained by summation of the frequencies across the error cells of the table. This total sum is denoted Oq (Observed sum of errors). The total sum to be expected when marginal independence of the items holds, can be obtained by means of the marginals of the cross table of items i and j. This sum is denoted by Eq (Expected sum of errors). For two items i and j the scalability coefficient is defined as Hq = 1 - Oq/Eq. (7) Based on this definition are coefficients for the scalability of one specific item with respect to the other items in the test, as well as a coefficient for a set of k items. The sums Oq and Eq are determined for all pairs of items, and then summated across item pairs: Hi = 1 - Oq Eq; j4-i (8) k--1 k k:--i k H = ij/E E Eij. (9) i=1 j=i+l i=1 j=i+l It may be noted that the maximum values of these coefficients equal unity. Their minimum values are not fixed. A value equal to zero is obtained when

8 Mokken scale analysis for polychotomous items 179 the number of observed errors equals the number to be expected given marginal independence. According to Mokken the following rules of thumb can be used for the interpretation of the overall scalability coefficient H: the items form a weak scale in case of 0.30 ~< H< 0.40, a medium scale in case of 0.40~<H<0.50 and a strong scale when H~>0.50. For the use of H as a measure of monotone homogeneity in the dichotomous case see Mokken (1971) and Mokken et al. (1986). MSP contains an algorithm for the selection of items from a larger pool using the H coefficient, exactly as in the earlier programs for dichotomous items. This algorithm selects items, starting with the pair having the highest Hij coefficient, followed by stepwise selection of those items which maximize in each separate step the total scalability coefficient H of the items selected thus far. A detailed explanation of this algorithm is given in the results section. Reliability of test scores Molenaar and Sijtsma (1988) have proposed an extension of the reliability estimate called MS by Sijtsma (1988a) and Sijtsma and Molenaar (1987) to multicategory items. This extension is based on the order of the item steps according to their difficulty, as well as on the assumption of double monotonicity on the level of ISCC's. We refer to Molenaar and Sijtsma (1988) for a technical explanation of the method of reliability estimation. In this paper we briefly review the method. Molenaar and Sijtsma (1988) show that the reliability of the test score X equals i=l g--1/'=1 h=l (lo) The variance o-2(x) can be estimated directly from empirical data. Furthermore, the item step difficulties as well as the bivariate proportions ~'gi,hj (i j) can also be estimated directly. The only problem arises for bivariate proportions if i=j, because a direct estimate would require independent replications of the same item. Based on theory proposed by Mokken (1971, pp ), the method presented by Molenaar and Sijtsma (1988) amounts to an approximation of such proportions without the need for independent replications. An estimate of the reliability is obtained by insertion of these approximations in formula (10) together with the insertion of the other statistics.

9 180 K. Sijtsma et al. For dichotomous items, Sijtsma and Molenaar (1987) found that pxx,, estimated according to (10), is only biased to a negligible extent. MSP provides a reliability estimate of the test score according to the method outlined in this subsection. The program MSP Given a sample of n individuals measured on a set of k items, each with at least two ordered answer categories, MSP offers the following possibilities: -- evaluation of a set of items as an a-priori scale; -- stepwise construction of one or more scales from a given pool of items, with optionally one or more items specified as a startset; -- a check of the assumptions underlying the models of monotone homogeneity and double monotonicity; -- estimation of the reliability of a scale. MSP can handle up to 100 items. For each item the maximum number of answer categories is 10. Subjects with item scores outside a specified range are deleted from the analysis. There is no restriction to the number of subjects. The data consist of the score matrix for the subjects on all items, the bivariate tables with frequencies for all item pairs or the response patterns with their frequency counts. The score matrix has to be a "raw data file"; system files from statistical packages are not allowed. The program MSP has no facilities for transformations of variables, recoding variables, selection of respondents or handling of missing values other than by listwise deletion. Such manipulations have to be done before data are entered. Options and statistics control the output of the program. Through the statistics practically all essential results can be obtained; options can be used to instruct the program that the second and next scales that are found in a set of items will be used as a startset in a later stage of the analysis. The mainframe version (CDC, IBM or VAX) of MSP works comparably to a program like SPSS: a setup with control commands (keyword expressions) is read from a file, data are read from a file and the output is written on a file. A micro computer version of MSP is available for IBM (or compatibles) with 640k memory. This version works for the most part in the same way as the mainframe version. Instead of input and output files, input and output windows on the screen can be used. The input window acts as screen-editor

10 Mokken scale analysis for polychotomous items 181 for preparing the setup for MSP. After execution the output window shows up with the output of the analysis. Switching from output- to input-window is possible in order to change (edit) the setup, to correct errors, to specify different values for some of the parameters or to prepare a new analysis for a new dataset. An application of MSP to multicategory test data Procedure As an example of an analysis of empirical test data by means of MSP we use data from an investigation of the attitudes of delegates of Dutch political parties (Middel & van Schuur, 1981). The delegates were asked the following question: We would like to ask about how much you would trust people from different countries. For each country please indicate whether in your opinion, they are in general very trustworthy, fairly trustworthy, not particularly trustworthy, or not at all trustworthy. The countries to be judged were the European Common Market countries Italy, West Germany, England, Ireland, Belgium, Luxembourg, The Netherlands, Denmark and France, and furthermore, Switzerland, The United States, Russia and China. Middel and Van Schuur analyzed the judgments of 1213 delegates with the dichotomous Mokken model. They used a dichotomized version of the originally polychotomous item scores. In our example pertaining to the original data we excluded respondents with a missing value on one of the items and respondents with the same response on all items. This resulted in a dataset of 806 respondents. In the analysis the search procedure was used: in the set of items one or more scales are constructed by means of a stepwise bottom up item selection procedure. A more detailed explanation of this procedure is given in the next subsection. The parameters to control the selection steps were specified such that the maximum number of scales to be formed equals k/2, the procedure starts with the best item pair (largest Hirvalue) and the minimum value of H permitted is 0.30 (lower bound for a weak scale). Results In Table 1 as a first result of MSP the frequency distributions for each item are given. The items are ordered according to increasing means (the higher

11 - - the 182 K. Sijtsma et al. the mean, the greater the trust in the inhabitants). In all output of MSP the items are ordered according to increasing means, unless otherwise specified by means of an option. In the table it can, e.g., be seen that the Dutch party delegates consider the Russians least trustworthy and their fellowcountrymen most trustworthy. Given the frequency distributions for these two items it can furthermore be concluded that the consensus of opinion is stronger for the Dutch than for the Russians. Next, MSP provides tables containing the H~Tvalues for all item pairs, as well as the statistics for testing whether Hej is significantly larger than zero. It is also possible to obtain the frequency tables for each item pair containing the data on which Hij for that pair is based. This output is very voluminous, however. In the search procedure from all item pairs the pair of items is selected that has the highest HiFcoefficient. This coefficient must be larger than the lowerbound specified by the user (usually 0.30) and should differ significantly from zero at a given level of significance. To avoid chance capitalization the significance level is adapted to the number of tests carried out. For the test of significance the asymptotically standard normal distributed statistic DELTA STAR (based on theory by Mokken, 1971, pp ) is used. The "two-item scale" is then extended in a stepwise manner. Each subsequent item must fulfil the following conditions: item must have positive Hq with all other items already admitted to the scale; Table 1. Frequency distributions for all items: i = "not trustworthy at all", 2 = "not particularly trustworthy", 3 = "fairly trustworthy" and 4 = "very trustworthy" Name Label Mean Values Item 12 Russians Item 13 Chinese Item 1 Italians Item 9 French Item 11 Americans Item 2 Germans Item 4 Irish Item 3 British Item 10 Swiss Item 5 Belgians Item 6 Luxembourgeois Item 8 Danish Item 7 Dutch

12 - - the - - the - - the - - the Mokken scale analysis for polychotomous items the item scalability coefficient Hi for the item with respect to the items already selected is greater than or equal to the lowerbound specified by the user and the item coefficient differs significantly from zero; general scalability coefficient H for the resulting scale is largest given all choices of items from the pool of items not yet selected. In the example the search procedure starts with item 5 (Belgians) and item 6 (Luxembourgeois) with //56 = Successively the following items are added: item 7 (Dutch), item 8 (Danish), item 4 (Irish) and item 3 (British). The final result (see Table 2) is given in terms of: general scalability coefficient H. In this case H= 0.41, indicating a medium scale; mean item score (the higher the mean, the higher the trust) which is also provided in Table 1; -- the item scalability coefficient Hi for each item (also for those not selected) with respect to the items in the scale; asymptotically standard normal distributed statistic Delta Star, which is used to test the hypothesis that H or Hi is larger than zero. Table 2. Final result of the search procedure Final scale 1, number of variables: 6 Scale coefficient H = 0.41, scale delta star = Variable Label Mean Item 4 Irish 3.03 Item 3 British 3.15 Item 5 Belgians 3.28 Item 6 Luxembourgeois 3.42 Item 8 Danish 3.54 Item 7 Dutch 3.56 Coefficients of non-scale variables with respect to this scale: Variable Label Mean Item 12 Russians 1.99 Item 13 Chinese 2.20 Item 1 Italians 2.38 Item 9 French 2.66 Item 11 Americans 2.96 Item 2 Germans 2.98 Item 10 Swiss 3.26 H i Hi Delta Star Delta Star

13 - - tables - - tables - - tables 184 K. Sijtsma et al. Furthermore the following results can be requested: containing all frequencies of persons having a specific test score as well as a specific score on a specific item in the scale; containing the proportions of positive responses on each item step within each separate test score group; containing the mean of each item within each separate test score group. The assumption of monotonely nondecreasing ISCC's (Molenaar, 1986) can be checked by means of a heuristic method using the table with the pro- portions of positive responses on each item step within each separate test score group (Table 3). For reasons of efficiency the results are only given for the items 3, 4 and 5. In the columns of the table we are looking for monotonely nondecreasing sequences of proportions. Except for some minor deviations, the selected items in our example fulfil this condition. As was pointed out before this method does not yet have a mathematical basis and it should thus be applied with caution. Under the assumption of the model of double monotonicity, the ISCC's may not intersect. This assumption can be investigated by means of the so called P-matrix, containing pairwise for all item steps, the probabilities of a positive response for both item steps involved. In this matrix the item steps are ordered according to increasing popularities, and so from left to right and from top to bottom the probabilities should be monotonely nondecreas- ing. The P-matrix in our example pertaining to six items has the order 18 by 18. It thus takes much space to display this matrix. In Table 4 only the part Table 3. Proportions of positive responses on each item step within each separate test score group N Item 4 Item 3 Item 5 Sum score ~2 ~3 ~4 ~2 ~3 ~4 )2 ~3 ~ , ,

14 - - the - - the - - the Mokken scale analysis for polychotomous items 185 with respect to the nine most difficult or least popular item steps is shown. The items in our example seem to fulfil the condition of double monotonicity. The P-matrix is also the basis for the computation of the reliability coefficient (Molenaar and Sijtsma, 1988). The reliability estimate for the six items selected in the first scale is shown at the bottom of Table 4. After the construction of the first scale, the search procedure restarts the analysis with the items not yet selected in this scale. This procedure is called "multiple scaling". If two or more scales can be constructed based on a set of items, this could be an indication of multi-dimensionality. In our example three additional scales can be constructed (see Table 5), whereupon no more items are left. Once an item is included in a scale, it can not be selected in another scale. It could be possible however that the composition of the scales has been affected by the order in which they were constructed. MSP contains an option which offers the possibility to use the second and following scales as startset for scale construction based on the complete set of items in a second phase of the analysis. In our example the use of the second, the third and the fourth scale as startset, respectively, yields the following results: second scale obtained in phase 1 remains the same; third scale (Swiss, Germans and Americans) is extended with Dutch, Belgians, Luxembourgeois and Danish. This scale has a scalability coefficient of 0.37; fourth scale (French and Italians) is extended with all items from the first scale (Irish, British, Belgians, Luxembourgeois, Danish and Dutch) Table 4. P-matrix and reliability coefficient. For i =~ j proportions are estimated directly from the data. For i = j proportions are approximated~by means of extrapolation methods (Molenaar & Sijtsma, 1988) P-matrix, estimates of + + probabilities for all paired item steps. Item steps are ordered according to their sample difficulties. Item 4 Item 3 Item 5 Item 6 Item 8 Item 7 Item 4 Item 3 Item 5 ~>4 />4 />4 />4 />4 />4 ~>3 />3 ~>3 Variable/> P Item Item Item Item Item Item Item Item Item Reliability RHO

15 186 K. Sijtsma et al. resulting in a scale with a scalability coefficient equal to In the resulting scale French and Italians have an item scalability coefficient less than 0.30, however. The results of this analysis can be compared with the results in Table 2, where the results for the first scale are given; the items of the fourth scale (French and Italians) are not included in this scale because of item scalability coefficients which are too low. Interpretation and comparison with the dichotomous analysis With respect to the judgment of the trustworthiness of inhabitants of countries several factors are important. The first scale contains countries which are neighbours of The Netherlands, which is the native country of the party delegates. The second scale contains the communist countries, and so on. Therefore, vicinity and political system partly account for the scales. In the original investigation (Middel and Van Schuur, 1981) a dichotomous analysis was performed. The polychotomous item scores were dichotomized in no trust (original scores 1 and 2) versus trust (original scores 3 and 4). The result was one scale containing all thirteen items. A possible explanation for this result could be the fact that Middel and Van Schuur did not exclude subjects with the same response to all items. The response patterns of Table 5. Results after multiple scaling Final scale 2, number of variables: 2 Scale coefficient H = 0.69, Scale delta star = Variable Label Mean Item I2 Russians 1.99 Item 13 Chinese 2.20 Final scale 3, number of variables: 3 Scale coefficient H = 0.38, Scale delta star = Variable Label Mean Item 11 Americans 2.96 Item 2 Germans 2.98 Item 10 Swiss 3.26 Final scale 4, number of variables: 2 Scale coefficient H = 0.39, Scale delta star = 9.69 Hi Delta Star Delta Star Variable Label Mean H i Delta Star Item 1 Italians Item 9 French

16 Mokken scale analysis for polychotomous items 187 these subjects do not contain errors and, consequently, scalability coefficients are inflated. A dichotomous analysis of the reduced dataset of 806 respondents results in an essential loss of information. If we look at the frequency distributions of each item (Table 1) we see that after dichotomization of the items 5, 6, 7 and 8 over 90% of the subjects would belong to a single category. The analysis of the dichotomized data results in two scales; the first consists of Russians, Chinese, Italians, French, Irish, Belgians and Luxembourgeois and the second consists of Americans, Germans, Swiss and Dutch. Discussion The extension of the Mokken model to multicategory item scores by Molenaar allows to avoid the loss of information resulting from a dichotomization of originally multicategory data. This dichotomization usually produces a less differentiated measurement scale, as well as smaller reliability of measurement. Furthermore, the results obtained in the previous section show that the composition of scales based on polychotomous and dichotomized data may differ. For some of the countries judgment of the trustworthiness of inhabitants means a choice between fairly or very trustworthy, while for other countries it means a choice between trust or distrust, and sometimes a finer differentiation is possible. The multicategory Mokken model offers the possibility to look in more detail at polychotomously scored items, some of which would be very popular or unpopular after a plausible dichotomization. Availability of the program MSP The program MSP is available for IBM-compatible micro computers and IBM, CDC or VAX mainframe computers. The price of the micro computer version including a user manual is $200 (price is without taxes, mailing, shipping and administration costs and may be subject to changes). Orders should be sent to: iec ProGAMMA, Kraneweg 8, 9718 JP Groningen, The Netherlands. References Debets, P. & Brouwer, E. (1989). MSP: a program for Mokken Scale analysis for Polychotomous items. Groningen: iec ProGAMMA.

17 188 K. Sijtsma et al. Gillespie, M., TenVergert, E.M. & Kingma, 3. (1987). "Using Mokken scale analysis to develop unidimensional scales", Quality & Quantity 21: Grayson, D.A. (1988). "Two-group classification in latent trait theory: scores with monotone likelihood ratio", Psychometrika 53: Henning, H.J. & Six, B. (1977). "Konstruktion einer Machiavellismus-Skala", Zeitschrifi f~r Sozial Psychologie 8: Kingma, J. & Renvekamp, J. (1986a). "Mokken Scale: a pascal program for nonparametric stochastic scaling", Educational and Psychological Measurement 46: Kingma, J. & Reuvekamp, J. (1986b). "Mokken test for the robustness of nonparametric stochastic Mokken scales", Educational and Psychological Measurement 46: Middel, B.P. & Sehuur, W.H. van (1981). "Dutch Party Delegates", Acta Potitica i6: 24i Mokken, R.J. (1971). A Theory and Procedure of Scale Analysis. The Hague: Mouton. Mokken, R.J. & Lewis, C. (1982). "A nonparametric approach to the analysis of dichotomous item responses", Applied Psychological Measurement 6: Mokken, R.J., Lewis, C. & Sijtsma, K. (1986). Rejoinder to "The Mokken Scale: A critical discussion", Applied Psychological Measurement 10: Molenaar, I.W. (1982). "Mokken Scaling Revisited", Kwantitatieve Methoden 3 no 8: Molenaar, I.W. (1986). "Een vingeroefening in item response theorie voor drie geordende antwoordcategorieen", pp in G.F. Pikkemaat & J.J.A. Moors (eds), Liber Amicorum Jaap Muilwijk. Groningen: Econometrisch Instituut. Molenaar, I.W. & Sijtsma, K. (1988). "Mokken's approach to reliability estimation extended to multicategory items", Kwantitatieve Methoden 9, no. 28: NiemOller, B. (1980). Mokken Scale. STAP user's manual, vol. 4, part 2. Amsterdam: Technisch Centrum FSW, University of Amsterdam. Niem611er, B. & Van Schuur, W.H. (1980). Mokken Test. STAP user's manual, vol. 4, part 3. Amsterdam: Technisch Centrum FSW, University of Amsterdam. Niem611er, B. & Van Schuur, W.H. (1983). "Stochastic Models for Unidimensionat Scaling: Mokken and Rasch", pp in D. McKay, N. Schofield & P. Whiteley (eds), Data Analysis and the Social Sciences. London: Francis Pinter Ltd. Sijtsma, K. (1988a). "Reliability estimation in Mokken's nonparametric item response model", pp in W.E. Saris & I.N. Gallhofer (eds), Sociometric Research, Vol. 1: Data Collection and Scaling. London: MacMillan Press Ltd. Sijtsma, K. (1988b). Contributions to Mokken's Nonparametric Item Response Theory. Amsterdam: Free University Press. Sijtsma, K. & Molenaar, I.W. (1987). "Reliability of test scores in nonparametric item response theory", Psychometrika 52: Stokman, F.N. & Van Schuur, W. (1980). "Basic Scaling", Quality & Quantity 14: 5-30.

Latent Trait Standardization of the Benzodiazepine Dependence. Self-Report Questionnaire using the Rasch Scaling Model

Latent Trait Standardization of the Benzodiazepine Dependence. Self-Report Questionnaire using the Rasch Scaling Model Chapter 7 Latent Trait Standardization of the Benzodiazepine Dependence Self-Report Questionnaire using the Rasch Scaling Model C.C. Kan 1, A.H.G.S. van der Ven 2, M.H.M. Breteler 3 and F.G. Zitman 1 1

More information

Ten recommendations for Osteoarthritis and Cartilage (OAC) manuscript preparation, common for all types of studies.

Ten recommendations for Osteoarthritis and Cartilage (OAC) manuscript preparation, common for all types of studies. Ten recommendations for Osteoarthritis and Cartilage (OAC) manuscript preparation, common for all types of studies. Ranstam, Jonas; Lohmander, L Stefan Published in: Osteoarthritis and Cartilage DOI: 10.1016/j.joca.2011.07.007

More information

Evaluating the quality of analytic ratings with Mokken scaling

Evaluating the quality of analytic ratings with Mokken scaling Psychological Test and Assessment Modeling, Volume 57, 2015 (3), 423-444 Evaluating the quality of analytic ratings with Mokken scaling Stefanie A. Wind 1 Abstract Greatly influenced by the work of Rasch

More information

[3] Coombs, C.H., 1964, A theory of data, New York: Wiley.

[3] Coombs, C.H., 1964, A theory of data, New York: Wiley. Bibliography [1] Birnbaum, A., 1968, Some latent trait models and their use in inferring an examinee s ability, In F.M. Lord & M.R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-479),

More information

Knowledge as a driver of public perceptions about climate change reassessed

Knowledge as a driver of public perceptions about climate change reassessed 1. Method and measures 1.1 Sample Knowledge as a driver of public perceptions about climate change reassessed In the cross-country study, the age of the participants ranged between 20 and 79 years, with

More information

Investigating invariant item ordering in the Mental Health Inventory: An illustration of the use of

Investigating invariant item ordering in the Mental Health Inventory: An illustration of the use of 1 Investigating invariant item ordering in the Mental Health Inventory: An illustration of the use of different methods Roger Watson a, * Wenru Wang b David R Thompson c Rob R Meijer d a The University

More information

A Comparison of Several Goodness-of-Fit Statistics

A Comparison of Several Goodness-of-Fit Statistics A Comparison of Several Goodness-of-Fit Statistics Robert L. McKinley The University of Toledo Craig N. Mills Educational Testing Service A study was conducted to evaluate four goodnessof-fit procedures

More information

Section 6: Analysing Relationships Between Variables

Section 6: Analysing Relationships Between Variables 6. 1 Analysing Relationships Between Variables Section 6: Analysing Relationships Between Variables Choosing a Technique The Crosstabs Procedure The Chi Square Test The Means Procedure The Correlations

More information

Nonparametric IRT analysis of Quality-of-Life Scales and its application to the World Health Organization Quality-of-Life Scale (WHOQOL-Bref)

Nonparametric IRT analysis of Quality-of-Life Scales and its application to the World Health Organization Quality-of-Life Scale (WHOQOL-Bref) Qual Life Res (2008) 17:275 290 DOI 10.1007/s11136-007-9281-6 Nonparametric IRT analysis of Quality-of-Life Scales and its application to the World Health Organization Quality-of-Life Scale (WHOQOL-Bref)

More information

Lessons in biostatistics

Lessons in biostatistics Lessons in biostatistics The test of independence Mary L. McHugh Department of Nursing, School of Health and Human Services, National University, Aero Court, San Diego, California, USA Corresponding author:

More information

Investigating the robustness of the nonparametric Levene test with more than two groups

Investigating the robustness of the nonparametric Levene test with more than two groups Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing

More information

Does organic food intervention in the Danish schools lead to change dietary patterns? He, Chen; Mikkelsen, Bent Egberg

Does organic food intervention in the Danish schools lead to change dietary patterns? He, Chen; Mikkelsen, Bent Egberg Aalborg Universitet Does organic food intervention in the Danish schools lead to change dietary patterns? He, Chen; Mikkelsen, Bent Egberg Published in: Like what you get? Is it good for you? Organic food,

More information

One-Way Independent ANOVA

One-Way Independent ANOVA One-Way Independent ANOVA Analysis of Variance (ANOVA) is a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment.

More information

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure Rob Cavanagh Len Sparrow Curtin University R.Cavanagh@curtin.edu.au Abstract The study sought to measure mathematics anxiety

More information

Psychometrics in context: Test Construction with IRT. Professor John Rust University of Cambridge

Psychometrics in context: Test Construction with IRT. Professor John Rust University of Cambridge Psychometrics in context: Test Construction with IRT Professor John Rust University of Cambridge Plan Guttman scaling Guttman errors and Loevinger s H statistic Non-parametric IRT Traces in Stata Parametric

More information

CHAPTER 3 METHOD AND PROCEDURE

CHAPTER 3 METHOD AND PROCEDURE CHAPTER 3 METHOD AND PROCEDURE Previous chapter namely Review of the Literature was concerned with the review of the research studies conducted in the field of teacher education, with special reference

More information

Gender differences in competitive preferences: new cross-country empirical evidence

Gender differences in competitive preferences: new cross-country empirical evidence SCHUMPETER DISCUSSION PAPERS Gender differences in competitive preferences: new cross-country empirical evidence Werner Bönte SDP 2014-008 ISSN 1867-5352 by the author Gender differences in competitive

More information

In this module I provide a few illustrations of options within lavaan for handling various situations.

In this module I provide a few illustrations of options within lavaan for handling various situations. In this module I provide a few illustrations of options within lavaan for handling various situations. An appropriate citation for this material is Yves Rosseel (2012). lavaan: An R Package for Structural

More information

Association between symptoms of depression and glycaemic control may be unstable across gender Pouwer, François; Snoek, F.J.

Association between symptoms of depression and glycaemic control may be unstable across gender Pouwer, François; Snoek, F.J. Tilburg University Association between symptoms of depression and glycaemic control may be unstable across gender Pouwer, François; Snoek, F.J. Published in: Diabetic Medicine: Journal of Diabetes UK Publication

More information

From where does the content of a certain geo-communication come? semiotics in web-based geo-communication Brodersen, Lars

From where does the content of a certain geo-communication come? semiotics in web-based geo-communication Brodersen, Lars Downloaded from vbn.aau.dk on: april 02, 2019 Aalborg Universitet From where does the content of a certain geo-communication come? semiotics in web-based geo-communication Brodersen, Lars Published in:

More information

The dual pathway model of overeating Ouwens, Machteld; van Strien, T.; van Leeuwe, J. F. J.; van der Staak, C. P. F.

The dual pathway model of overeating Ouwens, Machteld; van Strien, T.; van Leeuwe, J. F. J.; van der Staak, C. P. F. Tilburg University The dual pathway model of overeating Ouwens, Machteld; van Strien, T.; van Leeuwe, J. F. J.; van der Staak, C. P. F. Published in: Appetite Publication date: 2009 Link to publication

More information

Aalborg Universitet. CLIMA proceedings of the 12th REHVA World Congress Heiselberg, Per Kvols. Publication date: 2016

Aalborg Universitet. CLIMA proceedings of the 12th REHVA World Congress Heiselberg, Per Kvols. Publication date: 2016 Aalborg Universitet CLIMA 2016 - proceedings of the 12th REHVA World Congress Heiselberg, Per Kvols Publication date: 2016 Document Version Publisher's PDF, also known as Version of record Link to publication

More information

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses

Item Response Theory. Steven P. Reise University of California, U.S.A. Unidimensional IRT Models for Dichotomous Item Responses Item Response Theory Steven P. Reise University of California, U.S.A. Item response theory (IRT), or modern measurement theory, provides alternatives to classical test theory (CTT) methods for the construction,

More information

Application of the Double Monotonicity Model to Polytomous Items

Application of the Double Monotonicity Model to Polytomous Items T. Rivas et al.: Application European Journalof of Double Psychological Monotonicity Assessment 2005Hogrefe&HuberPubli Model to 2005; Polytomous Vol. 21(1):1 10 Items shers Application of the Double Monotonicity

More information

Syddansk Universitet. Stem cell divisions per se do not cause cancer. Wensink, Maarten Jan; Vaupel, James W. ; Christensen, Kaare

Syddansk Universitet. Stem cell divisions per se do not cause cancer. Wensink, Maarten Jan; Vaupel, James W. ; Christensen, Kaare Syddansk Universitet Stem cell divisions per se do not cause cancer Wensink, Maarten Jan; Vaupel, James W. ; Christensen, Kaare Published in: Epidemiology DOI: 10.1097/EDE.0000000000000612 Publication

More information

Field-normalized citation impact indicators and the choice of an appropriate counting method

Field-normalized citation impact indicators and the choice of an appropriate counting method Field-normalized citation impact indicators and the choice of an appropriate counting method Ludo Waltman and Nees Jan van Eck Centre for Science and Technology Studies, Leiden University, The Netherlands

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Bruno D. Zumbo, Ph.D. University of Northern British Columbia Bruno Zumbo 1 The Effect of DIF and Impact on Classical Test Statistics: Undetected DIF and Impact, and the Reliability and Interpretability of Scores from a Language Proficiency Test Bruno D. Zumbo, Ph.D.

More information

Overview of Non-Parametric Statistics

Overview of Non-Parametric Statistics Overview of Non-Parametric Statistics LISA Short Course Series Mark Seiss, Dept. of Statistics April 7, 2009 Presentation Outline 1. Homework 2. Review of Parametric Statistics 3. Overview Non-Parametric

More information

Edinburgh Research Explorer

Edinburgh Research Explorer Edinburgh Research Explorer Effect of time period of data used in international dairy sire evaluations Citation for published version: Weigel, KA & Banos, G 1997, 'Effect of time period of data used in

More information

Analyzing Psychopathology Items: A Case for Nonparametric Item Response Theory Modeling

Analyzing Psychopathology Items: A Case for Nonparametric Item Response Theory Modeling Psychological Methods 2004, Vol. 9, No. 3, 354 368 Copyright 2004 by the American Psychological Association 1082-989X/04/$12.00 DOI: 10.1037/1082-989X.9.3.354 Analyzing Psychopathology Items: A Case for

More information

Case study examining the impact of German reunification on life expectancy

Case study examining the impact of German reunification on life expectancy Supplementary Materials 2 Case study examining the impact of German reunification on life expectancy Table A1 summarises our case study. This is a simplified analysis for illustration only and does not

More information

Small Group Presentations

Small Group Presentations Admin Assignment 1 due next Tuesday at 3pm in the Psychology course centre. Matrix Quiz during the first hour of next lecture. Assignment 2 due 13 May at 10am. I will upload and distribute these at the

More information

Hearing Protection and Hearing Symptoms in Danish Symphony Orchestras

Hearing Protection and Hearing Symptoms in Danish Symphony Orchestras Downloaded from orbit.dtu.dk on: Jul 25, 2018 Hearing Protection and Hearing Symptoms in Danish Symphony Orchestras Laitinen, Heli; Poulsen, Torben Published in: Proceedings of Euronoise 2006 Publication

More information

Reveal Relationships in Categorical Data

Reveal Relationships in Categorical Data SPSS Categories 15.0 Specifications Reveal Relationships in Categorical Data Unleash the full potential of your data through perceptual mapping, optimal scaling, preference scaling, and dimension reduction

More information

The Impact of Relative Standards on the Propensity to Disclose. Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX

The Impact of Relative Standards on the Propensity to Disclose. Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX The Impact of Relative Standards on the Propensity to Disclose Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX 2 Web Appendix A: Panel data estimation approach As noted in the main

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

Intro to SPSS. Using SPSS through WebFAS

Intro to SPSS. Using SPSS through WebFAS Intro to SPSS Using SPSS through WebFAS http://www.yorku.ca/computing/students/labs/webfas/ Try it early (make sure it works from your computer) If you need help contact UIT Client Services Voice: 416-736-5800

More information

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1 Nested Factor Analytic Model Comparison as a Means to Detect Aberrant Response Patterns John M. Clark III Pearson Author Note John M. Clark III,

More information

Does organic food intervention in the Danish schools lead to change dietary patterns? He, Chen; Mikkelsen, Bent Egberg

Does organic food intervention in the Danish schools lead to change dietary patterns? He, Chen; Mikkelsen, Bent Egberg Aalborg Universitet Does organic food intervention in the Danish schools lead to change dietary patterns? He, Chen; Mikkelsen, Bent Egberg Publication date: 2009 Document Version Publisher's PDF, also

More information

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD Psy 427 Cal State Northridge Andrew Ainsworth, PhD Contents Item Analysis in General Classical Test Theory Item Response Theory Basics Item Response Functions Item Information Functions Invariance IRT

More information

Tilburg University. Measuring the ability of transitive reasoning, using product and strategy information Bouwmeester, S.; Sijtsma, K.

Tilburg University. Measuring the ability of transitive reasoning, using product and strategy information Bouwmeester, S.; Sijtsma, K. Tilburg University Measuring the ability of transitive reasoning, using product and strategy information Bouwmeester, S.; Sijtsma, K. Published in: Psychometrika Document version: Publisher's PDF, also

More information

Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives

Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives DOI 10.1186/s12868-015-0228-5 BMC Neuroscience RESEARCH ARTICLE Open Access Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives Emmeke

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

The Effect of Guessing on Item Reliability

The Effect of Guessing on Item Reliability The Effect of Guessing on Item Reliability under Answer-Until-Correct Scoring Michael Kane National League for Nursing, Inc. James Moloney State University of New York at Brockport The answer-until-correct

More information

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * by J. RICHARD LANDIS** and GARY G. KOCH** 4 Methods proposed for nominal and ordinal data Many

More information

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Statistics as a Tool A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Descriptive Statistics Numerical facts or observations that are organized describe

More information

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,

More information

Jan Stochl 1,2*, Peter B Jones 1,2 and Tim J Croudace 1

Jan Stochl 1,2*, Peter B Jones 1,2 and Tim J Croudace 1 Stochl et al. BMC Medical Research Methodology 2012, 12:74 CORRESPONDENCE Open Access Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in

More information

linking in educational measurement: Taking differential motivation into account 1

linking in educational measurement: Taking differential motivation into account 1 Selecting a data collection design for linking in educational measurement: Taking differential motivation into account 1 Abstract In educational measurement, multiple test forms are often constructed to

More information

Using SPSS for Correlation

Using SPSS for Correlation Using SPSS for Correlation This tutorial will show you how to use SPSS version 12.0 to perform bivariate correlations. You will use SPSS to calculate Pearson's r. This tutorial assumes that you have: Downloaded

More information

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Connexion of Item Response Theory to Decision Making in Chess Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan Acknowledgement A few Slides have been taken from the following presentation

More information

Reliability. Scale: Empathy

Reliability. Scale: Empathy /VARIABLES=Empathy1 Empathy2 Empathy3 Empathy4 /STATISTICS=DESCRIPTIVE SCALE Reliability Notes Output Created Comments Input Missing Value Handling Syntax Resources Scale: Empathy Data Active Dataset Filter

More information

The influence of (in)congruence of communicator expertise and trustworthiness on acceptance of CCS technologies

The influence of (in)congruence of communicator expertise and trustworthiness on acceptance of CCS technologies The influence of (in)congruence of communicator expertise and trustworthiness on acceptance of CCS technologies Emma ter Mors 1,2, Mieneke Weenig 1, Naomi Ellemers 1, Dancker Daamen 1 1 Leiden University,

More information

Score Tests of Normality in Bivariate Probit Models

Score Tests of Normality in Bivariate Probit Models Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model

More information

Turning Output of Item Response Theory Data Analysis into Graphs with R

Turning Output of Item Response Theory Data Analysis into Graphs with R Overview Turning Output of Item Response Theory Data Analysis into Graphs with R Motivation Importance of graphing data Graphical methods for item response theory Why R? Two examples Ching-Fan Sheu, Cheng-Te

More information

1. Evaluate the methodological quality of a study with the COSMIN checklist

1. Evaluate the methodological quality of a study with the COSMIN checklist Answers 1. Evaluate the methodological quality of a study with the COSMIN checklist We follow the four steps as presented in Table 9.2. Step 1: The following measurement properties are evaluated in the

More information

Abstract Title Page Not included in page count. Authors and Affiliations: Joe McIntyre, Harvard Graduate School of Education

Abstract Title Page Not included in page count. Authors and Affiliations: Joe McIntyre, Harvard Graduate School of Education Abstract Title Page Not included in page count. Title: Detecting anchoring-and-adjusting in survey scales Authors and Affiliations: Joe McIntyre, Harvard Graduate School of Education SREE Spring 2014 Conference

More information

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2 MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and Lord Equating Methods 1,2 Lisa A. Keller, Ronald K. Hambleton, Pauline Parker, Jenna Copella University of Massachusetts

More information

Unequal Numbers of Judges per Subject

Unequal Numbers of Judges per Subject The Reliability of Dichotomous Judgments: Unequal Numbers of Judges per Subject Joseph L. Fleiss Columbia University and New York State Psychiatric Institute Jack Cuzick Columbia University Consider a

More information

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model

A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model A Comparison of Pseudo-Bayesian and Joint Maximum Likelihood Procedures for Estimating Item Parameters in the Three-Parameter IRT Model Gary Skaggs Fairfax County, Virginia Public Schools José Stevenson

More information

The Influence of Test Characteristics on the Detection of Aberrant Response Patterns

The Influence of Test Characteristics on the Detection of Aberrant Response Patterns The Influence of Test Characteristics on the Detection of Aberrant Response Patterns Steven P. Reise University of California, Riverside Allan M. Due University of Minnesota Statistical methods to assess

More information

Exploring rater errors and systematic biases using adjacent-categories Mokken models

Exploring rater errors and systematic biases using adjacent-categories Mokken models Psychological Test and Assessment Modeling, Volume 59, 2017 (4), 493-515 Exploring rater errors and systematic biases using adjacent-categories Mokken models Stefanie A. Wind 1 & George Engelhard, Jr.

More information

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M.

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M. Analysis of single gene effects 1 Quantitative analysis of single gene effects Gregory Carey, Barbara J. Bowers, Jeanne M. Wehner From the Department of Psychology (GC, JMW) and Institute for Behavioral

More information

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data Karl Bang Christensen National Institute of Occupational Health, Denmark Helene Feveille National

More information

Two-Way Independent ANOVA

Two-Way Independent ANOVA Two-Way Independent ANOVA Analysis of Variance (ANOVA) a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment. There

More information

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu

More information

Aalborg Universitet. Statistical analysis plan Riis, Allan; Karran, E. L. ; Jørgensen, Anette; Holst, S.; Rolving, N. Publication date: 2017

Aalborg Universitet. Statistical analysis plan Riis, Allan; Karran, E. L. ; Jørgensen, Anette; Holst, S.; Rolving, N. Publication date: 2017 Aalborg Universitet Statistical analysis plan Riis, Allan; Karran, E. L. ; Jørgensen, Anette; Holst, S.; Rolving, N. Publication date: 2017 Document Version Publisher's PDF, also known as Version of record

More information

Nonparametric DIF. Bruno D. Zumbo and Petronilla M. Witarsa University of British Columbia

Nonparametric DIF. Bruno D. Zumbo and Petronilla M. Witarsa University of British Columbia Nonparametric DIF Nonparametric IRT Methodology For Detecting DIF In Moderate-To-Small Scale Measurement: Operating Characteristics And A Comparison With The Mantel Haenszel Bruno D. Zumbo and Petronilla

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, April 23-25, 2003 The Classification Accuracy of Measurement Decision Theory Lawrence Rudner University

More information

University of Dundee. Statistical packages and clinical psychology research Peck, Dave; Dow, Mike; Goodall, William

University of Dundee. Statistical packages and clinical psychology research Peck, Dave; Dow, Mike; Goodall, William University of Dundee Statistical packages and clinical psychology research Peck, Dave; Dow, Mike; Goodall, William Published in: Clinical Psychology Forum Publication date: 2016 Document Version Peer reviewed

More information

Electrolyte-balanced heparin in blood gas syringes can introduce a significant bias in the measurement of positively charged electrolytes

Electrolyte-balanced heparin in blood gas syringes can introduce a significant bias in the measurement of positively charged electrolytes Electrolyte-balanced heparin in blood gas syringes can introduce a significant bias in the measurement of positively charged electrolytes Citation for published version (APA): Berkel, van, M., & Scharnhorst,

More information

Qualified Presumption of Safety (QPS) an EFSA Tool for Microbial Safety Assessment

Qualified Presumption of Safety (QPS) an EFSA Tool for Microbial Safety Assessment Downloaded from orbit.dtu.dk on: Dec 20, 2017 Qualified Presumption of Safety (QPS) an EFSA Tool for Microbial Safety Assessment Leuschner, Renata; Licht, Tine Rask; Hugas, Marta Publication date: 2012

More information

Description of components in tailored testing

Description of components in tailored testing Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of

More information

Does cognitive ability influence responses to the Warwick-Edinburgh Mental. Well-being Scale?

Does cognitive ability influence responses to the Warwick-Edinburgh Mental. Well-being Scale? Does cognitive ability influence responses to the Warwick-Edinburgh Mental Well-being Scale? Abstract It has been suggested that how individuals respond to self-report items relies on cognitive processing.

More information

IDEA Technical Report No. 20. Updated Technical Manual for the IDEA Feedback System for Administrators. Stephen L. Benton Dan Li

IDEA Technical Report No. 20. Updated Technical Manual for the IDEA Feedback System for Administrators. Stephen L. Benton Dan Li IDEA Technical Report No. 20 Updated Technical Manual for the IDEA Feedback System for Administrators Stephen L. Benton Dan Li July 2018 2 Table of Contents Introduction... 5 Sample Description... 6 Response

More information

International Journal of Education and Research Vol. 5 No. 5 May 2017

International Journal of Education and Research Vol. 5 No. 5 May 2017 International Journal of Education and Research Vol. 5 No. 5 May 2017 EFFECT OF SAMPLE SIZE, ABILITY DISTRIBUTION AND TEST LENGTH ON DETECTION OF DIFFERENTIAL ITEM FUNCTIONING USING MANTEL-HAENSZEL STATISTIC

More information

Nearest-Integer Response from Normally-Distributed Opinion Model for Likert Scale

Nearest-Integer Response from Normally-Distributed Opinion Model for Likert Scale Nearest-Integer Response from Normally-Distributed Opinion Model for Likert Scale Jonny B. Pornel, Vicente T. Balinas and Giabelle A. Saldaña University of the Philippines Visayas This paper proposes that

More information

Empirical Formula for Creating Error Bars for the Method of Paired Comparison

Empirical Formula for Creating Error Bars for the Method of Paired Comparison Empirical Formula for Creating Error Bars for the Method of Paired Comparison Ethan D. Montag Rochester Institute of Technology Munsell Color Science Laboratory Chester F. Carlson Center for Imaging Science

More information

An Introduction to Research Statistics

An Introduction to Research Statistics An Introduction to Research Statistics An Introduction to Research Statistics Cris Burgess Statistics are like a lamppost to a drunken man - more for leaning on than illumination David Brent (alias Ricky

More information

Chapter 3 Software Packages to Install How to Set Up Python Eclipse How to Set Up Eclipse... 42

Chapter 3 Software Packages to Install How to Set Up Python Eclipse How to Set Up Eclipse... 42 Table of Contents Preface..... 21 About the Authors... 23 Acknowledgments... 24 How This Book is Organized... 24 Who Should Buy This Book?... 24 Where to Find Answers to Review Questions and Exercises...

More information

SAMPLING ERROI~ IN THE INTEGRATED sysrem FOR SURVEY ANALYSIS (ISSA)

SAMPLING ERROI~ IN THE INTEGRATED sysrem FOR SURVEY ANALYSIS (ISSA) SAMPLING ERROI~ IN THE INTEGRATED sysrem FOR SURVEY ANALYSIS (ISSA) Guillermo Rojas, Alfredo Aliaga, Macro International 8850 Stanford Boulevard Suite 4000, Columbia, MD 21045 I-INTRODUCTION. This paper

More information

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University. Running head: ASSESS MEASUREMENT INVARIANCE Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies Xiaowen Zhu Xi an Jiaotong University Yanjie Bian Xi an Jiaotong

More information

University of Huddersfield Repository

University of Huddersfield Repository University of Huddersfield Repository Whitaker, Simon Some problems with the definition of intellectual disability and their implications Original Citation Whitaker, Simon (2013) Some problems with the

More information

CHAPTER 3 RESEARCH METHODOLOGY

CHAPTER 3 RESEARCH METHODOLOGY CHAPTER 3 RESEARCH METHODOLOGY 3.1 Introduction 3.1 Methodology 3.1.1 Research Design 3.1. Research Framework Design 3.1.3 Research Instrument 3.1.4 Validity of Questionnaire 3.1.5 Statistical Measurement

More information

Psychometric Details of the 20-Item UFFM-I Conscientiousness Scale

Psychometric Details of the 20-Item UFFM-I Conscientiousness Scale Psychometric Details of the 20-Item UFFM-I Conscientiousness Scale Documentation Prepared By: Nathan T. Carter & Rachel L. Williamson Applied Psychometric Laboratory at The University of Georgia Last Updated:

More information

Community structure in resting state complex networks

Community structure in resting state complex networks Downloaded from orbit.dtu.dk on: Dec 20, 2017 Community structure in resting state complex networks Andersen, Kasper Winther; Madsen, Kristoffer H.; Siebner, Hartwig Roman; Schmidt, Mikkel Nørgaard; Mørup,

More information

To open a CMA file > Download and Save file Start CMA Open file from within CMA

To open a CMA file > Download and Save file Start CMA Open file from within CMA Example name Effect size Analysis type Level Tamiflu Hospitalized Risk ratio Basic Basic Synopsis The US government has spent 1.4 billion dollars to stockpile Tamiflu, in anticipation of a possible flu

More information

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL JOURNAL OF EDUCATIONAL MEASUREMENT VOL. II, NO, 2 FALL 1974 THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL SUSAN E. WHITELY' AND RENE V. DAWIS 2 University of Minnesota Although it has been claimed that

More information

Answers to end of chapter questions

Answers to end of chapter questions Answers to end of chapter questions Chapter 1 What are the three most important characteristics of QCA as a method of data analysis? QCA is (1) systematic, (2) flexible, and (3) it reduces data. What are

More information

2008 Ohio State University. Campus Climate Study. Prepared by. Student Life Research and Assessment

2008 Ohio State University. Campus Climate Study. Prepared by. Student Life Research and Assessment 2008 Ohio State University Campus Climate Study Prepared by Student Life Research and Assessment January 22, 2009 Executive Summary The purpose of this report is to describe the experiences and perceptions

More information

CHAPTER ONE CORRELATION

CHAPTER ONE CORRELATION CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

Testing Means. Related-Samples t Test With Confidence Intervals. 6. Compute a related-samples t test and interpret the results.

Testing Means. Related-Samples t Test With Confidence Intervals. 6. Compute a related-samples t test and interpret the results. 10 Learning Objectives Testing Means After reading this chapter, you should be able to: Related-Samples t Test With Confidence Intervals 1. Describe two types of research designs used when we select related

More information

STOCHASTIC CHOICE HEURISTICS *

STOCHASTIC CHOICE HEURISTICS * Acta Psychologica 56 (1984) 153-166 North-Holland STOCHASTIC CHOICE HEURISTICS * K. Michael ASCHENBRENNER, Dietrich ALBERT and Franz SCHMALHOFER University of Heidelberg, West-Germany A class of stochastic

More information

STATS Relationships between variables: Correlation

STATS Relationships between variables: Correlation STATS 1060 Relationships between variables: Correlation READINGS: Chapter 7 of your text book (DeVeaux, Vellman and Bock); on-line notes for correlation; on-line practice problems for correlation NOTICE:

More information