Tilburg University. Mokken scale analysis for polychotomous items Sijtsma, K.; Debets, P.; Molenaar, I.W. Published in: Quality & Quantity

Size: px

Start display at page:

Download "Tilburg University. Mokken scale analysis for polychotomous items Sijtsma, K.; Debets, P.; Molenaar, I.W. Published in: Quality & Quantity"

Myron Pope
6 years ago
Views:

1 Tilburg University Mokken scale analysis for polychotomous items Sijtsma, K.; Debets, P.; Molenaar, I.W. Published in: Quality & Quantity Publication date: 1990 Link to publication Citation for published version (APA): Sijtsma, K., Debets, P., & Molenaar, I. W. (1990). Mokken scale analysis for polychotomous items: Theory, a computer program and an empirical application. Quality & Quantity, 24(2), General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. - Users may download and print one copy of any publication from the public portal for the purpose of private study or research - You may not further distribute the material or use it for any profit-making activity or commercial gain - You may freely distribute the URL identifying the publication in the public portal Take down policy If you believe that this document breaches copyright, please contact us providing details, and we will remove access to the work immediately and investigate your claim. Download date: 29. Jun. 2018

2 Quality & Quantity 24: , Kluwer Academic Publishers. Printed in the Netherlands. Mokken scale analysis for polychotomous items: theory, a computer program and an empirical application K. SIJTSMA 1, P. DEBETS 2 & I.W. MOLENAAR 3 1Free University of Amsterdam, The Netherlands (Correspondence address: Sectie Arbeids- en Organisatiepsychologie, Vrije Universiteit, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands) 2University of Amsterdam, The Netherlands 3State University of Groningen, The Netherlands Abstract. This paper contains three subjects. First, an extension of Mokken's nonparametric item response models from dichotomous items to items with two or more ordered answer categories is proposed. Second, a computer program to analyze multicategory item scores is presented. This program is called MSP. The analyses by means of MSP are based on the multicategory extension of Mokken's theory. Finally, an application of MSP to empirical multicategory test data is presented in order to illuminate its possibilities. Introduction Several computer programs exist for the analysis of dichotomously scored test items according to the nonparametric item response models proposed by Mokken (1971); also see Mokken & Lewis (1982). Among these programs are MOKKEN SCALE and MOKKEN TEST in the package STAP and their stand alone versions MOKSC and MOKTST that were designed for mainframe computers during the seventies in The Netherlands (Niem611er, 1980; Niem611er & Van Schuur, 1980). Kingma and Reuvekamp (1986a, b) recently presented programs for microcomputers. The test models by Mokken and the programs based on these models are restricted to dichotomous item responses. Unless the researcher is willing to dichotomize his/her multicategory responses, which means throwing away information provided by examinees, another test model must be used to analyze the test data. Molenaar (1982, 1986) and Molenaar & Sijtsma (1988) have extended Mokken's theory to the general case of multicategory items with ordered response categories, dichotomous items being a special case. Based on this generalization, a computer program has been developed (Debets & Brouwer, 1986), which is suited both for mainframe and micro computers. This program, which is called MSP (Mokken Scale analysis for

3 174 K. Sijtsma et al. Polychotomous items), can be regarded as a generalization as well as a successor of the MOKKEN SCALE program mentioned above. In this paper we first give a brief introduction to the dichotomous Mokken approach, followed by a review of its multicategory extension. Second, relevant technical information is provided with respect to the program MSP. Finally, an example of an application of MSP to thirteen items having four response categories each is given. These data were originally analyzed by Middel & Van Schuur (1981) after a proper dichotomization. Results from their study are compared with our results. Mokken's approach for dichotomously scored items Mokken (1971) proposed his approach to item response theory for dichotomously scored test items only. Response categories are usually interpreted as either positive or negative with respect to the attribute measured by the test. It is assumed that the probability of a positive response on an item is a monotonely nondecreasing function of the attribute measured. This function is usually called the Item Characteristic Curve (ICC). A set of items having monotonely nondecreasing ICC's, and each being a measure of the same attribute, is called a monotonely homogeneous set. It can be shown that apart from ties, the expected order of the examinees on the latent attribute is the same for each selection of items from a monotonely homogeneous set. An additional restriction on the set of items is that their ICC's do not intersect. Such a set of items is called doubly monotone. Analogously to the "item free" order of subjects, it can be shown that the order of items according to their difficulty is the same within each subpopulation of persons from the population for which the model of double monotonicity holds. It may be noted that neither of the models discussed assumes a parametric definition of the ICC or of the distribution of the attribute across persons. Hence these models are called nonparametric. Mokken (1971) derived observable properties of both models. These properties can be used to check the empirical fit of these models to test data. Results based on these properties are output of the programs mentioned in the preceding section. Introductions to Mokken's approach are provided by, e.g., Stokman and Van Schuur (1980), Mokken and Lewis (1982), Niem611er and Van Schuur (1983) and Sijtsma (1988b). Empirical applications of the dichotomous Mokken models in test construction research are presented by, e.g., Mokken (1971), Henning and Six (1977), Middel and Van Schuur (1981) and by Gillespie et al. (1987).

Mokken scale analysis for polychotomous items 175 Mokken's approach extended to multieategory items Attitudes and personality traits are often measured by means of items with ordered answer

4 Mokken scale analysis for polychotomous items 175 Mokken's approach extended to multieategory items Attitudes and personality traits are often measured by means of items with ordered answer categories. Rating scales are well known examples of such ordered categories. Molenaar (1982, 1986) has proposed an extension of Mokken's model from the dichotomous to the multicategory ordered case. This extension is discussed in the next three subsections. Nonparametric models for multicategory items The notion of the item step is central in the multicategory approach. As an example we mention an attitude item having three ordered response categories, e.g., disagree, neutral and agree. This item may be thought to consist of two dichotomous item steps. The first item step consists of the imaginary question whether the respondent agrees enough to the positively phrased attitude statement to take the step from disagree to neutral. If the imaginary answer is affirmative, the item step score equals one, and in the negative case it equals zero. The second step consists of the imaginary question whether the step can be made from neutral to agree. A positive answer again yields an item step score equal to one and a score equal to zero if the response is negative. The item score equals the sum of the two separate item step scores. It may be tempting to think that the multicategory models actually can be replaced by the dichotomous models for item steps instead of items. Item steps belonging to the same item are dependent however: a negative response on the first step implies negative responses on the subsequent steps of a fixed item, and a positive response on a specific step implies positive responses on the preceding steps. Such dependencies are not allowed in item response theories. Molenaar's (1982, 1986) approach is, therefore, based on the multicategory items. We introduce some notation to formalize the definitions of item steps, their difficulties and the Item Step Characteristic Curve (ISCC). First, we assume that each of k items in a test consists of m + 1 ordered answer categories. This means that each item can be viewed as a sequence of m imaginary dichotomous item steps ordered along the latent continuum. The attribute values on this continuum are denoted by ~:. Furthermore the score on item i is denoted by X~, and the score on step g of item i is denoted by Ygi. The item score takes integer values ranging from 0 to m, while Ygi takes the values zero (item step is failed) or one (item step is passed). The relation between Xi and Ygi is

5 176 K. Sijtsma et al. X, = ~] Yg,. (1) g=l Second, we assume that each person in a population of interest is characterized by a position on the latent continuum ~, and that the cumulative distribution function of ~ within this population is denoted by G(~). Third, we assume that unidimensionality of measurement holds across the items in a test, as well as local stochastic independence of all item responses of fixed persons. The ISCC may now be defined as follows, denoting probabilities by "rr: Wgi(~) = Prob(Yg, = 11~) = Prob(Xi I> gl~). (2) The ISCC thus gives the conditional probability that the item step score equals one, or that the item score equals or exceeds the value g. In the multicategory model of monotone homogeneity, the ISCC is a nondecreasing function of ~. In the multicategory model of double monotonicity, in addition the ISCC's do not intersect. The difficulty of an item step can be obtained from ~rg~ = fe 7rg,(~:) dg(~:) = Prob(Ygi = 1) = Prob(Xi >/g). (3) This is the proportion of persons having an item step score equal to one. These persons have item scores equal to Xi = g, g + 1,..., m. In the sample, ~rgi is estimated by means of "~gi = ~ nm/n, h=g (4) where n denotes the sample size, and nhi denotes the number of persons in the sample having an item score Xi = h. Since ~ can not be estimated, we may replace it by the true score from classical test theory, which has the same ordering as ~. The true score is estimated by the test score X which is the unweighted sum of all item scores X~ in a test. For dichotomous items it follows from Grayson (1988) that apart from sampling errors, X and ~ are monotonely related. A heuristic method for checking monotone homogeneity may thus be obtained in the following way. First, all score groups are constructed, where each score group contains all persons having the same test score X. Second, for a specific item step the difficulty #8~ is determined within each score group. A check on monotone

6 Mokken scale analysis for polychotomous items 177 homogeneity may then consist of inspecting whether the magnitude of the estimated item step difficulties ~g~ is nondecreasing across increasing values of the test score X. This check is carried out for all item step difficulties. Since the mathematical proof of a nondecreasing relation between 7rg~ and the test score X has not been supplied, this check on monotone homogeneity should be used with caution. MSP provides a table with all estimated item step difficulties for all score groups. This table can be used to perform the check on monotone homogeneity. For two items i and j and response categories g and h, the bivariate proportion 7r~i,hj can be obtained: ( 7"Fgi,hj = Je 7rg~(SC)Trhj(s c) dg(~) = = Prob(Yg~ = 1, Yhj = 1) = = Prob(Xi/> g, Xj/> h). (5) In the sample, this proportion is estimated by 7rgi,hj = ~ ~ nei,fj/n. (6) e=gf=h In (6) e and f denote category numbers, and nei,~ denotes the number of persons in the sample who have item scores Xi = e and Xj = f. Based on Mokken's results (1971, p. 132) for dichotomous items it can be shown that under double monotonicity the proportions 7ru,-hj and the item step difficulties ~rg~ are ordered identically within the group of examinees who have a fixed item step score on step hi. This means that within the group of persons all characterized by a score equal to one on item step hi, the order of the difficulties 7rgi (g = 1,..., m; i = 1... k) is identical to the order of the bivariate proportions 7rgi,hj (g = 1..., m; i = 1... k). The output of MSP provides a table where the proportions #gi are arranged according to increasing magnitude along the marginals. Inside the table are the bivariate proportions #g~,hj. Apart from sampling fluctuations rows and columns of the table should be monotonely nondecreasing. This table thus makes possible an empirical check of double monotonicity. A scalability coefficient for multicategory items The scalability coefficient for dichotomous items (Mokken, 1971, p. 148) has

7 178 K. Sijtsma et al. been generalized (Molenaar, 1982) to multicategory items. The definition of an error pattern needed for this coefficient is adapted to the level of item steps: an error pattern is obtained when an examinee succeeds on a relatively difficult item step but does not succeed on an easier item step concerning another item. The multicategory scalability coefficient expresses the presence of such error patterns compared with the situation in which the null model of statistical independence holds among the items. To define the scalability coefficient for two items i and j, as an example we take two arbitrary item steps characterized by difficulties 7rgs and %/. We assume that 7rg~ < 7rhj, so that step g of item i is the most difficult of the two steps. An error thus occurs if a person passes step g of item i but fails step h of item j. If 7rg~ = ~'hj the definition of an error is arbitrary. For items i and j the crosstabulation of item scores is considered next. Each cell of the table contains the number of persons having a specific score pattern on i and j. Given the orderings of the steps of these items according to increasing difficulty as defined in (3) and estimated by (4), it can be deduced that some score patterns on items i and j are permitted while others are errors as defined above. The total frequency of error patterns on i and j" is obtained by summation of the frequencies across the error cells of the table. This total sum is denoted Oq (Observed sum of errors). The total sum to be expected when marginal independence of the items holds, can be obtained by means of the marginals of the cross table of items i and j. This sum is denoted by Eq (Expected sum of errors). For two items i and j the scalability coefficient is defined as Hq = 1 - Oq/Eq. (7) Based on this definition are coefficients for the scalability of one specific item with respect to the other items in the test, as well as a coefficient for a set of k items. The sums Oq and Eq are determined for all pairs of items, and then summated across item pairs: Hi = 1 - Oq Eq; j4-i (8) k--1 k k:--i k H = ij/E E Eij. (9) i=1 j=i+l i=1 j=i+l It may be noted that the maximum values of these coefficients equal unity. Their minimum values are not fixed. A value equal to zero is obtained when

Mokken scale analysis for polychotomous items 179 the number of observed errors equals the number to be expected given marginal independence.

8 Mokken scale analysis for polychotomous items 179 the number of observed errors equals the number to be expected given marginal independence. According to Mokken the following rules of thumb can be used for the interpretation of the overall scalability coefficient H: the items form a weak scale in case of 0.30 ~< H< 0.40, a medium scale in case of 0.40~<H<0.50 and a strong scale when H~>0.50. For the use of H as a measure of monotone homogeneity in the dichotomous case see Mokken (1971) and Mokken et al. (1986). MSP contains an algorithm for the selection of items from a larger pool using the H coefficient, exactly as in the earlier programs for dichotomous items. This algorithm selects items, starting with the pair having the highest Hij coefficient, followed by stepwise selection of those items which maximize in each separate step the total scalability coefficient H of the items selected thus far. A detailed explanation of this algorithm is given in the results section. Reliability of test scores Molenaar and Sijtsma (1988) have proposed an extension of the reliability estimate called MS by Sijtsma (1988a) and Sijtsma and Molenaar (1987) to multicategory items. This extension is based on the order of the item steps according to their difficulty, as well as on the assumption of double monotonicity on the level of ISCC's. We refer to Molenaar and Sijtsma (1988) for a technical explanation of the method of reliability estimation. In this paper we briefly review the method. Molenaar and Sijtsma (1988) show that the reliability of the test score X equals i=l g--1/'=1 h=l (lo) The variance o-2(x) can be estimated directly from empirical data. Furthermore, the item step difficulties as well as the bivariate proportions ~'gi,hj (i j) can also be estimated directly. The only problem arises for bivariate proportions if i=j, because a direct estimate would require independent replications of the same item. Based on theory proposed by Mokken (1971, pp ), the method presented by Molenaar and Sijtsma (1988) amounts to an approximation of such proportions without the need for independent replications. An estimate of the reliability is obtained by insertion of these approximations in formula (10) together with the insertion of the other statistics.

180 K. Sijtsma et al. For dichotomous items, Sijtsma and Molenaar (1987) found that pxx,, estimated according to (10), is only biased to a negligible extent.

9 180 K. Sijtsma et al. For dichotomous items, Sijtsma and Molenaar (1987) found that pxx,, estimated according to (10), is only biased to a negligible extent. MSP provides a reliability estimate of the test score according to the method outlined in this subsection. The program MSP Given a sample of n individuals measured on a set of k items, each with at least two ordered answer categories, MSP offers the following possibilities: -- evaluation of a set of items as an a-priori scale; -- stepwise construction of one or more scales from a given pool of items, with optionally one or more items specified as a startset; -- a check of the assumptions underlying the models of monotone homogeneity and double monotonicity; -- estimation of the reliability of a scale. MSP can handle up to 100 items. For each item the maximum number of answer categories is 10. Subjects with item scores outside a specified range are deleted from the analysis. There is no restriction to the number of subjects. The data consist of the score matrix for the subjects on all items, the bivariate tables with frequencies for all item pairs or the response patterns with their frequency counts. The score matrix has to be a "raw data file"; system files from statistical packages are not allowed. The program MSP has no facilities for transformations of variables, recoding variables, selection of respondents or handling of missing values other than by listwise deletion. Such manipulations have to be done before data are entered. Options and statistics control the output of the program. Through the statistics practically all essential results can be obtained; options can be used to instruct the program that the second and next scales that are found in a set of items will be used as a startset in a later stage of the analysis. The mainframe version (CDC, IBM or VAX) of MSP works comparably to a program like SPSS: a setup with control commands (keyword expressions) is read from a file, data are read from a file and the output is written on a file. A micro computer version of MSP is available for IBM (or compatibles) with 640k memory. This version works for the most part in the same way as the mainframe version. Instead of input and output files, input and output windows on the screen can be used. The input window acts as screen-editor

10 Mokken scale analysis for polychotomous items 181 for preparing the setup for MSP. After execution the output window shows up with the output of the analysis. Switching from output- to input-window is possible in order to change (edit) the setup, to correct errors, to specify different values for some of the parameters or to prepare a new analysis for a new dataset. An application of MSP to multicategory test data Procedure As an example of an analysis of empirical test data by means of MSP we use data from an investigation of the attitudes of delegates of Dutch political parties (Middel & van Schuur, 1981). The delegates were asked the following question: We would like to ask about how much you would trust people from different countries. For each country please indicate whether in your opinion, they are in general very trustworthy, fairly trustworthy, not particularly trustworthy, or not at all trustworthy. The countries to be judged were the European Common Market countries Italy, West Germany, England, Ireland, Belgium, Luxembourg, The Netherlands, Denmark and France, and furthermore, Switzerland, The United States, Russia and China. Middel and Van Schuur analyzed the judgments of 1213 delegates with the dichotomous Mokken model. They used a dichotomized version of the originally polychotomous item scores. In our example pertaining to the original data we excluded respondents with a missing value on one of the items and respondents with the same response on all items. This resulted in a dataset of 806 respondents. In the analysis the search procedure was used: in the set of items one or more scales are constructed by means of a stepwise bottom up item selection procedure. A more detailed explanation of this procedure is given in the next subsection. The parameters to control the selection steps were specified such that the maximum number of scales to be formed equals k/2, the procedure starts with the best item pair (largest Hirvalue) and the minimum value of H permitted is 0.30 (lower bound for a weak scale). Results In Table 1 as a first result of MSP the frequency distributions for each item are given. The items are ordered according to increasing means (the higher

11 - - the 182 K. Sijtsma et al. the mean, the greater the trust in the inhabitants). In all output of MSP the items are ordered according to increasing means, unless otherwise specified by means of an option. In the table it can, e.g., be seen that the Dutch party delegates consider the Russians least trustworthy and their fellowcountrymen most trustworthy. Given the frequency distributions for these two items it can furthermore be concluded that the consensus of opinion is stronger for the Dutch than for the Russians. Next, MSP provides tables containing the H~Tvalues for all item pairs, as well as the statistics for testing whether Hej is significantly larger than zero. It is also possible to obtain the frequency tables for each item pair containing the data on which Hij for that pair is based. This output is very voluminous, however. In the search procedure from all item pairs the pair of items is selected that has the highest HiFcoefficient. This coefficient must be larger than the lowerbound specified by the user (usually 0.30) and should differ significantly from zero at a given level of significance. To avoid chance capitalization the significance level is adapted to the number of tests carried out. For the test of significance the asymptotically standard normal distributed statistic DELTA STAR (based on theory by Mokken, 1971, pp ) is used. The "two-item scale" is then extended in a stepwise manner. Each subsequent item must fulfil the following conditions: item must have positive Hq with all other items already admitted to the scale; Table 1. Frequency distributions for all items: i = "not trustworthy at all", 2 = "not particularly trustworthy", 3 = "fairly trustworthy" and 4 = "very trustworthy" Name Label Mean Values Item 12 Russians Item 13 Chinese Item 1 Italians Item 9 French Item 11 Americans Item 2 Germans Item 4 Irish Item 3 British Item 10 Swiss Item 5 Belgians Item 6 Luxembourgeois Item 8 Danish Item 7 Dutch

12 - - the - - the - - the - - the Mokken scale analysis for polychotomous items the item scalability coefficient Hi for the item with respect to the items already selected is greater than or equal to the lowerbound specified by the user and the item coefficient differs significantly from zero; general scalability coefficient H for the resulting scale is largest given all choices of items from the pool of items not yet selected. In the example the search procedure starts with item 5 (Belgians) and item 6 (Luxembourgeois) with //56 = Successively the following items are added: item 7 (Dutch), item 8 (Danish), item 4 (Irish) and item 3 (British). The final result (see Table 2) is given in terms of: general scalability coefficient H. In this case H= 0.41, indicating a medium scale; mean item score (the higher the mean, the higher the trust) which is also provided in Table 1; -- the item scalability coefficient Hi for each item (also for those not selected) with respect to the items in the scale; asymptotically standard normal distributed statistic Delta Star, which is used to test the hypothesis that H or Hi is larger than zero. Table 2. Final result of the search procedure Final scale 1, number of variables: 6 Scale coefficient H = 0.41, scale delta star = Variable Label Mean Item 4 Irish 3.03 Item 3 British 3.15 Item 5 Belgians 3.28 Item 6 Luxembourgeois 3.42 Item 8 Danish 3.54 Item 7 Dutch 3.56 Coefficients of non-scale variables with respect to this scale: Variable Label Mean Item 12 Russians 1.99 Item 13 Chinese 2.20 Item 1 Italians 2.38 Item 9 French 2.66 Item 11 Americans 2.96 Item 2 Germans 2.98 Item 10 Swiss 3.26 H i Hi Delta Star Delta Star

13 - - tables - - tables - - tables 184 K. Sijtsma et al. Furthermore the following results can be requested: containing all frequencies of persons having a specific test score as well as a specific score on a specific item in the scale; containing the proportions of positive responses on each item step within each separate test score group; containing the mean of each item within each separate test score group. The assumption of monotonely nondecreasing ISCC's (Molenaar, 1986) can be checked by means of a heuristic method using the table with the pro- portions of positive responses on each item step within each separate test score group (Table 3). For reasons of efficiency the results are only given for the items 3, 4 and 5. In the columns of the table we are looking for monotonely nondecreasing sequences of proportions. Except for some minor deviations, the selected items in our example fulfil this condition. As was pointed out before this method does not yet have a mathematical basis and it should thus be applied with caution. Under the assumption of the model of double monotonicity, the ISCC's may not intersect. This assumption can be investigated by means of the so called P-matrix, containing pairwise for all item steps, the probabilities of a positive response for both item steps involved. In this matrix the item steps are ordered according to increasing popularities, and so from left to right and from top to bottom the probabilities should be monotonely nondecreas- ing. The P-matrix in our example pertaining to six items has the order 18 by 18. It thus takes much space to display this matrix. In Table 4 only the part Table 3. Proportions of positive responses on each item step within each separate test score group N Item 4 Item 3 Item 5 Sum score ~2 ~3 ~4 ~2 ~3 ~4 )2 ~3 ~ , ,

14 - - the - - the - - the Mokken scale analysis for polychotomous items 185 with respect to the nine most difficult or least popular item steps is shown. The items in our example seem to fulfil the condition of double monotonicity. The P-matrix is also the basis for the computation of the reliability coefficient (Molenaar and Sijtsma, 1988). The reliability estimate for the six items selected in the first scale is shown at the bottom of Table 4. After the construction of the first scale, the search procedure restarts the analysis with the items not yet selected in this scale. This procedure is called "multiple scaling". If two or more scales can be constructed based on a set of items, this could be an indication of multi-dimensionality. In our example three additional scales can be constructed (see Table 5), whereupon no more items are left. Once an item is included in a scale, it can not be selected in another scale. It could be possible however that the composition of the scales has been affected by the order in which they were constructed. MSP contains an option which offers the possibility to use the second and following scales as startset for scale construction based on the complete set of items in a second phase of the analysis. In our example the use of the second, the third and the fourth scale as startset, respectively, yields the following results: second scale obtained in phase 1 remains the same; third scale (Swiss, Germans and Americans) is extended with Dutch, Belgians, Luxembourgeois and Danish. This scale has a scalability coefficient of 0.37; fourth scale (French and Italians) is extended with all items from the first scale (Irish, British, Belgians, Luxembourgeois, Danish and Dutch) Table 4. P-matrix and reliability coefficient. For i =~ j proportions are estimated directly from the data. For i = j proportions are approximated~by means of extrapolation methods (Molenaar & Sijtsma, 1988) P-matrix, estimates of + + probabilities for all paired item steps. Item steps are ordered according to their sample difficulties. Item 4 Item 3 Item 5 Item 6 Item 8 Item 7 Item 4 Item 3 Item 5 ~>4 />4 />4 />4 />4 />4 ~>3 />3 ~>3 Variable/> P Item Item Item Item Item Item Item Item Item Reliability RHO

15 186 K. Sijtsma et al. resulting in a scale with a scalability coefficient equal to In the resulting scale French and Italians have an item scalability coefficient less than 0.30, however. The results of this analysis can be compared with the results in Table 2, where the results for the first scale are given; the items of the fourth scale (French and Italians) are not included in this scale because of item scalability coefficients which are too low. Interpretation and comparison with the dichotomous analysis With respect to the judgment of the trustworthiness of inhabitants of countries several factors are important. The first scale contains countries which are neighbours of The Netherlands, which is the native country of the party delegates. The second scale contains the communist countries, and so on. Therefore, vicinity and political system partly account for the scales. In the original investigation (Middel and Van Schuur, 1981) a dichotomous analysis was performed. The polychotomous item scores were dichotomized in no trust (original scores 1 and 2) versus trust (original scores 3 and 4). The result was one scale containing all thirteen items. A possible explanation for this result could be the fact that Middel and Van Schuur did not exclude subjects with the same response to all items. The response patterns of Table 5. Results after multiple scaling Final scale 2, number of variables: 2 Scale coefficient H = 0.69, Scale delta star = Variable Label Mean Item I2 Russians 1.99 Item 13 Chinese 2.20 Final scale 3, number of variables: 3 Scale coefficient H = 0.38, Scale delta star = Variable Label Mean Item 11 Americans 2.96 Item 2 Germans 2.98 Item 10 Swiss 3.26 Final scale 4, number of variables: 2 Scale coefficient H = 0.39, Scale delta star = 9.69 Hi Delta Star Delta Star Variable Label Mean H i Delta Star Item 1 Italians Item 9 French

16 Mokken scale analysis for polychotomous items 187 these subjects do not contain errors and, consequently, scalability coefficients are inflated. A dichotomous analysis of the reduced dataset of 806 respondents results in an essential loss of information. If we look at the frequency distributions of each item (Table 1) we see that after dichotomization of the items 5, 6, 7 and 8 over 90% of the subjects would belong to a single category. The analysis of the dichotomized data results in two scales; the first consists of Russians, Chinese, Italians, French, Irish, Belgians and Luxembourgeois and the second consists of Americans, Germans, Swiss and Dutch. Discussion The extension of the Mokken model to multicategory item scores by Molenaar allows to avoid the loss of information resulting from a dichotomization of originally multicategory data. This dichotomization usually produces a less differentiated measurement scale, as well as smaller reliability of measurement. Furthermore, the results obtained in the previous section show that the composition of scales based on polychotomous and dichotomized data may differ. For some of the countries judgment of the trustworthiness of inhabitants means a choice between fairly or very trustworthy, while for other countries it means a choice between trust or distrust, and sometimes a finer differentiation is possible. The multicategory Mokken model offers the possibility to look in more detail at polychotomously scored items, some of which would be very popular or unpopular after a plausible dichotomization. Availability of the program MSP The program MSP is available for IBM-compatible micro computers and IBM, CDC or VAX mainframe computers. The price of the micro computer version including a user manual is $200 (price is without taxes, mailing, shipping and administration costs and may be subject to changes). Orders should be sent to: iec ProGAMMA, Kraneweg 8, 9718 JP Groningen, The Netherlands. References Debets, P. & Brouwer, E. (1989). MSP: a program for Mokken Scale analysis for Polychotomous items. Groningen: iec ProGAMMA.

188 K. Sijtsma et al. Gillespie, M., TenVergert, E.M. & Kingma, 3. (1987). "Using Mokken scale analysis to develop unidimensional scales", Quality & Quantity 21: 393-408. Grayson, D.A. (1988).

17 188 K. Sijtsma et al. Gillespie, M., TenVergert, E.M. & Kingma, 3. (1987). "Using Mokken scale analysis to develop unidimensional scales", Quality & Quantity 21: Grayson, D.A. (1988). "Two-group classification in latent trait theory: scores with monotone likelihood ratio", Psychometrika 53: Henning, H.J. & Six, B. (1977). "Konstruktion einer Machiavellismus-Skala", Zeitschrifi f~r Sozial Psychologie 8: Kingma, J. & Renvekamp, J. (1986a). "Mokken Scale: a pascal program for nonparametric stochastic scaling", Educational and Psychological Measurement 46: Kingma, J. & Reuvekamp, J. (1986b). "Mokken test for the robustness of nonparametric stochastic Mokken scales", Educational and Psychological Measurement 46: Middel, B.P. & Sehuur, W.H. van (1981). "Dutch Party Delegates", Acta Potitica i6: 24i Mokken, R.J. (1971). A Theory and Procedure of Scale Analysis. The Hague: Mouton. Mokken, R.J. & Lewis, C. (1982). "A nonparametric approach to the analysis of dichotomous item responses", Applied Psychological Measurement 6: Mokken, R.J., Lewis, C. & Sijtsma, K. (1986). Rejoinder to "The Mokken Scale: A critical discussion", Applied Psychological Measurement 10: Molenaar, I.W. (1982). "Mokken Scaling Revisited", Kwantitatieve Methoden 3 no 8: Molenaar, I.W. (1986). "Een vingeroefening in item response theorie voor drie geordende antwoordcategorieen", pp in G.F. Pikkemaat & J.J.A. Moors (eds), Liber Amicorum Jaap Muilwijk. Groningen: Econometrisch Instituut. Molenaar, I.W. & Sijtsma, K. (1988). "Mokken's approach to reliability estimation extended to multicategory items", Kwantitatieve Methoden 9, no. 28: NiemOller, B. (1980). Mokken Scale. STAP user's manual, vol. 4, part 2. Amsterdam: Technisch Centrum FSW, University of Amsterdam. Niem611er, B. & Van Schuur, W.H. (1980). Mokken Test. STAP user's manual, vol. 4, part 3. Amsterdam: Technisch Centrum FSW, University of Amsterdam. Niem611er, B. & Van Schuur, W.H. (1983). "Stochastic Models for Unidimensionat Scaling: Mokken and Rasch", pp in D. McKay, N. Schofield & P. Whiteley (eds), Data Analysis and the Social Sciences. London: Francis Pinter Ltd. Sijtsma, K. (1988a). "Reliability estimation in Mokken's nonparametric item response model", pp in W.E. Saris & I.N. Gallhofer (eds), Sociometric Research, Vol. 1: Data Collection and Scaling. London: MacMillan Press Ltd. Sijtsma, K. (1988b). Contributions to Mokken's Nonparametric Item Response Theory. Amsterdam: Free University Press. Sijtsma, K. & Molenaar, I.W. (1987). "Reliability of test scores in nonparametric item response theory", Psychometrika 52: Stokman, F.N. & Van Schuur, W. (1980). "Basic Scaling", Quality & Quantity 14: 5-30.

Latent Trait Standardization of the Benzodiazepine Dependence. Self-Report Questionnaire using the Rasch Scaling Model

Chapter 7 Latent Trait Standardization of the Benzodiazepine Dependence Self-Report Questionnaire using the Rasch Scaling Model C.C. Kan 1, A.H.G.S. van der Ven 2, M.H.M. Breteler 3 and F.G. Zitman 1 1