International Journal of pidemiology Oxford University Press 1981 Vol. 10, No. 1 Printed in Great ritain Weighting the Seriousness of Perceived Health Problems Using Thurstone's Method of Paired omparisons SP McKNN,* SM HUNT* and J McWN* McKenna SP [epartment of ommunity Health, The University of Nottingham, University Hospital, Queen's Medical entre, lifton oulevard, Nottingham NG72UH], Hunt SM and Mcwen J. Weighting the seriousness of perceived health problems using Thurstone's method of paired comparisons. International Journal of pidemiology 1981, 10: 9397. n exercise carried out to weight the seriousness of perceived health problems using Thurstone's method of paired comparisons is reported. For the purposes of illustration, the sleep section of the Nottingham Health Prof ile containing 5 statements is used. The calculation of the scale values associated with each statement is described and a new method of deriving weights from the scale values is proposed. The judgement of comparisons was found acceptable to interviewees. The applicability of the paired comparison technique to the field of health is discussed. One reason for collecting information about health is to help determine the services required by the community. Traditionally, health has been measured by mortality and morbidity rates. While such measures are routinely collected and readily available they are relatively crude and there have been many calls for the development of alternative measures of service provision 1 ' 2 and decision making. One such measure is that of 'quality of life' which may be more influential in determining demand on services since objective medical criteria cannot give an insight into the patient's feelings of wellbeing. The need for some measure of 'quality of life' has led to the development of subjective and sociomedical indicators. problem in the development of a subjective health index is that of measurement. Whereas deaths and episodes of diagnosed illness are relatively easy to count, descriptions of wellbeing are not. The Nottingham Health Profile, whose development is reported in detail in Martini and Mcowell, 3 Mcowell et al, 4 and Hunt and Mcwen, 5 is designed to give standardised assessments of subjective health status. n instrument which purports to measure perceived health status must be standardised for the relevant populations and provide for differ epartment of ommunity Health, The University of Nottingham, University Hospital, Queen's Medical entre, lifton oulevard, Nottingham NG7 2UH. This research was supported by Grant number HR615 7/1 awarded by the Social Sciences Research ouncil. ences in the severity of subjective states. It is made up of 6 sections; energy, pain, sleep, social isolation, emotional reactions and physical mobility. ach section consists of statements describing problems in that domain. The respondent is required to indicate whether or not each statement applies to him or her. For the purpose of comparison between sections, it was decided that the maximum score on each section should be 100. The problems described by the statements vary in seriousness. For example, the section on sleep (which will be used for illustration in this paper) includes the following statements: 'I'm waking up in the early hours of the morning'. 'I lie awake for most of the night'. There is clearly a disparity in the magnitude of the problems described. For this reason it was considered inappropriate to give equal 'weight' to each statement. gain, within sections statements do not describe different degrees of the same problem. For example, the sleep section includes the statement 'I take tablets to help me sleep'. This statement cannot be placed on a single concept continuum defined by the 2 statements above. However, it is possible to conceive of all the statements in the section as being on a continuum from minor to serious sleep problems. Whereas it would be a difficult task for a respondent to place several statements in rank order of seriousness, it is an easier task for him to consider 2 statements and make a judgement about which ownloaded from http://ije.oxfordjournals.org/ at Pennsylvania State University on March 6, 2016 93
94 INTRNTIONL JOURNL OF PIMIOLOGY represents a greater problem. Thus, it was decided that Thurstone's Judgement Scaling Model would be the one most appropriate for determining the weight which should be given to each statement. The model underlying the Thurstone scaling methods is described in Thurstone 6 and Torgerson. The task is to locate the stimuli (statements) on a psychological continuum. The psychological continuum can be considered to be a continuum of subjective or psychological magnitudes. Thurstone argues that each psychological magnitude is mediated by a 'discriminal process' which is 'that process by which the organism identifies, distinguishes, or reacts to stimuli'. ach stimulus when presented to a respondent gives rise to a discriminal process. ecause of momentary fluctuations in the organism a given stimulus will not always excite the same discriminal process. It is postulated that if the same stimulus is presented to the respondent several times, there will be a normally distributed number of discriminal processes on the psychological continuum. The scale value of the stimulus on the psychological continuum is taken to be the modal, median or mean discriminal process, as all 3 measures arc the same for a normal distribution. n alternative to presenting the same stimulus a large number of times to the same subject is to present the same stimulus to a large number of subjects. It is possible to deduce equations relating judgements of relations among stimuli to the scale values of the stimuli on the psychological continuum. One of these sets of equations is the law of comparative judgement, which is concerned with pairedcomparison judgements. The equations relate the proportion of times any stimulus k is judged greater on a given attribute (in this case sleep problems) than any other stimulus ;'. MTHO The section on sleep contains the following 5 statements: 'I'm waking up in the early hours of the morning' 'It takes me a long time to get to sleep' 'I sleep badly at night' 'I take tablets to help me sleep' 'I lie awake for most of the night'. The method of obtaining empirical estimates of the proportions discussed above is known as the method of paired comparisons. ach statement was compared with each other statement, giving n (n 1)/2 or 10 pairs. ach pair was presented to the subject who was required to indicate which of the statements represented the greater sleep problem. ach statement was typed onto a 4 inch by 3 inch card with an identifying letter ( to ) in the top left hand corner. The respondent was required to respond with the letter corresponding to the selected statement. Pairs of cards were presented in a randomised order with the order reversed for 50% of the trials. 204 interviews were carried out on the section on sleep. Subjects pilot study had been conducted to test the acceptability of the experimental method and to investigate the effect of demographic variables. The pilot study showed that respondents found the task very interesting and that few people had any difficulties with it. No associations were found between age, sex or the respondents' perceived health status and results obtained by the paired comparison method. Subjects for the main study were obtained from the following places: a family planning clinic, a large local industrial company, the Nottingham Medical School library, a cafeteria run by the Women's Royal Volunteer Service, and the following outpatients clinics: dermatology, orthopaedic surgery and ear, nose and throat. t the outpatient clinics interviews were conducted with both patients and nonpatients who were accompanying friends, relations, etc. Interviews The weighting of the 6 sections involved over 1200 interviews and 9 interviewers were employed. ach interviewer received training designed to standardise the interview. The interviewer introduced herself and explained that research was being conducted into the problems people have in their daily lives when they are unwell. Interviewees were invited to participate and no attempt to pressure them was made. It was explained that the same statements would be seen a number of times but that the same two statements would not be shown together. They were asked not to concentrate on their own experiences, but to consider the statements from the view point of people in general. They were also told that there were no 'right' or 'wrong' answers. uring the interview, the interviewer checked that the respondent understood the task by saying after the first few answers 'So you think... would be a worse problem for someone to have?' xplanations were avoided during the interview but general questions were answered at the end of the exercise. If the interview was interrupted (for example, ownloaded from http://ije.oxfordjournals.org/ at Pennsylvania State University on March 6, 2016
PRPTION OF HLTH PROLMS 95 when the patient was called in for consultation) it was abandoned. Where it was thought that the respondent did not fully understand the task, that he was unable to read the statement, or if there was any doubt about the quality of the responses, the interview data were excluded. RSULTS N LULTION OF SL VLUS The results consisted of the number of times each statment in a pair was considered a greater sleep problem than the other statement. The results are shown in the raw frequency matrix F shown in Table 1. SUM TL 1 70 48 81 35 TL 2 MN 0.34 0.24 0.40 0.17 1.65 0.33 Matrix F: frequency of response Statement judged i greater problem 134 72 67 44 156 132 94 64 123 137 110 98 169 160 140 106 Matrix P: proportion matrix for tbe sleep section weighting 0.66 0.35 0.33 0.22 2.06 0.41 0.76 0.65 0.46 0.31 2.68 0.54 0.60 0.67 0.54 0.48 2.79 0.56 0.83 0.78 0.69 0.52 3.32 0.66 Matrix P (Table 2) is constructed from matrix F. The values in the cells are the proportion of times statement k is judged a greater sleep problem than statement /. Symmetric cells sum to one. For example, P (0.65) + P (0.35) = 1. The diagonal cells can be left blank or the value 0.5 can be entered because it is assumed that if respondents were required to compare a statement with itself the first and second statement cards would be selected an equal number of times. TL 3 0.41 0.71 0.25 0.95 Matrix X: unit normal deviate matrix for tbe sleep section weighting 0.41 0.39 0.44 O.77 0.71 0.39 0.10 O.50 0.25 0.44 0.10 0.05 0.95 0.77 0.05 SUM 2.32 1.19 0.74 2.27 MN 0.464 0.240 0.100 0.148 0.454 From matrix P, matrix X Table 3 the basic transformation matrix is constructed. The cell values are the unit normal deviates corresponding to the proportions given in matrix P. The values are positive when the proportion is greater than 0.5, and negative when the proportion is less than 0.5. In matrix X symmetric cells sum to zero. Mosteller and Horst have shown that the usual procedure for obtaining estimates of scale values from a matrix X with no vacant cells is a leastsquare solution. derivation following Mosteller's treatment is given by Torgerson. 7 In practice, the scale values are calculated by summing and averaging the normal deviate values for each statement. These calculations for the sleep section are shown in Table 3. The above description has shown how scale values have been estimated from the observed proportions. Given these scale values the procedure can be reversed, so that derived proportions can be obtained from the estimates of the scale values. This procedure will not be discussed here, but the applicability of the model used to the data can be tested by seeing how well the derived proportions correspond to the observed proportions. One such procedure is simply to obtain the average absolute deviation. If this average discrepancy is 'small' it can be concluded that the model fits adequately. For the sleep section the average absolute deviation is 0.5. n overall test of goodness of fit is given by Mosteller. 10 This test uses the inversesine transformation on the observed and derived proportions. The value of x 2 for the sleep section derived from Mosteller's formula is almost zero, suggesting that the goodness of fit is extremely high. VLOPMNT OF WIGHTS FROM TH SL VLUS For most purposes the weights can be taken to be ownloaded from http://ije.oxfordjournals.org/ at Pennsylvania State University on March 6, 2016
96 INTRNTIONL JOURNL OF PIMIOLOGY the scale values derived from the observed proportions. For example, the scale values of stimuli developed from the responses of an experimental group can be compared with the scale values developed from the responses of a control group. For the purpose of the Nottingham Health Profile the scale values could not be used as weights since the sum of these values would always equal zero. It was convenient that a score of zero should indicate no sleep problems, while a high score (say 100) should indicate that all the problems applied to the respondent. Scale values derived from the law of comparative judgement locate the stimuli on the psychological continuum with respect to one another only. Torgerson suggests that the zero point should be chosen arbitrarily by adding a constant to each of the scale values, but this raises the problem of what constant to add. dding one to each scale value would maintain a reasonably large spread in the weights given to each statement, adding ten to each scale value would have the effect of producing weights for statements with a minimal range. Torgerson also argues that an absolute zero point y O4x + a5 l.or 3 2 1 0 1 2 Unit normal deviate to FIGUR 1 umulative proportion against unit normal deviate showing positions of the five statements in the sleep section can be equated with the absolute limen as determined by standard psychophysical methods. Unfortunately, the absolute limen, while being applicable to many cononua particularly in the sensory domain cannot be applied to the domain of sleep problems. It is possible to postulate a new method for determining a rational zero point from which weightings can be derived. Figure 1 shows the relationship between cumulative proportion and unit normal deviate. Theoretically, the proportion 0 will have a unit normal deviate value of minus infinity. Hence, the absolute zero value is also minus infinity. However, it is possible to calculate the regression line about the point where the cumulative proportion = 0.5. Figure 1 shows that this line (y = 0.4x + 0.5) is a good approximation to the plotted curve between the values of x±6.5. For the sleep section all scale values fall within this range. The same is true in general for the scale values of statements in the other sections of the Nottingham Health Profile. The scale values and associated mean proportions of the statements in the sleep section (shown on Table 2) are plotted in Figure 1. It can be seen that all the points lie close to the line y = 0.4x + 0.5. This line crosses the unit normal deviate (x) axis at the point 1.25. It is postulated that this point is the rational zero point for the scale values and that the constant 1.25 should be added to each scale value in order to determine the weight allocated to each statement. The addition of 1.25 to the scale values of the statements gives the following weights: = 0.786, = 1.010, = 1.350, = 1.398, = 1.704. The total of the weights of the 5 statements is 6.248. In order to standardise the weights so that a respondent affirming all 5 statements will receive a score of 100, each weight is multiplied by 122 or 16.005, u u 6 248 giving the weights: = 12.58, = 16.17, = 21.61, = 22.37, = 27.27. These are the weights used in the sleep section of the Nottingham Health Profile. It should be noticed that the proportional distances between the scale values of the statements are the same as the proportional distances between the weights of the statements. For example, the proportional distance between statements and is (a) for scale values, Z^ZZ. x 100 = 37% and 0.918 (b) for weights, ^^ x 100 = 37% ownloaded from http://ije.oxfordjournals.org/ at Pennsylvania State University on March 6, 2016
PRPTION OF HLTH PROLMS 97 ISUSSION This paper has discussed the development of weights for statements on the Nottingham Health Profile referring to sleep problems. Thurstone's method of paired comparisons was used to provide the data from which the scale values and weights were developed. One possible criticism of Thurstone's model is that the discriminal processes may not be normally distributed on the attribute continuum. However, the continuum and the distributions are hypothetical constructs and thus cannot be measured directly. Nunnally 11 argues that, 'If discriminal processes for two stimuli are normally distributed, as most errors of this type tend to be for many different phenomena, the distribution of differences in discriminal processes will be normally distributed'. check on the goodness of fit of the model to the data is provided by Mosteller's test (discussed above). This test involves the construction of a matrix of normal deviates corresponding to the estimates of the scale values. From this matrix, a matrix of fitted proportions is constructed. The test depends on how well the fitted proportions correspond to the observed proportions. s was stated above, there was extremely good agreement between the observed and fitted proportions, indicating that the model, based on the assumption of normal distribution of the discriminal processes, is a good fit to the data. Thus the method of paired comparisons was found to be a satisfactory method of placing statements with differing specific concepts onto a general psychological continuum. It is argued that the final weights for statements on the sleep section are an accurate reflection of the relative seriousness of each statement, as judged by the experimental subjects. Respondents completing the section on sleep in the Nottingham Health Profile can score from zero (no statements apply) to 100 (all 5 statements apply). It is possible to argue that section scores are at least at the interval level of measurement. Whether or not such scores can be considered to be on a ratio scale is a matter for debate. The method of paired comparisons is a particularly useful method of scaling in the field of health. Its use should be considered where problems of rating the seriousness of symptoms, diseases or injuries arise. The method has the great advantage of simplifying the judgement task for the respondents who found the task relatively straightforward to perform, few being unable to complete it. One disadvantage is that where large numbers of items require ranking or weighting, the number of paired comparisons increases dramatically. In every case ihzl) paired comparisons are needed. For example, if the experimenter wished to determine the relative unpleasantness of 15 symptoms, 105 paired comparisons would be necessary, although Torgerson suggests methods by which the experimental labour may be reduced. urroughs 12 considers the technique to be both statistically elegant and powerful. He counters the problem of the need for large numbers of comparison judgements with the following commment: 'if this forces us to explore with the rapier of 5 items rather than the bludgeon of 100, it may be no bad thing'. RFRNS World Health Organization. Statistical Indicators for the Planning and valuation of Public Health Programmes. Technical Report Series No. 472, 1971. White KL. ontemporary pidemiology. Int J pidemiol 1974; 3i 295303. Martini J and Mcowell I. Health Status: patient and physician judgements. Health Services Research 1976; Winter: 508515 Mcowell I, Martini J, Waugh W. method for selfassessment of disability, before and after hip replacement operation. rmedj 1978; lit 875879. Hunt Sonja M and Mcwen J. The development of a subjective health indicator. Social, of Health and Illness 1980; 2i 231246. Thurstone LL. law of comparative judgement. Psycbol Rev 1927; 34i 273286. Torgerson WS. Theory and Methods of Scaling. New York: John Wiley, 1958. Mosteller F. Remarks on the method of paired comparisons. I. The least squares solution assuming equal standard deviations and equal correlations. Psycbometrika 1951; 16i 311. Horst P. The prediction of personal adjustment. Soc Sci Res oun ull No. 48. 1941. Mosteller F. Remarks on the method of paired comparisons. III. test of significance for paired comparisons when equal standard deviations and equal correlations are assumed. Psycbometrika 1951; 16. 207218. Nunnally J. Psychometric Theory. (2nd dition) New York: McGrawHill, 1978. urroughs GR. esign and nalysis in ducational Research (2nd dition). ducational Monograph No. 8. ducational Review, Oxford, 1975. ownloaded from http://ije.oxfordjournals.org/ at Pennsylvania State University on March 6, 2016 (Revised version received 30 October 1980)