Weighting the Seriousness of Perceived Health Problems Using Thurstone's Method of Paired Comparisons

Similar documents
Empirical Formula for Creating Error Bars for the Method of Paired Comparison

Louis Leon Thurstone in Monte Carlo: Creating Error Bars for the Method of Paired Comparison

Technical Specifications

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Lecturer: Rob van der Willigen 11/9/08

Lecturer: Rob van der Willigen 11/9/08

AND ITS VARIOUS DEVICES. Attitude is such an abstract, complex mental set. up that its measurement has remained controversial.

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Making comparisons. Previous sessions looked at how to describe a single group of subjects However, we are often interested in comparing two groups

CHAPTER - III METHODOLOGY CONTENTS. 3.1 Introduction. 3.2 Attitude Measurement & its devices

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

An update on the analysis of agreement for orthodontic indices

Modeling Human Perception

computation and interpretation of indices of reliability. In

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

Types of data and how they can be analysed

Categorical Perception

A BAYESIAN SOLUTION FOR THE LAW OF CATEGORICAL JUDGMENT WITH CATEGORY BOUNDARY VARIABILITY AND EXAMINATION OF ROBUSTNESS TO MODEL VIOLATIONS

Unit 1 Exploring and Understanding Data

Understanding Uncertainty in School League Tables*

Fundamentals of Psychophysics

AP Statistics Practice Test Ch. 3 and Previous

Development of a self-reported Chronic Respiratory Questionnaire (CRQ-SR)

PSYC 441 Cognitive Psychology II

Psychology of Perception Psychology 4165, Fall 2001 Laboratory 1 Weight Discrimination

Gold and Hohwy, Rationality and Schizophrenic Delusion

4. How often do you use all of your energy to accomplish only this activity? [yellow card]

Still important ideas

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Framework for Comparative Research on Relational Information Displays

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Things you need to know about the Normal Distribution. How to use your statistical calculator to calculate The mean The SD of a set of data points.

ELEMENTS OF PSYCHOPHYSICS Sections VII and XVI. Gustav Theodor Fechner (1860/1912)

CHAPTER VI RESEARCH METHODOLOGY

Interpreting the Item Analysis Score Report Statistical Information

Section 3.2 Least-Squares Regression

Birds' Judgments of Number and Quantity

JUDGMENTAL MODEL OF THE EBBINGHAUS ILLUSION NORMAN H. ANDERSON

Cognitive styles sex the brain, compete neurally, and quantify deficits in autism

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Psychology of Perception Psychology 4165, Spring 2003 Laboratory 1 Weight Discrimination

o^ &&cvi AL Perceptual and Motor Skills, 1965, 20, Southern Universities Press 1965

Conduct an Experiment to Investigate a Situation

THE EFFECT OF A REMINDER STIMULUS ON THE DECISION STRATEGY ADOPTED IN THE TWO-ALTERNATIVE FORCED-CHOICE PROCEDURE.

Chapter 20: Test Administration and Interpretation

Chapter 2--Norms and Basic Statistics for Testing

GENERALIZATION GRADIENTS AS INDICANTS OF LEARNING AND RETENTION OF A RECOGNITION TASK 1

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Still important ideas

Chapter 3: Examining Relationships

The Psychometric Principles Maximizing the quality of assessment

Development of a new loudness model in consideration of audio-visual interaction

Making a psychometric. Dr Benjamin Cowan- Lecture 9

LEDYARD R TUCKER AND CHARLES LEWIS

2016 Children and young people s inpatient and day case survey

Classical Psychophysical Methods (cont.)

Standard Scores. Richard S. Balkin, Ph.D., LPC-S, NCC

Intransitivity on Paired-Comparisons Instruments:

DO NOT OPEN THIS BOOKLET UNTIL YOU ARE TOLD TO DO SO

VISUAL PERCEPTION OF STRUCTURED SYMBOLS

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Herbert A. Simon Professor, Department of Philosophy, University of Wisconsin-Madison, USA

The role of sampling assumptions in generalization with multiple categories

CHAPTER ONE CORRELATION

MEASURING AFFECTIVE RESPONSES TO CONFECTIONARIES USING PAIRED COMPARISONS

Measuring the Impossible : Quantifying the Subjective (A method for developing a Risk Perception Scale applied to driving)

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Descriptive Statistics Lecture

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Living Life with Persistent Pain. A guide to improving your quality of life, in spite of pain

The first step to managing stress is to understand its nature

Electroconvulsive Therapy (ECT) Patient Information Leaflet

Electroconvulsive Therapy (ECT) Patient Information Leaflet

Discrimination Weighting on a Multiple Choice Exam

Effects of Sequential Context on Judgments and Decisions in the Prisoner s Dilemma Game

Appendix B Statistical Methods

Electroconvulsive Therapy (ECT) Patient Information Leaflet

Technical Whitepaper

AP Psych - Stat 1 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

psychology of visual perception C O M M U N I C A T I O N D E S I G N, A N I M A T E D I M A G E 2014/2015

Attitude Measurement

Choice of screening tests

The merits of mental age as an additional measure of intellectual ability in the low ability range. Simon Whitaker

Business Statistics Probability

To what extent do people prefer health states with higher values? A note on evidence from the EQ-5D valuation set

Using the Patient Reported Outcome Measures Tool (PROMT)

Connexion of Item Response Theory to Decision Making in Chess. Presented by Tamal Biswas Research Advised by Dr. Kenneth Regan

..." , \ I \ ... / ~\ \ : \\ ~ ... I CD ~ \ CD> 00\ CD) CD 0. Relative frequencies of numerical responses in ratio estimation1

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES

Conscious and unconscious thought preceding complex decisions: The influence of taking notes and intelligence.

Lesson 1: Distributions and Their Shapes

OF AN ANNOYANCE SCALE FOR COMMUNITY NOISE ASSESSMENT

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Choosing the Correct Statistical Test

Supplemental Information: Task-specific transfer of perceptual learning across sensory modalities

Conformity. Jennifer L. Flint. The University of Southern Mississippi

Something to think about. What happens, however, when we have a sample with less than 30 items?

Transcription:

International Journal of pidemiology Oxford University Press 1981 Vol. 10, No. 1 Printed in Great ritain Weighting the Seriousness of Perceived Health Problems Using Thurstone's Method of Paired omparisons SP McKNN,* SM HUNT* and J McWN* McKenna SP [epartment of ommunity Health, The University of Nottingham, University Hospital, Queen's Medical entre, lifton oulevard, Nottingham NG72UH], Hunt SM and Mcwen J. Weighting the seriousness of perceived health problems using Thurstone's method of paired comparisons. International Journal of pidemiology 1981, 10: 9397. n exercise carried out to weight the seriousness of perceived health problems using Thurstone's method of paired comparisons is reported. For the purposes of illustration, the sleep section of the Nottingham Health Prof ile containing 5 statements is used. The calculation of the scale values associated with each statement is described and a new method of deriving weights from the scale values is proposed. The judgement of comparisons was found acceptable to interviewees. The applicability of the paired comparison technique to the field of health is discussed. One reason for collecting information about health is to help determine the services required by the community. Traditionally, health has been measured by mortality and morbidity rates. While such measures are routinely collected and readily available they are relatively crude and there have been many calls for the development of alternative measures of service provision 1 ' 2 and decision making. One such measure is that of 'quality of life' which may be more influential in determining demand on services since objective medical criteria cannot give an insight into the patient's feelings of wellbeing. The need for some measure of 'quality of life' has led to the development of subjective and sociomedical indicators. problem in the development of a subjective health index is that of measurement. Whereas deaths and episodes of diagnosed illness are relatively easy to count, descriptions of wellbeing are not. The Nottingham Health Profile, whose development is reported in detail in Martini and Mcowell, 3 Mcowell et al, 4 and Hunt and Mcwen, 5 is designed to give standardised assessments of subjective health status. n instrument which purports to measure perceived health status must be standardised for the relevant populations and provide for differ epartment of ommunity Health, The University of Nottingham, University Hospital, Queen's Medical entre, lifton oulevard, Nottingham NG7 2UH. This research was supported by Grant number HR615 7/1 awarded by the Social Sciences Research ouncil. ences in the severity of subjective states. It is made up of 6 sections; energy, pain, sleep, social isolation, emotional reactions and physical mobility. ach section consists of statements describing problems in that domain. The respondent is required to indicate whether or not each statement applies to him or her. For the purpose of comparison between sections, it was decided that the maximum score on each section should be 100. The problems described by the statements vary in seriousness. For example, the section on sleep (which will be used for illustration in this paper) includes the following statements: 'I'm waking up in the early hours of the morning'. 'I lie awake for most of the night'. There is clearly a disparity in the magnitude of the problems described. For this reason it was considered inappropriate to give equal 'weight' to each statement. gain, within sections statements do not describe different degrees of the same problem. For example, the sleep section includes the statement 'I take tablets to help me sleep'. This statement cannot be placed on a single concept continuum defined by the 2 statements above. However, it is possible to conceive of all the statements in the section as being on a continuum from minor to serious sleep problems. Whereas it would be a difficult task for a respondent to place several statements in rank order of seriousness, it is an easier task for him to consider 2 statements and make a judgement about which ownloaded from http://ije.oxfordjournals.org/ at Pennsylvania State University on March 6, 2016 93

94 INTRNTIONL JOURNL OF PIMIOLOGY represents a greater problem. Thus, it was decided that Thurstone's Judgement Scaling Model would be the one most appropriate for determining the weight which should be given to each statement. The model underlying the Thurstone scaling methods is described in Thurstone 6 and Torgerson. The task is to locate the stimuli (statements) on a psychological continuum. The psychological continuum can be considered to be a continuum of subjective or psychological magnitudes. Thurstone argues that each psychological magnitude is mediated by a 'discriminal process' which is 'that process by which the organism identifies, distinguishes, or reacts to stimuli'. ach stimulus when presented to a respondent gives rise to a discriminal process. ecause of momentary fluctuations in the organism a given stimulus will not always excite the same discriminal process. It is postulated that if the same stimulus is presented to the respondent several times, there will be a normally distributed number of discriminal processes on the psychological continuum. The scale value of the stimulus on the psychological continuum is taken to be the modal, median or mean discriminal process, as all 3 measures arc the same for a normal distribution. n alternative to presenting the same stimulus a large number of times to the same subject is to present the same stimulus to a large number of subjects. It is possible to deduce equations relating judgements of relations among stimuli to the scale values of the stimuli on the psychological continuum. One of these sets of equations is the law of comparative judgement, which is concerned with pairedcomparison judgements. The equations relate the proportion of times any stimulus k is judged greater on a given attribute (in this case sleep problems) than any other stimulus ;'. MTHO The section on sleep contains the following 5 statements: 'I'm waking up in the early hours of the morning' 'It takes me a long time to get to sleep' 'I sleep badly at night' 'I take tablets to help me sleep' 'I lie awake for most of the night'. The method of obtaining empirical estimates of the proportions discussed above is known as the method of paired comparisons. ach statement was compared with each other statement, giving n (n 1)/2 or 10 pairs. ach pair was presented to the subject who was required to indicate which of the statements represented the greater sleep problem. ach statement was typed onto a 4 inch by 3 inch card with an identifying letter ( to ) in the top left hand corner. The respondent was required to respond with the letter corresponding to the selected statement. Pairs of cards were presented in a randomised order with the order reversed for 50% of the trials. 204 interviews were carried out on the section on sleep. Subjects pilot study had been conducted to test the acceptability of the experimental method and to investigate the effect of demographic variables. The pilot study showed that respondents found the task very interesting and that few people had any difficulties with it. No associations were found between age, sex or the respondents' perceived health status and results obtained by the paired comparison method. Subjects for the main study were obtained from the following places: a family planning clinic, a large local industrial company, the Nottingham Medical School library, a cafeteria run by the Women's Royal Volunteer Service, and the following outpatients clinics: dermatology, orthopaedic surgery and ear, nose and throat. t the outpatient clinics interviews were conducted with both patients and nonpatients who were accompanying friends, relations, etc. Interviews The weighting of the 6 sections involved over 1200 interviews and 9 interviewers were employed. ach interviewer received training designed to standardise the interview. The interviewer introduced herself and explained that research was being conducted into the problems people have in their daily lives when they are unwell. Interviewees were invited to participate and no attempt to pressure them was made. It was explained that the same statements would be seen a number of times but that the same two statements would not be shown together. They were asked not to concentrate on their own experiences, but to consider the statements from the view point of people in general. They were also told that there were no 'right' or 'wrong' answers. uring the interview, the interviewer checked that the respondent understood the task by saying after the first few answers 'So you think... would be a worse problem for someone to have?' xplanations were avoided during the interview but general questions were answered at the end of the exercise. If the interview was interrupted (for example, ownloaded from http://ije.oxfordjournals.org/ at Pennsylvania State University on March 6, 2016

PRPTION OF HLTH PROLMS 95 when the patient was called in for consultation) it was abandoned. Where it was thought that the respondent did not fully understand the task, that he was unable to read the statement, or if there was any doubt about the quality of the responses, the interview data were excluded. RSULTS N LULTION OF SL VLUS The results consisted of the number of times each statment in a pair was considered a greater sleep problem than the other statement. The results are shown in the raw frequency matrix F shown in Table 1. SUM TL 1 70 48 81 35 TL 2 MN 0.34 0.24 0.40 0.17 1.65 0.33 Matrix F: frequency of response Statement judged i greater problem 134 72 67 44 156 132 94 64 123 137 110 98 169 160 140 106 Matrix P: proportion matrix for tbe sleep section weighting 0.66 0.35 0.33 0.22 2.06 0.41 0.76 0.65 0.46 0.31 2.68 0.54 0.60 0.67 0.54 0.48 2.79 0.56 0.83 0.78 0.69 0.52 3.32 0.66 Matrix P (Table 2) is constructed from matrix F. The values in the cells are the proportion of times statement k is judged a greater sleep problem than statement /. Symmetric cells sum to one. For example, P (0.65) + P (0.35) = 1. The diagonal cells can be left blank or the value 0.5 can be entered because it is assumed that if respondents were required to compare a statement with itself the first and second statement cards would be selected an equal number of times. TL 3 0.41 0.71 0.25 0.95 Matrix X: unit normal deviate matrix for tbe sleep section weighting 0.41 0.39 0.44 O.77 0.71 0.39 0.10 O.50 0.25 0.44 0.10 0.05 0.95 0.77 0.05 SUM 2.32 1.19 0.74 2.27 MN 0.464 0.240 0.100 0.148 0.454 From matrix P, matrix X Table 3 the basic transformation matrix is constructed. The cell values are the unit normal deviates corresponding to the proportions given in matrix P. The values are positive when the proportion is greater than 0.5, and negative when the proportion is less than 0.5. In matrix X symmetric cells sum to zero. Mosteller and Horst have shown that the usual procedure for obtaining estimates of scale values from a matrix X with no vacant cells is a leastsquare solution. derivation following Mosteller's treatment is given by Torgerson. 7 In practice, the scale values are calculated by summing and averaging the normal deviate values for each statement. These calculations for the sleep section are shown in Table 3. The above description has shown how scale values have been estimated from the observed proportions. Given these scale values the procedure can be reversed, so that derived proportions can be obtained from the estimates of the scale values. This procedure will not be discussed here, but the applicability of the model used to the data can be tested by seeing how well the derived proportions correspond to the observed proportions. One such procedure is simply to obtain the average absolute deviation. If this average discrepancy is 'small' it can be concluded that the model fits adequately. For the sleep section the average absolute deviation is 0.5. n overall test of goodness of fit is given by Mosteller. 10 This test uses the inversesine transformation on the observed and derived proportions. The value of x 2 for the sleep section derived from Mosteller's formula is almost zero, suggesting that the goodness of fit is extremely high. VLOPMNT OF WIGHTS FROM TH SL VLUS For most purposes the weights can be taken to be ownloaded from http://ije.oxfordjournals.org/ at Pennsylvania State University on March 6, 2016

96 INTRNTIONL JOURNL OF PIMIOLOGY the scale values derived from the observed proportions. For example, the scale values of stimuli developed from the responses of an experimental group can be compared with the scale values developed from the responses of a control group. For the purpose of the Nottingham Health Profile the scale values could not be used as weights since the sum of these values would always equal zero. It was convenient that a score of zero should indicate no sleep problems, while a high score (say 100) should indicate that all the problems applied to the respondent. Scale values derived from the law of comparative judgement locate the stimuli on the psychological continuum with respect to one another only. Torgerson suggests that the zero point should be chosen arbitrarily by adding a constant to each of the scale values, but this raises the problem of what constant to add. dding one to each scale value would maintain a reasonably large spread in the weights given to each statement, adding ten to each scale value would have the effect of producing weights for statements with a minimal range. Torgerson also argues that an absolute zero point y O4x + a5 l.or 3 2 1 0 1 2 Unit normal deviate to FIGUR 1 umulative proportion against unit normal deviate showing positions of the five statements in the sleep section can be equated with the absolute limen as determined by standard psychophysical methods. Unfortunately, the absolute limen, while being applicable to many cononua particularly in the sensory domain cannot be applied to the domain of sleep problems. It is possible to postulate a new method for determining a rational zero point from which weightings can be derived. Figure 1 shows the relationship between cumulative proportion and unit normal deviate. Theoretically, the proportion 0 will have a unit normal deviate value of minus infinity. Hence, the absolute zero value is also minus infinity. However, it is possible to calculate the regression line about the point where the cumulative proportion = 0.5. Figure 1 shows that this line (y = 0.4x + 0.5) is a good approximation to the plotted curve between the values of x±6.5. For the sleep section all scale values fall within this range. The same is true in general for the scale values of statements in the other sections of the Nottingham Health Profile. The scale values and associated mean proportions of the statements in the sleep section (shown on Table 2) are plotted in Figure 1. It can be seen that all the points lie close to the line y = 0.4x + 0.5. This line crosses the unit normal deviate (x) axis at the point 1.25. It is postulated that this point is the rational zero point for the scale values and that the constant 1.25 should be added to each scale value in order to determine the weight allocated to each statement. The addition of 1.25 to the scale values of the statements gives the following weights: = 0.786, = 1.010, = 1.350, = 1.398, = 1.704. The total of the weights of the 5 statements is 6.248. In order to standardise the weights so that a respondent affirming all 5 statements will receive a score of 100, each weight is multiplied by 122 or 16.005, u u 6 248 giving the weights: = 12.58, = 16.17, = 21.61, = 22.37, = 27.27. These are the weights used in the sleep section of the Nottingham Health Profile. It should be noticed that the proportional distances between the scale values of the statements are the same as the proportional distances between the weights of the statements. For example, the proportional distance between statements and is (a) for scale values, Z^ZZ. x 100 = 37% and 0.918 (b) for weights, ^^ x 100 = 37% ownloaded from http://ije.oxfordjournals.org/ at Pennsylvania State University on March 6, 2016

PRPTION OF HLTH PROLMS 97 ISUSSION This paper has discussed the development of weights for statements on the Nottingham Health Profile referring to sleep problems. Thurstone's method of paired comparisons was used to provide the data from which the scale values and weights were developed. One possible criticism of Thurstone's model is that the discriminal processes may not be normally distributed on the attribute continuum. However, the continuum and the distributions are hypothetical constructs and thus cannot be measured directly. Nunnally 11 argues that, 'If discriminal processes for two stimuli are normally distributed, as most errors of this type tend to be for many different phenomena, the distribution of differences in discriminal processes will be normally distributed'. check on the goodness of fit of the model to the data is provided by Mosteller's test (discussed above). This test involves the construction of a matrix of normal deviates corresponding to the estimates of the scale values. From this matrix, a matrix of fitted proportions is constructed. The test depends on how well the fitted proportions correspond to the observed proportions. s was stated above, there was extremely good agreement between the observed and fitted proportions, indicating that the model, based on the assumption of normal distribution of the discriminal processes, is a good fit to the data. Thus the method of paired comparisons was found to be a satisfactory method of placing statements with differing specific concepts onto a general psychological continuum. It is argued that the final weights for statements on the sleep section are an accurate reflection of the relative seriousness of each statement, as judged by the experimental subjects. Respondents completing the section on sleep in the Nottingham Health Profile can score from zero (no statements apply) to 100 (all 5 statements apply). It is possible to argue that section scores are at least at the interval level of measurement. Whether or not such scores can be considered to be on a ratio scale is a matter for debate. The method of paired comparisons is a particularly useful method of scaling in the field of health. Its use should be considered where problems of rating the seriousness of symptoms, diseases or injuries arise. The method has the great advantage of simplifying the judgement task for the respondents who found the task relatively straightforward to perform, few being unable to complete it. One disadvantage is that where large numbers of items require ranking or weighting, the number of paired comparisons increases dramatically. In every case ihzl) paired comparisons are needed. For example, if the experimenter wished to determine the relative unpleasantness of 15 symptoms, 105 paired comparisons would be necessary, although Torgerson suggests methods by which the experimental labour may be reduced. urroughs 12 considers the technique to be both statistically elegant and powerful. He counters the problem of the need for large numbers of comparison judgements with the following commment: 'if this forces us to explore with the rapier of 5 items rather than the bludgeon of 100, it may be no bad thing'. RFRNS World Health Organization. Statistical Indicators for the Planning and valuation of Public Health Programmes. Technical Report Series No. 472, 1971. White KL. ontemporary pidemiology. Int J pidemiol 1974; 3i 295303. Martini J and Mcowell I. Health Status: patient and physician judgements. Health Services Research 1976; Winter: 508515 Mcowell I, Martini J, Waugh W. method for selfassessment of disability, before and after hip replacement operation. rmedj 1978; lit 875879. Hunt Sonja M and Mcwen J. The development of a subjective health indicator. Social, of Health and Illness 1980; 2i 231246. Thurstone LL. law of comparative judgement. Psycbol Rev 1927; 34i 273286. Torgerson WS. Theory and Methods of Scaling. New York: John Wiley, 1958. Mosteller F. Remarks on the method of paired comparisons. I. The least squares solution assuming equal standard deviations and equal correlations. Psycbometrika 1951; 16i 311. Horst P. The prediction of personal adjustment. Soc Sci Res oun ull No. 48. 1941. Mosteller F. Remarks on the method of paired comparisons. III. test of significance for paired comparisons when equal standard deviations and equal correlations are assumed. Psycbometrika 1951; 16. 207218. Nunnally J. Psychometric Theory. (2nd dition) New York: McGrawHill, 1978. urroughs GR. esign and nalysis in ducational Research (2nd dition). ducational Monograph No. 8. ducational Review, Oxford, 1975. ownloaded from http://ije.oxfordjournals.org/ at Pennsylvania State University on March 6, 2016 (Revised version received 30 October 1980)