66 TWO STUDIES OF INTERVIEWER VARIANCE OF SOCIO- PSYCHOLOGICAL VARIABLES By: Leslie Kish nd Crol W. Slter Survey Reserch Center, University of Michign Introduction We report results obtined in two surveys in which respondents were rndomized mong interviewers to permit the vlid estimtion of the interviewer vrince s component ip survey errors. In ech study, done by the Survey Reserch Center of the University of Michign, the blue - collr worker's of plnt were sked mny sociopsychologicl questions bout their jobs, nd compny nd such. We suspect tht few people worry t ll bout the interviewer vrince. They, however, re pt to fer tht for "vgue" psychologicl nd ttitudinl questions the effects must indeed be lrge. Our results my hold t lest one surprise for everybody. On the one hnd, the interviewer effects re not very gret: they compre well with effects on "fctul" items, nd, becuse of this we were unble to seprte different clsses of items - the "soft" psychologicl items from the "hrd" fctul items. On the other hnd, even these smll or moderte effects on individul interviews cn hve importnt effects on the smple mens. As finl drmtic effect, hppy ending: even gret effects on the mens of the entire smple re reduced for subclsses nd the effects usully seem to dispper completely from the comprisons of subclsses. Here we present summry of our findings; detils nd references to relted literture will pper in n rticle lredy submitted for publiction. In the First Study in 1948, t lrge unionized uto plnt in the Midwest, we selected with equl probbility strtified rndom smple of individul employees, of whom 162 gve interviews. The nmes nd ddresses of the selected employees were typed on crds which were then shuffled nd ssigned rndomly to interviewers t the beginning of ech dy. The interviews were tken in the respondents' homes nd lsted n verge of n hour nd hlf. Open -ended questions were used to gther informtion bout ttitudes towrds foremen, stewrds, the union, higher mngement nd vrious spects of their jobs. The 20 interviewers were selected, screened nd hired specificlly for this study. All hd hd some previous experience in interviewing, not necessrily in survey work. A week of trining ws crried out before the study begn nd ws ugmented, s needed, by individul supervision nd group sessions. For the Second Study, we selected in 1958 with equl probbility strtified rndom smple of individul employees, with finl n 489. The interviews were conducted, in 1958, in offices provided by the compny, nd lsted n hour on the verge. After the interview, ech respondent ws lso sked to fill out pper- nd -pencil questionnire in the presence of the interviewer, which took roughly n dditionl three - qurters of n hour. From the list of respondents vilble during ech week, rndom ssignments were llocted to the interviewers working during tht period. Completely open -ended items were few: lmost ll questions included in the written questionnire nd mny of those used in the interview involved sking the respondent to choose from prepred nd pretested list the lterntive coming closest to his own viewpoint. The nine interviewers were members of the Center's field stff with severl yers of interviewing experience. The Mesurement of Interviewer Vrince Besides smpling errors proper --those rising in selection or estimtion procedures --survey results re lso ffected by errors which occur in the course of the observtion (mesurement), recording nd processing of the dt. These errors fll into two brod types, hving very different effects on the summry results (such s mens or totls) of survey. The first includes the "bises" or "systemtic errors" imposed by the "essentil survey conditions ": the verge or "expected" devitions of smple estimtors from their estimnds, the popultion vlues. These, lthough importnt, were not the subject of our reserch. The second type consists of vrible errors: those not fixed by the "essentil survey conditions." Some vrible errors re uncorrelted mong the elements, nd, unless replicte mesurements re tken on the respondents, these cnnot be distinguished from smpling error mong respondents. We re not here concerned with them nd generlly they cn be regrded s rndom errors which increse the vrince of estimtors with contributions which enter utomticlly into the estimte of the vrince. Some other vrible errors, however, involve the correlted effect tht ech interviewer's bis cn impose upon the respondents (the elements) mking up his worklod. Insofr s the individul interviewers hve different verge effects on their worklods, this "interviewer vrince" contributes to the vrince of the smple men. This contribution of the interviewer vrince to the smpling vrince is our present concern. The contribution, s we shll see, cn be lrge nd its neglect cn led to serious underestimtion of the totl survey vrition. Our model ssumes the rndom selection of smple of interviewers from lrge pool of potentil interviewers, tht pool defined by the "essentil survey conditions." Ech interviewer hs n individul verge "interviewer bis" on the responses in his worklod; we estimte the effect of "rndom smple" of these bises on the vrince of the smple men. This effect is expressed s n "interviewer vrince" which decreses in proportion to the number (á) of inter-.
67 CHART 1 - Three Distributions of Reltive of Rho's for Different Vribles "Criticl" vrible "Ambiguous" vrible Other vrible viewers. Its contribution to the vrince of smple mens (sá /) resembles other vrince terms, being directly proportionl to the vrince per interviewer nd inversely to the number of interviewers. This increse in the vrince my be substntil; filing to tke it into ccount (s when estimting the vrince simply by /n) results in neglecting potentilly importnt source of vrition ctully present in the design, introduced by the smpling of interviewer's bises. The interviewer vrince should be viewed s component of the totl vrince, denoted s + s l) First Study Interviews Í6 vribles where is the vrince without ny interviewer effect, nd ll three terms re mesured per element. It is convenient to tke the interviewer component reltive to the totl vrince, nd to denote this rtio by the rtio of homogeneity, often clled the coefficient of intrclss correltion: roh =s b The individul roh's re subject to very gret vribilities; the vlues of re computed with 9 degrees of freedom in the First Study nd 19 in the Second Study. As rough guide we consider the vlues of the First Study subject to coefficients of vrition of 0.5 nd those of the Second Study to bout 0.3. Nevertheless, the results re useful, prticulrly when considered in the ggregte over mny items. lb) Second Study Interviews 25 vribles o lc) Second Study Questionnire 23 vribles Primry Results nd Implictions How re these vlues useful in plnning surveys? First, they show tht it is fesible to obtin responses with rther low interviewer effects on wht pper to be mbiguous nd emotionlly loded ttitudinl items, if the interviewers re crefully selected nd well- trined. The low vlues of roh on these items spek well for the prospects of obtining ttitudinl, sociopsychologicl dt of this kind with resonble relibility. The vribility for these ttitudinl interview items pper to be generlly not much, if ny, higher thn responses to "fctul" items obtined in good Census -- expect probbly for the simplest items like ge nd sex. They compre fvorbly with some other results relting to "fctul" items. The primry results pper on Chrt 1; the First Study in l, the interviews nd questionnires of the Second Study in lb nd ic, respectively. Ech of these presents distribution of the reltive frequencies (percentges) of occurence of in size clsses of.01. (The totl height of ech clss is divided into three to seprte "criticl", "mbiguous" nd other items.) Second, this kind of nlysis cn distinguish
68 items for which the interviewer vrinces pper unexpectedly high, nd by so doing, led to corrective ctions either through better trining or by chnging the survey opertions. Extension of this kind of nlysis my lso be used to seprte interviewers who mke undue contributions to the vrinces. Third, we cn distinguish in the three tbles concomitnts of different interviewing situtions. The results of the First Study (l) cme from newly hired nd trined interviewers tking open -ended interviews; the robs rnge, in the min, from zero to.07, with n verge of.02 or.03. In the Second Study we see expert interviewers tking more structured interview (lb); the roh's vry mostly from zero to.04, with n verge of.01 to.02. For written questionnires we find (lc) tht the priori hypothesis of zero effect is generlly cceptble (with the exception of three puzzling items). Fourth, our results indicte the difficulties involved in mking judgments beforehnd bout the degree of interviewer vrince ssocited with wht my seem priori to be different kinds of items. In ech of the three prts of Chrt 1 the res corresponding to "criticl ", to "mbiguous" nd to "other" items do not pper to hve very distinct distributions. Even informed intuition, it seems, needs considerbly more conceptul nd empiricl tools thn re now vilble to evlute the reltive susceptbility of survey items to interviewer bis. Fifth, we find tht interviewer vrince, lthough it ppers smll, definitely exists. Furthermore, it cn exert importnt influence on the totl vribility of survey results, since even smll roh, when multiplied by moderte or lrge interviewer worklods, cn hve lrge effects. This effet on the vrince is bout [1 + roh(--1)]. Let us consider n increse in the vrince by fctor of 1.5 s "serious" nd by 2 s "criticl "; these correspond to increses in the stndrd errors of 1.5 1.22 nd = 1.41. With n/ = 22 in the First Study, roh becomes serious t.025 nd criticl t.045 ctegories which include 16 nd 8 items respectively. In the Second Study, with n/ = 52,.01 is serious nd roh =.02 is criticl, thus including 13 nd 10 items respectively. In the cse of element smpling; these effects cn nd should be included in the vrince by computing the interviewer's lod s if it were "cluster." (In the cse of ctully clustered smples, where the interviewer is confined to sizigle primry selection, such s county in ntionl smple, the usul computtion utomticlly includes this effect.) optimum worklod size of = / C C roh b b For exmple, if it costs p180 to trin n interviewer nd to tke n interview, then C = 18. For roh's of.02 this gives n opti- Cb mum worklod per interviewer of n/ = 30. The ctul worklods in our two studies were in this neighbood. Effects on Subclsses nd Their Comprisons Current models of response errors del mostly with the effects on the men for the entire smple, but pplying the model nd methods to the mens of subclsses is strightforwrd. The dt support our hypothesis tht the effects of interviewer vrince on the vrinces Chrt 2 - The Effects of Interviewer Vribility on Subclsses (x) nd on Their Comprisons (0) Plotted Aginst the Effects on the Entire Smple. (The effects re mesured s rtios to the totl vrince per interview - s synthetic equivlents of roh's.) Synthetic * roh's for subclsses (x) nd for their comprisons (0). X Sixth, nlysis of this type mkes it possible to include interviewer effects in considering the economic spects of survey designs. If the rtio of the cost of hiring nd trining n interviewer to the cost of single interview is C, then the most economicl pln - lest totl b vrince (s - results from the f + Synthetic.At 40 roh's for the entire smple.
69 of subclss mens tend to decrese in the sme proportion s the verge worklods of the subclsses per interviewer decrese. The effect on the vrince is pproximtely [1 + roh(n * -1)], where n* is the smple size of the subclss. This effect decreses if roh remins constnt, where roh expresses the interviewer contribution per element. Tht roh remins firly constnt for the subclsses is evidenced by the proximity of the vlues mrked by x to the degree line on Chrt 2; this line denotes equlity for the of the subclsses nd of the entire smple. The x points mrk the vlues of in subclsses ginst the roh for the entire smple for the sme vrible. Actully the ordintes denote the verge of the roh's for two subclsses into which the entire smple ws divided. We lso investigted the effects of interviewer vrince on the comprisons of pirs of subclss mens. These re even more importnt, reserch workers often sy, thn the estimtion of individul mens. Becuse of the considerble effort required we hd to limit the extent of this investigtion. For strtegic resons we chose for investigting both the subclsses nd their comprisons seven of the most criticl vribles from the two studies: those for which the effect on the mens were gretest. For ech vrible we used two different wys of forming subclsses nd this gives rise to the fourteen comprisons mrked 0 on Chrt 2 (s well s the fourteen subclss verges mrked x). The results show tht the effect of interviewer vrince on comprisons between subclss mens is reduced drsticlly to the neighbood of zero. This importnt result seems to hold roughly nd on the verge in our investigtion. As evidence, note on Chrt 2 tht the 0 mrks for interviewer effects on comprisons fluctute round the horizontl line denoting zero effect. These come from plotting the effects per element of the comprisons ginst those of the entire smple. These dt show gret del of fluctution, the cuses of which should be sought in lter investigtions; nevertheless, the tenttive working hypothesis of zero verge effects ppers to be good working hypothesis on the verge, nd better thn ny lterntive we could form. This should pply lso to comprisons of ny two (or more) smples which hve been rndomized over the sme set of interviewers. Importnt exmples rise from the comprisons of periodic smples ssigned to the set of interviewers; such comprisons of periodic surveys should tend lso to be free of the effects of the interviewer vrince tht ffects single smple. (Similr results were obtined on the very different dt, with very high initil roh's.) This result gins dded significnce in combintion with the likelihood tht the systemtic bises of comprisons re often lso less thn the bises of the individul mens. In other words, if the interviewers' bises ffect the subclsses eqully (corresponding to lck of "inter- cp -ion" between interview bis nd subclss) then both the systemtic bis nd the interviewer vrince tend to dispper from the comprisons of subclsses. Some Remrks on Reserch Strtegy Reserch on interviewer vribility my be designed to different degres of symmetry nd completeness. A very complete design might cll for simple rndom selection of equl worklods; the effects of other sources of errors, especilly coding error, would be included in symmetricl, clen (orthogonl) design; the questions could be chosen to test vrious hypotheses bout them. Our studies lck these virtues. Our rndomiztion procedures were designed to minimize costs nd interference with field opertions. We scrificed chiefly: () equl size worklods which would hve resulted in somewht simpler computtions nd more efficient estimtes; (b) eliminting the complictions rising becuse the rndomized set (the worklod for dy in the First Study nd for week in the Second Study) is difficult to tret exctly in the nlysis; (c) nd the possibility of seprting the components of the vrince due to coding vribility by rndomizing coders in net design. Perhps we most regret lcking the mens nd persusiveness to chieve the lst of these three improvements- -which modest disposl of mens could hve brought. In defense we pled tht the choice ws between little or nothing -- s it often is. The procedure for ssigning interviewers to coders does not deprt enough from rndom to interfere seriously with our nlysis: the distribution of coders ginst interviewers ws checked nd found bout s even s rndom ssignment would hve mde it. With reltively modest extr mens it is possible to get little closer thn we did to more symmetricl nd complete design. Nevertheless, we re convinced of the desirbility nd economy of llocting ner the lower end of the scle the limited resources vilble for reserch in interviewer vribility. This is not merely post justifiction for our reserch, but belief bsed on the expecttion tht few- - becuse expensive -- "crucil experiments" will not yield definitive evidence bout Smll set of "bsic prmeters "-- becuse tht smll set does not exist. It is more likely tht interviewer errors differ gretly for vrious chrcteristics, popultions, designs nd resources- -this lst including questionnires, nture nd trining of interviewers, etc. Therefore, knowledge bout this source of vrition, s with smpling vribility, cn be ccumulted only from gret del of empiricl work spred over the length nd bredth of survey work. This implies, together with the necessrily limited totl mens for this kind of reserch, tht most reserch in this re must be done t mrginl cost, s ppendges to the min ims nd designs of surveys. Therefore, generl strtegy should cll for mny investigtions of modest scope nd tht these be widely commúnicted.
70 APPENDIX The response from the j -th individul to the i -th interviewer is expressed s yij yij + A., where A is the verge "effect" of the i -th interviewer. Any constnt (or "systemtic ") bises of the interviewers re not distinguished nd we ssume tht the sum of the interviewer effects is zero for the popultion of A interviewers, from which the ctul interviewers re rndom smple. Ech response is viewed s composed of two components, the smpling vrince of the individul response nd the component due to the vrible interviewer bis, or interviewer vrince: = b The "rtio of homogeneity" is roh / /(s + sb). (1) Assuming tht of the n respondents ni were ssigned with simple rndom smpling to the i -th interviewer, we hve in terms of the usul "nov" tble for computtions: Column 4. We expected nd found tht formuls (1) nd (2) gve very similr results. We then computed using (2) the vlues of e(ÿ1) nd e(ÿ2), the effects on the vrinces of the mens of two subclsses. We computed ctully e(ÿ1)= vr(ÿ1) /(/nl), nd similrly for the other subclss; tht is, we did not bother to compute seprtely nd becuse they would not hve differed enough from to mke tht extr lbor worthwhile. The synthetic * roh's for the two subclsses were verged nd these were plotted s x in Chrt 2 ginst the roh's for the entire smple. The vrince of the difference computed, tking into ccount the correltions within the worklods of the interviewers, s: vr(51 - ÿ) = ws vr (51) + vr(52) - 2 cov(51,52). As with the vrinces, the covrince is computed for the rtio estimtor of rndomly selected clusters: Source of Degrees of Vrition Freedom Sum of Squres (SS) Men Squre Among interviewers -1 E yi /ni y2/n Components of the Men Squres SS() V + k -1 Within interviewers n- ni /ni SS(b) V b n- b Here nd yi = yij nd = (V - where k = n2 1/ni 1/n -1 n ni i To mesure the effects on the differences between the mens of two subclsses we hd to improvise pproximte methods. To compre with the preceding we begn by computing the "effect" of interviewer vrince s the rtio of the ctul vrince to the simple rndom vrince for the entire smple: = where 92 n nd vr(ÿ) = + - yini). This lst is the vrince of the rtio estimtor y/n of rndomly selected clusters. The computed effect on the vrince is then equted with [1 + ''roh(n / - 1)] nd this yields the synthetic *roh = [e(ÿ) - 1] /(n / - 1). (2) j - 1 yliy2i + y152 nlin2i - '2 ylin2i] From these we computed the effects on the difference vr(yi - y2) - y2) (1 /n1 + 1 /n2) Finlly, we computed the "synthetic *roh" s *roh [vr( Y2) 1 1 [2 1-1 nl n2 nl n2 These vlues pper s the 0 points on Chrt 2 plotted ginst the vlues of roh for the entire smple.