INTERVIEWER RATINGS OF RESPONDENT POLITICAL KNOWLEDGE: CALIBRATING A USEFUL MEASUREMENT INSTRUMENT

Size: px

Start display at page:

Download "INTERVIEWER RATINGS OF RESPONDENT POLITICAL KNOWLEDGE: CALIBRATING A USEFUL MEASUREMENT INSTRUMENT"

Elmer Richardson
5 years ago
Views:

1 INTERVIEWER RATINGS OF RESPONDENT POLITICAL KNOWLEDGE: CALIBRATING A USEFUL MEASUREMENT INSTRUMENT William G. Jacoby Michigan State University jacoby@msu.edu Prepared for presentation at the 01 Annual Meetings of the Midwest Political Science Association. Chicago, IL: April 17, 01.

2 ABSTRACT The American National Election Studies (ANES) interviewer ratings of respondents levels of political information are widely used in the field of mass political behavior. But recent research calls their measurement characteristics into question. In this paper I use data from the 00 ANES to create a set of optimally-scaled (OS) scores for the interviewer ratings. These OS scores simultaneously maximize the relationship with objective measures of respondent political information and strictly respect a set of explicit measurement assumptions regarding the ratings. I argue that the OS scores overcome the problems associated with standard usage of the interviewer ratings and comprise better measurement of political knowledge within the mass public.

3 Political knowledge is widely regarded as an important variable in research on public opinion and political behavior. However, there is little consensus among scholars regarding the best way to measure the knowledge possessed by individual citizens. One strategy relies upon survey interviewer assessments of respondents political information and awareness. Despite numerous advantages, there are potentially serious problems with this approach resulting from systematic differences in judgments across interviewers. In this paper, I draw from measurement theory to propose a simple approach for taking these interviewer biases into account, thereby effectively calibrating the measurement instrument for political knowledge. This approach is tested using the interviewer assessments and other items from the 00 ANES. The measurement calibration strategy produces more detailed and cleaner measures of political knowledge, that exhibit theoretically reasonable relationships with other variables. Overall, the empirical results provide an optimistic perspective on the measurement characteristics and theoretical utility of interviewer assessments of individual political knowledge. BACKGROUND Political knowledge, or information, is one of the central variables in the field of mass political behavior. In fact, one of the most consistent findings to emerge from more than seventy-five years of empirical research is the low level of political information possessed throughout most of the mass public (e.g., Neuman 1986). 1 This finding is troublesome because information and knowledge are widely believed to be preconditions for individuals to fulfill the requirements of democratic citizenship (Delli Carpini and Keeter 1996). Information is necessary for the public to evaluate elements of the political world and informed citizens have been found to behave and think about politics differently from their less-informed counterparts (e.g., Bartels 1996; Althaus 00). At the same time, knowledge and information are strongly related to political engagement and overt participation (Verba and Nie 197; Verba, Schlozman, Brady 199). Information also helps people connect general orientations to specific policy stimuli and candidate choices (e.g., Campbell, Converse, Miller, Stokes 1960; Lewis-Beck, Jacoby, Norpoth, Weisberg 008). Thus, political knowledge is a key factor in understanding citizens political attitudes and behavior. 1 In some contexts, knowledge and information could refer to conceptually distinct phenomena. For example, information could be regarded as an incoming stream of stimuli while knowledge could represent that subset of the information that a person actually retains. Here, however, I will not make such distinctions. The two terms are used interchangeably throughout the study.

4 It probably is an understatement to say that the concept of political information or knowledge has received an enormous amount of scholarly attention (e.g., Barabas, Jerit, Pollock, Rainey 01). This is particularly interesting because, as an empirical variable, political information is a relative newcomer to the field. Up through the early 1990 s researchers used other variables to stratify the mass public, including political conceptualization (Converse 196), cognitive ability (Stimson 197), sophistication (Luskin 1987), and education (Sniderman, Brody, Tetlock 1991). While the specific operationalizations varied from one investigator to the next, the general objective in each case is to tap individual differences in the comprehension of, and engagement with, the political world. Fact-Based Political Knowledge Attention shifted quickly, starting in the late 1980 s. The Center for Political Studies (CPS) American National Election Studies (ANES) began including batteries of factual items about politics on their regular interview schedules beginning in 1988, after initial testing on the 198 NES Pilot Study. Starting that year, and continuing to the present day, respondents in the post-election wave of the ANES have been presented with the following question: Now we have a set of questions concerning various public figures. We want to see how much information about them gets out to the public from television, newspapers and the like. The first name is. What job or political office does he NOW hold? Of course, the blank in the question is replaced with the name of an actual political figure and the question is repeated for several more political figures. The specific political figures vary from one year to the next corresponding to reasonable differences in their public visibility. For example, the 00 ANES respondents were asked about the offices held by Dennis Hastert, Dick Cheney, Tony Blair, and William Rehnquist. Typically, these items are not employed as conceptually distinct empirical variables. Instead, researchers usually assign each survey respondent a score corresponding to the number of correct identifications. And, that score is interpreted as the respondent s level of political knowledge. Shortly after the introduction of the ANES factual item battery, political knowledge became a very popular concept. For example, Delli Carpini and Keeter (199, page 1180) state:

5 A common conclusion... is that factual knowledge is the single best indicator of sophistication and its related concepts of expertise, awareness, political engagement, and even media exposure... Certainly the scale based upon the office identification items has very attractive properties: It is easily-administered (Zaller 1986), highly reliable (Delli Carpini and Keeter 199), and has strong predictive validity (Delli Carpini and Keeter 1996). So, the popularity of the fact-based knowledge scale and its simultaneous widespread use in empirical research are readily understandable. Despite its apparent strengths, researchers have also identified several serious weaknesses in the factual knowledge scale. Mondak (001) has shown that the ANES respondents varying propensities to guess the correct answers to the knowledge questions introduce systematic biases into the measure. Gibson and Caldeira (009) examined the verbatim responses to the office identification items and found highly arbitrary coding decisions about what constitutes correct or incorrect answers. DeBell (01) identifies issues associated with the differing sets of public figures used from one year to the next, and the potential for systematically varying difficulties in identifying figures within a given year due to such factors as differing levels of visibility across the figures. From a somewhat different perspective, the factual item scale may be an incomplete measure of knowledge. Gilens (001) points out that it does not tap domain-specific knowledge regarding substantive political matters (e.g., specific issue controversies). Similarly, the fact-based scale does not capture operative knowledge, as opposed to textbook knowledge about politics (Lupia 006; Abrajano 01). Barabas et al. (01) identify two dimensions of political knowledge and point out that the office-identification items only tap a subset of the resultant classification system for different types of information. Using Interviewer Ratings to Measure Knowledge The various issues with factual knowledge batteries highlight the potential utility of an alternative approach to measuring political knowledge: Interviewer assessments. At the completion of each face-to-face session with the respondent, the ANES interviewer assesses the respondent s general level of information about politics and public affairs relative to five ordered categories. The most common practice is to assign the categories successive integer scores as follows: 1..

6 ... This variable can be regarded as a quantification of the interviewer assessments of political knowledge, with larger scores indicating more informed respondents. Using interviewer assessments to measure political knowledge have several advantages. For one thing, the interviewers seem to comprise a high quality measurement instrument. They are carefully trained at the Survey Research Center, in the University of Michigan s Institute for Social Research. And, most of the interviewers have quite a bit of experience administering surveys to respondents. The ANES interview schedules have included interviewer ratings in every survey since 1968, enabling comparisons of public information levels over time. The rating scores the interviewers assign appear to be highly reliable. For example, the testretest correlation of interviewers information ratings (using the one-to-five quantification) from the pre- and post-election waves of the 00 ANES is This figure, in itself, suggests that the characteristic under observation is quite stable. And, if we are willing to assume that an individual s level of political knowledge does not change across a relatively short period of time (like the interval between the two waves of the 00 survey) then combining the separate rating scores into a single scale produces a reliability coefficient of According to the standard interpretation of reliability (e.g., Carmines and Zeller 1980), this means that 8% of the observed variance in the two-item scale is shared with the unobservable variance in true political knowledge. Furthermore, the interviewer assessments show high levels of convergent validity, in the form of strong correlations with a sizable number of other variables that, from a theoretical perspective, should be closely related to political knowledge (Zaller 1986; Delli Carpini and Keeter 199). For all of these reasons, it is not surprising that the ANES interviewer ratings have been widely accepted as excellent measures of individual political information levels and used in many studies of American political behavior (e.g., Zaller 199; Bartels 1996; Baum and Kernell 1999; Goren 000; Althaus 00; Jacoby 009). In fact, the ANES Codebook scores the five categories in the opposite direction from what is shown here. That is, the one-to-five scale runs from the highest to lowest levels of knowledge. The reflected scoring scheme shown here makes more sense from a substantive perspective (again, larger values indicate more knowledge) so it is used throughout this analysis.

7 Interviewer ratings of political knowledge definitely are not a panacea for the problems of factbased measures; instead, they have some serious problems of their own. One troubling issue is that various interviewers may not be assessing knowledge the same way, and they may not even be measuring political information or knowledge at all. Lupia (01) points out that, even though the SRC interviewers are highly trained in the various details of administering a survey, they are not given specific guidance about evaluating respondents political knowledge. He writes (page 06): After years of searching and dozens of interviews with staff, no one has been able to produce any written instructions given to any ANES interviewers about how to assess any respondent s information level other than a single line of text that reads use your best judgment. Thus, interviewers are required to evaluate respondents according to a criterion that they apparently are supposed to define for themselves. Given the complicated and multifaceted nature of political knowledge (e.g., Delli Carpini and Keeter 1996; Barabas et al. 01) it seems very likely that different interviewers will focus on varying elements of the overall concept when they try to apply it to individual survey respondents. Lupia shows that this can have pernicious consequences when knowledge is used as a variable that affects political orientations, attitudes, and behavior: These assessments are not the product of practices that are likely to provide accurate and consistent evidence from interviewer to interviewer or from year to year. Existing interviewer assessments are unreliable (Lupia 01, page 1). At least part of Lupia s pessimistic conclusion about the qualities of the interviewer ratings is based upon earlier work by Levendusky and Jackman (008). The latter authors use data from the 1998 ANES to construct an item response theory model of individual political knowledge, and then examine its relationship with the interviewer assessments. Levendusky and Jackman find huge interviewer effects in the rating measures of political knowledge, saying that a respondent with the same level of political knowledge will most likely be assigned different interviewer ratings when scored by two different interviewers (008, Abstract). As a summary demonstration of the severity of the problem, they perform a simple ANOVA and show that % of the variance in interviewer ratings of political knowledge is due to the differences across the 8 interviewers, alone. This crossinterviewer variability is not unique to that particular dataset. A similar ANOVA carried out on the interviewer ratings from the 00 ANES (the dataset used in the empirical analysis below)

8 shows that the 81 interviewers account for 1% of the variance in the rating scores. Thus, the ANES interviewer ratings do not appear to provide an objective measure of political knowledge, with scores that are comparable across both respondents and interviewers. So, what is to be done about the shortcomings of the interviewer ratings? Levendusky and Jackman (008) say that First, scholars should accept once and for all that the interviewer rating is a flawed measure of political knowledge (page ). Nevertheless, discarding or ignoring this variable does not seem to be a wise course of action. The interviewers bring insights based upon a fairly lengthy social interaction with the respondents (Tourangeau, Rips, Rasinski 000) in which the focus of discussion is politics. The interviewer records answers to a sizable number of questions, including the factual items. He or she also is exposed to the additional verbal and nonverbal queues that respondents provide during the session many of which reflect the person s level of political understanding (e.g., Converse 197). The readiness and ease with which respondents converse about political phenomena certainly seems like a reasonable expression of their awareness of, and facility in dealing with, the elements of the political world in other words, precisely what social scientists generally mean when they refer to political knowledge. And, it seems very likely that the interviewers take these factors into account when generating their own rating of a respondent s political information. As we have already seen, the resultant scores are reliable and correlated with a variety of other variables. So, it seems more reasonable to fix the problems in the interviewer ratings than to abandon them. Of course, the strategy we follow depends upon the source of the problems in the measure. Here, there do not appear to be systematic biases due to the race, gender, or social class of the interviewer relative to the respondent (Zaller 1986; Levendusky and Jackman 008). Instead there are pronounced individual differences in the ways the interviewers use the five rating categories to represent political knowledge. This generates a problem known as inter-personally incomparable scores (Brady 198) or differential item functioning (King, Murray, Salomon, Tandon 00). The quantifications of political knowledge obtained by assigning successive integers to the five response categories are not comparable across the interviewers. And, while serious, this problem definitely is not insurmountable. Lupia argues that it should be handled through better training of the interviewers (01, page 09). Levendusky and Jackman recommend that... the interviewer rating 6

9 should... be used... in a measurement model that can explicitly correct for this cross-interviewer heterogeneity (009, pp. -6). That is precisely what I will do in the remainder of this study, although I use a different approach than the anchoring vignettes that they mention. The potential remedies to differential item functioning proposed by Lupia and by Levendusky and Jackman can only be used with new data. The strategy I will employ has an important advantage in that it can be applied to any existing ANES data that contain both the interviewer ratings of political knowledge and the factual questions for the same respondents. In effect, I will calibrate the former, using their relationship to the latter. MEASUREMENT THEORY Let us begin by considering the basic nature of measurement (e.g., Hand 00), and how it pertains to political knowledge. Regardless of the specific context, measurement begins by sorting a set of objects into observationally distinct categories, based upon some specified characteristic of those objects. Numbers are assigned to the objects based upon their category membership. This classification process can be considered measurement if and only if the differences between the numbers assigned to the objects correspond to the substantive differences between the observational categories, according to some specified rule. Here, the objects are the ANES survey respondents. The observational categories are the five levels of political information that the interviewers use to classify the respondents,,, and so on. The assigned numbers are the integers one through five, and the rule is to assign larger numbers to categories corresponding to higher information levels. Measurement Properties There are three important properties of measurement that are relevant in the present context. First, measurement level refers to the nature of the function mapping from the observational categories to the assigned numbers. Measurement at the nominal level implies that an identitypreserving function is used, but no further restrictions are placed on the specific numbers assigned to the objects in the respective categories. Ordinal measurement implies a monotonic function in This discussion is based upon a theory of measurement originally laid out by Young (1981) and developed in the political science context by Jacoby (1991; 1999). 7

10 which observational asymmetries across the objects correspond to a non-decreasing array of numeric values. Interval measurement implies that there is some parametric function often linear mapping from the objects to the numbers. And, ratio-level measurement indicates a parametric function, with a non-alterable intercept of zero (usually signalling the absence of the property being measured). In any case, measurement level refers to variability in the assigned numbers across the observational categories. The second property, measurement process, involves the variability of the numbers assigned within the observational categories. Specfically, measurement process can either be discrete or continuous. Discrete process implies a measurement scheme in which all of the objects within a common category are assigned the same number. With a continuous measurement process, objects within a single category can be assigned different numbers, as long as the values within each category correspond to a closed interval of real numbers. In other words, adjacent intervals of values representing two observational categories can share a common end point but, beyond that, they cannot overlap. The third property, measurement conditionality, pertains to differences across the objects, regardless of their category membership. Specifically, measurement condition determines which kinds of comparisons are legitimate and meaningful within the given measurement scheme. The immediate consequence of measurement is that the numbers assigned to two distinct objects can be compared to each other to determine whether one object possesses more, less, or an equal amount of the property being measured, relative to the other object. If such comparisons are possible for all distinct pairs of objects being measured, then the measurement is unconditional. But, if such comparisons are only meaningful within distinct subsets of observations (i.e., each objects numerical value can only be compared to the numerical values of other objects within the same subset) then the measurement is conditional. Now, the one-to-five scale usually constructed from the interviewer ratings is a quantification of political knowledge. But, what are its measurement properties? In the research literature, the interviewer ratings are usually treated as interval level, discrete process, and unconditional measures. This quantification implies that the differences between adjacent categories are identical across the entire range of the scale. In other words, the scores are regarded as a linear function of observed differences in political knowledge, making the ratings an interval-level measure. Similarly, 8

11 the survey respondents placed in a given category by the interviewer are all assigned the same number again, a value ranging from one to five. This means that the ratings are a discrete measure. Finally, the scoring scheme used for the interviewer ratings is not affected by which interviewer rated which respondent. The scores assigned to the respondents who were categorized by any given interviewer are treated as exactly equivalent to the scores assigned to respondents by any other interviewer. If two NES respondents are assigned a score of, say, then they are both treated as if they have an identical fairly low level of political knowledge, even if two different interviewers assigned the respective scores. Thus, the usual measurement of political knowledge is treated as if it is unconditional. It is almost certainly unrealistic to treat the interviewer ratings of political knowledge as intervallevel, discrete process, and unconditional. Instead, it probably is more appropriate to assume that the ratings are ordinal level, continuous process, and interviewer-conditional. First, the ordinal level of measurement means that the numbers assigned to the objects within the categories comprise a monotonic function of observed political knowledge (i.e., as evaluated by the interviewer). That is, as respondents levels of political knowledge increase, the numbers assigned to the respondents should never decrease. This implies that, say, respondents in the fairly high category possess more knowledge than those in the average category, but we cannot say a priori how much more. Further, the difference between these two categories need not be the same as the difference between the very high category and the fairly high category (or any other pair of adjacent categories). Second, continuous process means that each of the observational categories corresponds to a closed interval of numbers, rather than to a single value. In substantive terms, this implies that each category contains individuals with varying degrees of political knowledge, although it still is the case that respondents in a category with lower assigned numbers possess less knowledge than those in a category with higher assigned numbers. This should be very reasonable if interviewers employ several observational criteria (e.g., answers to various questions along with other cues) to rate the respondents. Third, interviewer-conditional measurement means that the scores assigned by one interviewer are not necessarily comparable to those assigned by any other interviewer. It is reasonable to assume that each interviewer has his or her own idea of what constitutes each level of political knowledge that is, what a respondent must say or do in order to get placed into the average category, or 9

12 any of the other four categories. But, one interviewer s idea of average may not be equivalent to another interviewer s (and similar non-equivalence probably exists for the other four categories). Taken together, these assumptions imply that the scores assigned to the five knowledge categories can be meaningfully compared across the respondents interviewed by any given interviewer. But, the numeric scores for respondents from one interviewer cannot be compared legitimately to the scores for respondents interviewed by someone else. Thus, the usual quantification generated by the interviewer ratings of respondent political knowledge is based upon a set of unrealistically stringent assumptions about its measurement characteristics. Therefore, it is important to develop a different quantification of the interviewer ratings which corresponds to less stringent, but more realistic, measurement assumptions. A strategy for doing so is presented in the next section. ESTIMATION STRATEGY: ALSOS The interviewer makes his or her rating of the respondent s knowledge at the end of the interview and it presumably is based (in some way) on the full interaction that the interviewer has had with the respondent. As such, it should incorporate more than just factual knowledge, hopefully including such things as the domain-specific and operative forms of knowledge that are missing from responses to factual questions about officeholders and other general textbook aspects of American government. Nevertheless, there is no reason to expect that the interviewer ratings of political knowledge would be inconsistent with the kinds of factual knowledge that are tapped by the latter survey questions. All of the ANES interviewers pose the factual questions from the interview schedule to all of their respondents. The correct answers to the factual questions (obviously) do not vary across respondents or interviewers. Therefore, they can be regarded as a fixed standard from which to evaluate variability across the latter two sets of actors. In effect, we will generate a new quantification of the interviewer ratings of respondent political knowledge that is calibrated according to the responses on the factual items. In order to do so, we will assign a set of scores to the interviewer ratings that optimize two criteria simultaneously: 1. The assigned scores will maximize the squared multiple correlation between the interviewer ratings and the objective measures of respondent information. 10

13 . The assigned scores will conform strictly to the pre-specified assumptions about the measurement properties of the interviewer ratings (i.e., ordinal level, continuous process, and interviewer-conditional). Numeric values that possess the preceding two characteristics are called optimally-scaled values, or OS scores. For present purposes, the OS scores represent a quantification of the interviewer ratings of political knowledge that conform to a realistic set of measurement assumptions. Hopefully, that will make them more useful tools for research than the traditional one-to-five quantification that is beset with the problems discussed earlier. The specific procedure for obtaining the OS scores is called multiple optimal regression via alternating least squares (MORALS, see Young, de Leeuw, Takane 1976) and it is a manifestation of a more general strategy for the quantitative analysis of qualitative data called alternating least squares, optimal scaling or ALSOS (Young 1981; Jacoby 1999). MORALS is an iterative procedure. Each iteration begins by regressing the current quantification of the interviewer ratings on the objective measures of factual knowledge from the ANES. On the first iteration, the the current quantification is just the usual (reflected) one-to-five scale assigned in the ANES codebook. On all subsequent iterations, it is an updated set of scores that provide the optimal quantification based upon the estimates so far. On each iteration, the procedure has two separate phases. In the first phase, the procedure obtains ordinary least squares estimates of the regression coefficients relating the factual knowledge measures to the current quantification of the knowledge ratings. These coefficients and the independent variable values are used to produce predicted values of political knowledge for the respondents in the usual manner. By definition, these predicted values are the linear combination of the independent variables that is maximally correlated with the dependent variable (again, the current quantification of the knowledge ratings). In the second phase of each iteration, Kruskal s (196) monotonic transformation is applied to the predicted values. This produces a set of scores that are as similar as possible (in the leastsquares sense) to the predicted values from the regression, but are still perfectly monotonic with the ordered categories of the original interviewer ratings. Kruskal s primary treatment of ties is used in the transformation, which means that respondents in the same rating category need 11

14 not be assigned the same scores. Furthermore, the transformation is carried out separately for each subset of respondents interviewed by the respective NES interviewers; therefore, the ways the numbers are assigned to the respondents will vary across the interviewers. A complete iteration of the MORALS procedure consists of these two phases. Each time the first phase is carried out, the latest R value is compared to the R from the previous iteration (on the first iteration, the previous R is initialized at zero). If the goodness of fit has increased over the previous iteration, the MORALS algorithm goes on to the second phase of the iteration and calculates new optimal scores based upon the latest set of regression coefficients and predicted values. If the goodness of fit has stabilized (i.e., R does not increase over that from the previous iteration), the procedure terminates and the most recent transformations of the predicted values (those calculated in the second phase of the previous iteration) are taken as the OS scores. Again, the OS scores from the MORALS algorithm represent a quantification of the interviewer ratings that is maximally correlated with the factual items on the ANES, and consistent with the pre-specified measurement assumptions about the ratings. It is important to emphasize that this quantification is an interval-level representation; any changes in the relative sizes of the OS scores would degrade the goodness of fit between the scores and the predictor variables. Hence, the OS scores can be used in statistical models that require interval-level variables. Nevertheless, the OS scores are completely legitimate representations of the interviewer ratings because they always comprise a weakly monotonic function of the original ordered categories assigned by the respective interviewers. Beyond the monotonicity requirement, there are no restrictions on the differences between the numeric values assigned to respondents in different categories. In this manner, the quantification reflects the ordinal nature of the characteristic being measured. Different monotonic transformations are used for the respondents rated by different interviewers. This explicitly allows for the differential item functioning that Levendusky and Jackman identified in the traditional five-point measure. But, unlike the original one-to-five integer values, the OS scores can be compared across interviewers; they are the monotonic transformations of the categories for For example, two respondents that an interviewer rated as showing average levels of political information would not have to be assigned the same optimal scores, as long as their scores are greater than or equal to the scores assigned to all respondents that the interviewer rated as political information and less than or equal to the scores assigned all respondents that the interviewer judged as possessing levels of political information. 1

15 each interviewer that, when combined into a single set of values, produces the highest squared multiple correlation with the fact-based knowledge measures. Finally, each of the five original observational categories of political knowledge is represented by a closed interval of OS scores for each interviewer. This is consistent with the likely possibility that each interviewer grouped together respondents with somewhat differing levels of political knowledge into single categories, although we still assume that the categories for each interviewer are ordered properly. In summary, the OS scores possess exactly the characteristics that most researchers assume to exist in the interviewer ratings of political knowledge. From a measurement theory perspective, the OS scores comprise a better quantification of the political knowledge ratings than the quantification provided by the five-point successive-integer scale. DATA AND ALSOS ANALYSIS The empirical analysis uses data from the 00 ANES. In that year, there were 81 interviewers with sufficient information to include in the MORALS estimation routine. For present purposes, interviewers have to interview more than one respondent because the transformation to the OS scores cannot be carried out with a single observation. Fortunately, only two interviewers had to be dropped for this reason (along with their respective respondents). At the same time, respondents must answer all of the factual questions and be rated by the interviewers in order to be included in the regression equation. There were 909 respondents with usable information. The independent variables comprise all of the factual questions included on the 00 ANES. First, there is the number of political figures whose offices the respondent identified correctly; this variable ranges from zero to four. Second, respondents were asked which party currently held the majority in the U.S. House and the Senate. A variable identifying how many of these they got correct ranges from zero to two. Third, a dummy variable is scored one if the respondent correctly stated that the rich-poor gap has been increasing and zero otherwise. Fourth, another dummy variable is scored one of the respondent correctly identifies the national Republican party as more conservative, and zero otherwise Two additional dummy variables are included in the regression equation for high school graduates and college graduates. 1

16 OLS and ALSOS Estimates Let us begin by examining the relationship between the traditional, but problematic, version of the interviewer rating of political knowledge and the other variables. Table 1 shows the ordinary least squares estimates obtained when the original five-point scale is regressed on the factual knowledge items and the two education dummy variables. Even with a knowledge measure that is based upon extremely unrealistic assumptions, these independent variables still account for almost 0% of the observed variance. Four of the six independent variables have statistically significant effects. The only exceptions are the dummy variables for increasing gap between rich and poor, and the ideology of the national parties. Thus, despite attributing properties to the interviewer ratings that they almost certainly do not possess (i.e., equal differences between all adjacent categories, homogeneity within categories, and comparability across interviewers), the resultant quantification appears to be moderately related to more objective measures of political knowledge. Table presents the ALSOS estimates for the same equation. 6 Here, the values of the dependent variable conform to much more realistic assumptions about the interviewer ratings. That is, the scores assigned to the respondents are monotonic to the five ordered categories that the interviewers used to classify their political knowledge, the scores can vary within each of those categories, and the scores are assigned separately for each of the 81 interviewers. And, the scores represent the best fit to the statistical model, subject to the preceding measurement constraints. These OS scores for the interviewer ratings are set to range from one to five, so their values are at least roughly comparable to the original five-point variable. The most obvious feature in Table is the very high level of variance explained, with R = When we assign scores via the MORALS routine, the quantification of the interviewer ratings is a nearly perfect linear function of the factual knowledge measures the latter account for about 90% of the variance in the former about half again more than with the traditional variable. It is remarkable that this improvement in fit is achieved solely by using more realistic assumptions about the measurement properties of the information ratings. The entries in Table 1 are also the estimates obtained in the first phase of the first iteration of the MORALS routine. 6 The MORALS routine was carried out using the optiscale in R (Jacoby 01). It took three iterations to produce the final estimates. 1

17 The individual coefficient estimates in Table are not of particular interest for present purposes. In effect, the independent variables function as the calibration instrument for adjusting the information in the interviewer ratings to their optimal scores (i.e., making the latter the best-fitting linear function of the former). But, it still is gratifying to see that any substantive interpretations about the impact of objective knowledge measures on the interviewer ratings would remain largely unchanged. While the specific numerical values differ, the relative sizes of the respective coefficients in Table are very similar to those in Table 1. Once again, college graduation shows the largest effect, followed by the ability to identify officeholders, high school graduation, the ability to identify congressional majority parties, correct ideological placement of the Republican party, and finally, correctly saying that the gap between rich and poor is increasing. Note that it is difficult to assess statistical significance of the individual variables effects because the standard errors from the MORALS routine are almost certainly too small (Young et al. 1976). Interviewer Effects and Measurement Assumptions The large difference in fit between the OLS and ALSOS results confirms the presence of interviewer effects in the knowledge ratings. We can be more specific about the nature of these effects by looking at the relationship between the original five-point scale and the OS scores for the individual ANES interviewers. Figure 1 contains a trellis display with 81 separate panels one for each interviewer in three subsets of 7 panels each. The panels are arrayed by the sequential numbers assigned to the respective interviewers. Therefore, the specific ordering of the panels has no substantive meaning. In each panel, the horizontal axis represents the original five categories used in the rating. They are spaced at equal intervals to correspond to successive integer scoring scheme usually applied to the categories. The vertical axis in each panel plots the mean OS score for the respondents that the interviewer placed into each the five rating categories. Note that many interviewers did not use the full range of categories, so there are fewer than five points plotted in those cases. The curve formed by using line segments to connect the adjacent points in each panel can be regarded as the estimated measurement function for that interviewer, in the sense that it summarizes the mapping from the observational categories to the optimally scaled scores assigned to the respondents. 7 7 The OS scores are not unique representations of the interviewer ratings. For each interviewer, any set of numbers that is weakly monotonic to the five rating categories would provide equally valid scores for the respondents in the 1

18 First, consider the shapes of the measurement functions. They are all constrained to be (weakly) monotonically increasing with respect to the political information categories. But, most of them are nearly linear, such as the functions for interviewers 1 (lower left corner of the first subset of panels) and (fourth from the left in the same row of panels). For the many interviewers for which this is the case, then the assumption of interval measurement level is not problematic because the differences in the mean OS values between adjacent observational categories are identical (or nearly so) across the ranges of values that they used to rate the respondents political information. Of course, there are also a non-trivial number of panels that exhibit non-linear arrays of functions so, in these cases, the average differences in the optimal scores are not equal across all adjacent pairs of categories. For those interviewers, the interval-level assumption clearly is violated. As a specific example, consider the plot for interviewer 79, the third from the right in the top row of the third subset of panels. This interviewer only used three of the rating categories, and the difference between the mean OS scores assigned to the and categories is much larger than the difference between the OS scores assigned to the and categories. More severe violations of the interval-level assumption occur when segments of the measurement function are flat. This means that the mean OS scores are identical across the two categories which, in turn, implies that the interviewer did not differentiate the respondents in those categories effectively with respect to their observed levels of factual knowledge. For example, consider interviewer 81, in the upper right corner of the third subset of panels, who shows one of the more pronounced examples of this problem. Even though this interviewer used four of the five rating categories (those from through, ) he or she did not distinguish the three lower categories according to their factual knowledge. Therefore, it is more effective to treat the ratings from this interviewer as representing only two distinct levels of political information, corresponding to Very high and everything else below that. Second, consider the differences in the specific measurement functions across the 81 panels of Figure 1. The varying shapes of the curves connecting the adjacent mean OS scores for the five rating categories confirm that the assumption of unconditional measurement across interviewers categories. But, these scores are uniquely optimal in the sense that they provide the best possible least-squares fit to the factual items on the ANES interview schedule. Even so, there are an infinite number of such scores but, they all would be linear transformations of these scores so any other sets of values would provide exactly the same information. This specific set of values is obtained by transforming the OS scores to range across the interval from one to five. 16

19 is violated severely. This implies that the level of political knowledge represented within each of the observational categories varies markedly from one interviewer to the next. As one example, consider the panels that fall fourth and fifth from the left in the bottom row of the first subset of panels (for interviewers and ). Both of these interviewers placed respondents into the Very high information category. But, the mean OS score assigned to this category in the right-hand panel is close to, but slightly lower than, the mean score for the category in the left-hand panel. This shows that the interviewer s perception of very high political knowledge corresponds on average to what interviewer means by fairly high political knowledge. It is important to emphasize that the conditionality problems in the interviewer ratings are distinct from the property of measurement level. As already discussed, many of the curves in Figure 1 are nearly linear in shape. And, as explained, this is evidence for interval-level measurement of the distinctions between the observational categories. But, the precise linear functions the slopes and intercepts for the nearly-linear curves do vary from one panel to the next. And, that confirms that interviewers map the rating categories onto actual levels of political knowledge differently from each other, even if they each do so in ways that imply equal differences between categories. While Figure 1 enables an assessment of measurement level and conditionality, it cannot be used to evaluate measurement process. The panels in the figure only plot the central tendency of the OS scores for each of the five rating categories, for each interviewer. But, the ALSOS analysis allowed the OS scores to vary within the categories so that each of the latter are quantified by a range of scores, rather than a single value. In principle, this could be shown in a trellis display like Figure 1, but using the actual OS values for the individual respondents, rather than the category means. This display is shown in the Appendix. But, it is difficult to interpret, due to the small sizes of the individual panels. And, there probably is not much to be gained from close inspection of that display, beyond the general observation that there is, indeed, variability in the OS scores within each of the observational categories used by the respective interviewers. Figure shows the more detailed measurement functions for two interviewers, taken from the overall trellis display in the Appendix. In each panel. the points and line segments plotted in blue represent the OS scores for the individual respondents rated by that interviewer. The green points and line segments plot the means of the OS scores for the observational categories (i.e., the same information that was shown in Figure 1). Within each panel, the amount of variability within a 17

20 rating category is represented by the length of the vertical line segment connecting the OS scores within that category. So, for interviewer 6, the range of OS values is greater within the three middle categories than in either of the two end categories (note that the category actually contains two respondents with identical OS scores, so their points are overplotted). In contrast, interviewer 1 shows much more variation in the amount of political knowledge subsumed within the category than in the or categories. This interviewer placed only one respondent in the category, and none in the category. The general conclusion to be drawn from these plots, and the full trellis display in the Appendix, is that there clearly is variability in the levels of political knowledge subsumed within each of the rating categories. Of course, this is ignored entirely if the usual five-point variable is used in its standard form. To summarize the results so far, the ALSOS analysis of the ANES interviewer ratings of respondent political knowledge that the typical usage of this variable is highly problematic. The five-point version of this variable based on the successive-integer scores in the ANES codebook are severely affected by measurement error. The ALSOS analysis provides more insights about the nature of that error. Specifically, it stems from three distinct sources: Measurement conditionality or differences in the ways the respective interviewers sort observed levels of political knowledge into the five categories and continuous measurement process the existence of nontrivial variability in the actual levels of political knowledge probably represent the most troublesome issues. Measurement level is problematic for some interviewers, who either fail to differentiate observational categories by actual levels of knowledge or vary in the extent to which they differentiate the amount of political knowledge in adjacent categories across the range of the scale. But, for many of the interviewers, their ratings do suggest that their average distinctions about political knowledge between categories are fairly uniform. OS SCORES AND MEASUREMENT QUALITY The optimally-scaled scores examined in the previous section produce the best quantification of the interviewer ratings of political knowledge in the sense that they are maximally correlated with the available objective information while still conforming strictly to a set of explicit measurement assumptions about the nature of the variable. So far, we have used these OS scores to gain insights 18

21 about the type of problems that arise in the typical quantification of the interviewer ratings (i.e., the usual five-point equal-interval scale). Hopefully, we can go significantly further than that, and use the OS scores to improve our measurement of respondents political information. We will use two criteria for evaluating our success in achieving this objective. The OS scores can be regarded as better measurement from a substantive perspective if they (1) provide more detailed resolution of the property being measured; and () exhibit clearer relationships with other theoretically relevant variables. As we will see, the OS scores meet both of these criteria quite easily. Measurement Resolution The interviewer ratings of political knowledge sort the ANES respondents into five ordered categories. If we conceptualize knowledge as a continuum, then the standard variable only allows us to empirically distinguish five positions along that continuum. In contrast, there are 0 distinct values in the set of OS scores. This enables much more fine-grained placement of individual respondents according to their political knowledge. Again, these latter placements are fully consistent with the information in the original ratings. If we focus on the OS scores for respondents rated by any given interviewer, then the locations along the continuum will be monotonic with respect to the categorical ratings. Figure shows the histograms for the interviewer ratings, separately for the original five-point scale, and for the OS scores. The two displays tell somewhat different stories about the distribution of political knowledge among the ANES respondents. The histogram for the five-point scale (i.e., the original rating scores from the ANES codebook) shows a distribution with a negative skew. The modal score is, corresponding to a level of political information although there are almost as many respondents in the middle category (scored as ), or political knowledge. The proportion of respondents in the highest knowledge category (1.% of the total) is much larger than the proportion that fall into the two lowest knowledge categories combined (1.9%). On the five-point scale, the mean is.9, the median is, and the standard deviation is 1.0. Overall, the traditional five-point scale suggests a fairly optimistic view of citizens political knowledge, with the typical individual placed somewhere from the middle to the more knowledgeable end of the continuum. 19

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES

4 Chapter 2 CHAPTER 2. MEASURING AND DESCRIBING VARIABLES 1. A. Age: name/interval; military dictatorship: value/nominal; strongly oppose: value/ ordinal; election year: name/interval; 62 percent: value/interval;