AnExaminationoftheQualityand UtilityofInterviewerEstimatesof HouseholdCharacteristicsinthe NationalSurveyofFamilyGrowth. BradyWest

Size: px

Start display at page:

Download "AnExaminationoftheQualityand UtilityofInterviewerEstimatesof HouseholdCharacteristicsinthe NationalSurveyofFamilyGrowth. BradyWest"

Scot McDaniel
5 years ago
Views:

1 AnExaminationoftheQualityand UtilityofInterviewerEstimatesof HouseholdCharacteristicsinthe NationalSurveyofFamilyGrowth BradyWest

2 An Examination of the Quality and Utility of Interviewer Estimates of Household Characteristics in the National Survey of Family Growth Brady T. West Michigan Program in Survey Methodology Institute for Social Research University of Michigan - Ann Arbor bwest@umich.edu NSFG Survey Methodology Working Papers, No April 2010

3 Quality and Utility of Interviewer Estimates of Household Characteristics 2 ABSTRACT Effective methods for repairing nonresponse error are of primary interest to the field of survey methodology, given declining response rates in household surveys of nearly all formats. Post-survey methods for repairing nonresponse error rely on the presence of auxiliary variables on a sampling frame for both respondents and non-respondents, and much methodological work has shown that the best auxiliary variables for repairing nonresponse errors are related to both the survey variables of interest and response propensity. Unfortunately, auxiliary variables having these optimal properties are rare in survey research practice. In Cycle 7 of the National Survey of Family Growth (NSFG), female interviewers performing household screening operations were asked to record their best guesses as to whether there were children under the age of 15 in the household (35,258 guesses by 96 interviewers, prior to the screening questions), and whether the selected respondent was in a sexually active relationship with a member of the opposite sex (13,495 guesses by 94 interviewers, after the completed screening questions). Given that correct values on these two indicators can be derived from completed household listings and responses to the main NSFG interview, this study sought to examine the amount of error in the two interviewer estimates, and the associations of the interviewer estimates with both key NSFG variables and the propensity to respond to the main NSFG interview, given a completed screening interview. Several significant associations were found, suggesting that these interviewer estimates may be useful for repairing nonresponse errors. However, a small simulation study shows that the level of estimation error in the NSFG may have a negative impact on potential reductions in nonresponse error. The study concludes with a discussion of estimation techniques used by highly accurate interviewers and future research in this area aimed at improving the quality of the interviewer estimates.

4 Quality and Utility of Interviewer Estimates of Household Characteristics 3 INTRODUCTION This paper presents an initial examination of the error properties associated with interviewer judgments of household characteristics in the National Survey of Family Growth (NSFG), and evaluates the utility of these paradata (Couper, 1998) for constructing post-survey nonresponse adjustments to NSFG estimates. The paper also simulates the implications of these errors for the bias properties of nonresponse adjustments, and reports the observational techniques used by interviewers who tend to be more accurate in their judgments. Effective and inexpensive methods for repairing nonresponse error are of primary interest to the field of survey methodology, given declining response rates in large household surveys of nearly all formats (De Leeuw and De Heer, 2002). One relatively inexpensive method for repairing unit nonresponse errors is to make post-survey adjustments to base sampling weights (if applicable) by grouping respondents and nonrespondents into weighting classes, and adjusting the weights based on inverses of estimated response rates (or response propensities when using logistic regression modeling) within the weighting classes. This adjustment method, however, relies on auxiliary variables (or covariates) measured for both respondents and nonrespondents, and methodological work has shown that the best auxiliary variables for repairing nonresponse errors are related to both the survey variables of interest and response propensity (Groves, 2006; Little and Vartivarian, 2005). Further, gains in the precision of survey estimates are also possible when using auxiliary variables that are correlates of survey variables of interest (Kreuter et al., 2010; Little and Vartivarian, 2005). Unfortunately, auxiliary variables having these optimal properties are rare in survey research practice (Kreuter et al., 2010). As a result, large survey research programs have turned to the collection of paradata (Couper, 1998), or variables describing interviewer observations and other measurements about the survey data collection process, from both respondents and nonrespondents. For example, working with National Health Interview Survey (NHIS) data collected in 2006 and 2007 and NHIS sample families

5 Quality and Utility of Interviewer Estimates of Household Characteristics 4 contacted at least once, Maitland et al. (2009) describe an analysis of such variables from the U.S. Census Bureau s Contact History Instrument (CHI), which are available on the NHIS paradata files. Analyzing both individual variables from the paradata files (measuring cooperation and contactability of the families) and factor scores derived from the variables, these authors showed that the variables tended to have much stronger correlations with survey participation than the NHIS variables of interest. Given that the variables measured in the CHI are not specifically designed to measure health status but rather to manage field operations, variables having a stronger theoretical relationship with the survey variables of interest would seem to be of more importance when collecting paradata. Indeed, Maitland et al. presented evidence of a health-related variable in the available NHIS paradata (breaking off an interview for health reasons) having a stronger correlation with both survey participation and selected health variables measured in the survey than other theoretically unrelated CHI variables. This work suggested that collecting paradata on variables having a theoretical relationship with survey variables of interest would certainly be recommended for making nonresponse adjustments based on the paradata. Kreuter et al. (2010) present examples of other large survey research programs attempting to collect paradata from both respondents and nonrespondents on other theoretical correlates of key survey variables, including the European Social Survey (ESS) and the American National Election Study (ANES). The Continuous National Survey of Family Growth (NSFG) is another example of a large survey research operation that has used paradata extensively for production and estimation work (Groves et al., 2009, p ). Beginning in Cycle 7 of the NSFG (2006 to present), interviewers were requested during screening operations to estimate whether children under the age of 15 were present in the household and whether or not selected respondents were in sexually active relationships. These two variables were thus collected from both respondents and nonrespondents to the eventual main NSFG interview during household screening visits, as a part of a larger paradata-driven responsive survey design

6 Quality and Utility of Interviewer Estimates of Household Characteristics 5 (Groves and Heeringa, 2006). The two variables have the important property of being theoretically (assuming no measurement error) correlated with a variety of key variables in the NSFG. In fact, a recent study (Kreuter et al., 2010) has demonstrated that these variables have better correlations with NSFG survey variables of interest than similar paradata collected in four other large-scale personal interview surveys, and that these correlations can lead to moderate changes in NSFG estimates after applying nonresponse adjustments based on the auxiliary variables. The present study was motivated in part by previous attempts to use similar forms of paradata for making nonresponse adjustments in a national transportation survey (Yan and Raghunathan, 2007), the National Election Study (Peytchev and Olson, 2007), and the European Social Survey (Kreuter, Lemay and Casas-Cordero, 2007). Unfortunately, the correlations of these interviewer observations with both response propensity and survey variables, along with the corresponding nonresponse adjustments, may be attenuated by measurement error in the auxiliary variables. Potential reductions in nonresponse error may not be realized if these variables are measured with too much error. The impact of measurement error in auxiliary variables on the bias of estimated regression coefficients in linear regression models has been well established (e.g., Fuller, 1987), and this bias carries over to logistic regression models used for response propensity modeling when making nonresponse adjustments based on predicted response propensities (Stefanski and Carroll, 1985). To date, only one study has directly examined the error associated with the measurement of these types of auxiliary variables (Groves et al., 2007), based on survey self-reports. These authors analyzed data from the first four quarters of Cycle 7 of the NSFG, and only found 70-80% accuracy on the sexual activity judgment, with evidence of over-estimation of sexual activity. One can therefore assume that the measurement error on these observations will not be negligible, especially given that the data are based on interviewer judgments. Two other more indirect studies of measurement error in these types of observations, focusing on item-missing data and inter-rater reliability, also provide

7 Quality and Utility of Interviewer Estimates of Household Characteristics 6 empirical support for the presence of errors (Casas-Cordero and Kreuter, 2008; Kreuter et al., 2007). No studies to date, however, have considered the impacts of these errors on the bias properties of subsequent nonresponse adjustments, and this study aims to extend the initial work of Groves et al. (2007) in this manner. The NSFG affords the opportunity to study this error and its implications for nonresponse adjustments in more detail, given that measures on these two variables are collected from respondents as a part of the main NSFG interview. The objectives of this study are to examine the amount of error associated with these types of observations, consider the implications of this error for nonresponse adjustments based on the paradata, present observation techniques used by the interviewers with the most accurate observations, and discuss methods that may be useful for reducing the measurement error in future investigations. METHODS Data Data collected during the first 10 completed quarters of the NSFG (July 2006 December 2008) were analyzed in this study, building on the work of Groves et al. (2007). Screening interviews are necessary in the NSFG to determine the eligibility of individuals in randomly selected households, given that the target population is non-institutionalized U.S. males and females aged Additional details on the design of the NSFG, which has a primary goal of collecting nationally representative data on factors affecting birth and pregnancy rates, family formation, and the risks of HIV and other STDs, can be found elsewhere (Groves et al., 2009). Prior to the first face-to-face contact attempt with a randomly selected household for screening purposes, female interviewers 1 were instructed to first locate the household and then estimate whether the selected household contained any children under the age of 15 present (yes / no). In the data set constructed for analyzing the amount of error in 1 The NSFG does not employ male interviewers for data collection.

8 Quality and Utility of Interviewer Estimates of Household Characteristics 7 these observations for this study, there were a total of 35,258 observations on the presence of young children reported by 96 interviewers. For each of these observations, completed household roster information was available to determine whether children under the age of 15 were actually present in the household (observations with missing household roster information were deleted). There was certainly a possibility of error in the household enumeration process, but for the purposes of this study, completed household rosters were assumed to be correct. Immediately after the successful completion of the full screening questionnaire and the selection of a respondent from a household for the main interview, interviewers were asked to estimate whether the selected respondent was in a sexually active relationship with an opposite-sex partner (yes / no). There were a total of 13,490 judgments of sexual activity reported by 94 interviewers for which actual survey information on sexual activity was also available from the CAPI portion of a completed main interview. These numbers were reduced relative to the observations on young children because the true value for the variable indicating the presence of young children under the age of 15 could be measured after the household roster was completed, and did not require information from the main interview. Measurement error on the self-report of sexual activity collected in the main NSFG interview was also a real possibility, but was not considered further in this study. These two interviewer judgments (the presence of children under age 15 and whether the selected respondent was sexually active) were therefore collected for both respondents and nonrespondents to the main NSFG interview request. For the purposes of making nonresponse adjustments for the main interview in practice, one would certainly prefer to use the correct household roster indicator for young children; in this study, error properties of the interviewer observation on this indicator and their implications for nonresponse adjustments were considered. In total, there were 15,044 completed screening interviews where these two observations were available for potential respondents and nonrespondents, and a main interview was either completed or not

9 Quality and Utility of Interviewer Estimates of Household Characteristics 8 completed (more than 4,600 cases with completed screening interviews that did not respond in the first 10 weeks of the quarter and were not randomly selected for the NSFG second phase were dropped). A small subset of the more accurate interviewers (based on the household rosters and the responses to the main survey) were later approached by the NSFG field supervisor and asked to describe the observational techniques that they tended to use for making their judgments in an open-ended manner. In Quarters 1-10 of the NSFG, main interviews were completed by a total of 13,495 respondents. Five of these respondents had missing data on the variables necessary to determine a reported value of current sexual activity, resulting in the 13,490 sample persons with sufficient data for studying measurement error. Given the complex multistage sample design of the NSFG, base weights were necessary to offset unequal probabilities of selection, and included on the data file for each of the respondents. In addition, the NSFG includes a second phase or double sample operation where a subsample of initial nonrespondents from the first 10 weeks of a quarter receives an alternative and more intensive data collection protocol aimed at boosting response rates for the quarter (Lepkowski et al., 2010). The base sampling weights for second phase respondents therefore required an adjustment for this subsampling. Sampling error codes produced by NSFG staff enabling complex sample variance estimation (Lepkowski et al., 2010) were also used for the design-based analyses presented in this study. Survey variables collected in the main NSFG interview that were analyzed in this study included: 1) a binary indicator of whether the respondent had never been married; 2) a binary indicator of whether the respondent had ever had sex; 3) a binary indicator of whether the respondent had ever cohabitated with a partner; 4) the number of sexual partners in the past year; 5) for males, a count of biological children; and 6) for females, parity, or the number of live births. Male and female respondents to the main interview were coded as being sexually active if reporting one or more opposite sex partners in the past 12 months. Female respondents were also asked about having a current opposite-sex partner, and this measure was used to indicate being sexually active if no information was

10 Quality and Utility of Interviewer Estimates of Household Characteristics 9 available on the number of partners in the past 12 months. This information was used to determine the amount of error in the interviewer judgments regarding the sexual activity of selected respondents. Data Analysis Simple two-way cross-tabulations and unweighted Kappa statistics were used to examine overall agreement of the binary interviewer judgments with actual binary measures collected from the household roster information and the survey data. Two gross difference rates (GDRs), measuring the proportion of observations by an interviewer on each variable that were discordant with actual values (e.g., Biemer, 2004, p. 229), were computed for each interviewer. One example of a discordant observation would be a sampled male whom the interviewer estimates to be sexually active but does not report having a female sex partner in the past 12 months in the main NSFG interview. To examine associations of the two interviewer observations with propensity to respond to the main NSFG interview, two logistic regression models were fitted. These models used response to the main interview conditional on a completed screening interview as a binary dependent variable, the 15,044 successful screening interviews with interviewer observations available as the case base, the second phase sampling weight as a weight for each case (equal to 1 for respondents from the first phase), and a series of predictors identified as important in previous response propensity models for the main NSFG interview (Lepkowski et al., 2006) 2. The first model considered all of the predictors of response propensity from previous work, including a variety of paradata. The second model added the interviewer judgments on sexual activity and young children as predictors, to analyze their independent ability to predict response to the main interview. 2 In the final response propensity models used for computing nonresponse adjustments for the main NSFG interview in Cycle 7 (conditional on a completed screening interview), separate weighted logistic regression models were fitted in five strata defined by age and the interviewer estimation of sexual activity: age / not sexually active; age / not sexually active; age / sexually active; age / sexually active; and age / sexually active (Lepkowski et al., 2010). This stratification was performed after initial exploratory analyses found variance in response propensity by both age and estimation of sexual activity, in addition to variance in the relationships of other predictors with response propensity across the strata. The main effect models considered in this study are for illustrative purposes only.

11 Quality and Utility of Interviewer Estimates of Household Characteristics 10 Given that the relationships of both binary and count survey variables from the main NSFG interview with the two interviewer observations were of interest in this study, design-based logistic and Poisson regression models were fitted to the six different survey variables under consideration (e.g., Heeringa et al., 2010). Predictor variables in these models included the two interviewer observations, along with the same control variables used in the response propensity models (again to determine whether these two variables have independent predictive power, only now for the survey variables). These analyses represent an important first step in analyzing nonresponse bias (Maitland et al., 2009; Peytcheva and Groves, 2009), and assume that the associations of the survey variables with the interviewer observations are the same for both respondents and nonrespondents. The available NSFG data did not permit testing this assumption. Finally, to examine the impact of including the interviewer judgments in the nonresponse adjustments on the estimated means and estimated variances for the key survey variables, design-based estimates of means and percentages on the six survey variables (and their standard errors) were then computed using the base weights, nonresponseadjusted base weights with adjustments excluding the interviewer judgments, and nonresponse-adjusted base weights with adjustments including the interviewer judgments. RESULTS Overall Quality of the Interviewer Judgments Considering first a comparison of the overall quality of the two interviewer judgments, interviewers appeared to be less accurate when judging whether children under the age of 15 were present in selected households (Table 1). Roughly 73% (i.e., 59.63% %) of these judgments were correct based on the household roster information collected in the screening interviews (Kappa = 0.30), and among the errors, there was a slight tendency for more false negatives (15.00%) than false positives (12.47%).

12 Quality and Utility of Interviewer Estimates of Household Characteristics 11 Table 1: Case counts and overall percentages indicating the measurement error properties of interviewer judgments regarding the presence of children under the age of 15 in selected households (Quarters 1-10, Continuous NSFG) Household Roster Indicator: Kids Age < 15 Interviewer Judgment: Kids Age < 15 No Yes Totals No 21,025 (59.63%) 5,289 (15.00%) 26,314 (74.63%) Yes 4,395 (12.47%) 4,549 (12.90%) 8,944 (25.37%) Totals 25,420 (72.10%) 9,838 (27.90%) 35,258 (100.00%) * Note: Kappa Statistic = 0.30, 95% CI = (0.29, 0.31). Interviewers had an easier time estimating sexual activity (Table 2), with overall accuracy approaching 78% (Kappa = 0.34). Results are presented both overall and by gender, to see if the accuracy of this judgment varied depending on the gender of the selected respondent. Roughly 79% of these judgments were accurate when considering selected female respondents (Kappa = 0.35), consistent with findings reported by Groves et al. (2007) based on the first four quarters of data collection from the Continuous NSFG. Accuracy of these judgments was slightly lower when considering male respondents (roughly 76%; Kappa = 0.32), and the Kappa statistics for males and females were not found to be significantly different. Errors on the sexual activity observations had a slightly higher tendency to be false positives, where interviewers guessed that selected respondents were sexually active when in fact they were not. These analyses assume that self-reports of sexual activity in the main NSFG interview (collected using CAPI) are accurate.

13 Quality and Utility of Interviewer Estimates of Household Characteristics 12 Table 2: Case counts and overall percentages indicating measurement error in interviewer judgments of whether selected respondents were sexually active, both overall and by gender of selected respondent (Quarters 1-10, Continuous NSFG)* All Respondents Female Respondents Male Respondents Main NSFG Interview: Selected R Sexually Active Interviewer Judgment: Selected R Sexually Active No Yes Totals No 1,358 (10.07%) 1,290 (9.56%) 2,648 (19.63%) Yes 1,703 (12.62%) 9,139 (67.75%) 10,842 (80.37%) Totals 3,061 (22.69%) 10,429 (77.31%) 13,490 (100.00%) No 689 (9.37%) 668 (9.08%) 1,357 (18.45%) Yes 858 (11.67%) 5,140 (69.88%) 5,998 (81.55%) Totals 1,547 (21.03%) 5,808 (78.97%) 7,355 (100.00%) No 669 (10.90%) 622 (10.14%) 1,291 (21.04%) Yes 845 (13.77%) 3,999 (65.18%) 4,844 (78.96%) Totals 1,514 (24.68%) 4,621 (75.32%) 6,135 (100.00%) * Notes: Overall Kappa Statistic = 0.34, 95% CI = (0.32, 0.35). Male Kappa Statistic = 0.32, 95% CI = (0.30, 0.35). Female Kappa Statistic = 0.35, 95% CI = (0.32, 0.37). Test for Equal Kappa Statistics: Chi-square(1) = 1.38, p = Overall n = 13,490 (5 cases, or 1 female and 4 males, had missing data on the survey variable in the main NSFG interview). Interviewer-Specific Measures of Quality Figure 1 presents a scatter plot showing the association of the two GDRs computed for each interviewer. Each point corresponds to a single interviewer, and the gross difference rates (GDRs) for both interviewer observations define the horizontal and vertical axes. This plot allows for an examination of whether the same interviewer tends to do well on both estimation tasks. A weighted scatter plot smoother 3 was fitted to the GDRs, where the weights (and the sizes of the points in Figure 1) were proportional to the number of judgments on presence of young children (i.e., a proxy of the number of screening interviews attempted by each interviewer). Figure 1 shows that there is not consistent evidence of interviewers doing poorly or doing well on both observations, as would be indicated by a linear association between the GDRs, and the Pearson correlation (r) of the two GDRs was not significant at the 5% level. Accuracy tended to vary depending on the judgment and the interviewer, suggesting that the same interviewer may be using different strategies to make the two observations. 3 The survey package in the R software was used for this analysis.

14 Quality and Utility of Interviewer Estimates of Household Characteristics 13 Figure 1: Scatter plot examining the association of interviewer gross difference rates (GDRs) on judgment of sexual activity and presence of children under age 15 in the household. Two interviewers with relatively low GDRs on both observations are highlighted with arrows, and a weighted smoother is fitted to the points. Sizes of points (weights) are based on the number of initial observations on children under 15 (representing attempted screening interviews). Two interviewers with relatively low GDRs on both measures (indicating relatively high accuracy) are highlighted with arrows in Figure 1. One interviewer had 441 housing unit observations on the presence of children under 15, and was only incorrect on 65 of the observations (85% accuracy); this same interviewer also had 155 observations on sexual activity (after a completed screener) and was only incorrect on 27 of them (83% accuracy). The second interviewer had 1323 housing unit observations on the presence of children under 15, and was incorrect on 307 of them (77% accuracy); this same interviewer also had 487 observations on sexual activity and was only incorrect on 71 of them (85% accuracy). In contrast, one of the more poorly performing interviewers on

15 Quality and Utility of Interviewer Estimates of Household Characteristics 14 both measures was incorrect on 113 out of 261 young children observations (57% accuracy), and 32 out of 96 sexual activity observations (67% accuracy). Figure 1 shows evidence of interviewer variance in the accuracy of these two observations, which suggests that interviewers may vary in terms of their observational strategies. Variance in accuracy between interviewers may also arise as a function of the difficulty of the PSU being worked by an interviewer (e.g., urban areas without yards might make it harder to see children s toys). If the errors in these observations are in fact having a negative impact on subsequent nonresponse adjustments based on the judgments, then methods for reducing the discrepancies in accuracy between interviewers certainly require additional research. A later section will consider some of the observational techniques used by the more accurate interviewers. Associations of Interviewer Judgments with Response Propensity For an auxiliary variable to be effective in constructing nonresponse adjustments, it must first have a significant association with response propensity. The following table presents estimates of the odds ratios (along with 95% confidence intervals for the odds ratios) in the two logistic regression models predicting propensity to respond to the main NSFG interview, conditional on a completed screening interview. The first set of estimates (Model 1) is from the fitted model excluding the two interviewer judgments, while the second set of estimates (Model 2) is from the second model including the two interviewer judgments. Table 3: Main interview response propensity modeling results, showing significant predictors of response propensity in models excluding and including the interviewer judgments (Continuous NSFG, Quarters 1-10). Model 1 Model 2 Predictor Estimated Odds Estimated Odds 95% CI Ratio Ratio 95% CI Physical Impediments to HH (1.076, 1.488) (1.086, 1.502) Call Number (0.875, 0.894) (0.874, 0.893) Number of Contacts (1.499, 1.684) (1.480, 1.663) Black Respondent (0.921, 1.282) (0.911, 1.268)

16 Quality and Utility of Interviewer Estimates of Household Characteristics 15 Quarter (0.416, 0.665) (0.429, 0.689) Quarter (0.667, 1.108) (0.675, 1.123) Quarter (0.969, 1.613) (0.985, 1.645) Quarter (0.980, 1.638) (1.014, 1.699) Quarter (1.112, 1.855) (1.167, 1.953) Quarter (0.858, 1.410) (0.886, 1.453) Quarter (0.732, 1.171) (0.796, 1.276) Quarter (0.863, 1.414) (0.886, 1.453) Quarter (0.574, 0.904) (0.574, 0.905) High Estimated Main Interview Probability (0.693, 1.052) (0.690, 1.048) Medium Est. Main Interview Probability (0.350, 0.481) (0.353, 0.485) Low Estimated Main Interview Probability (0.154, 0.214) (0.155, 0.216) Age (1.350, 1.730) (1.516, 1.965) Age (1.242, 1.691) (1.230, 1.676) Urban Neighborhood (0.924, 1.246) (0.933, 1.260) Single Household (1.090, 1.529) (1.210, 1.708) Interviewer Non-White (0.862, 1.201) (0.810, 1.131) Bilingual Interviewer (1.016, 1.312) (0.984, 1.274) East Region (0.499, 0.701) (0.487, 0.686) Midwest Region (0.911, 1.280) (0.882, 1.242) West Region (1.028, 1.420) (1.084, 1.503) <10% Black, <10% Hispanic Population (0.673, 0.992) (0.681, 1.004) In Segment >10% Black, <10% Hispanic Population (0.697, 1.054) (0.731, 1.107) In Segment <10% Black, >10% Hispanic Population (0.522, 0.761) (0.516, 0.752) In Segment All Housing Units in Segment Residential (1.041, 1.299) (1.035, 1.292) Interviewer Has Safety Concerns (1.038, 1.358) (1.004, 1.315) Case Part of Second Phase Sample (0.241, 0.306) (0.239, 0.304) Interviewer Estimates R Sexually (1.405, 1.837) Active Interviewer Estimates Children (1.113, 1.415) Under 15 in HH Sample Size 15,044 15,044 Nagelkerke R AUC Reference Categories: Quarter 10; Estimated Main Interview Probability Missing; Age 30-44; Region South; Domain >10% Black, >10% Hispanic Population in Segment

17 Quality and Utility of Interviewer Estimates of Household Characteristics 16 The estimated odds ratios in these two models show that there are several strong predictors of propensity to respond to the main NSFG interview, with most reflecting theoretical expectations. For example, respondents having received more calls have significantly lower odds of responding to the main interview. One type of paradata that the interviewers are asked to collect is an estimate of the probability that a selected respondent will complete the main interview, and relative to missing observations on this variable (treated as a discrete category), a sample line with a low estimated probability assigned by an interviewer has more than 80% lower odds of completing the main interview. Of particular interest are the estimates in the second model, where the two interviewer judgment variables are added to the initial models. Both of these variables are significant independent predictors of response propensity: controlling for the other predictors, respondents estimated to be sexually active have 61% higher odds of completing the main interview, while respondents in households estimated to have children under 15 have 26% higher odds of completing the main interview. The overall fit of the model is improving when adding these two predictors, with the area under the curve (AUC) increasing by However, it is important to note that predicted response propensities for the 13,495 respondents based on the two models have a very high correlation (0.989), suggesting that impacts on estimates from using one set of response propensities or another for nonresponse adjustments may be relatively minor. We note that the estimated odds ratios which change the most from Model 1 to Model 2 are those for the youngest age category and single households. This suggests that the impacts of these predictors on response propensity may become stronger when adjusting for the interviewer judgments. Given the possibility of differential measurement error in these two judgments across interviewers, sensitivity of the relationships of these two judgments with response

18 Quality and Utility of Interviewer Estimates of Household Characteristics 17 propensity to the presence of particular interviewers was also examined. Specifically, the second response propensity model in Table 3 was re-fitted multiple times, each time excluding one of the interviewers. This jackknife-type approach was used to examine the range of estimated odds ratios for the two interviewer judgments. The resulting range of estimated odds ratios for sexual activity was (1.532, 1.669), and the resulting range of estimated odds ratios for presence of young children was (1.222, 1.306). Further, in all cases, a value of 1 was not included in the design-based 95% confidence intervals. The results indicate that the relationships of these two judgments with response propensity were not heavily influenced by particular interviewers. Collectively, these results suggest that these two variables may be independently useful for the construction of nonresponse adjustments. It remains important to determine whether the two interviewer judgments are also predictive of key survey variables when controlling for the same predictors. Associations of Interviewer Judgments with Key NSFG Variables Table 4 indicates the results of design-based tests of significance for the two interviewer judgments as predictors of the six NSFG variables measured in the main interview. The tests indicate whether or not the regression parameters associated with the two predictors in the six regression models are significantly different from zero when controlling for the other predictors in the response propensity models (the same control variables from Table 3). The estimates of the parameters indicate the directions of the relationships of the two interviewer judgments with either the log-odds of a binary outcome being equal to 1 or the log-mean of the count outcomes (from a Poisson regression model).

19 Quality and Utility of Interviewer Estimates of Household Characteristics 18 Table 4: Design-based estimates of regression parameters and tests of significance for the two interviewer judgments as predictors of six NSFG variables of interest, contrasted with estimates when using true values as predictors NSFG Survey Variable* (Outcome) Never Been Married (binary, n = 13,495) Ever Had Sex (binary, n = 13,495) Ever Cohabitated (binary, n = 13,495) Number of Biological Children (count, males, n = 6,139) Number of Sexual Partners in Past Year (count, n = 12,468) Parity (count, females, n = 7,356) Interviewer Judgment: Children Under 15 Estimate (SE), p-value (0.10), p = [-0.94 (0.13), p < ] 0.18 (0.11), p = [0.39 (0.17), p = ] 0.01 (0.10), p = [0.12 (0.09), p = ] 0.20 (0.05), p < [0.59 (0.06), p < ] (0.03), p = [-0.09 (0.04), p = ] 0.31 (0.04), p < [1.16 (0.11), p < ] Interviewer Judgment: Sexual Activity Estimate (SE), p-value (0.20), p < [-2.56 (0.21), p < ] 1.70 (0.14), p < [23.07 (0.24), p < ] 0.81 (0.13), p < [2.18 (0.12), p < ] 0.44 (0.11), p < [1.37 (0.18), p < ] 0.21 (0.06), p < [5.09 (0.82), p < ] 0.49 (0.12), p < [1.09 (0.10), p < ] * Note: Parameter estimates for other control variables listed in Table 3 are not shown for each dependent variable in the first column. Parameter estimates for variables containing the true values on sexual activity and presence of young children are displayed in italics and [brackets]. The results in Table 4 clearly show that the interviewer judgment of sexual activity is a strong correlate of these six NSFG survey variables when controlling for the other predictors included in the response propensity models (Table 3). Respondents estimated to be sexually active have significantly lower odds of never having been married, significantly higher odds of ever having had sex, significantly higher odds of ever having cohabitated, significantly more biological children on average (males), significantly more sexual partners in the past year, and significantly more live births (females), all when controlling for the other predictors from Table 3. The interviewer judgment of whether there are children under 15 in the household is not as strong of a correlate of these six outcomes, suggesting that its utility may be more limited when constructing nonresponse adjustments. Particularly striking in the Table 4 results are the estimates of these relationships had the variables measuring true values for sexual activity and presence of children under 15 in the household been used in the models instead of the interviewer judgments. The severe attenuation of these relationships due to the measurement error in

20 Quality and Utility of Interviewer Estimates of Household Characteristics 19 the interviewer judgments is clearly evident, which will more than likely impact the effectiveness of nonresponse adjustments based in part on the judgments. Impacts of Nonresponse Adjustments on Survey Estimates Collectively, the results in Tables 3 and 4 suggest that the interviewer judgment of sexual activity might be a candidate for an auxiliary variable to be used in constructing nonresponse adjustments for the main NSFG interview, whether using response propensity models or developing weighting classes. At present, this variable is used in computing the final nonresponse adjustments for the initial Continuous NSFG data release (Quarters 1-10), and other work has demonstrated the changes in estimates that are possible given strong correlations of these kinds of auxiliary variables with both response propensity and several key survey variables (Kreuter et al., 2010). Table 5 presents estimates of percentages or means (including design-based estimates of standard errors) on the six key NSFG variables, using three alternative weights: the base weights without nonresponse adjustments, the base weights with nonresponse adjustments based on predicted response propensities excluding the interviewer judgments ( No IW ), and the base weights with nonresponse adjustments based on predicted response propensities including the interviewer judgments ( IW ). Table 5: Impacts of alternative nonresponse adjustments on NSFG estimates (design-based standard errors reported in parentheses)* Variable (Estimate) Base Weights Only Nonresponse Adjusted Nonresponse Adjusted Base Weights, No IW Base Weights, IW Never Married (%) 49.74% (1.31) 49.60% (2.51) 49.34% (2.70) Had Sex (%) 84.89% (0.99) 87.01% (1.08) 86.87% (1.12) Ever Cohabitated (%) 48.92% (1.57) 49.95% (2.65) 49.88% (2.83) Males: # Biological Kids (Mean) 1.30 (0.07) 1.31 (0.16) 1.30 (0.17) # Partners in Past Year (Mean) 1.17 (0.02) 1.16 (0.02) 1.16 (0.02) Females: Parity (Mean) 1.27 (0.05) 1.27 (0.05) 1.26 (0.05) * Notes: these estimates do not incorporate post-stratification factors and do not represent final estimates based on quarters 1-10 of NSFG Cycle 7. n = 13,495 for all analyses.

21 Quality and Utility of Interviewer Estimates of Household Characteristics 20 The three estimates presented in Table 5 for each of the six NSFG variables suggest that nonresponse adjustments including the two interviewer observations are not having a substantial impact on the estimates. Nonresponse adjustments based on the additional interviewer judgments continue to move the estimated percentage that have never been married farther down, and nonresponse adjustments appear to increase the estimate of the percentage that have ever had sex, with the estimate based on the interviewer judgments slightly lower than the estimate without. Given the results in Table 4, these findings are not terribly surprising, as the measurement error in the interviewer judgments appears to be attenuating the true relationships of these auxiliary variables with the key survey variables (and thus reducing the potential reductions in bias and variance from using the auxiliary variables to make nonresponse adjustments). It is worth noting the increases in variance of the estimates when applying the nonresponse adjustments, relative to the changes in the estimates: there does not appear to be a fair bias-variance tradeoff, with the changes in variance being much larger than the changes in the estimates (with the exception of the estimate for percentage that have ever had sex). In theory, stronger correlations of the interviewer judgments with the survey variables should help to reduce variance (Little and Vartivarian, 2005), but the measurement error in the judgments appears to be preventing this. The overall lack of difference in the estimates with and without nonresponse adjustments could be due to the relatively high main interview response rate in the NSFG (81%, conditional on a completed screener). Implications of Measurement Error for Nonresponse Adjustments The results presented thus far indicate that nonresponse adjustments incorporating the two interviewer judgments are having only a minimal impact on NSFG estimates, despite apparent associations of the judgments with both response propensities and the survey variables of interest. An important open question remains, especially given the results in Table 4: what is the impact that errors in the interviewer judgments are having on the effectiveness of these nonresponse adjustments?

22 Quality and Utility of Interviewer Estimates of Household Characteristics 21 To initially examine possible theoretical implications of measurement error in the interviewer judgments of sexual activity on the bias and variance properties of subsequent nonresponse adjustments, a small simulation study was performed using real NSFG data. A hypothetical population was defined by the N = 7,355 female respondents to the main NSFG interview in Quarters 1-10, and the data set for this population included both interviewer judgments of sexual activity and actual reports of sexual activity from the main NSFG interview. The base sampling weights and NSFG survey variables measuring parity and number of partners in the past year were also included in the population data file for the simulations, allowing for an examination of how measurement error in the interviewer judgments can attenuate relationships between sexual activity and these two survey variables. In each of six simulations (three for each survey variable), one hundred (100) probability proportionate to size (PPS) samples of size n = 500, with size measures for females representing inverses of the NSFG base weight, were selected from this artificial population. This size variable has no relationship with either survey variable in the population, meaning that the sampling can be considered non-informative (for these variables). A small number of the largest NSFG base weights were trimmed to the 95 th percentile for the weights to enable the PPS selection. New base weights were then computed for each simulated sample based on the probabilities of selection. Unit nonresponse was simulated for each of the 100 samples based on the following logistic regression model, motivated by actual NSFG outcomes 4 : exp( report. sexually. activei ) Pr( responsei ) = 1 + exp( report. sexually. active ) A sampled case denoted by i had values on the two survey variables deleted if a random draw from a UNIFORM(0,1) distribution was not less than or equal to the probability i 4 See Table 3, where the estimated coefficient for the sexual activity judgment in the response propensity model was ln(1.607) = 0.47, and was likely attenuated toward zero by the measurement errors in these judgments.

23 Quality and Utility of Interviewer Estimates of Household Characteristics 22 computed above. The simulated probability of response was thus a function of the reported sexual activity for case i (1 = yes, 0 = no), and not the interviewer judgment. For each simulated sample, a logistic regression model was fitted to a response indicator, with a given sexual activity measure (reported or judged) and the sampling weight as predictors, and the inverse of the predicted probability based on this model was used to adjust the base sampling weight for non-response. Given known means on the two variables for the artificial population, the empirical bias, root mean squared error (RMSE), and 95% confidence interval coverage of simulated design-based estimates using a) the base weights only, b) nonresponse-adjusted base weights using the reported sexual activity values, and c) nonresponse-adjusted base weights using the interviewer judgments were computed. Based on the artificial population of N = 7,355 females with available values on both the interviewer judgments of sexual activity and self-reported sexual activity, Table 6 presents simple differences in population means between sexually active and sexually inactive females (according to each potential auxiliary variable) on the NSFG variables measuring parity and number of partners in the past year. For example, the population mean on number of partners in the past year for sexually active women (according to survey reports) is 1.29, compared to a population mean on number of partners for sexually inactive women of 0.00 (by definition). The corresponding means for sexually active and sexually inactive women based on the interviewer judgments are 1.09 and Table 6: Differences in population means on parity and number of partners in the past year as a function of sexual activity for the artificial NSFG population, by measure of sexual activity Self-Reported Sexual Activity Interviewer Judgment of Sexual Activity NSFG Variable Active Inactive Active Inactive Parity Partners in the Past Year

24 Quality and Utility of Interviewer Estimates of Household Characteristics 23 The results in Table 6 clearly show the attenuation in the relationship between sexual activity and these two survey variables that is introduced by the measurement error in the interviewer judgments. Much stronger associations are evident when using the selfreported values of sexual activity, which suggests that nonresponse adjustments based on the interviewer judgments are eliminating some of the within-group homogeneity that would be possible if the judgments were closer to the reported values. This would have important implications for nonresponse adjustments based on weighting classes. Table 7 summarizes the results of this small simulation study using real NSFG data, showing the empirical performance (across 100 simulated samples) of the three potential estimators of the mean on each NSFG variable: the estimator using base weights only, the estimator with a nonresponse adjustment to the base weights based on self-reported sexual activity, and the estimator with a nonresponse adjustment to the base weights based on the interviewer judgment of sexual activity. Table 7: Results of simulation study, showing empirical performance of estimators with and without nonresponse adjustments based on the interviewer judgments NSFG Variable Parity Partners in Past 12 Months Nonresponse Adjustment Method Auxiliary Variable for Nonresponse Adjustment True Mean None Response Propensity Self-reported Sexual Activity Interviewer Judgment of Sexual Activity None Response Propensity Self-reported Sexual Activity Interviewer Judgment of Sexual Activity Empirical Bias (Rel. %) (3.36%) (0.09%) (2.20%) (5.28%) (0.17%) (3.92%) Empirical RMSE 95% CI Coverage Mean CI Width

25 Quality and Utility of Interviewer Estimates of Household Characteristics 24 The results in Table 7 suggest that use of the interviewer judgment on sexual activity as an auxiliary variable when constructing the nonresponse adjustments (rows 3 and 6) attenuates potential reductions in both bias and variance when using the response propensity weighting method, relative to adjustments using the true self-reported values of sexual activity (rows 2 and 5). The bias of the resulting estimates (when using the interviewer judgments) is similar to (and slightly lower than) that found when analyzing the cases without any adjustments to the base weights for nonresponse under the defined nonresponse mechanism (rows 1 and 4). For example, the relative bias of the completecase estimate of the mean on parity is 3.36%. When using the true sexual activity to make a nonresponse adjustment, the relative bias is 0.09%, but when using the interviewer judgment of sexual activity, the relative bias is 2.20%. There is also evidence of higher empirical RMSEs in the estimates (compared to use of base weights only) when using the interviewer judgments, in contrast to the lower empirical RMSE in the estimates found when using the true measures. This suggests that potential reductions in the variance of estimates are being attenuated as well by using the error-prone judgments; in fact, the estimates based on nonresponse adjustments using the judgments have the highest RMSE for both variables. Coverage and confidence interval width do not appear to be affected by the use of the interviewer judgments (measured with error) for making the nonresponse adjustments. Observational Techniques used by Accurate Interviewers Given the negative implications of error in the sexual activity judgments for nonresponse adjustments found in the small simulation study above, methods for minimizing error in the observations certainly warrant investigation. Five of the active NSFG interviewers that appeared to be performing well on both observations after the first 10 quarters of data collection (see Figure 1) were approached over by the NSFG field supervisor, and asked about the techniques that they used in making their observations (given their relatively high accuracy rates). In some cases, follow-up conversations were had over the telephone to clarify messages. The following techniques and visual cues were

26 Quality and Utility of Interviewer Estimates of Household Characteristics 25 identified by these interviewers regarding their observations on the presence of young children: Examining the furniture outside the house Presence of baby strollers, outdoor toys / shoes in the yard or on the porch, evidence of stickers / crayons / miscellaneous kids decorations Looking inside any open curtains / blinds for baby blankets and baby furniture Firefighter stickers indicating the presence of children Bikes on the porch or in the yard / swing sets or trampolines in the yard Boxes for baby wipes or diapers Looking inside cars in the driveway for booster seats or toys in the backseats Looking inside open garages for toys or other child equipment Basketball hoop in the driveway Candy wrappers around the doorway Listening for sounds of children The following techniques and visual cues were identified by these same five interviewers regarding the estimation of sexual activity: Considering the physical appearance of the selected respondent and others in the household (conservatively dressed?) Teenage respondents that are said by parents to not come home after school, hang out with friends, or are frequently involved in sports activities Teenage respondents that are more reserved and shy (not sexually active) vs. teenage respondents that are more interested in the content of the survey Gut feelings about the selected respondent Presence of children even if only a single adult lives in the household Two cars parked at the household A nicely manicured lawn and yard, indicating the presence of a male in the household

27 Quality and Utility of Interviewer Estimates of Household Characteristics 26 Upscale neighborhood, which indicates two incomes and increased likelihood of being sexually active Well maintained residences in lower income neighborhoods usually indicate older couples who are not likely to be sexually active Beer cans and garbage on the porch in a lower income area tend to indicate single males with no females living in the household (not sexually active) Collectively, these essentially anecdotal techniques and visual cues provide interesting insights into the observation methods used in the field by the more accurate interviewers. Although all of the NSFG interviewers are trained to record these judgments based on their initial impressions and best guesses, these interviewers tended to be very aware of the area surrounding each household and were able to pick up on a variety of evidence and visual cues that lead to more accurate observations on these two variables. Unfortunately, similar conversations were not attempted with poorly performing interviewers at the same time. Future study of these observational techniques among both accurate and inaccurate interviewers is certainly warranted in an effort to reduce errors in these observations. DISCUSSION This study represents one of the first detailed examinations into the measurement error properties of interviewer observations in a national survey using in-person interviewing, and the implications of those errors for post-survey nonresponse adjustments. Analyses of data collected in the first ten quarters of the Continuous NSFG indicated that female interviewers tended to be between 70-80% accurate overall when estimating household characteristics, namely whether children under the age of 15 were present in a household and whether selected respondents were sexually active. Female interviewers did not appear to be consistently accurate or inaccurate on both observations, and a large amount of interviewer variance in accuracy was found, suggesting that different interviewers may be using different observational techniques that vary in their effectiveness (e.g., consistent guessing of Yes for sexual activity may overestimate the

28 Quality and Utility of Interviewer Estimates of Household Characteristics 27 proportion of sample units that are sexually active). However, we cannot rule out the possibility that accuracy may have depended on the difficulty of the primary sampling unit being worked by a given interviewer, which would likely impact the housing unit observations on the presence of children (e.g., urban areas may not have yards where toys could be noticed). In the NSFG design, the majority of the interviewers are assigned to one PSU, introducing confounding between interviewer and PSU. More detailed examinations of factors that influence the accuracy of these types of judgments are certainly needed. The two interviewer judgments were found to have strong relationships with both main interview response propensity and NSFG survey variables of interest when controlling for the relationships of other paradata and information collected in a screening interview, even though these relationships appeared to be attenuated due to the measurement error in the judgments. This makes the two judgments attractive candidates for auxiliary variables to be used in post-survey nonresponse adjustments. However, nonresponse adjustments to base sampling weights incorporating the two interviewer judgments were not found to significantly impact key estimates relative to nonresponse adjustments excluding the two observations. There could be a number of reasons for this finding, including the measurement error in the judgments and the relatively high response rate for the main NSFG interview conditional on a completed screening interview (81%). The NSFG is also unique in that it collects a large amount of paradata on sample units during the screening process while operating under a responsive survey design framework (Groves and Heeringa, 2006). Some of these variables may have been correlated with the two interviewer judgments (e.g., safety concerns, physical impediments, primarily residential neighborhood, single-person household, age, etc.), and the use of too much paradata when constructing nonresponse adjustments may introduce multicollinearity concerns in models used to compute the adjustments. Alternative methods of using the interviewer judgments to repair nonresponse errors were also certainly possible (e.g., multiple imputation analysis for individual survey items with the interviewer judgments as predictors in the imputation models, selection of only predictors having a significant

29 Quality and Utility of Interviewer Estimates of Household Characteristics 28 relationship with both the survey variables of interest and response propensity in the response propensity models, etc.). Most importantly, the possibility exists that errors in the judgments may have limited the effectiveness of the nonresponse adjustments based on the judgments. Results from a small simulation study indicated that the amount of error in the NSFG interviewer judgments of sexual activity (where 20-30% of observations are inaccurate) may be attenuating the effectiveness of nonresponse adjustments based on the judgments, relative to nonresponse adjustments based on the sexual activity reported by respondents to the main NSFG interview ( true values). This initial simulation study may provide one explanation for the lack of impact that nonresponse adjustments based on the two observations were having on NSFG estimates: measurement error at these levels or higher in similar auxiliary variables could eliminate a large portion of the reductions in nonresponse bias that would be possible if interviewers had access to the true values on the variables that they are trying to observe. The possible impact of measurement error on alternative nonresponse adjustment techniques needs more research focus as well. Given the potentially detrimental effects of measurement error in these observations on the effectiveness of nonresponse adjustments, future research in this area needs to examine predictors of accuracy among the interviewers and study observational techniques associated with reduced error. In this study, five of the interviewers found to have the highest accuracy on the two judgments were queried about the techniques that they employ in the field when making the observations. A wide variety of observational techniques and important visual cues to look for were provided by these interviewers, and regular assessment of these techniques used by the more accurate interviewers could lead to improved training programs aimed at reducing error in these judgments. The NSFG has started to request that interviewers provide open-ended justifications for all of their judgments, and these data will be analyzed qualitatively and linked with interviewer accuracy in future work. In addition, NSFG interviewers in the most recent quarter were provided with information regarding observable predictors of sexual activity that would

30 Quality and Utility of Interviewer Estimates of Household Characteristics 29 be available to them at the time of making a judgment for a selected respondent, and the effectiveness of this modification to NSFG protocol is currently being evaluated. The ongoing work in this area that is being conducted by NSFG staff aims to identify effective observational techniques that are associated with reduced error in these judgments. The collection of interviewer judgments on auxiliary variables having theoretical relationships with key survey variables provides a potentially useful tool for making postsurvey nonresponse adjustments. After factoring in the costs of incorporating training on effective techniques for making accurate judgments of selected characteristics into general interviewer training sessions, the collection of interviewer judgments in the field is a relatively inexpensive method of data collection with many potential benefits. These ideas extend beyond the NSFG; for example, the American National Election Study (ANES) has interviewers in the field note whether political signs were present for selected households. Other surveys may also benefit from having interviewers collect information on features of households that are relevant to a particular survey s content. However, survey researchers need to consider the potential measurement error involved in the collection of these observations and judgments, and the impact of the measurement error on the effectiveness of post-survey nonresponse adjustments. The additional time and cost required to train interviewers in these techniques and have the interviewers record observations may not be warranted if the level of error in the collected data has detrimental effects on nonresponse adjustments.

31 Quality and Utility of Interviewer Estimates of Household Characteristics 30 REFERENCES Biemer, P. (2004). Chapter 12: Modeling Measurement Error to Identify Flawed Questions. In Methods for Testing and Evaluating Survey Questionnaires, Edited by Presser et al. Wiley. Casas-Cordero, C. and Kreuter, F. (2008). Assessing interviewer observation of neighborhood characteristics for nonresponse adjustments. Paper presented at the International Conference on Survey Methods in Multinational, Multiregional, and Multicultural Contexts (3MC). Berlin, Germany. June 28, Couper, M.P. (1998). Measuring survey quality in a CASIC environment. Paper presented at the Joint Statistical Meetings of the American Statistical Association, Dallas, TX. de Leeuw, E., and de Heer, W. (2002). Trends in Household Survey Nonresponse: A Longitudinal and International Comparison. Chapter 3 in Groves, R.M. et al., Survey Nonresponse. Wiley. Fuller, W. (1987). Chapter 1: A Single Explanatory Variable. Measurement Error Models. Wiley. Groves, R.M., and Heeringa, S.G., 2006, Responsive Design for Household Surveys: Tools for Actively Controlling Survey Errors and Costs. J.R..Statist. Soc. A, 169, Part 3, Groves, R.M., Mosher, W.D., Lepkowski, J. and Kirgis, N.G. (2009). Planning and development of the continuous National Survey of Family Growth. National Center for Health Care Statistics. Vital Health Statistics, 1(48). Groves, R., Wagner, J., and Peytcheva, E. (2007). Use of Interviewer Judgments About Attributes of Selected Respondents in Post-Survey Adjustments for Unit Nonresponse: An Illustration with the National Survey of Family Growth. Proceedings of the Section on Survey Research Methods, Joint Statistical Meetings, Salt Lake City, UT. Heeringa, S.G., West, B.T., and Berglund, P.A. (2010). Applied Survey Data Analysis. Chapman and Hall / CRC Press. Kreuter, F., Lemay, M. and Casas-Cordero, C. (2007). Using Proxy Measures of Survey Outcomes in Post-Survey Adjustments: Examples from the European Social Survey (ESS). Proceedings of the Section on Survey Research Methods, Joint Statistical Meetings, Salt Lake City, UT. Kreuter, F., Olson, K., Wagner, J., Yan, T., Ezzati-Rice, T.M., Casas-Cordero, C., Lemay, M., Peytchev, A., Groves, R.M., and Raghunathan, T.E. (2010). Using Proxy Measures and Other Correlates of Survey Outcomes to Adjust for Nonresponse: Examples from Multiple Surveys. Journal of the Royal Statistical Society - Series A, 173, Part 3, Lepkowski, J.M. et al. (2010). The National Survey of Family Growth: Sample Design and Analysis of a Continuous Survey. Vital and Health Statistics, Series 2, No. 150, forthcoming in Lepkowski et al., NSFG Series 2 report from Cycle 6. Little, R.J., and Vartivarian, S. (2005). Does Weighting for Nonresponse Increase the Variance of Survey Means? Survey Methodology, 31(2),

32 Quality and Utility of Interviewer Estimates of Household Characteristics 31 Maitland, A., Casas-Cordero, C., and Kreuter, F. (2009). An Evaluation of Nonresponse Bias Using Paradata from a Health Survey. Proceedings of the Section on Government Statistics, Joint Statistical Meetings, Washington, D.C. Petchev, A. and Olson, K. (2007). Using Interviewer Observations to Improve Nonresponse Adjustments: NES Proceedings of the Section on Survey Research Methods, Joint Statistical Meetings, Salt Lake City, UT. Peytcheva, E. and Groves, R.M. (2009). Using variation in response rates of demographic subgroups as evidence of nonresponse bias in survey estimates. Journal of Official Statistics, 25, Stefanski, L.A., and Carroll, R.J. (1985). Covariate Measurement Error in Logistic Regression. The Annals of Statistics, 13(4), Yan, T. and Raghunathan, T. (2007). Using Proxy Measures of the Survey Variables in Post-Survey Adjustments in a Transportation Survey. Proceedings of the Section on Survey Research Methods, Joint Statistical Meetings, Salt Lake City, UT.

Use of Paradata in a Responsive Design Framework to Manage a Field Data Collection

Journal of Official Statistics, Vol. 28, No. 4, 2012, pp. 477 499 Use of Paradata in a Responsive Design Framework to Manage a Field Data Collection James Wagner 1, Brady T. West 1, Nicole Kirgis 1, James