Measuring noncompliance in insurance benefit regulations with randomized response methods for multiple items

Similar documents
Centre for Education Research and Policy

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

Conceptualising computerized adaptive testing for measurement of latent variables associated with physical objects

Latent Trait Standardization of the Benzodiazepine Dependence. Self-Report Questionnaire using the Rasch Scaling Model

Model fit and robustness? - A critical look at the foundation of the PISA project

Technical Specifications

Using the Testlet Model to Mitigate Test Speededness Effects. James A. Wollack Youngsuk Suh Daniel M. Bolt. University of Wisconsin Madison

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data

linking in educational measurement: Taking differential motivation into account 1

A Comparison of Several Goodness-of-Fit Statistics

Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study

THE NATURE OF OBJECTIVITY WITH THE RASCH MODEL

Item Analysis: Classical and Beyond

Description of components in tailored testing

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Influences of IRT Item Attributes on Angoff Rater Judgments

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

Overview. Survey Methods & Design in Psychology. Readings. Significance Testing. Significance Testing. The Logic of Significance Testing

Parameter Estimation with Mixture Item Response Theory Models: A Monte Carlo Comparison of Maximum Likelihood and Bayesian Methods

Catherine A. Welch 1*, Séverine Sabia 1,2, Eric Brunner 1, Mika Kivimäki 1 and Martin J. Shipley 1

Scaling TOWES and Linking to IALS

Statistics for Social and Behavioral Sciences

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data

Motivation: Fraud Detection

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

UvA-DARE (Digital Academic Repository)

Controlled Trials. Spyros Kitsiou, PhD

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests

Meta-Analysis of Randomized Response Research

Extending Rungie et al. s model of brand image stability to account for heterogeneity

INTRODUCTION TO ITEM RESPONSE THEORY APPLIED TO FOOD SECURITY MEASUREMENT. Basic Concepts, Parameters and Statistics

Models in Educational Measurement

CONSTRUCTION OF THE MEASUREMENT SCALE FOR CONSUMER S ATTITUDES IN THE FRAME OF ONE-PARAMETRIC RASCH MODEL

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Lecture Notes Module 2

Dichotomizing partial compliance and increased participant burden in factorial designs: the performance of four noncompliance methods

Bayesian hierarchical modelling

André Cyr and Alexander Davies

educational assessment and educational measurement

INVESTIGATING FIT WITH THE RASCH MODEL. Benjamin Wright and Ronald Mead (1979?) Most disturbances in the measurement process can be considered a form

GMAC. Scaling Item Difficulty Estimates from Nonequivalent Groups

Basic concepts and principles of classical test theory

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

SECTION 504 NOTICE OF PARENT/STUDENT RIGHTS IN IDENTIFICATION/EVALUATION, AND PLACEMENT

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

Randomized Experiments with Noncompliance. David Madigan

[3] Coombs, C.H., 1964, A theory of data, New York: Wiley.

Title: Determinants of intention to get tested for STI/HIV among the Surinamese and Antilleans in the Netherlands: results of an online survey

Instrumental Variables I (cont.)

Multilevel Latent Class Analysis: an application to repeated transitive reasoning tasks

AP Psychology -- Chapter 02 Review Research Methods in Psychology

Social Research (Complete) Agha Zohaib Khan

False Discovery Rates and Copy Number Variation. Bradley Efron and Nancy Zhang Stanford University

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests

Validating Measures of Self Control via Rasch Measurement. Jonathan Hasford Department of Marketing, University of Kentucky

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION

CRITICALLY APPRAISED PAPER (CAP)

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data

CHAMP: CHecklist for the Appraisal of Moderators and Predictors

Psychometric properties of the PsychoSomatic Problems scale an examination using the Rasch model

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

Evaluating the quality of analytic ratings with Mokken scaling

The Psychometric Development Process of Recovery Measures and Markers: Classical Test Theory and Item Response Theory

PSYCHOLOGY 300B (A01) One-sample t test. n = d = ρ 1 ρ 0 δ = d (n 1) d

Cover Page. The handle holds various files of this Leiden University dissertation

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

TECHNICAL REPORT. The Added Value of Multidimensional IRT Models. Robert D. Gibbons, Jason C. Immekus, and R. Darrell Bock

Type of intervention Secondary prevention and treatment. Economic study type Cost-effectiveness analysis.

Computerized Adaptive Testing With the Bifactor Model

Likelihood Ratio Based Computerized Classification Testing. Nathan A. Thompson. Assessment Systems Corporation & University of Cincinnati.

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis

Table of Contents. Preface to the third edition xiii. Preface to the second edition xv. Preface to the fi rst edition xvii. List of abbreviations xix

Formulating and Evaluating Interaction Effects

Issues That Should Not Be Overlooked in the Dominance Versus Ideal Point Controversy

Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century

Selection of Linking Items

References. Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Mahwah,

Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking

Confidence Intervals On Subsets May Be Misleading

RAG Rating Indicator Values

Multilevel IRT for group-level diagnosis. Chanho Park Daniel M. Bolt. University of Wisconsin-Madison

Quality of Life. The assessment, analysis and reporting of patient-reported outcomes. Third Edition

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA

The Effect of Extremes in Small Sample Size on Simple Mixed Models: A Comparison of Level-1 and Level-2 Size

Challenges of Observational and Retrospective Studies

Type I Error Rates and Power Estimates for Several Item Response Theory Fit Indices

A Bayesian Nonparametric Model Fit statistic of Item Response Models

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Validation of an Analytic Rating Scale for Writing: A Rasch Modeling Approach

Classical Psychophysical Methods (cont.)

Clinical trials with incomplete daily diary data

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Transcription:

Measuring noncompliance in insurance benefit regulations with randomized response methods for multiple items Ulf Böckenholt 1 and Peter G.M. van der Heijden 2 1 Faculty of Management, McGill University, 1001 Sherbrooke Street West, Montreal, QC H3A 1G5, CANADA 2 Department of Methodology and Statistics, Utrecht University, P.O. Box 80.140, 3508 TC Utrecht, The Netherlands Abstract: Randomized response (RR) is a well known method for measuring sensitive behavior. Yet it is not often applied. Two possible reasons for this are (i) its lower efficiency and the resulting need for larger sample sizes, making applications of RR expensive, (ii) the notion that in many applications the RR design may not be followed by every respondent ( cheating ). This paper addresses the efficiency problem by proposing item response theory (IRT) models for the analysis of multivariate RR data. In these models a person parameter is estimated based on multiple measures of a sensitive behavior under study which yields a more efficient and powerful analysis of individual differences than available from univariate RR data. Cheating in a RR study is approached by introducing additional mixture components in the IRT models with one component consisting of respondents who answer truthfully and other components consisting of respondents who do not provide truthful responses to all or a subset of the items. The resulting IRT model is applied to data from a Dutch survey conducted under receivers of disablement insurance benefit (DIB) who are interviewed about their compliance behavior to rules that are a prerequisite for receiving DIB. Keywords: randomized response; item response theory; cheating; sensitive behavior; efficiency. 1 Introduction In many RR studies, respondents are asked multiple questions about one or more domains. For example, in the 2002 surveys conducted in the Netherlands on social security regulation infringements of the Occupational Disability Insurance Act, the Unemployment Insurance Act and the National Social Security Assistance Act, each social security recipient was asked about nine randomized response questions about their compliance with these regulations. The following four questions focussed on health related issues: (1) Have you been told by your physician about a reduction in your

2 randomized response methods for multiple items disability symptoms without reporting this improvement to your social welfare agency? (2) On you last spot check by the social welfare agency, did you pretend to be in poorer health than you actually were? (3) Have you noticed personally any recovery from you disability complaints without reporting it to the social welfare agency? (4) Have you felt for some time now to be substantially stronger and healthier and able to work more hours, without reporting any improvement to the social welfare agency? Clearly, these questions are ordered according to their degree of intentional violations of the regulations. A person who does not report the outcome of a medical investigation may also avoid reporting any personally noticed improvements of their health status. In contrast, persons who notice personal improvements may or may not mis-report their health status. Item response models (van der Linden et al., 1997) are well suited for studying how individuals differ in their compliance behavior by ordering respondents on a latent continuum that represents their level of compliance. Although there is much empirical support to indicate that RR methods increase the number of honest responses, there is no guarantee that all respondents provide truthful answers (see van der Heijden et al, 2000). Some respondents might violate the rules set out by the RR procedure. Here the Forced Choice response format is used: respondents are asked to throw two dice, to answer yes when the outcome is 2, 3 or 4, to answer no when the outcome is 11 or 12, and to answer truthfully when the outcome is between 5 and 10. A typical rule violation is to answer no whatever the outcome of the dice (compare van den Hout et al., 2004; Clark et al., 1998). To accommodate such response behavior, an extension of the item response approach is presented which allows explicitly for a response bias in the sense that it can capture a possible tendency of respondents towards giving a No response regardless of the outcome of the randomizing device. These respondents are captured by a latent class that can be identified by an extreme use of No responses. 2 RR Models for Multiple Items We distinguish three classes of RR models for multiple items. The first class assumes that respondents are homogenous in their compliance behavior and have a fixed probability of answering each item. This is the classical RR model and it is used as a benchmark for the models proposed next. The second model class relaxes the homogeneity assumption and allows for individual variability in compliance for the various behaviors under study. The third class of models considers the possibility that a subset of respondents may not follow the randomization instructions and answer No regardless of the outcome of the randomization device. When all respondents have the same probability of endorsing an item, it

Ulf Böckenholt and Peter G.M. van der Heijden 3 is convenient to express the probability of answering affirmatively by the logistic function with Pr(x ij = 1) = Pr(γ j ) = 1 1 + exp(γ j ) Under random sampling of the respondents, the likelihood function of the homogeneous response model can then be written as i=1 J [ 1 6 +.75 Pr(γ j)] x ij [1 ( 1 6 +.75 Pr(γ j))] 1 xij. (1) where 1 6 is the probability of a forced yes and.75 is the probability of a truthful answer. Clearly, the assumption that all respondents have an equal probability of answering an item is too strong in most applications although it is the standard assumption for single-item RR studies. The second class of models assumes that associations among the responses to multiple items are caused by a person specific compliance parameter. Because typically the number of items is small in a RR study, we adopt the Rasch (1980) model to measure individual differences in compliance behavior. Under this model, the probability that item j is answered affirmatively by person i can be written as Pr(x ij = 1) = Pr(γ j, θ i ) = 1 1 + exp(θ i γ j ), where γ j is called the item location parameter. Typically, the person parameter θ i is specified to vary according to a normal distribution. Under the Forced Choice response format, the item-response model needs to be modified to account for the randomization effect. In this case the likelihood function can be written as: J [ 1 6 +.75 Pr(γ j, θ i )] xij i=1 [1 ( 1 6 +.75 Pr(γ j, θ i ))] 1 x ij f(θ; µ, σ)dθ, (2) where f(θ; µ, σ) is the normal density with parameters µ and σ. Note that the mean µ of the population distribution cannot be estimated independently of the item locations. In the reported application, we therefore set µ = 0. It is worthwhile stressing that the normal distribution assumption may not always be appropriate in RR studies and that other distributional forms should be considered to capture more closely the non compliance variability in the population of interest. The third class of models allows for the possibility that not all respondents comply with the randomization response format and provide a No response regardless of the question asked. Combined with the item response

4 randomized response methods for multiple items model given by (2), the likelihood function is specified as: (π i=1 J f(θ; µ, σ)dθ + (1 π) {[ 1 6 +.75 Pr(γ j, θ i )] x ij [1 ( 1 6 +.75 Pr(γ j, θ i ))] 1 xij } J {Pr( No ) x ij [1 Pr( No )] 1 xij }), (3) where π denotes the probability of a randomly sampled person to answer the questions according to the FC mechanism. In the reported application, we specify that participants who answer No regardless of the question asked, give this response with probability 1. It is straightforward to relax this assumption and to estimate the probability of a No response from the data. The crucial assumption of (3) is that members of the No -group do not provide any information about the items location and discrimination parameters. 3 Data Analysis The aim of the study and the RR design have been described above. We note that 44% of all respondents respondents provide No responses to all four items. The homogeneous model required the estimation of four item location parameters and yielded a goodness-of-fit statistic of G 2 = 123.8 with 11 d.f.. Clearly, the assumption of no individual differences does not agree with the data. This result is supported by the fit improvement obtained from Model (2). With one additional parameter, the variance of the normal distribution σ 2, Model (2) provides a major fit improvement (G 2 = 23.4 with 10 d.f.). However, despite the better fit, this model does not describe the data satisfactorily. The main reason for the misfit is that the outcome of consistent No responses to the four items is greatly underestimated by (2). Model (3) can address this problem by allowing for the possibility that some respondents select the No response for reasons that are unrelated to the compliance parameter θ. The resulting fit improvement provides support for this specification (G 2 = 14.3 with 9 d.f.). Table 1 contains the corresponding parameter estimates of the three models. We note that the standard errors of Model (1) are too small since this model does not reflect the dependencies among the four responses. In contrast, Model (2) overestimates strongly the degree of heterogeneity in the data since it tries to fit the large percentage of No responses to the four items. Model (3) yields a much reduced but still substantial estimate of the population standard deviation (ˆσ = 2.07). About 196 or 12% of the respondents are classified as consistent No sayers. For the remaining 88% of the respondents, the items are ordered but far away from the mean of

Ulf Böckenholt and Peter G.M. van der Heijden 5 the population distribution. Clearly, a positive response to any of the four items is low at the mean of the population distribution. TABLE 1. Parameter Estimates (and Standard Errors) of RR Models for Multiple Items Parameter Model (1) Model (2) Model (3) ˆγ 1 3.77 (.56) 9.10 (2.74) 4.56 (.88) ˆγ 2 3.07 (.30) 8.44 (2.73) 3.99 (.80) ˆγ 3 2.58 (.20) 7.61 (2.72) 3.42 (.67) ˆγ 4 1.94 (.13) 5.83 (2.01) 2.63 (.53) ˆσ 4.72 (1.61) 2.15 (.47) ln( ˆπ 1 ˆπ ) 2.07 (.34) By taking into account that about 12% of the respondents give a No - response without providing information about their actual compliance behavior, Model (3) renders more accurate estimates about the compliance rate in the population. Under Model (1) the percentage of non-compliant respondents for the four items are 2.2%, 4.5%, 7.0%, and 12.5% respectively. In contrast, under Model (3) the corresponding estimates are 5.2%, 7.7%, 11.0%, and 17.0%. These differences are substantial and demonstrate the value of the proposed models for the analysis of RR data. References Clark, S.J., and Desharnais, R.A. (1998). Honest answers to embarrassing questions: detecting cheating in the randomized response model. Psychological Methods, 3, 160-168. Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago: The University of Chicago Press. (Original published 1960, Copenhagen: The Danish Institute of Educational Research) van der Linden, W. J. and Hambleton, R. K. (Eds.) (1997). Handbook of modern item response theory. New York: Springer. van den Hout, A. and van der Heijden, P.G.M. (2004). The analysis of multivariate misclassified data with special attention to randomized response data. Sociological Methods and Research, 32, 310-336. van der Heijden, P.G.M., van Gils,, G., Bouts, J. and Hox, J. (2000). A comparison of randomized response, CASAQ, and direct questioning; eliciting sensitive information in the context of welfare and unemployment benefit. Sociological Methods and Research, 28, 505-537.