A Differential Item Functioning (DIF) Analysis of the Self-Report Psychopathy Scale. Craig Nathanson and Delroy L. Paulhus

Similar documents
Factor Structure of the Self-Report Psychopathy Scale: Two and Three factor solutions. Kevin Williams, Craig Nathanson, & Delroy Paulhus

Assessing Measurement Invariance in the Attitude to Marriage Scale across East Asian Societies. Xiaowen Zhu. Xi an Jiaotong University.

Cross-cultural DIF; China is group labelled 1 (N=537), and USA is group labelled 2 (N=438). Satisfaction with Life Scale

Research and Evaluation Methodology Program, School of Human Development and Organizational Studies in Education, University of Florida

Comparing Multiple-Choice, Essay and Over-Claiming Formats as Efficient Measures of Knowledge

INTERPRETING IRT PARAMETERS: PUTTING PSYCHOLOGICAL MEAT ON THE PSYCHOMETRIC BONE

Psychometric Methods for Investigating DIF and Test Bias During Test Adaptation Across Languages and Cultures

Thank You Acknowledgments

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Academic Discipline DIF in an English Language Proficiency Test

Differential Item Functioning

Mantel-Haenszel Procedures for Detecting Differential Item Functioning

Known-Groups Validity 2017 FSSE Measurement Invariance

Nonparametric DIF. Bruno D. Zumbo and Petronilla M. Witarsa University of British Columbia

Psychopathy and gender: Examining validity of the PPI:SF and TriPM

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION

Manifestation Of Differences In Item-Level Characteristics In Scale-Level Measurement Invariance Tests Of Multi-Group Confirmatory Factor Analyses

PRINCIPLES OF STATISTICS

AN ASSESSMENT OF ITEM BIAS USING DIFFERENTIAL ITEM FUNCTIONING TECHNIQUE IN NECO BIOLOGY CONDUCTED EXAMINATIONS IN TARABA STATE NIGERIA

Different Roles, Same Goals: Preventing Sexual Abuse 2016 ATSA Conference Friday November 4 1:30 PM - 3:00 PM F-19

Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models

Impact of Differential Item Functioning on Subsequent Statistical Conclusions Based on Observed Test Score Data. Zhen Li & Bruno D.

Estimates of the Reliability and Criterion Validity of the Adolescent SASSI-A2

Published by European Centre for Research Training and Development UK (

CHAPTER III RESEARCH METHOD. method the major components include: Research Design, Research Site and

Canadian Adult and Youth Opinions on the Sizing of Health Warning Messages HC H /001/CB

Validity of the PPI:SF and the TriPM using the CAPP as a Concept Map!

Millhaven's specialized sex offender intake assessment: A preliminary evaluation

Comprehensive Substance Abuse Prevention Program Evaluation

Comprehensive Substance Abuse Prevention Program Evaluation

International Journal of Education and Research Vol. 5 No. 5 May 2017

Factor structure of the Self-Report Psychopathy scale (SRP-II) in non-forensic samples

Observational Substance Use Epidemiology: A case study

INFLUENCING FLU VACCINATION BEHAVIOR: Identifying Drivers & Evaluating Campaigns for Future Promotion Planning

Three Generations of DIF Analyses: Considering Where It Has Been, Where It Is Now, and Where It Is Going

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

Measuring Self-Esteem of Adolescents Based on Academic Performance. Grambling State University

EXPLORING DIFFERENTIAL ITEM FUNCTIONING AMONG HAWAI I RESIDENTS ON THE BOSTON NAMING TEST

Heterogeneity of Symptom Presentation in Sexually Abused Youth: Complex Profiles of a Complex Problem

VALIDATION OF TWO BODY IMAGE MEASURES FOR MEN AND WOMEN. Shayna A. Rusticus Anita M. Hubley University of British Columbia, Vancouver, BC, Canada

Collecting & Making Sense of

At-Risk in Primary Care INTRODUCTION ORIGINAL RESEARCH MARCH 2016

ADMS Sampling Technique and Survey Studies

Using the Rasch Modeling for psychometrics examination of food security and acculturation surveys

Collecting & Making Sense of

UBC Social Ecological Economic Development Studies (SEEDS) Student Report

Attitudes and Behaviour towards Alcohol Survey 2013/14 to 2015/16: Bay of Plenty regional analysis

psychopathy and women

Using a Likert-type Scale DR. MIKE MARRAPODI

Evaluators Perspectives on Research on Evaluation

Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology*

Modeling Binary outcome

Cover Page. The handle holds various files of this Leiden University dissertation.

The Discovering Diversity Profile Research Report

Report of the Committee on Serious Violent and Sexual Offenders

Substance Use Among Potential Kidney Transplant Candidates and its Impact on Access to Kidney Transplantation: A Canadian Cohort Study

Summary of the Dental Results from the GP Patient Survey; July to September 2014

GE SLO: Ethnic-Multicultural Studies Results

Relations of Ethnic Stereotype Threat and Mindset to Achievement Goals in Science

ceptions of Perceptions of mental health service delivery among staff and Indigenous consumers: it's still about communication

Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, The Scientific Method of Problem Solving

Statistical questions for statistical methods

SBIRT IOWA THE IOWA CONSORTIUM FOR SUBSTANCE ABUSE RESEARCH AND EVALUATION. Iowa Army National Guard. Biannual Report Fall 2015

Keywords: Dichotomous test, ordinal test, differential item functioning (DIF), magnitude of DIF, and test-takers. Introduction

Suicide Executive Bulletin

1. Family context. a) Positive Disengaged

Single-Factor Experimental Designs. Chapter 8

Reference Lists With Key Findings and Conclusions Program Evaluation and Research Youth Forensic Psychiatric Services

Running Head: OVULATION AND SOCIAL MEDIA ACTIVITY 1

Psychopathy: Literature Review. Psychopaths are the social predators who charm and ruthlessly manipulate in order to do

On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA

Chapter V Depression and Women with Spinal Cord Injury

AARP/American Speech-Language-Hearing Association (ASHA)

Levenson Psychopathy Inventory Grad 12 /Year 13 Fast Track Project Technical Report Anne-Marie Iselin and Richard L. Lamb 02/2010

Utility of the Inventory of Callous-Unemotional Traits in Adolescent Offenders and Non- Offenders: An Item Response Theory Analysis

Marc J. Tassé, PhD Nisonger Center UCEDD

canadian consortium for gambling research to Discovery Conference 2012 Judith Glynn April 4, 2012

THE SRP-II AS A RICH SOURCE OF DATA ON THE PSYCHOPATHIC PERSONALITY WHITNEY S. LESTER A THESIS

The Research Roadmap Checklist

Teaching Job Interview Skills to Psychiatrically Disabled People Using Virtual Interviewers

Impulsivity is Important

Inferential Statistics

Questionnaire on Anticipated Discrimination (QUAD)(1): is a self-complete measure comprising 14 items

ANALYZING ALCOHOL BEHAVIOR IN SAN LUIS OBISPO COUNTY

Survey Research. We can learn a lot simply by asking people what we want to know... THE PREVALENCE OF SURVEYS IN COMMUNICATION RESEARCH

Testing Factorial Structure and Validity of the PCL:SV in Lithuanian Prison Population

Choosing the Correct Statistical Test

Results. NeuRA Forensic settings April 2016

DELINEATING PSYCHOPATHY FROM COGNITIVE EMPATHY: THE CASE OF PSYCHOPATHIC PERSONALITY TRAITS SCALE

Patterns of HIV testing among Ontario physicians, 2006

EXPERIMENTAL RESEARCH DESIGNS

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

CLASSIFICATION TREE ANALYSIS:

A study of association between demographic factor income and emotional intelligence

Module One: What is Statistics? Online Session

Before we get started:

It s All Relative: How Presentation of Information To Patients Influences Their Decision-Making

Transcription:

A Differential Item Functioning (DIF) Analysis of the Self-Report Psychopathy Scale Craig Nathanson and Delroy L. Paulhus University of British Columbia Poster presented at the 1 st biannual meeting of the Society for the Scientific Study of Psychopathy, Vancouver, BC, Canada. Please do not cite without prior permission

ABSTRACT Differential item functioning (DIF) occurs when an item is influenced by a variable irrelevant to the construct of interest. Recent investigations of DIF in the Psychopathy Checklist-Revised (PCL-R) indicated that several PCL-R items demonstrated significant DIF. These findings spurred the current investigation of DIF in a measure of subclinical psychopathy closely tied to the PCL-R, namely a 40-item version of the Self-Report Psychopathy Scale (SRP). Participants (N = 383) completed the 40-item measure and we investigated these items for DIF across genders and ethnicities. Results indicated that the majority of SRP items were free of DIF. Only two items demonstrated significant DIF, one between genders and the other between ethnicities. Given the paucity of SRP items with DIF only 5% such a finding may have been obtained by chance. In turn, we do not recommend dropping or modifying these items. Rather, our results suggest that users of the SRP should feel confident that its items function equivalently across genders and ethnicities. Future studies of differential test functioning (DTF) are also discussed.

Introduction Differential item functioning (DIF) -- a technique based in item response theory -- occurs when a given item is influenced by an irrelevant variable (Zumbo, 1999). One common form of DIF akin to a main effect in ANOVA is overestimation: After being matched on trait level, individuals from one group are on average more likely to endorse a particular item than are those in another group. More complex and less common is so-called non-uniform DIF: Akin to an interaction in ANOVA, nonuniform DIF is said to occur when across the levels of the trait, there is an inconsistent between-group difference in the likelihood of endorsing that item. To date, several studies have observed DIF in the items of the Psychopathy Checklist-Revised (PCL-R; Hare, 2003). For example, items on the Social Deviance factor (Factor 2) tended to display DIF more than items on the Affective/Interpersonal factor (Factor 1). Specifically, Factor 2 items tended to overestimate psychopathy in male offenders compared to female offenders (Bolt, Hare, Vitale, & Newman, 2004) and African American offenders rather than Caucasian offenders (Cooke, Kosson, & Michie, 2001). These findings spurred us on to investigate DIF in a measure of subclinical psychopathy. We felt that the most appropriate measure to investigate was the recently developed 40-item version of the Self-Report Psychopathy Scale (SRP; Williams, Nathanson, & Paulhus, 2003), for two reasons. First, the SRP is historically linked to the PCL-R. Second, the SRP has been explicitly modeled after the factor structure of psychopathy (Williams et al., 2003). The current study explored the extent to which the items on the 40-item Self-Report Psychopathy Scale demonstrated differential item functioning. In particular, we compared the genders and the major ethnic groups at our university, namely students of European vs. East Asian heritage.

Method Participants Participants were 383 undergraduates from a large Western Canadian university. Sixty-five percent of participants were women. Fifty-two percent of participants were of East Asian heritage, 32% were of European heritage, and the remainder came from other ethnic heritages. All participants received course credit for participation. Materials As part of a larger laboratory study, participants completed a 40-item version of the Self-Report Psychopathy Scale (Williams et al., 2003). This measure requires participants to rate their agreement with the items presented using a 5-point Likert scale (1 = Strongly disagree to 5 = Strongly agree ). Sample items include I don t think of myself as tricky or sly (reverse coded) and I have stolen a motor vehicle. Total score was computed by averaging across all 40 items. Alpha reliability of SRP scores in this sample was.86. Measuring DIF To measure DIF in the SRP s items, we utilized the methods and criteria advocated by Bruno Zumbo and colleagues (Gelin & Zumbo, 2003; Slocum, Gelin, & Zumbo, 2003; Zumbo, 1999; see also Hidalgo and Lopez-Pina, 2004). For an item to demonstrate DIF, it must (1) be significantly affected by external, irrelevant variables, and (2) this effect must have at least a moderate effect size. We conducted a series of ordinal logistic regressions on each item, where item scores are regressed first on total score (Step 1), then on that plus the demographic or group variable of interest (e.g., gender) (Step 2), and then finally on those terms plus their interaction (Step 3). To determine whether an item demonstrates significant DIF, the difference in chi-squares ( χ 2 ) of Step 3 - Step 1 is compared against the 2 df, p <.01 cutoff of 9.21. This omnibus χ 2 functions like an omnibus F-test in

ANOVA given the χ 2 simultaneously tests for an interaction, namely non-uniform DIF, and a main effect, namely overestimation. If the omnibus χ 2 is significant, Zumbo and colleagues recommend conducting similar analyses to those described above by computing χ 2 and R 2 scores for (A) Step 3 - Step 2 and (B) Step 2 - Step 1 (Slocum et al., 2003). If test (A) meets Zumbo and colleagues criteria, the item is said to demonstrate non-uniform DIF (i.e., a significant interaction). If test (B) meets the criteria, the item is said to overestimate the trait for a particular group (i.e., a significant main effect). Although statistical significance is necessary for an item to demonstrate DIF, it is not sufficient. Recall that Zumbo and colleagues (1999; Gelin & Zumbo, 2003) indicate that an item only demonstrates DIF if the significant χ 2 has at least a moderate effect size. The omnibus measure of effect size is obtained by computing the difference in R 2 of Step 3 - Step 1, with R 2 values of.035 to.070 considered moderate (Jodoin & Gierl, 2001). In cases where both criteria are met, it is useful to graph the item characteristic curves, which shows the relationship between level of the trait (i.e., total score) and responses on a given item (i.e., item score) for each group (e.g., men and women).

Results Gender The vast majority of the 40 items on the SRP did not meet Zumbo and colleagues criteria and, in turn, did not demonstrate DIF, an example of which is indicated in Figure 1. Note that the two item characteristic curves seen in Figure 1 for the item Not hurting others feelings is important to me (reverse coded) overlap very closely with each other, suggesting no DIF (Slocum et al., 2003). Only one item demonstrated DIF, namely the item, I am usually very careful about what I say to people (reverse coded) with χ 2 (2) = 18.05, p <.01, and R 2 =.046. Subsequent analyses revealed that this item was overestimating psychopathy and there was no interaction: Difference scores for Step 2 - Step 1 were χ 2 (1) = 15.77, p <.01, R 2 =.041, compared with χ 2 (1) = 2.28, R 2 =.005 for Step 3 - Step 2. As seen in Figure 2, this item consistently overestimated psychopathy in women. Ethnicity Similar to the results with gender, the majority of the SRP items did not demonstrate DIF, with the exception of four items. We suspected that the DIF observed here may be attributable to a large proportion of English-as-a-second-language students in our sample. In turn, we excluded those students who indicated that English was not their first language, resulting in a subsample of n = 145 (70% European heritage). After re-testing the four DIF items with this subsample, three of the four items no longer demonstrated DIF. However, one item, I find it easy to manipulate people, continued to demonstrate DIF, with χ 2 (2) = 9.97, p <.01, and R 2 =.053. Unlike the gender DIF item, the current item demonstrated non-uniform DIF, χ 2 (1) = 7.62, p <.01, R 2 =.04, but did not overestimate psychopathy, χ 2 (1) = 2.35, R 2 =.013. As indicated in Figure 3, although this item discriminated European heritage participants across levels of psychopathy, it did not discriminate for East Asian heritage participants.

Figure 1 Item characteristic curves for men and women for the SRP item Not hurting others feelings is important to me 5 Men Women 4 Item score 3 2 1 1.00 1.50 2.00 2.50 3.00 3.50 4.00 Total score

Figure 2 Item characteristic curves for men and women for the SRP item I am usually very careful about what I say to people 5 Men Women 4 Item score 3 2 1 1.00 1.50 2.00 2.50 3.00 3.50 4.00 Total score

Figure 3 Item characteristic curves for participants of East Asian heritage and participants of European heritage for the SRP item I find it easy to manipulate people 5 East Asian heritage 4 European heritage Item score 3 2 1 1.00 1.50 2.00 2.50 3.00 3.50 4.00 Total score

Discussion Taken together, our results suggest that these 40 Self-Report Psychopathy Scale items are not significantly influenced by external, irrelevant variables. The vast majority of these items 38 out of 40, or 95% were found to be free of DIF. We found only one item that demonstrated gender DIF and another that demonstrated ethnicity DIF. However, given the relatively few SRP items that demonstrated DIF, we do not feel that researchers or other users of the SRP should be particularly concerned. That is, the paucity of items with DIF on the SRP 2 out of 40, or 5% suggests that such a finding may be attributable to chance. In turn, we do not believe that these two items should be modified or discarded from the SRP. Moreover, users of the SRP may rightly feel confident that the items are valid indicators of subclinical psychopathy across genders and ethnic groups. The next step in this investigation is to test for differential test functioning (DTF), which extends the principle of DIF from the individual item to the whole test. It is especially noteworthy that despite having several items with DIF, subsequent DTF analyses conducted on the PCL-R suggested these items were not significantly harming the usefulness of the PCL-R s total score. That is, the PCL- R s ability to classify individuals as psychopaths or non-psychopaths using extant scoring procedures remained intact (e.g., Bolt et al., 2004; Cooke et al., 2001; for an exception, see Cooke, Michie, Hart, & Clark, 2005). Although our concerns with the SRP are different than those for the PCL-R we do not use the SRP to categorize individuals the more general principle of using a measure that functions equally for all groups is, of course, still highly relevant. Given we observed far less DIF among the SRP items than has been observed among the PCL-R items, we expect that DTF analyses should unequivocally highlight the SRP s unbiased measurement of subclinical psychopathy.

References Bolt, D. M., Hare, R. D., Vitale, J. E., & Newman, J. P. (2004). A multigroup item response theory analysis of the Psychopathy Checklist-Revised. Psychological Assessment, 16, 155-168. Cooke, D. J., Kosson, D. S., & Michie, C. (2001). Psychopathy and ethnicity: Structural, item, and test generalizability of the Psychopathy Checklist Revised (PCL-R) in Caucasian and African- American participants. Psychological Assessment, 13, 531-542. Cooke, D. J., Michie, C., Hart, S. D., & Clark, D. (2005). Assessing psychopathy in the UK: Concerns about cross-cultural generalisability. British Journal of Psychiatry, 186(4), 335-341. Gelin, M. N., & Zumbo, B. D. (2003). Differential item functioning results may change depending on how an item is scored: An illustration with the Center for Epidemiologic Studies Depression Scale. Educational and Psychological Measurement, 63, 65-74. Hare, R. D. (2003). Manual for the Psychopathy Checklist-Revised (2 nd ed). Toronto/Buffalo: Multi- Health Systems. Hidalgo, M. D., & Lopez-Pina, J. A. (2004). Differential item functioning detection and effect size: A comparison between logistic regression and Mantel-Haenszel procedures. Educational and Psychological Measurement, 64, 903-915. Jodoin, M.G., & Gierl, M.J. (2001). Evaluating Type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329-349. Slocum, S. L., Gelin, M. N., & Zumbo, B. D. (in press). Statistical and graphical modeling to investigate differential item functioning for rating scale and Likert item formats. In B. D. Zumbo (Ed.), Developments in the Theories and Applications of Measurement, Evaluation, and Research Methodology Across the Disciplines, Vol. 1. Vancouver: Edgeworth Laboratory, University of British Columbia. Williams, K. M., Nathanson, C., & Paulhus, D. L. (2003, August). Structure and validity of the Self- Report Psychopathy scale in normal populations. Poster presented at the 111 th annual meeting of the American Psychological Association, Toronto.

Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-Type (Ordinal) Item Scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.