Communication Skills in Standardized-Patient Assessment of Final-Year Medical Students: A Psychometric Study

Advances in Health Sciences Education 9: 179 187, 2004. 2004 Kluwer Academic Publishers. Printed in the Netherlands. 179 Communication Skills in Standardized-Patient Assessment of Final-Year Medical Students: A Psychometric Study GRETCHEN GUITON 1, CAROL S. HODGSON 2, GINETTE DELANDSHERE 3 and LUANN WILKERSON 1 1 David Geffen School of Medicine at University of California, Los Angeles; 2 Department of Education, Indiana University; 3 University of California, San Francisco ( Author for correspondence, Center for Educational Development and Research, UCLA, Box 951722, Los Angeles, CA 90095-1722, USA; E-mail: gguiton@mednet.ucla.edu) Abstract. The purpose of this study is to investigate the content-specificity of communication skills. It investigates the reliability and dimensionality of standardized patient (SP) ratings of communication skills in an Objective Structured Clinical Examination (OSCE) for final year medical students. An OSCE consisting of seven standardized patient (SP) encounters was administered to final-year medical students at four medical schools that are members of the California Consortium for the Assessment of Clinical Competence (N = 567). For each case, SPs rated students communication skills on the same seven items. Internal consistency coefficients were calculated and a two-facet generalizability study was performed to investigate the reliability of the scores. An exploratory factor analysis was conducted to examine the dimensionality of the exam. Findings indicate that communication skills across the seven-case examination demonstrate a reliable generic component that supports relative decision making, but that a significant case-by-student interaction exists. The underlying structure further supports the case-specific nature of students ability to communicate with patients. From these findings, it is evident that individual s communication skills vary systematically with specific cases. Implications include the need to consider the range of communication skill demands made across the OSCE to support generalization of findings, the need for instruction to provide feedback on communication skills in multiple contexts, and the need for research to further examine the student, patient, and presenting problem as sources of variation in communication skills. Key words: communication skills, generalizability theory, medical students, performance assessment, psychometrics, standardized patients Introduction In a qualitative analysis of narrative comments written by standardized patients (SPs) during an objective structured clinical examination (OSCE), Wilkerson and Rose (2001) noted an unexpected variability in individual student s communication skills from patient to patient. At one station the SP would comment on the student s ability to put him at ease while at the next station, the SP would describe the student as inattentive and distracted. This pattern of inconsistency was demonstrated by one-third of the 40 students included in the study. A review of other communication

180 GRETCHEN GUITON ET AL. Table I. Characteristics of cases in the OSCE, 2001 Presenting complaint Age Sex Exam components Abdominal pain 22 F History, physical exam, communication Chronic sore throat 36 F History, physical exam, communication Chest pain 49 F History, physical exam, information sharing, communication Routine check up 50 M History, information sharing, communication Follow-up visit for 60 M History, physical exam, information sharing, communication diabetes Cough and fatigue 60 M History, physical exam, information sharing, communication Recent weight loss 76 F History, physical exam, information sharing, communication skills studies (Hodges, 1996; Cohen et al., 1997; Colliver et al., 1998; Colliver et al., 1999; Donnelly et al., 2000; Blue et al., 2000) revealed a recurrent theme in the discussion of the results. The authors raised the question of content-specificity in communication skills similar to that already known to exist in other problemsolving skills (Shulman et al., 1975). The recognition of content specificity in areas such as history taking and physical exam skills has altered our instruction and assessment; yet we treat communication as a unidimensional skill set that can be taught separately from content and assessed as such. If the mounting evidence refutes this unidimensionality, both our teaching and assessment methods will be called into question. The current study extends this line of investigation into the content-specificity of communication skills. We examine the psychometric characteristics of SP ratings of communicative skills, including reliability and dimensionality, in an OSCE for final year medical students. Method A multi-station clinical performance examination administered to 547 final-year medical students attending four of seven medical schools in the California Consortium for the Assessment of Clinical Competence provided data for the study. The multi-station examination consisted of seven standardized patient (SP) encounters (six of 15 minutes and one of 37 minutes) in which students were instructed to perform focused patient workups with attention to skills in physician-patient communication and information sharing (e.g., treatment plan). The seven cases, developed by Consortium faculty, represented a mix of acute, chronic, well-care, behavioral, and ill-defined problems. A description of each case is provided in Table I. Students at each campus completed the half-day exam over a two- to three-week period. Standardized patients across the Consortium received 16 hours of training in presentation of the case and use of the checklist for scoring. SPs entered their

A PSYCHOMETRIC STUDY 181 ratings immediately following the clinical encounter. Schools varied in the mode of entry, using on-line or op-scannable forms. Schools also varied in the number of SPs trained to play the role of the patient in each case. Across the four schools, a total of 73 SP s were involved in the seven cases. Three of the schools used two SPs for each case, while the fourth school used four to six SPs for each case. On average, each SP rated 43 students (actual numbers ranged from 7 to 86). No SP was involved in more than a single case or school, so that rater effects are confounded with student, case, and school. Students were assigned to SPs based on the day and time selected for administration, which occurred non-systematically. Specific physician-patient communication items, based on the work of Forrest Lang (Bayer-Fetzer, 2001), were used to evaluate students. For each case, SPs rated students communication skills on the same seven items ( appeared professionally competent, effectively gathered information, listened actively, established professional rapport, appropriately explored my perspective, addressed my feelings, met my needs ) using a 6-point Likert-type rating scale ranging from outstanding to unacceptable. Using the same scale, SPs provided a rating of their overall satisfaction with this student encounter. In addition, SPs completed specific case-related items assessing history taking, physical examination, information sharing, and clinical courtesy and provided narrative comments at the end of the checklist in response to an open-ended request to write positive comments for this student and constructive comments for this student. To assess reliability, we calculated Cronbach s alpha for the physician-patient communication ratings in a number of ways based on how data are reported by the schools. For instance, data are reported as the sum of all communication items (i.e., across cases) and students are identified normatively for remediation. For this reason, we investigated whether a general dimension of communication skills was evident and would support the use of a total score. We compared within case reliability and cross-case reliability on individual items to investigate the content specificity of communication skills as well. Finally we estimated the intra-class correlation coefficient as a measure of the reliability of a single item from an individual case. We performed a generalizability analysis with students as the object of measurement to investigate the sources of variation in communication scores both in case and total scores. We estimated six variance components using SPSS, GLM: students, cases, items, students by cases, students by items, and cases by items. The three-way interaction was confounded with the error term along with other factors not estimated in the model (e.g., SP and school effects). Using these variance components, we computed a generalizability coefficient to estimate the reliability of the total communications score for relative decisions. We conducted an exploratory factor analyses using SPSS to further examine the dimensionality of the communication skill items. As a technique, factor analysis permits investigation of the statistical relationship among variables (e.g., the items) as empirical evidence for or against the relationship of these variables and an

182 GRETCHEN GUITON ET AL. Table II. Means, standard deviations, and internal consistency estimates of reliability of communication skills items for each case in the OSCE, 2001 Case Number of Number of Mean SD Cronbach s students items alpha 36 year-old female with chronic sore throat 421 7 4.81 0.81 0.89 22 year-old female with abdominal pain 504 7 4.68 0.75 0.89 35 year-old male with cough and fatigue 421 7 4.63 0.80 0.90 49 year-old female with chest pain 505 7 4.89 0.86 0.90 50 year-old male with preventive visit 421 7 4.70 1.02 0.92 60 year-old male with diabetes 494 7 4.91 0.75 0.92 76 year-old female with weight loss 532 7 4.61 0.94 0.94 abstract concept (e.g., the factor) (Maradi, 1981). We used principal axis factoring methods to provide information on the number of underlying factors. The nature of the items loading on each factor suggests the constructs underlying variation in student performance. The results are based on data from students taking the exam at one of the four universities in the summer of 2001. Students in the Consortium schools received the same cases, although one of the three schools included in the study administered only a subset of cases to all students. Consequently, the number of subjects per case varies from 420 to 532. Because list-wise deletion reduces the number of subjects to 276, the analyses were conducted with all individuals completing each case. Results RELIABILITY We first assessed the scale characteristics and reliability across cases since current reporting of the items by the Consortium involves summing the seven communication items across the seven cases (n i = 49). To assess this practice, we computed Cronbach s alpha and obtained a high internal consistency reliability coefficient (alpha = 0.91) for the 49-item scale (n = 421, M = 33.18, SD = 3.26). Next, we examined the scale characteristics and reliability of the set of seven communication items within each case. As shown in Table II, very reliable ratings (0.89 to 0.94) of student performance are obtained within cases suggesting that these seven items measure a single construct. To further examine the content specificity of individual communication items (e.g., appropriately explored my perspective, listened actively ), we computed Cronbach s alpha for each item individually across cases. Table III presents the results of the cross-case analyses. The dramatically lower reliabilities obtained

A PSYCHOMETRIC STUDY 183 Table III. Means, standard deviations, and internal consistency estimates of reliability of individual communication skills items for the OSCE, 2001 Item across seven stations Number of Number of Mean SD Cronbach s students items alpha Appeared professionally competent 421 7 4.91 0.57 0.57 Effectively gathered information 421 7 4.72 0.54 0.51 Listened actively 421 7 4.77 0.53 0.46 Established personal rapport 421 7 4.82 0.59 0.60 Appropriately explored my perspective 421 7 4.59 0.49 0.43 Addressed my feelings 421 7 4.64 0.54 0.48 Met my needs 420 7 4.72 0.48 0.35 (0.35 to 0.60) indicate considerable variation in student performance on a given communication skill across cases. Although the lower reliabilities may result from the small number of ratings (seven) of each item, we note that the within case reliabilities, also based on seven items, have considerably higher reliabilities. In combination, these results support other research suggesting that student performance differences across cases reflect case-specific variance. Finally we calculated the intra-class correlation coefficient using one-way ANOVA. The obtained ICC of 0.165 indicates the reliability of a single item, independent of case, rated by a randomly selected rater. This low value further underscores the lack of homogeneity across items in student performance, lending support to the content specificity of items. GENERALIZABILITY STUDY Variance in communication scores was attributed primarily to student and studentby-case interaction in the two-facet design. As shown in Table IV, students account for 33.77% of the variance indicating that the CPX moderately discriminates among students in their competency in interacting with patients. The student-bycase interaction accounts for the highest amount of variance in scores (50%) and indicates that students standing differs on the different cases. Clearly students may demonstrate strong communication skills in one situation and do less well in others. The amount of variance contributed by the cases (8.84%) is small relative to the other estimated components suggesting that only a small portion of the variance is accounted for by differences in cases when considered across all students. The error term, accounting for 6.74% of the variance, suggests a very modest students-bycase-by-item interaction or a source of error variability that has not been captured by the model or both. Finally, the obtained generalizability coefficient, computed for all seven cases and seven items, is 0.822 indicating reasonably high reliability for making relative decisions about individuals communication skills on the

184 GRETCHEN GUITON ET AL. Table IV. ANOVA estimates of variance components for the communication skill items on the OSCE, 2001 Source of variation Sum of df Mean Estimated Percent of squares squares variance total components variance Students 661.59 333 1.99 0.033 33.77% Case 124.88 6 20.81 0.009 8.84% Item 1.63 6 0.27 7.83E-05 0.08% Students by case 705.33 1998 0.35 0.050 50.16% Students by item 15.55 1998 0.008 0.0002 0.16% Case by item 3.16 36 0.09 0.0002 0.25% Error 79.72 11988 0.007 0.007 6.74% basis of the full exam. One limitation of this analysis is its inability to disentangle rater and case effects. Overall, the results of the g-study support the idea that students ability to communicate with patients depends substantially on the presenting situation, although a two-hour, seven-station exam permits reasonably reliable judgments about students relative abilities. INTERNAL STRUCTURE Exploratory factor analysis demonstrated a case structure underlying communication skills. The data met all of the assumptions for using the factor model. 1 A principal axis factoring of all 49 communcation items, and the resulting scree plot, 2 suggested seven factors underlying the data with these factors accounting for 60.8% of the variance. The first factor, with an eigenvalue of 9.41, accounts for 19.27% of the variance. This factor is almost twice the magnitude of the second factor (eigenvalue = 5.35, accounting for 10.77% of the variance) indicating a dominant factor, but failing to meet the criteria for unidimensionality (Reckase, 1979). Consequently, we estimated a seven-factor solution using orthogonal rotation (varimax) to obtain a simple structure. 3 This analysis resulted in a highly interpretable factor structure. Items loaded strongly on factors defined by the individual cases with no items loading 0.2 or higher on a second factor. Loadings ranged from 0.662 to 0.776 on the chronic sore throat case, from 0.583 to 0.755 on the abdominal pain case, from 0.691 to 0.782 on the cough and fatigue case, from 0.665 to 0.793 on the chest pain case, from 0.708 to 0.809 on the preventive case, from 0.703 to 0.855 on the diabetes case, and from 0.743 to 0.867 on the weight loss case. These loadings are large with 39 of the loadings exceeding 0.71 indicating a 50% overlap with the factor and all but one of the remaining loadings reaching 0.63, indicating a 40% overlap with the factor. 4 Each factor accounts for 8% to 10% of the variance in the set of variables and 13% to 16% of the covariance.

A PSYCHOMETRIC STUDY 185 The squared multiple correlations of factor scores obtained from the factor score covariance matrix range from 0.882 to 0.994 and indicate a stable solution with factors well defined by the variables. Conclusion The finding that communication skill items vary systematically with specific cases provides support to prior research suggesting that communication skills, at least among novices, are content dependent. Hodges et al. (1996) identified the contentspecific nature of communication skills in a study of the reliability of OSCE stations specifically designed to measure communication skills and concluded communication skills are highly bound to content and that increased difficulty and increased score variance alone are not enough to improve generalizability (p. 42). The concept of a case-specific dimension of communication is further supported by the results of a multi-institutional patient satisfaction study (Meredith and Wood, 1996) in which the more serious the patient s condition, the more dissatisfaction patients reported with physicians communication. Likewise Wilkerson and Rose (2001), examining narrative comments, found that narrative themes clustered within cases based on the specific challenges that each presented. Recently in a generalizability study of communication cases, Keen et al. (2003) found that communication skill levels were the major source of variation with raters having a negligible effect on subject scores (p. 11). They recommend increasing the number of cases rather than raters to increase reliability. Together these studies and our current findings suggest that communication skills possess a generic component, but also appear to be comprised of specific, context-related elements. From a measurement perspective, then, we conclude that reliable measurement of the generic component can be achieved with a series of items administered across cases. Moreover, our findings do not support the representation of communication skills as consisting of a set of separable dimensions, at least as rated by SPs. On the other hand, we agree with Hodges et al. (1996) that the communication skills checklist could be built around specific aspects of communication needed for each particular case rather than expecting all cases to address the same skills. Our results concur with those from the qualitative studies of SP s narrative comments (Rose and Wilkerson, 2001) that suggest there is a fair amount of within student variability in performance across cases. The variability in communications skills demonstrated by an individual student may be a product of qualities of the student, such as personal interests, possession of relevant knowledge, selfconfidence or prior experience. Or it may reflect characteristics of the patient (e.g., age, ethnicity, communication style, culture, socio-economic status), or characteristics of the problem (e.g., complexity, ill-defined nature). Results from other studies (e.g., Keen et al., 2003) along with the strength and consistency of our results from the g-study and factor analysis with data from diverse schools

186 GRETCHEN GUITON ET AL. involving multiple standardized patients give us confidence in our findings. The situational variation in communication skills, however, would benefit from future study able to estimate the rater effect. Additional study will be needed to identify the sources of case variability. Of particular importance is the investigation of the relationship between the domain of knowledge and the quality of communication. For instructional purposes, the content specificity found in these data implies that students need multiple, varied communication experiences with an opportunity to practice and reflect on performance. Furthermore, in reporting communication scores for a multi-station examination, it will be important to reflect the variation in a student s performance through the use of case based scores, narrative comments, and opportunities to review the video taped encounters. Acknowledgements The authors wish to thank those faculty members and staff at the University of Irvine School of Medicine and the University of Southern California who agreed to allow the data from their students in the OSCE to be used in the present study. Notes 1 Specifically normality and multicollinearity assumptions were evaluated. Item descriptive statistics indicated that all items demonstrated normality with the exception of 5 items which had kurtosis values between 1 and 1.5. These deviations are not serious given the size of the sample. Bivariate correlations among items rarely exceed 0.7 (only 26 of the 2756 correlations) and none reach 0.9 indicating that multicollinearity is not a problem. 2 3 Following Muthen, an oblique (promax) rotation was estimated. Results indicate that each factor represents a single case with the items loading strongly (0.69 or higher) on a single case and no double loadings reaching significance (i.e., 0.3). Moreover, the factor correlation matrix indicates only a slight relationship among factors with none of the 21 correlations reaching 0.32 the level where 10% of the correlation among factors indicates that an oblique rotation might be warranted. Consequently, only the results of the orthogonal rotation are reported in detail.

A PSYCHOMETRIC STUDY 187 4 Comrey and Lee (1992) note that any loading of 0.71 or higher indicates an overlap of at least 50% with the factor. References Bayer-Fetzer Conference on Patient-Physician Interaction in Medical Education (2001). Essential elements of communication in medical encounters: the Kalamazoo Consensus Statement. Academic Medicine 76: 390 393. Blue, A.V., Chessman, A.W., Gilbert, G.E. & Mainous III, A.G. (2000). Responding to patients emotions: important for standardized patient satisfaction. Family Medicine 32: 326 330. Cohen, D.S., Colliver, J.A., Robbs, R.S. & Swartz, M.H. (1997). A large-scale study of the reliabilities of checklist scores and ratings of communication skills evaluated on a standardized-patient examination. Advances in Health Sciences Education 1: 209 213. Colliver, J.A. & Swartz, M.H. (1997). Assessing clinical performance with standardized patients. JAMA 278: 790 791. Colliver, J.A., Swartz, M.H., Robbs, R.S. & Cohen, D.S. (1999). Relationship between clinical competence and communication skills in standardized-patient assessment Academic Medicine 74: 271 274. Colliver, J.A., Willis, M.S., Robbs, R.S., Cohen, D.S. & Swartz, M.H. (1998). Assessment of empathy in a standardized-patient examination. Teaching and Learning in Medicine 10: 8 11. Donnelly, MB., Sloan, D., Pymale, M. & Schwartz, R. (2000). Assessment of residents interpersonal skills by faculty proctors and standardized patients: a psychometric analysis. Academic Medicine 75(October): S93 S95. Hodges, B., Turnbull, J., Cohen, R., Bienenstock, A. & Norman, A. (1996). Evaluating communication skills in the objective structured clinical examination format: reliability and generalizability. Medical Education 30: 38 44. Holmboe, E.S. & Hawkins, R.E. (1998). Methods for evaluating the clinical competence of residents in internal medicine: a review. Annals of Internal Medicine 129: 42 48. Keen, A.J.A., Klein, S. & Alexander, D.A. (2003). Assess the communication skills of doctors in training: reliability and sources of error. Advances in Health Sciences Education 8: 5 16. Maradi, A. (1981). Factor analysis as an aid in the formation and refinement of empirically useful concepts. In D. Jackson & E.F. Borgatta (eds.), Factor Analysis and Measurement in Sociological Research. Thousand Oaks, CA: Sage. Meredith, P. & Wood, C. (1996). Aspects of patient satisfaction with communication in surgical care: confirming qualitative feedback through quantitative methods. International Journal for Quality in Health Care 8: 253 264. Reckase, M.D. (1979). Unifactor latent trait models applied to multi-factor tests: results and implications. Journal of Educational Statistics 4: 207 230 Rose, M. & Wilkerson, L. (2001). Widening the lens on standardized patient assessment: what the encounter can reveal about the development of clinical expertise. Academic Medicine 76: 88 91. Sloan, D.A., Donnelly, M.B., Schwartz, R. & Strodel, W.E. (1995). The objective structured clinical examination: the new gold standard for evaluating postgraduate clinical performance. American Journal of Surgery 222: 735. Vu, N.V. & Barrows, H.S. (1994). Use of standardized patients in clinical assessments: recent developments and measurement findings. Educational Researcher 23: 23 30. Wilkerson, L. & Rose, M. (2001). Learning from narrative comments of standardized patients during an objective structured clinical examination of final-year medical students. ERIC Clearinghouse on Assessment and Evaluation.