Revising the In-Training Evaluation Report (ITER) to Improve its Utility

Revising the In-Training Evaluation Report (ITER) to Improve its Utility Clarissa Agusto, MD, FRCPC, PGY-5 Vijay Daniels MD, MHPE, FRCPC, Associate Professor Division of General Internal Medicine University of Alberta

I do not have an affiliation (financial or otherwise) with a pharmaceutical, medical device or communications organization. Je n ai aucune affiliation (financière ou autre) avec une entreprise pharmaceutique, un fabricant d appareils médicaux ou un cabinet de communication. 2

Background Formative utility Shift to CBME Assessment for Learning ITER is one of the most commonly used assessment tools yet is widely criticized for lack of validity Summative utility ITER Evidence of Validity» Scores reliable across multiple rotations and predictive validity for PGY3 Internal Medicine performance - Ginsburg et al» Within scale alpha too high (Kassam et al, locally)? Halo effect with rater fatigue reducing comment quality 3

Purpose The purpose of this study is to evaluate the effect of reducing length of the ITER used for an Internal Medicine residency program on one aspect of validity, reliability, and on the quality of feedback 4

Methods Setting Old ITER contained 35-items, based on the CanMEDS framework Consensus approach with Division of General Internal Medicine (GIM), reduced to 14-items» 13 items capturing broader competencies» 1 overall global rating scale 5

Methods ITER implementation 2015-16 academic year, PGY1 residents rotate through 4 GIM hospital sites Shorter (14-item) ITER at 2 GIM sites Longer (35-item) ITER at 2 other GIM sites Faculty Development 6

Analysis ITER reliability (G-studies) Within scale (Phi) Across two rotations with same ITER (Phi) Decision study to estimate reliability across 6, and 12 rotations ITER formative quality Completed Clinical Evaluation Report Rating (CCERR), Dudek et al» Each ITER rated by 3 GIM subspecialty residents using CCERR» CCERR reliability - overall G, internal consistency, inter-rater reliability» T-tests for differences in mean CCERR scores of long and short ITER to compare quality 7

Results ITER reliability Phi (reliability) Long (35-item) Short (14-item) Within scale 0.96-0.98 0.95 Across 2 Rotations 0.54 0.47 Across 6 Rotations 0.78 0.73 Across 12 Rotations 0.88 0.84 8

Results ITER Formative Quality 33 PGY1 residents, 131 ITERs, 393 CCERRs scores CCERR reliability» Overall reliability G coefficient = 0.75» High internal consistency, Cronbach alpha 0.88» Very strong interrater reliability, ICC 0.86» ICC for each item at least moderate agreement, except item #1 (ICC -0.07) 9

Results ITER Formative Quality Average CCERR score (with and without Item #1)» Long ITER 2.76 vs Short ITER 2.92 (p-value 0.076)» Long ITER 2.68 vs Short ITER 2.85 (p-value 0.051) 10

Discussion Shortened ITER scores slightly less reliable But still acceptable for use» to determine who can progress to more advanced rotations earlier Shortened ITER likely provides better quality of feedback Although p=0.076 (0.051 without Item 1)» Would meet standard p<0.05 with one-tailed t-test Anecdotally (need to study)» Improved feasibility of use and timeliness of completion 11

Discussion Re-evaluation of the CCERR Very good reliability of CCERR scores Feasible: on average 3 to 4 mins/iter 12

Limitations Broader applicability PGY1 residents, GIM ward rotation Internal Medicine Residency Program at one institution Unusually high within scale reliability coefficients Rater errors Scale issues Items can be further reduced 13

Future Directions Implementation of the shortened ITER to all rotations in our program Further reduced ITER to 10-items Rotation-specific item content Further analysis of data across subspecialty rotations 14

References Chou S, Lockyer J, Cole G, McLaughlin K. Assessing postgraduate trainees in Canada: are we achieving diversity in methods? Med Teach 2009 February 1,;31(2):58. Patel R, Drover A, Chafe R. Pediatric faculty and residents' perspectives on In-Training Evaluation Reports (ITERs). Can Med Educ J 2015 December 11,;6(2):41. Watling CJ, Kenyon CF, Schulz V, Goldszmidt MA, Zibrowski E, Lingard L. An exploration of faculty perspectives on the in-training evaluation of residents. Acad Med 2010 July 1,;85(7):1157-1162. Bismil R, Dudek NL, Wood TJ. In-training evaluations: developing an automated screening tool to measure report quality. Med Educ 2014 July 1,;48(7):724-732. Ginsburg S, McIlroy J, Oulanova O, Eva K, Regehr G. Toward authentic clinical evaluation: pitfalls in the pursuit of competency. Acad Med 2010 May 1,;85(5):780-786. Ginsburg S, Eva K, Regehr G. Do in-training evaluation reports deserve their bad reputations? A study of the reliability and predictive ability of ITER scores and narrative comments. Acad Med 2013 October 1,;88(10):1539-1544. Dudek NL, Marks MB, Wood TJ, Lee AC. Assessing the quality of supervisors' completed clinical evaluation reports. Med Educ 2008 August 1,;42(8):816-822. Dudek NL, Marks MB, Bandiera G, White J, Wood TJ. Quality in-training evaluation reports--does feedback drive faculty performance? Acad Med 2013 August 1,;88(8):1129-1134. Gray JD. Global rating scales in residency education. Acad Med 1996 January 1,;71(1 Suppl):55. 15

References Norman GR, van der Vleuten C, Newble D. International handbook of research in medical education. Dordrecht :London: Kluwer Academic; 2002. Epstein RM. Assessment in medical education. N Engl J Med 2007 January 25,;356(4):387-396. Holmboe ES, Hawkins RE. Practical guide to the evaluation of clinical competence. Philadelphia, PA: Mosby Elsevier; 2008. Downing SM, Yudkowsky R. Assessment in health professions education. New York, NY: Routledge; 2009. Van Der Vleuten, C P. The assessment of professional competence: Developments, research and practical implications. Advances in health sciences education : theory and practice 1996 Jan;1(1):41-67. Watling CJ, Kenyon CF, Zibrowski EM, Schulz V, Goldszmidt MA, Singh I, et al. Rules of engagement: residents' perceptions of the in-training evaluation process. Academic medicine : journal of the Association of American Medical Colleges 2008 Oct;83(10 Suppl):S97. Driessen E, Scheele F. What is wrong with assessment in postgraduate training? Lessons from clinical practice and educational research. Medical Teacher 2013 Jul;35(7):569-574. Kassam A, Donnon T, Rigby I. Validity and reliability of an in-training evaluation report to measure the CanMEDS roles in emergency medicine residents. CJEM 2014 Mar;16(2):144-150. Frank JR, Snell LS, Cate OT, Holmboe ES, Carraccio C, Swing SR, et al. Competency-based medical education: theory to practice. Medical Teacher 2010 Aug;32(8):638-645. Ginsburg S, Gold W, Cavalcanti RB, Kurabi B, McDonald-Blumer H. Competencies "plus": the nature of written comments on internal medicine residents' evaluation forms. Academic medicine : journal of the Association of American Medical Colleges 2011 Oct;86(10 Suppl):S34. Chaudhry S, Holmboe E, Beasley B. The State of Evaluation in Internal Medicine Residency. J GEN INTERN MED 2008 Jul;23(7):1010-1015. 16

Thanks for your attention! Questions?? 17

Help us improve. Your input matters. Download the ICRE App, Visit the evaluation area in the Main Lobby, near Registration, or Go to: http://www.royalcollege.ca/icreevaluations to complete the session evaluation. Aidez-nous à nous améliorer. Votre opinion compte! Téléchargez l application de la CIFR Visitez la zone d évaluation dans le hall principal, près du comptoir d inscription, ou Visitez le http://www.collegeroyal.ca/evaluationscifr afin de remplir une évaluation de la séance. You could be entered to win 1 of 3 $100 gift cards. Vous courrez la chance de gagner l un des trois chèques-cadeaux d une valeur de 100. 18