Reliability of oral examinations: Radiation oncology certifying examination

Similar documents
Survey Question. What are appropriate methods to reaffirm the fairness, validity reliability and general performance of examinations?

Catching the Hawks and Doves: A Method for Identifying Extreme Examiners on Objective Structured Clinical Examinations

Registered Radiologist Assistant (R.R.A. ) 2016 Examination Statistics

Analysis of Confidence Rating Pilot Data: Executive Summary for the UKCAT Board

CHAPTER III RESEARCH METHODOLOGY

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Examining Factors Affecting Language Performance: A Comparison of Three Measurement Approaches

UNIVERSITY OF CALGARY. Reliability & Validity of the. Objective Structured Clinical Examination (OSCE): A Meta-Analysis. Ibrahim Al Ghaithi A THESIS

The Impact of Statistically Adjusting for Rater Effects on Conditional Standard Errors for Performance Ratings

RESEARCH ARTICLES. Brian E. Clauser, Polina Harik, and Melissa J. Margolis National Board of Medical Examiners

Detection of Differential Test Functioning (DTF) and Differential Item Functioning (DIF) in MCCQE Part II Using Logistic Models

AMERICAN BOARD OF SURGERY 2009 IN-TRAINING EXAMINATION EXPLANATION & INTERPRETATION OF SCORE REPORTS

Chapter 3. Psychometric Properties

Global Perspective Inventory (GPI) Report

COMPUTING READER AGREEMENT FOR THE GRE

Introduction to Reliability


Chapter 11. Experimental Design: One-Way Independent Samples Design

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

Lecture 4: Research Approaches

Education, Training, and Certification of Radiation Oncologists in Indonesia

Comparing Vertical and Horizontal Scoring of Open-Ended Questionnaires

THE ANGOFF METHOD OF STANDARD SETTING

Assessing the reliability of the borderline regression method as a standard setting procedure for objective structured clinical examination

Driving Success: Setting Cut Scores for Examination Programs with Diverse Item Types. Beth Kalinowski, MBA, Prometric Manny Straehle, PhD, GIAC

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education.

Validity, Reliability, and Defensibility of Assessments in Veterinary Education

The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests

GUIDELINES FOR POST PEDIATRICS PORTAL PROGRAM

THESE STANDARDS ARE DORMANT.

Linking Assessments: Concept and History

ACRO Scope of Practice Document

PSYCHOMETRIC PROPERTIES OF CLINICAL PERFORMANCE RATINGS

Mayo Clinic Gynecologic Oncology Fellowship (Minnesota) Competency-based goals

Item Analysis Explanation

LANGUAGE TEST RELIABILITY On defining reliability Sources of unreliability Methods of estimating reliability Standard error of measurement Factors

The AAO- HNS s position statement on Point- of- Care Imaging in Otolaryngology states that the AAO- HNS,

CHAPTER VI RESEARCH METHODOLOGY

Basic Standards for Fellowship Training in Sleep Medicine

Specific Standards of Accreditation for Residency Programs in Adult and Pediatric Neurology

Importance of Good Measurement

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Construct Reliability and Validity Update Report

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

THE PSYCHOMETRICS OF PERFORMANCE ASSESSMENTS A PHYSICIAN S PERSPECTIVE

Establishing Interrater Agreement for Scoring Criterion-based Assessments: Application for the MEES

CLINICAL RADIATION SCIENCES (CLRS)

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS

Development, administration, and validity evidence of a subspecialty preparatory test toward licensure: a pilot study

Collecting & Making Sense of

Basic Standards for Residency Training in General Neurology

Psychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals

Basic Standards for Residency Training in Child and Adolescent Neurology

02a: Test-Retest and Parallel Forms Reliability

Interpreting the Item Analysis Score Report Statistical Information

Medical Students Judgments of Mind and Brain in the Etiology and Treatment of Psychiatric Disorders. A Pilot Study

Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker

Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that

PRINCIPLES OF STATISTICS

Association Between Geographic Concentration Of Chiropractors In 2008 And Circulatory Disease Death Rates In 2009

MCAS Equating Research Report: An Investigation of FCIP-1, FCIP-2, and Stocking and. Lord Equating Methods 1,2

The Short NART: Cross-validation, relationship to IQ and some practical considerations

A National Study of the Certified Diabetes Educator: Report on a Job Analysis Conducted by the National Certification Board for Diabetes Educators

A Brief Guide to Writing

School orientation and mobility specialists School psychologists School social workers Speech language pathologists

RADIOLOGIC TECHNOLOGY (526)

Supporting Information

The annual promotion assessment consists of the following components: a. Written examination b. Continuous Assessment

Validation of an Analytic Rating Scale for Writing: A Rasch Modeling Approach

Craft Personality Questionnaire

Practice Analysis and Content Specifications. for Registered Radiologist Assistant

Controlled Experiments

The Personal Profile System 2800 Series Research Report

CLINICAL RADIATION SCIENCES, BACHELOR OF SCIENCE (B.S.) WITH A CONCENTRATION IN RADIATION THERAPY (SECOND MODALITY)

Reliability Study of ACTFL OPIc in Spanish, English, and Arabic for the ACE Review

Regions Hospital Delineation of Privileges Radiation Oncology

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Discrimination Weighting on a Multiple Choice Exam

The Milestones are designed only for use in evaluation

Conformity. Jennifer L. Flint. The University of Southern Mississippi

Global Perspective Inventory (GPI) - Pilot Report

CLINICAL RADIATION SCIENCES, BACHELOR OF SCIENCE (B.S.) WITH A CONCENTRATION IN NUCLEAR MEDICINE TECHNOLOGY (SECOND MODALITY)

English 10 Writing Assessment Results and Analysis

CHAPTER III RESEARCH METHODOLOGY

McCloskey Executive Functions Scale (MEFS) Report

Measures of Dispersion. Range. Variance. Standard deviation. Measures of Relationship. Range. Variance. Standard deviation.

A framework for predicting item difficulty in reading tests

Figure 1: Design and outcomes of an independent blind study with gold/reference standard comparison. Adapted from DCEB (1981b)

THE AMERICAN BOARD OF ORTHODONTICS SCENARIO BASED ORAL CLINICAL EXAMINATION STUDY GUIDE

Survey Research. We can learn a lot simply by asking people what we want to know... THE PREVALENCE OF SURVEYS IN COMMUNICATION RESEARCH

PTHP 7101 Research 1 Chapter Assignments

Collecting & Making Sense of

CLINICAL RADIATION SCIENCES, BACHELOR OF SCIENCE (B.S.) WITH A CONCENTRATION IN RADIOGRAPHY (DEGREE COMPLETION)

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

Using Generalizability Theory to Investigate the Psychometric Property of an Assessment Center in Indonesia

Basic Standards for Osteopathic Fellowship Training in Sleep Medicine

UNIVERSITY OF SOUTH ALABAMA RADIOLOGIC SCIENCES

VARIABLES AND MEASUREMENT

Transcription:

Practical Radiation Oncology (2013) 3, 74 78 www.practicalradonc.org Special Article Reliability of oral examinations: Radiation oncology certifying examination June C. Yang PhD, Paul E. Wallner DO, Gary J. Becker MD, Jennifer L. Bosma PhD, Anthony M. Gerdeman PhD American Board of Radiology, Tucson, Arizona Received 28 July 2011; revised 24 October 2011; accepted 25 October 2011 Abstract Purpose: Oral examinations are used in certifying examinations by many medical specialty boards. They represent daily clinical practice situations more realistically than do written tests or computerbased tests. However, there are repeated concerns in the literature regarding objectivity, fairness, and extraneous factors from interpersonal interactions, item bias, reliability, and validity. In this study, the reliability of oral examination on the radiation oncology certifying examination, which was administered in May of 2010, was analyzed. Methods and Materials: One hundred fifty-two candidates rotated though 8 examination stations. Stations consisted of a hotel room equipped with a computer and software that exhibited images appropriate to the content areas. Each candidate had a 25-30 minute face-to-face encounter with an oral examiner who was a content expert in one of the following areas: gastrointestinal, gynecology, genitourinary, lymphoma/leukemia/transplant/myeloma, head/neck/skin, breast, central nervous system/pediatrics, or lung/sarcoma. This type of design is typically referred to as a repeated measures design or a subject by treatment design, although the oral examination was a routine event without any experimental manipulation. Results: The reliability coefficient was obtained by applying Feldt and Charter's simple computational alternative to analysis of variance formulas that yielded KR-20, or Cronbach's coefficient alpha of 0.81. Conclusions: An experimental design to develop a blueprint in order to improve the consistency of evaluation is suggested. 2013 American Society for Radiation Oncology. Published by Elsevier Inc. All rights reserved. Introduction Oral examinations continue to be administered as certifying examinations by 14 of 24 medical specialty boards. Among the advantages of oral examinations is Conflicts of interest: None. Corresponding author. American Board of Radiology, 5441 E. Williams Blvd, Tucson, AZ 85711. E-mail address: jyang@theabr.org (J.C. Yang). that they represent daily clinical practice situations more realistically than written examinations do. An oral examination can test the limits of a candidate's knowledge on the given topic. They are considered particularly effective in assessing a candidate's clinical decision-making ability and interpersonal skills, as well as intrapersonal qualities such as confidence and selfawareness. 1,2 However, concerns about the reliability, validity, and fairness of oral examinations have been expressed repeatedly in the literature. 1-5 1879-8500/$ see front matter 2013 American Society for Radiation Oncology. Published by Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.prro.2011.10.006

Practical Radiation Oncology: January-March 2013 The major concerns that have been raised pertain to the inherent nature of oral examinations; that is, the effects of the personal interactions. These include the following: examinees' communication or language skills; examinees' familiarity with oral examination form; inhibition of the performance due to stress; the examiner's judgments related to demographic information such as age, gender, race, or socioeconomic status; and other factors that are unrelated to actual capabilities (knowledge, skills, and judgment). Would the candidate receive the same score if a different examiner tested him or her? Reliability is a major concern because subjective judgments may affect the scores. Different examiners may have different degrees of harshness or leniency. Content or follow-up questions may vary from one examiner to another. Contrary to the above concerns, Lunz and Bashook 2 found that when scores on communication ability were compared with scores on oral examinations in a medical specialty board certification process, the correlation coefficient between the communication scores and the oral examination scores was 0.10. A random sample of 90 candidates was observed. Nonmedical researchers measured the communication ability of the candidates using a 21-item communication survey instrument. The authors concluded that the candidates' oral examination scores were not influenced by their communication abilities. Recently, the American Board of Psychiatry and Neurology decided to eliminate oral examinations from the certification examination process, citing as concerns the candidates' stress and anxiety associated with the oral examinations, as well as financial stress. 6 The American Board of Radiology (ABR) has also decided to discontinue (beginning in 2014) the oral examination as the final certifying examination for the diagnostic radiology certification process. This decision was based on concerns about subjectivity and technically inadequate presentations. For example, in years past case presentations during the oral examination were similar to what could be observed in daily clinical situations. However, with the advancement of electronic and technical modalities, such as computed tomography and magnetic resonance (MR) imaging, MR angiography, and MR spectroscopy, it has become more difficult for the oral examination to simulate actual clinical practice. Medical physics and radiation oncology (RO) will continue the oral examinations in their certifying processes for those who have passed the initial written qualifying examinations. Methods and materials Reliability of oral examinations 75 The radiation oncology oral examination, a criterionreferenced test, is administered as the final certifying examination to those who have completed one internshiptransitional year plus 4 years of radiation oncology residency. This generally occurs 10 months after candidates obtain a passing score on the qualifying examination, which is taken at the end of the fourth year of residency. In this study, the oral examination scores were obtained from the regularly scheduled and administered RO oral examination in May 2010. The oral examination was not an experimental design and there was no modification or manipulation made in yielding the candidates' performance scores. No Institutional Review Board application was submitted as no human subject or identity was used for the study. Following a long-established procedure, all RO oral examiners received about 1 hour of general orientation followed by several hours of discussion within their specific category groups, before the beginning of the oral examination. Also, all new examiners were required to observe 2 oral examinations before giving their first examinations. No announcements or comments were made to the examiners or examinees regarding a study of the reliability of oral examination. That is, the reliability coefficient was calculated simply by using the existing data. In this routinely administered RO oral examination, 152 candidates rotated through 8 examination stations. Stations consisted of a hotel room equipped with a computer and software that exhibited images appropriate to the content areas. Each candidate had a face-to-face encounter with an oral examiner who was a content expert in one of the following areas: gastrointestinal, gynecology, genitourinary, lymphoma/leukemia/transplant/ myeloma, head/neck/skin, breast, central nervous system/ pediatrics, or lung/sarcoma. Each oral examination lasted 25-30 minutes. The scoring rubrics have been developed over the years, the continuous score scale ranged from 68 to 72 with a 1-point interval among the performances, and 70 as the passing score on each category. During each face-toface oral examination, the examiners, independent from each other, filled out specifics about the candidate's performance on a blank form provided with case numbers. At the end of the oral examinations, the examiners of each category met to discuss in depth the accuracy and fairness of some candidates' failing grades. After the meeting, those with failing or borderline performances were presented to the oral examiner panel of all 8 clinical categories so they could be reconsidered and their final grades determined. The textbook Educational Measurement presents several methods for estimating reliability coefficients, depending on the type of test and the theoretical framework. 7 The design of this RO oral examination is referred to as one way repeated-measures design, subject by treatment design, or complete randomized block design, and the analysis of the results are derived from analysis of variance (ANOVA) and produce exactly the same results. Although the result will be identical, one may also apply a generalizability model, as presented by

76 J.C. Yang et al Practical Radiation Oncology: January-March 2013 Brennan in his GENOVA program. 8 The author chose to use a simple computational alternative to ANOVA, or GENOVA, which was published by Feldt and Charter in 2004 [9], as follows: n h io r k =1 SD 2 x SD2 j SD2 s = ðk 1ÞSD 2 s ; where r k is k-judge reliability, SD x is the standard deviation of all scores using kn as the divisor. N is the number of subjects and k is the number of judges. SD j is the standard deviation of judge means using k as the divisor, and SD s is the standard deviation of each subject's test performance. The best score for each subject is the mean of the judges' ratings having a standard deviation. Note that SD 2 is variance. The above formula is identical to Cronbach's coefficient alpha, which is the same as the Kuder-Richardson formula 20 (KR-20), which can be obtained by ANOVA results with the interaction term subtracted. That is, ρ = (MS s MS A S )/ (MS s ), where MS represents mean square, the subscript s represents subject, and the subscript A S represents interaction in ANOVA terms. 9 Results Applying Feldt and Charter's formula, 9 the simple calculation method yielded the inter-rater reliability coefficient KR-20 of 0.81 for the oral examination administered by 8 content expert examiners. This means that 81% of the examinees' scores would maintain the same rank orders if a comparable examination was administered. Table 1 exhibits the data entry format, which contains partial data on the oral examination scores of 152 candidates, the mean of each candidate for 8 competency areas, and the mean of each rater who is the content expert examiner (the term rater is used interchangeably with examiner in this paper). Correlation coefficients were obtained among 8 oral examination scores and 3 RO written examination scores, biology, physics, and clinical, acknowledging that the examinees may have taken different written examination forms at different times and that the reliability coefficients varied somewhat from test to test. During the past 5 years, RO biology tests' reliability coefficients varied from 0.92 to 0.96 (0.95 on average); the reliability coefficients of RO physics examinations ranged from 0.88 to 0.96 (0.94 on average) and RO clinical component examinations had reliability coefficients ranging from 0.93 to 0.95 (0.95 on average). The correlation coefficients were obtained among the oral examination scores and the RO written tests. Restriction in range was not corrected because 152 examinees have taken different versions of the written examinations over the years. All correlation coefficients were statistically significant at 2-tailed P =.05. Most of them were statistically significant at P =.0001, suggesting highly consistent relationships between the written test scores and oral examination scores. The clinical examination, a written test, was highly correlated with all 8 categories of the oral examinations, yielding correlation Table 1 Subject (Candidate) Data format of May 2010 radiation oncology oral examination scores Rater 1 (GI) Rater 2 (Gyn) Rater 3 (GU) Rater 4 (Lymph) Rater 5 (HNS) Rater 6 (Breast) Rater 7 (CNS/PEDS) Rater 8 (Lung) Mean (Subject) 1 70 70 70 70 68 70 70 69 69.63 2 70 70 69 69 69 68 71 69 69.38 3 70 71 70 70 70 71 71 71 70.50 4 70 70 70 70 70 70 69 70 69.88 5 70 71 72 70 69 70 71 70 70.38 6 71 71 71 71 71 70 71 71 70.88 7 70 71 71 70 70 71 70 70 70.38 8 69 70 68 70 68 70 70 69 69.25 9 69 69 69 70 70 71 68 70 69.50 10 71 69 70 69 71 71 71 70 70.25 11 70 71 71 71 70 69 69 71 71.00 152 71 71 71 72 71 71 70 70 70.88 Mean 70.43 70.66 70.52 70.52 70.44 70.54 70.50 70.50 70.51 Standard deviation 0.64 0.89 0.77 0.76 0.88 0.84 0.87 0.85 0.53 CNS/PEDS, central nervous system/pediatrics; GI, gastrointestinal; GU, genitourinary; GYN, gynecology; HNS, head/neck/skin; Lymph, lymphoma/ leukemia/transplant/myeloma.

Practical Radiation Oncology: January-March 2013 coefficients ranging from 0.412 to 0.625. All were statistically significant at probabilities ranging from 2E -6 to 8E -17. These results mean that such high correlation could not have occurred merely by a chance. One can expect or predict that candidates who obtain high scores on the clinical written examination will perform well on oral examinations. The correlation coefficients between the physics written examination and the 8 oral examination categories were also statistically significant, with probabilities ranging from 0.005 to 5E -24. The same inference can be drawn on the predictability of the physics written test on the oral examinations: that these relationships could not have been the result of chance occurrence and that candidates who perform well on the physics written test will perform well on the oral examinations. Similarly, the biology written examination was a strong predictor of performance on the oral examinations, with probabilities ranging from 0.004 to 2E -6. Intercorrelation among the 8 oral examination categories was statistically significant also, with probabilities ranging from 0.034 to 3E -11, which suggests that candidates who perform well on one oral examination category will show successful performances on the other oral examination categories. Table 2 exhibits the detailed correlation coefficients for reference. Discussion Reliability of oral examinations 77 The fact that many correlation coefficients ranged from 0.300 to 0.600 suggests that unique factors existed in each form of examination; that is, the oral examinations and the written examinations measured some aspects of knowledge, skill, or judgment that were not measured in the other form of examination. If a correlation coefficient between a written test and oral examination approaches 1.0, it means the tests are probably measuring the same aspects or attributes of the criteria. Although statistically significant at P =.05, the correlation coefficient between the breast category oral examination and the RO physics written test correlated less highly (P =.028), and the same was true for the correlation coefficient between the breast category oral examination and the lymphoma/leukemia/ transplant/myeloma category oral examination (P =.034). No plausible explanation is certain. However, one possible explanation may be that the reliability of the breast category is lower than the other categories, possibly because all the regular breast category examiners were first-time examiners. There were 2 relievers who were experienced but each examined fewer candidates. As mentioned above, there are concerns in the literature about the reliability and validity of the oral examination because of its potential bias, effects of interpersonal Table 2 Intercorrelation among radiation oncology oral examination categories and computerized ( written ) test scores (n = 152) Category Computerized test Oral examination Clinical Physics Biology Rater 1 GI Rater 2 Gyn Rater 3 GU Rater 4 Lymph Rater 5 HNS Rater 6 Breast Rater 7 CNS/Peds Rater 8 Lung Clinical Pearson r 1 0.575 0.625 0.510 0.523 0.412 0.390 0.497 0.375 0.472 0.453 Probability 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Physics Pearson r 1 0.715 0.300 0.338 0.225 0.327 0.342 0.178 0.220 0.335 Probability 0.000 0.000 0.000 0.005 0.000 0.000 0.028 0.007 0.000 Biology Pearson r 1 0.355 0.326 0.307 0.301 0.407 0.257 0.233 0.353 Probability 0.000 0.000 0.000 0.000 0.000 0.001 0.004 0.000 GI Pearson r 1 0.400 0.333 0.387 0.362 0.347 0.437 0.297 Probability 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Gyn Pearson r 1 0.432 0.527 0.484 0.296 0.378 0.457 Probability 0.000 0.000 0.000 0.000 0.000 0.000 GU Pearson r 1 0.290 0.428 0.306 0.266 0.392 Probability 0.000 0.000 0.000 0.001 0.000 Lymph Pearson r 1 0.372 0.172 0.338 0.232 Probability 0.000 0.034 0.000 0.004 HNS Pearson r 1 0.306 0.336 0.332 Probability 0.000 0.000 0.000 Breast Pearson r 1 0.318 0.242 Probability 0.000 0.003 CNS/PEDS Pearson r 1 0.232 Probability 0.004 Lung Pearson r 1 Probability The probability was for 2-tailed tests. CNS/PEDS, central nervous system/pediatrics; GI, gastrointestinal; GU, genitourinary; GYN, gynecology; HNS, head/neck/skin; Lymph, lymphoma/ leukemia/transplant/myeloma.

78 J.C. Yang et al Practical Radiation Oncology: January-March 2013 communication skills, and possible subjectivity. However, some research studies dispute some of the concerns. In this study, each candidate was examined by 8 content experts for a period of 25-30 minutes each. The raters scored candidates' performances independently. The inter-rater reliability of the May 2010 ABR RO certifying oral examination was 0.81. Considering the lack of extensive systematic training, this is remarkably high for an oral examination administered by 8 content expert examiners. The reliability coefficient is higher than what might be commonly perceived or expected as concerns expressed in the literature and mentioned in the introduction of this paper. 1-5 However, more systematic examiner training before the beginning of the oral examination may further increase the consistency in the rater evaluations of the candidates' performances. Training for standardizing cases and questions may also help increase the consistency in grading. In addition, a variability in scores greater than the range of 68 to 72 would probably increase the reliability coefficient. The reliability of each category can affect the inter-rater reliability. If the reliability of a content category is low it will not correlate highly with other content categories. Perhaps the reliability of the breast category may be suspect because the correlation coefficient between the breast category and lymphoma/leukemia/transplant/ myeloma category (P =.028), as well as the correlation coefficient between the breast and physics categories (P =.034), were not as high as correlation coefficients between the other content categories, including the written test scores. It was not possible to determine one category's reliability coefficient by itself in this study. Though arbitrary, perhaps, a scale with 5-point intervals could be designed. The scale could be developed with scoring guidelines that may be familiar to the rater and are easy to follow. The trustees could be involved in the decision-making process regarding the development of the scoring rubric, especially whether or not they would allow the rater to assign a score within a 5-point interval scale with the wider range of points; for example, from 65 to 85 or 70 to 100, etc, with specific guidelines for each interval. A systematic method to increase consistency among the raters could be implemented by providing systematic training to standardize cases and questions. Each 25-30 minute oral examination could be audiotaped by the content category. The audiotapes of each content category could be transcribed and analyzed and then categorized by the questions asked by the examiners. This method could produce the blueprint for the oral examination in each category. During the process of developing the blueprint, a scale with 5-point intervals could be implemented instead of a 1-point interval scale (although this will have no effect on developing the oral examination blueprints ). Although the reliability coefficient of the ABR May 2010 oral examination was remarkably high, as with any organization administering high-stakes examinations, the ABR must continue to strive for high reliability, and thus validity, of their examinations to protect candidates and the public. Reliability is a necessary condition for validity, but not a sufficient condition for validity. Currently, the associate executive director of radiation oncology is working on a revision of the case formats to better rationalize oral examination content in order to improve the reliability and thus validity of the radiation oncology oral examinations. References 1. Memon AM, Joughin GR, Memon B. Oral assessment and postgraduate medical examinations: Establishing conditions for validity, reliability and fairness. Adv Health Sci Educ Theory Pract. 2010;15:277-279. 2. Lunz ME, Bashook PG. Relationship between candidate communication ability and oral certification examination scores. Med Educ. 2008;42:1227-1233. 3. U.S. Congress. The Court Interpreters Act of 1978, PL 95-539.28 USC 1827. Washington, DC: United States Congress. 1978. 4. Steinberg PI. Oral examination anxiety in physicians, narcissism, and object relations. J Appl Psychoanal Stud. 2002;4:379-388. 5. Stansfield CW, Hewitt WE. Examining the predictive validity of a screening test for court interpreters. Lang Test. 2005;22:438-462. 6. Pascuzzi RM. Opinion/education: The ABPN is the neurology resident's best friend. Neurology. 2008;70:e16-e19. 7. Brennan RL, ed. Educational Measurement, 4th ed. American Council on Education, Westport, CT: Praeger Publishers; 2006;65-110; 221-256. 8. Brennan RL. Generalizability Theory. 4th ed. New York, NY: Springer-Verlag; 2011453-469. 9. Feldt LS, Charter RA. A simple computational alternative to analysis of variance formulas for estimating the k-judge reliability. Psychol Rep. 2004;94:514-516.