Interobserver Agreement of Confocal Laser Endomicroscopy for Bladder Cancer

Similar documents
Confocal Endomicroscopy in Urologic Surgery: urothelial carcinoma and other emerging applications

Technology and Engineering Dynamic Real-time Microscopy of the Urinary Tract Using Confocal Laser Endomicroscopy

ARTICLE IN PRESS. tumor grade or stage information. Procedures

Confocal Laser Endomicroscopy of Bladder and Upper Tract Urothelial Carcinoma: A New Era of Optical Diagnosis?

Validation of Confocal Laser Endomicroscopy Features of Bladder Cancer: The Next Step Towards Real-time Histologic Grading

Confocal Laser Endomicroscopy. Populations Interventions Comparators Outcomes Individuals: With suspected or known colorectal lesions

Chromoendoscopy and Endomicroscopy for detecting colonic dysplasia

FEP Medical Policy Manual

FEP Medical Policy Manual

Confocal Laser Endomicroscopy. Description

Confocal Laser Endomicroscopy

Gastrointestinal Imaging

Advances in Endoscopic Imaging

The Paris classification of colonic lesions

Medical Policy. MP Confocal Laser Endomicroscopy

Author's Accepted Manuscript

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters.

GUIDELINES ON NON-MUSCLE- INVASIVE BLADDER CANCER

Confocal Laser Endomicroscopy of the Colon

How to characterize dysplastic lesions in IBD?

Jesse K. McKenney, MD

University Mainz. Early Gastric Cancer. Ralf Kiesslich. Johannes Gutenberg University Mainz, Germany. Early Gastric Cancer 15.6.

COMMITMENT &SOLUTIONS UNPARALLELED. Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study

BLADDER CANCER EPIDEMIOLOGY

Rationale of The Paris System for Reporting Urinary Cytopathology: The NEW paradigm

LARYNGEAL DYSPLASIA. Tomas Fernandez M; 3 rd year ENT resident, Son Espases University Hospital

Staging and Grading Last Updated Friday, 14 November 2008

About Omics Group conferences

What do we know. about the natural history of. precancerous. bronchial lesions?

American Journal of Gastroenterology. Volumetric Laser Endomicroscopy Detects Subsquamous Barrett s Adenocarcinoma

Your Chance to Improve Patient Outcome. Narrow Band Imaging (NBI) The New Standard for Diagnostics and Treatment

INCREASE DETECTION, REDUCE RECURRENCE. NBI Is Clinically Proven to Diagnose More Bladder Cancer

New Developments in the Endoscopic Diagnosis and Management of Barrett s Esophagus

Afterword: The Paris System for Reporting Urinary Cytology

Quantitative analysis of high-resolution microendoscopic images for diagnosis of neoplasia in patients with Barrett s esophagus

Volumetric laser endomicroscopy can target neoplasia not detected by conventional endoscopic measures in long segment Barrett s esophagus

The innovative aspect is that it detects bladder cancer based on the novel biomarker, minichromosome maintenance complex component 5 (MCM5).

CDx Diagnostics THE NEW STANDARD FOR QUALITY GI CARE

Review Article Confocal Endomicroscopy of Colorectal Polyps

Comparison of the Diagnostic Usefulness of Conventional Magnification and Near-focus Methods with Narrow-band Imaging for Gastric Epithelial Tumors

Paris classification (2003) 삼성의료원내과이준행

Citation for published version (APA): Cauberg, E. C. C. (2011). Advancements in diagnostic imaging for urothelial carcinoma.

Dysplasia 4/19/2017. How do I practice Chromoendoscopy for Surveillance of Colitis? SCENIC: Polypoid Dysplasia in UC. Background

Ivyspring International Publisher. Introduction. Journal of Cancer 2017, Vol. 8. Abstract

THE CLASSIFICATION OF BLADDER TUMOURS

Confocal laser endomicroscopy is a new field of endoluminal

AP110 URINARY BLADDER BIOPSY INTERPRETATION Part 1

Histopathology of Endoscopic Resection Specimens from Barrett's Esophagus

Morphologic Criteria of Invasive Colonic Adenocarcinoma on Biopsy Specimens

Squamous Cell Carcinoma of the Head and Neck (SCCHN)

E arly clinical observations regarding the biology

Applying Risk Management Principles to QA in Surgical Pathology: From Principles to Practice

Diagnosis and classification

CYSVIEW. CONFIDENCE AT FIRST SIGHT

2004 World Health Organization Classification of the Noninvasive Urothelial Neoplasms: Inherent Problems and Clinical Reflections

Page 1. Is the Risk This High? Dysplasia in the IBD Patient. Dysplasia in the Non IBD Patient. Increased Risk of CRC in Ulcerative Colitis

Philip Chiu Associate Professor Department of Surgery, Prince of Wales Hospital The Chinese University of Hong Kong

MANAGEMENT OF BARRETT S RELATED NEOPLASIA IN 2018

Accuracy and Reproducibility of Telecytology Diagnosis of Cervical Smears A Tool for Quality Assurance Programs

Optical Molecular Imaging in the Gastrointestinal Tract. * corresponding author after publication. # corresponding author for proofs

The Impact of Blue Light Cystoscopy with Hexaminolevulinate (HAL) on Progression of Bladder Cancer ANewAnalysis

ORIGINAL ARTICLES ALIMENTARY TRACT

Pathology in Slovenian CRC screening programme: Organisation and quality assurance. Snježana Frković Grazio and Matej Bračko

CK20 and p53 Immunohistochemical Staining Patterns in Urinary Bladder Specimens With Equivocal Atypia. Correlation With Outcomes

Barrett s Esophagus: Old Dog, New Tricks

Clinical Study Postchemotherapy Histopathological Evaluation of Ovarian Carcinoma: A 40-Case Study

Supplementary Information. Detection and delineation of oral cancer with a PARP1 targeted optical imaging agent

Optical biopsy of early gastroesophageal cancer by catheter-based reflectance-type laser-scanning confocal microscopy

Accepted Article. Parikh 1 This article is protected by copyright. All rights reserved.

Validation of diagnostic characteristics of needle based confocal laser endomicroscopy in differentiation of pancreatic cystic lesions

Observer variation for radiography, computed tomography, and magnetic resonance imaging of occult hip fractures

Flexible Cystoscopy. The Olympus KeyMed Group of Companies

How Many Diseases in Carcinoma in situ?

(true) Disease Condition Test + Total + a. a + b True Positive False Positive c. c + d False Negative True Negative Total a + c b + d a + b + c + d

Urinary Cytology. Spasenija Savic Prince, MD Pathology, University Hospital Basel, Switzerland

Helicobacter pylori Improved Detection of Helicobacter pylori

Urinary Bladder: WHO Classification and AJCC Staging Update 2017

Pathology in Slovenian CRC screening programme:

Pathologic Assessment of Invasion in TUR Specimens. A. Lopez-Beltran. T1 (ct1)

Haematuria and Bladder Cancer

ACCME/Disclosures. Case History 4/13/2016. USCAP GU Specialty Conference Case 3. Ann Arbor, MI

Evaluating the Endoscopic Reference Score for eosinophilic esophagitis: moderate to substantial intra- and interobserver reliability

Introduction. Phil Hyun Song, MD, PhD 1 Seok Cho, MD 2 Young Hwii Ko, MD, PhD 1. pissn , eissn

Vital staining and Barrett s esophagus

MR-US Fusion. Image-guided prostate biopsy. Richard E Fan Department of Urology Stanford University

Costing report: Bladder cancer

Image Analysis of Magnifying Endoscopy for Differentiation between Early Gastric Cancers and Gastric Erosions

Chapter 5. Oxygenated Hemoglobin Diffuse Reflectance Ratio for In Vivo Detection of oral Pre-cancer

MEDICAL POLICY SUBJECT: URINARY TUMOR MARKERS FOR BLADDER CANCER. POLICY NUMBER: CATEGORY: Technology Assessment

EAU GUIDELINES ON NON-MUSCLE INVASIVE (TaT1, CIS) BLADDER CANCER

Management of Barrett s: From Imaging to Resection

Computer aided optical diagnosis of polyps. Dr Michael Byrne Vancouver General Hospital University of British Columbia

An Approach to Pancreatic Cysts. Introduction

Pathology of bladder cancer in Egypt; a current study.

Differentiation of Tumors with Specific Red Cell Adherence (SRCA) test

Cytopathology. Robert M Genta Pathologie Clinique Université de Genève

The Optical Biopsy. Douglas S. Scherr

Identification of gastric atrophic changes: from histopathology to endoscopy

Urinary Biomarkers for Cancer Screening, Diagnosis, and Surveillance

Transcription:

Page 1 of 20 Interobserver Agreement of Confocal Laser Endomicroscopy for Bladder Cancer Authors: Timothy C. Chang, M.S., 1 Jen-Jane Liu, M.D., 1 Shelly T. Hsiao, B.A., 2 Ying Pan, Ph.D., 1 Kathleen E. Mach, Ph.D., 1 John T. Leppert, M.D., 1,2 Jesse K. McKenney, M.D., 3 Robert V. Rouse, M.D., 2,3 and Joseph C. Liao, M.D. 1,2 Institution(s)/affiliation(s) for each author: 1 Department of Urology, Stanford University School of Medicine, Stanford, CA 94305-5118 2 Veterans Affairs Palo Alto Health Care System, Palo Alto, CA 94304 3 Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305 Corresponding author: Joseph C. Liao, M.D. Department of Urology, Stanford University School of Medicine 300 Pasteur Dr., Room S-287 Stanford, CA 94305-5118 jliao@stanford.edu p: (650) 852-3284 f: (650) 849-1925 Timothy C. Chang tcchang@stanford.edu Jen-Jane Liu jenliu@stanford.edu Shelly T. Hsiao shsiao27@gmail.com Ying Pan yingpan@stanford.edu Kathleen E. Mach kmach@stanford.edu John T. Leppert jleppert@stanford.edu Jesse K. McKenney - mckennj@ccf.org Robert Vance Rouse rouse@stanford.edu Joseph C. Liao jliao@stanford.edu Previous abstract: Chang TC, Liu JJ, Hsiao ST, Pan Y, Mach KE, McKenney J, Rouse R and JC Liao. Interobserver agreement and accuracy of confocal laser endomicroscopy for in vivo diagnosis of bladder cancer. Poster presented at the meeting of the American Urological Association, Atlanta, GA, May 2012.

Page 2 of 20 2 Abstract: Purpose: Emerging optical imaging technologies such as confocal laser endomicroscopy (CLE) hold promise in improving bladder cancer diagnosis. The purpose of this study was to determine the interobserver agreement of image interpretation using CLE for bladder cancer. Methods: Experienced CLE urologists (n=2), novice CLE urologists (n=6), pathologists (n=4) and nonclinical researchers (n=5) were recruited to participate in a two-hour computer-based training consisting of a teaching and validation set of intraoperative white light cystoscopy (WLC) and CLE video sequences from patients undergoing transurethral resection of bladder tumor. Interobserver agreement was determined using the κ statistic. Results: Of the 31 bladder regions analyzed, 19 were cancer and 12 were benign. For cancer diagnosis, experienced CLE urologists had substantial agreement for both CLE and WLC+CLE (90%, κ 0.80) compared with moderate agreement for WLC alone (74%, κ 0.46), while novice CLE urologists had moderate agreement for CLE (77%, κ 0.55), WLC (78%, κ 0.54) and WLC+CLE (80%, κ 0.59). Pathologists had substantial agreement for CLE (81%, κ 0.61), and nonclinical researchers had moderate agreement (77%, κ 0.49) in cancer diagnosis. For cancer grading, experienced CLE urologists had fair to moderate agreement for CLE (68%, κ 0.64), WLC (74%, κ 0.67) and WLC+CLE (53%, κ 0.33), as did novice CLE urologists for CLE (53%, κ 0.39), WLC (66%, κ 0.50) and WLC+CLE (61%, κ 0.49). Pathologists (65%, κ 0.55) and nonclinical researchers (61%, κ 0.56) both had moderate agreement for CLE in cancer grading. Conclusions: CLE is an adoptable technology for cancer diagnosis in novice CLE observers following a short training with moderate interobserver agreement and diagnostic accuracy similar to WLC alone. Experienced CLE observers may be capable of achieving substantial levels of agreement for cancer diagnosis that is higher than with WLC alone. 2

Page 3 of 20 3 Introduction: Probe-based confocal laser endomicroscopy (CLE) is an emerging optical imaging technology that enables endoscopic microscopy of mucosal lesions, with dynamic, subsurface imaging of tissue microarchitecture and cellular features. The technology has been applied as an adjunct to standard white light endoscopy in the respiratory 1 and gastrointestinal tracts, 2 4 and recently in the urinary tract. 5 8 Particularly for bladder cancer, a new generation of imaging technologies such as CLE may augment the diagnostic accuracy of white light cystoscopy (WLC) through improved visualization of flat lesions, differentiation of benign from neoplastic tumors, and delineation of the tumor boundaries. 9 Based to the well-established principle of confocal microscopy 10,11, CLE is performed using a fiberoptic, sterilizable probe that fits within the working channel of a standard cystoscope. Optical sectioning of the tissue of interest with micron-scale resolution is achieved using a 488 nm laser as the light source and fluorescein, a FDA-approved drug that may be administered intravesically or intravenously, as the contrast agent. Previously, we demonstrated the feasibility of using CLE to obtain real-time in vivo images of bladder tumors 6 and developed diagnostic criteria for grading bladder tumors using CLE. 5 Interobserver agreement studies are useful in determining the subjective variation in interpretation amongst observers in analyzing images. 12 Methods that demonstrate higher levels of agreement between observers are deemed more reliable. 12 In disciplines that require subjective interpretation of diagnostic imaging such as pathology 13,14 and radiology, 15 interobserver studies are commonly applied to determine reproducibility of the results from one observer to another. Interobserver agreement studies for CLE have been studied in the 3

Page 4 of 20 4 gastrointestinal tract, which have ranged from moderate to good agreement in the diagnosis of colorectal cancer 16 to substantial and almost perfect agreement in Barrett s esophagus. 17 Interobserver agreement of CLE in the urinary tract has not been previously examined, which can assess the reliability of the technology between observers and evaluate the adoptability of the CLE in novice users. The aim of this study was to determine the interobserver agreement and diagnostic accuracy of CLE for bladder cancer. Methods: Observer Recruitment The study was approved by the Stanford University Institutional Review Board and the VA Palo Alto Health Care System (VAPAHCS) Research and Development Committee. The study consisted of four observer groups. The first was an experienced CLE urologist group consisting of a board certified urologist and a urology chief resident involved in the protocol development of CLE for the urinary tract. The remaining three groups were novice CLE observers with no prior experience with CLE, consisting of six board certified urologists, four pathologists, and five nonclinical researchers. The novice CLE urologists were expert WLC users, but the pathologists and nonclinical researchers had no experience with WLC. The nonclinical researchers ranged from an undergraduate student to Ph.D. scientists and engineers. All groups participated in identical training sessions. Teaching and Validation Set The observers participated in a two-hour, computer-based training that consisted of separate teaching and validation sets that featured intraoperative WLC and CLE videos from selected patients undergoing transurethral resection of bladder tumor at the VAPAHCS from 2008-2011. 4

Page 5 of 20 5 The computer-based, interactive training was created in a website format compatible with common internet browsers for easy access and future scalability. The observers were first introduced to background information on bladder cancer and CLE technology (Figure 1A). Using still images and video sequences, the observers were instructed to identify three microarchitectural features (flat vs. papillary, tissue organization, and vascularity) and three cellular features (morphology, cohesiveness, and borders) of benign and pathological urothelium using CLE. Diagnostic criteria (Table 1) were developed that associated the six features to benign and cancerous urothelium, and the observers were instructed to categorize each sequence as benign, low grade (LG) or high grade (HG) cancer. Fifteen CLE video sequences were provided in a teaching set for the observers to iteratively practice diagnosing the sequences (Figure 1B). Following the teaching set, the observers proceeded to the validation set to evaluate 32 CLE video sequences consisting of 12 benign, 9 LG, 11 HG images (Figure 1C). The benign sequences were from biopsy-confirmed normal mucosa, inflammation, and papilloma, LG sequences were from LG papillary tumors, and HG sequences were from HG papillary tumors and carcinoma-in-situ. The observers were able to pause and review each video clip frame-byframe. All observers were blinded to patient history and final pathology and were asked to diagnose and grade each clip as benign, LG or HG. Upon completing the 32 CLE sequences, the four pathologists and five nonclinical researchers concluded the study. The experienced CLE urologists and novice CLE urologists were asked to further diagnose an additional set of 32 corresponding WLC images in different order, followed by a third and final set of 32 images where the CLE and WLC were shown together. The data were gathered to determine the interobserver agreement and diagnostic accuracy. 5

Page 6 of 20 6 From the 32 sequences reviewed by the observers, one of the original HG sequences was excluded due to a discrepancy in the final pathologic diagnosis. The resulting 31 responses of benign, LG or HG from each of the observers for the 31 images with correlating histopathological information were used to generate the interobserver agreement and diagnostic accuracy data analyses. For cancer diagnosis, all 31 responses were used. However, for cancer grading, a subset of the 19 cancer (9 LG and 10 HG) sequences found on histopathology were used. In addition to diagnosing and grading each of the sequences as the other groups did, the experienced CLE urologist group was asked to also identify the six key CLE features (Table 1) for each sequence. Statistical Analysis Interobserver agreement was assessed using the Fleiss κ statistic. The description for κ statistic developed by Landis and Koch with 0.00-0.20 as slight, 0.21-0.40 as fair, 0.41-0.60 as moderate, 0.61-0.80 as substantial and 0.81-1.00 as almost perfect levels of agreement was used in this study. 18 To determine diagnostic accuracy, a dedicated pathologist (R.V.R.) who was blinded to the clinical history and not included in the pathologist group reviewed all the histopathological slides corresponding to each bladder lesion. Sensitivity and specificity was calculated using the histopathological results as the standard. Results: Table 2 shows the percent agreement and κ statistic for the two experienced CLE urologists for each of the six features. Tissue organization, vascular features, cellular morphology and 6

Page 7 of 20 7 cellular borders had substantial levels of agreement. The experienced CLE urologists had moderate agreement in the ability to determine flat vs. papillary using CLE, while they had a fair level of agreement in characterizing cellular cohesiveness. Table 3 shows the interobserver agreement and diagnostic accuracy for cancer diagnosis. Experienced CLE urologists had substantial to almost perfect levels of agreement for CLE and WLC+CLE, which were both greater than WLC alone. The novice CLE urologists had moderate levels of agreement for all groups. The sensitivity and specificity for all groups were between 73 and 89%. Table 4 shows the interobserver agreement and diagnostic accuracy for cancer grading. All groups had decreased levels of agreement compared with cancer diagnosis except for nonclinical researchers who had a slightly increased κ compared with cancer diagnosis (0.56 vs. 0.49). There was also an unexpected decrease in κ for experienced CLE urologists from WLC to WLC+CLE (0.67 to 0.33). For low grade cancer, experienced CLE urologists had increased sensitivity and specificity with the addition of CLE to WLC (sensitivity 64%, specificity 81%) compared with standard WLC alone (sensitivity 55%, specificity 50%). Novice CLE urologists showed an increase in specificity with the addition of CLE to WLC as well (specificity 65%) compared to WLC alone (specificity 54%). For low grade, the novice groups all had similar diagnostic accuracy using CLE alone (sensitivity 49-52%, specificity 66-75%). For high grade cancer, experienced CLE urologists showed an increase in sensitivity for WLC+CLE (sensitivity 69%) compared with WLC (sensitivity 50%) while maintaining specificity (specificity 73% for both) with the addition of CLE to standard WLC. For novice CLE urologists, there was an increase in sensitivity but a decrease in specificity with the addition of CLE to standard WLC. Discussion: 7

Page 8 of 20 8 Our single-center study indicates that CLE image interpretation of bladder cancer is adoptable by novice observers through a two-hour computer-based interactive training. We created a training session centered on guiding observers to identify six key CLE features in analyzing bladder cancer microarchitecture and cellular morphology. There were substantial levels of agreement for four of the six features, suggesting that these features can be identified reliably with proper training. The moderate level of agreement for the flat vs. papillary feature was not unexpected as it is a feature more easily identified on a macroscopic level by WLC than on a microarchitectural level. The cohesiveness feature had a fair level of agreement, suggesting that further refinement of the criteria may be necessary to improve the reliability of this feature. Once the observers were trained to identify these features, they were introduced to a table correlating the six features to bladder cancer diagnoses. This table was developed based on the 2004 WHO Classification of Bladder Tumours, 19 and further refinement of the table is expected as the technology matures and is adopted by additional end users. Since most of the observer groups with the exception of the pathologist group do not diagnose bladder cancer routinely using microscopy, the table served as a useful guide for novice observers, particularly nonclinical researchers, to diagnose bladder cancer. The interobserver agreement and diagnostic accuracy results were reported separately for CLE, WLC, and WLC+CLE. Data on CLE alone was useful in assessing the technology itself and observing its standalone performance without the bias of WLC. The WLC information provided baseline data as it is the standard imaging modality for bladder cancer. However, the practical use of CLE in the clinical setting would involve WLC for the initial survey of the bladder and guiding of the CLE probe to the area of interest. Thus, the WLC+CLE data provides the most clinically relevant information. As pathologists and nonclinical researchers had no prior training in WLC, they were only asked to review CLE alone. 8

Page 9 of 20 9 The ability of novice observers to learn CLE image interpretation was demonstrated by the performance of the various groups after a two-hour training session. For cancer diagnosis (Table 3), novice CLE urologists had similar moderate levels of agreement and diagnostic accuracy for CLE, WLC and WLC+CLE, indicating that the two-hour training was sufficient for CLE to reliably diagnose cancer with diagnostic accuracy comparable to standard WLC. Interestingly, nonclinical researchers who had no prior CLE or clinical experience also had a moderate level of agreement for cancer diagnosis, similar to the other novice CLE observer groups, while also maintaining similar levels of diagnostic accuracy. Thus, both the novice CLE urologists demonstrated comparable agreement for WLC and CLE, while concurrently, nonclinical researchers with no prior clinical experience demonstrated similar levels of agreement for CLE as other clinically trained novice groups. These results provide evidence on the relative ease in training the novice observers to interpret CLE images. It is notable that these results are in line with the moderate levels of interobserver agreement reported for CLE imaging of colorectal neoplasia. 16 Experienced CLE urologists obtained substantial to nearly perfect levels of agreement for CLE and WLC+CLE with κ of 0.80 while also maintaining the level of sensitivity and specificity compared to WLC. CLE outperformed WLC for the experienced CLE urologists group with respect to interobserver agreement. The result suggests that for cancer diagnosis, CLE may provide added value compared to WLC since it has greater interobserver reliability without sacrificing diagnostic accuracy. In short, the results indicate that a two-hour training session may enable novice users to achieve similar levels of interobserver reliability and diagnostic accuracy as WLC for cancer diagnosis, while greater reliability is seen for CLE compared with WLC in experienced CLE urologists. 9

Page 10 of 20 10 In regards to cancer grading, the interobserver agreement for cancer grading was in the fair to moderate range for CLE, WLC and WLC+CLE, which was generally lower than for cancer diagnosis. No clear patterns were noted on diagnostic accuracy. The results suggest that CLE may not be as reliable for cancer grading when compared with cancer diagnosis. The interobserver agreement for cancer grading using CLE is similar to pathology literature using the 2004 WHO classification system. May et al. 14 reported κ of 0.30-0.52 for interobserver agreement and van Rhijn et al. 13 reported κ of 0.14-0.58 and 0.55-0.81 for interobserver and intraobserver agreement, respectively. Moreover, a comparison of the cancer grading results in our study derived from the electronic medical records (by multiple pathologists) to the results of a single pathologist (R.V.R.) with expertise in bladder cancer showed a κ of 0.58. These findings illustrate the inherent challenges of bladder cancer grading with CLE or standard pathology. Our study has several limitations. First, our interobserver agreement for cancer grading analysis was a subset analysis of the confirmed cancers from the original data set. When reviewing the sequences, the observers were given the option to grade the sequences as benign, LG, or HG, rather than simply LG or HG. The additional choice may have contributed to the overall lower interobserver agreement compared to cancer diagnosis. Nevertheless, the occurrences were few, and as the data analysis reflects a more clinically relevant and practical scenario of clinicians grading an unknown lesion, the study was designed accordingly. Second, there may be selection bias of the CLE video sequences, which were edited offline and chosen non-randomly by a member of our team with the subjective criteria of image quality (fair to good) and roughly equal distribution of benign, LG, and HG lesions. This individual did not participate as an observer for the study. Third, there may be recall bias from the experienced CLE urologists who acquired the original CLE sequences. The use of an independent member from our team to select the images and video sequences mitigates, but does not eliminate, this 10

Page 11 of 20 11 potential bias. Fourth, an inherent limitation of using CLE for bladder cancer diagnosis is the reliance on microarchitectural and cellular features, whereas pathology utilizes additionally nuclear morphology (e.g. size, mitotic figures). Nuclear features are not routinely seen under CLE, as fluorescein is used as the contrast agent, which stains the extracellular matrix nonspecifically. 5,6 Overall, novice CLE observers demonstrated the ability to use CLE as an adjunct to WLC to diagnose bladder cancer following a brief training. Nonclinical researchers with no clinical training were able to diagnose bladder cancer to a comparable level as clinically trained novice CLE observers, highlighting the adoptability and translatability of the CLE technology to a wide range of novice users. Our results indicate that further studies are warranted that refines the CLE features used in diagnosing and grading bladder cancer as well as multi-center studies that validate the translatability of this study. Future directions include prospective multi-center studies to investigate the overall clinical utility of CLE, as well as cost-benefit analyses, which will be necessary for widespread adoption of CLE for bladder cancer diagnosis and grading. In addition, CLE, as a microscopic imaging modality, may be combined with other new macroscopic imaging technologies (i.e. photodynamic diagnosis and narrow band imaging) already in clinical use to improve the overall optical diagnosis of bladder cancer. 9 Conclusion: CLE is an adoptable technology for novice CLE observers following a training session with moderate interobserver agreement and diagnostic accuracy similar to WLC alone. Experienced CLE observers may be capable of achieving substantial levels of agreement for cancer diagnosis that is markedly higher than with WLC alone. Fair to moderate levels of agreement 11

Page 12 of 20 12 are achieved for cancer grading, although literature suggests the variability may in part be attributable to the grading classification system. Tables and Figures: FIG 1. Computer-based, interactive training module for confocal laser endomicroscopy of the bladder (A) CLE Training: Observers were trained to identify six key CLE features. (B) Teaching set: Teaching set consisting of fifteen video sequences to iteratively practice diagnosing and grading CLE sequences. (C) Validation set: All observer groups reviewed and diagnosed CLE sequences. The experienced CLE and novice CLE urologist groups continued on to diagnose two additional sets of WLC and WLC+CLE images. TABLE 1. DIAGNOSIS TABLE. TABLE 2. INTEROBSERVER AGREEMENT OF FEATURES OBSERVED ON CLE. TABLE 3. INTEROBSERVER AGREEMENT AND DIAGNOSTIC ACCURACY FOR CANCER DIAGNOSIS. TABLE 4. INTEROBSERVER AGREEMENT AND DIAGNOSTIC ACCURACY FOR CANCER GRADING. Acknowledgement: The authors would like to acknowledge our colleagues who participated in the study (H.G., J.D.B., C.V.C., M.E., E.S., M.S., D.B., R.M., C.Z., H.R., J.H., S.O., and A.S.). We also thank Mauna Kea Technologies for technical support and helpful discussions. This work was supported in part by the US National Institutes of Health (NIH) R01 CA160986 (J.C.L.). Disclosure Statement: No competing financial interests exist. References: 12

Page 13 of 20 1. Thiberville L, Salaün M, Lachkar S, et al. Human in vivo fluorescence microimaging of the alveolar ducts and sacs during bronchoscopy. Eur. Respir. J. 2009; 33: 974 985. 2. Dunbar KB, Okolo P 3rd, Montgomery E, et al. Confocal laser endomicroscopy in Barrett s esophagus and endoscopically inapparent Barrett s neoplasia: a prospective, randomized, double-blind, controlled, crossover trial. Gastrointest. Endosc. 2009; 70: 645 654. 3. Pech O, Rabenstein T, Manner H, et al. Confocal laser endomicroscopy for in vivo diagnosis of early squamous cell carcinoma in the esophagus. Clin. Gastroenterol. Hepatol. 2008; 6: 89 94. 4. Goetz M, Kiesslich R, Dienes H-P, et al. In vivo confocal laser endomicroscopy of the human liver: a novel method for assessing liver microarchitecture in real time. Endoscopy 2008; 40: 554 562. 5. Wu K, Liu J-J, Adams W, et al. Dynamic real-time microscopy of the urinary tract using confocal laser endomicroscopy. Urology 2011; 78: 225 231. 6. Sonn GA, Jones S-NE, Tarin TV, et al. Optical biopsy of human bladder neoplasia with in vivo confocal laser endomicroscopy. J. Urol. 2009; 182: 1299 1305. 7. Sonn GA, Mach KE, Jensen K, et al. Fibered confocal microscopy of bladder tumors: an ex vivo study. J. Endourol. 2009; 23: 197 201. 8. Adams W, Wu K, Liu J-J, et al. Comparison of 2.6- and 1.4-mm imaging probes for confocal laser endomicroscopy of the urinary tract. J. Endourol. 2011; 25: 917 921. 9. Liu J-J, Droller MJ and Liao JC. New optical imaging technologies for bladder cancer: considerations and perspectives. J. Urol. 2012; 188: 361 368. 10. Robinson JP. Chapter 4 Principles of confocal microscopy. In: Methods in Cell Biology. Edited by HAC Zbigniew Darzynkiewicz. Vol Volume 63, Part A. Academic Press 2001; pp 89 106. 11. Helmchen F. Miniaturization of fluorescence microscopes using fibre optics. Exp Physiol 2002; 87: 737 745. 12. Viera AJ and Garrett JM. Understanding interobserver agreement: the kappa statistic. Fam Med 2005; 37: 360 363. 13. van Rhijn BWG, van Leenders GJLH, Ooms BCM, et al. The Pathologist s Mean Grade Is Constant and Individualizes the Prognostic Value of Bladder Cancer Grading. European Urology 2010; 57: 1052 1057. 14. May M, Brookman-Amissah S, Roigas J, et al. Prognostic Accuracy of Individual Uropathologists in Noninvasive Urinary Bladder Carcinoma: A Multicentre Study Comparing the 1973 and 2004 World Health Organisation Classifications. European Urology 2010; 57: 850 858. 15. Tekes A, Kamel I, Imam K, et al. Dynamic MRI of Bladder Cancer: Evaluation of Staging Accuracy. AJR 2005; 184: 121 127. 13 13

Page 14 of 20 16. Gómez V, Buchner A, Dekker E, et al. Interobserver agreement and accuracy among international experts with probe-based confocal laser endomicroscopy in predicting colorectal neoplasia. Endoscopy 2010; 42: 286 291. 17. Wallace MB, Sharma P, Lightdale C, et al. Preliminary accuracy and interobserver agreement for the detection of intraepithelial neoplasia in Barrett s esophagus with probebased confocal laser endomicroscopy. Gastrointestinal Endoscopy 2010; 72: 19 24. 18. Landis JR and Koch GG. The Measurement of Observer Agreement for Categorical Data. Biometrics 1977; 33: 159 174. 19. Montironi R and Lopez-Beltran A. The 2004 WHO classification of bladder tumors: a summary and commentary. Int. J. Surg. Pathol. 2005; 13: 143 153. Abbreviations Used CLE = confocal laser endomicroscopy HG = high grade LG = low grade p a = percent agreement Sn = sensitivity Sp = specificity VAPAHCS = Veterans Affairs Palo Alto Health Care Systems WLC = white light cystoscopy 14 14

Cellular Architectural Page 15 of 20 TABLE 1. DIAGNOSIS TABLE TABLE 1. DIAGNOSIS TABLE. 15 Benign Cancer High grade Normal Papilloma Inflammatory Low grade Papillary CIS Flat vs. Flat Papillary Flat Papillary Papillary Flat papillary Organized, Organized, Loose cells in Organization Organized normal increased Disorganized Disorganized LP thickness thickness Vascular Capillary Fibrovascular Fibrovascular Torturous n/a n/a features network in LP stalk stalk vessels in Small, Morphology Monomorphic Monomorphic Monomorphic Pleomorphic Pleomorphic monomorphic Small, Cohesiveness Cohesive Cohesive clustered Cohesive Not cohesive Not cohesive cells Borders Distinct Distinct Distinct Distinct Indistinct Indistinct 15

TABLE 2. INTEROBSERVER AGREEMENT OF FEATURES OBSERVED ON CLE p a κ (95% CI) Microarchitectural Flat vs. papillary 77% 0.54 (0.19 0.89) Tissue organization 87% 0.70 (0.35 1.00) Vascular features 81% 0.74 (0.49 0.99) Cellular Morphology 84% 0.66 (0.31 1.00) Cohesiveness 74% 0.33 (-0.03 0.68) Borders 87% 0.74 (0.38 1.00) p a = percent agreement. TABLE 2. INTEROBSERVER AGREEMENT OF FEATURES OBSERVED ON CLE. 16 16 Page 16 of 20

Page 17 of 20 TABLE 3. INTEROBSERVER AGREEMENT AND DIAGNOSTIC ACCURACY FOR CANCER DIAGNOSIS CLE WLC WLC + CLE Interobserver Agreement p a κ (95% CI) p a κ (95% CI) p a κ (95% CI) Experienced CLE urologists 90% 0.80 (0.45 1.00) 74% 0.46 (0.10 0.81) 90% 0.80 (0.45 1.00) Novice CLE urologists 77% 0.55 (0.46 0.64) 78% 0.54 (0.45 0.63) 80% 0.59 (0.50 0.68) Pathologists 81% 0.61 (0.47 0.76) - - - - Nonclinical researchers 77% 0.49 (0.38 0.60) - - - - Diagnostic Accuracy Sn Sp Sn Sp Sn Sp Experienced CLE urologists 84% 88% 89% 83% 89% 88% Novice CLE urologists 75% 81% 86% 79% 89% 83% Pathologists 84% 81% - - - - Nonclinical researchers 89% 73% - - - - p a = percent agreement; Sn = sensitivity; Sp = specificity TABLE 3. INTEROBSERVER AGREEMENT AND DIAGNOSTIC ACCURACY FOR CANCER DIAGNOSIS. 17 17

TABLE 4. INTEROBSERVER AGREEMENT AND DIAGNOSTIC ACCURACY FOR CANCER GRADING CLE WLC WLC + CLE Interobserver Agreement p a κ (95% CI) p a κ (95% CI) p a κ (95% CI) Experienced CLE urologists 68% 0.64 (0.41 0.87) 74% 0.67 (0.35 0.98) 53% 0.33 (-0.03 0.70) Novice CLE urologists 53% 0.39 (0.31 0.47) 66% 0.50 (0.41 0.60) 61% 0.49 (0.41 0.57) Pathologists 65% 0.55 (0.43 0.68) - - - - Nonclinical researchers 63% 0.56 (0.47 0.65) - - - - Diagnostic Accuracy Sn Sp Sn Sp Sn Sp Low Grade Experienced CLE urologists 50% 94% 55% 50% 64% 81% Novice CLE urologists 52% 75% 59% 54% 55% 65% Pathologists 50% 66% - - - - Nonclinical researchers 49% 73% - - - - High Grade Experienced CLE urologists 75% 64% 50% 73% 69% 73% Novice CLE urologists 46% 74% 44% 76% 54% 67% Pathologists 50% 66% - - - - Nonclinical researchers 63% 60% - - - - p a = percent agreement; Sn = sensitivity; Sp = specificity TABLE 4. INTEROBSERVER AGREEMENT AND DIAGNOSTIC ACCURACY FOR CANCER GRADING. 18 18 Page 18 of 20

Page 19 of 20 19 FIG 1. Computer-based, interactive training modules for confocal laser endomicroscopy of the bladder. (A) CLE Training: Observers were trained to identify six key CLE features. (B) Teaching set: Teaching set consisting of fifteen video sequences to iteratively practice diagnosing and grading CLE sequences. (C) Validation set: All observer groups reviewed and diagnosed CLE sequences. The experienced CLE and novice CLE urologist groups continued on to diagnose two additional sets of WLC and WLC+CLE images. 19

Page 20 of 20 20 Tables and Figures: FIG 1. Computer-based, interactive training module for confocal laser endomicroscopy of the bladder (A) CLE Training: Observers were trained to identify six key CLE features. (B) Teaching set: Teaching set consisting of fifteen video sequences to iteratively practice diagnosing and grading CLE sequences. (C) Validation set: All observer groups reviewed and diagnosed CLE sequences. The experienced CLE and novice CLE urologist groups continued on to diagnose two additional sets of WLC and WLC+CLE images. TABLE 1. DIAGNOSIS TABLE. TABLE 2. INTEROBSERVER AGREEMENT OF FEATURES OBSERVED ON CLE. TABLE 3. INTEROBSERVER AGREEMENT AND DIAGNOSTIC ACCURACY FOR CANCER DIAGNOSIS. TABLE 4. INTEROBSERVER AGREEMENT AND DIAGNOSTIC ACCURACY FOR CANCER GRADING. 20