Automated Coding of Key Case Identifiers from Text-Based Electronic Pathology Reports

Similar documents
Checklist; Anus: Excisional Biopsy Anus: Excisional Biopsy 1/1/ Checklist; Anus: Resection Anus: Resection 1/1/2005

Definition of Synoptic Reporting

AJCC 7th Edition Handbook Errata as of 9/21/10

Kidney Case 1 SURGICAL PATHOLOGY REPORT

MCR: MANAGEMENT OF 2018 CHANGES. By: Maricarmen Traverso-Ortiz MPH, CGG, CTR

155.2 Malignant neoplasm of liver not specified as primary or secondary. C22.9 Malignant neoplasm of liver, not specified as primary or secondary

Automatic Extraction of Synoptic Data. George Cernile Artificial Intelligence in Medicine AIM

Kyle L. Ziegler, CTR. California Cancer Registry U.C. Davis Health System

Neoplasia part I. Dr. Mohsen Dashti. Clinical Medicine & Pathology nd Lecture

MEDICAL POLICY Gene Expression Profiling for Cancers of Unknown Primary Site

ANNUAL CANCER REGISTRY REPORT-2005

DATA STANDARDS AND QUALITY CONTROL MEMORANDUM DSQC #

2011 to 2015 New Cancer Incidence Truman Medical Center - Hospital Hill

Format Of ICD-O Terms In Numerical List Each topographic and morphologic term appears only once The first listed term in Bold Type is the Preferred Te

CODING PRIMARY SITE. Nadya Dimitrova

Cancer in Estonia 2014

Radiology Pathology Conference

Bladder Case 1 SURGICAL PATHOLOGY REPORT. Procedure: Cystoscopy, transurethral resection of bladder tumor (TURBT)

performed to help sway the clinician in what the appropriate diagnosis is, which can substantially alter the treatment of management.

SEER Advanced Topic 2018 Presentation. EOD 2018 and SS2018 Jennifer Ruhl

George Cernile Artificial Intelligence in Medicine Toronto, ON. Carol L. Kosary National Cancer Institute Rockville, MD

Q: In order to use the code 8461/3 (serous surface papillary) for ovary, does it have to say the term "surface" on the path report?

Take Home Quiz 1 Please complete the quiz below prior to the session. Use the Multiple Primary and Histology Rules

Case #1: 75 y/o Male (treated and followed by prostate cancer oncology specialist ).

Annual Report. Cape Cod Hospital and Falmouth Hospital Regional Cancer Network Expert physicians. Quality hospitals. Superior care.

4/10/2018. SEER EOD and Summary Stage. Overview KCR 2018 SPRING TRAINING. What is SEER EOD? Ambiguous Terminology General Guidelines

A Practicum Approach to CS: GU Prostate, Testis, Bladder, Kidney, Renal Pelvis. Jennifer Ruhl, RHIT, CCS, CTR Janet Stengel, RHIA, CTR

Truman Medical Center-Hospital Hill Cancer Registry 2014 Statistical Summary Incidence

American Cancer Society Estimated Cancer Deaths by Sex and Age (years), 2013

FINALIZED SEER SINQ QUESTIONS April July, 2017

CANCER REPORTING IN CALIFORNIA: ABSTRACTING AND CODING PROCEDURES California Cancer Reporting System Standards, Volume I

Appendix H 2018 FCDS Required Site Specific Data Items (SSDIs)

HOSPITAL-BASED CANCER REGISTRY ANNUAL REPORT 2011

Estimated Minnesota Cancer Prevalence, January 1, MCSS Epidemiology Report 04:2. April 2004

Carcinoma of the Urinary Bladder Histopathology

AMERICAN JOINT COMMITTEE ON CANCER AJCC CANCER STAGING

WLH Tumor Frequencies between cohort enrollment and 31-Dec Below the Women Lifestyle and Health tumor frequencies are tabulated according to:

We re on the Web! Visit us at VOLUME 19 ISSUE 1. January 2015

Cancer in Utah: An Overview of Cancer Incidence and Mortality from

CODING TUMOUR MORPHOLOGY. Otto Visser

WLH Tumor Frequencies between cohort enrollment and 31-Dec Below the Women Lifestyle and Health tumor frequencies are tabulated according to:

2018 New Required Data Items for Hospitals

incidence rate x 100,000/year

Comprehensive cancer cover

Florida Cancer Data System STAT File Documentation Version 2019

FINALIZED SEER SINQ QUESTIONS

Oncology 101. Cancer Basics

SHN-1 Human Digestive Panel Test results

Radiation Oncology Study Guide

ACR TXIT TM EXAM OUTLINE

Cancer in Colorado Incidence, Mortality, and Survival

Comprehensive cancer cover

Greater Manchester and Cheshire HPB Unit Guidelines for the Assessment & Management of Hepatobiliary and Pancreatic Disease Chapter 14

Case Scenario 1. 4/19/13 Bone Scan: No scintigraphic findings to suggest skeletal metastases.

THE BURDEN OF CANCER IN NEBRASKA: RECENT INCIDENCE AND MORTALITY DATA

NAACCR Webinar Series 1

Presentation material is for education purposes only. All rights reserved URMC Radiology Page 1 of 98

Procedures Needle Biopsy Transurethral Prostatic Resection Suprapubic or Retropubic Enucleation (Subtotal Prostatectomy) Radical Prostatectomy

Early Cancer Care FAQ

Oncology Centre Research Unit TUMOR REGISTRY

MT09 - Normal Human Tissue Microarray, FDA

Tumor Comparison. NAACCR Death Clearance Manual Update. Bobbi Jo Matt, BS RHIT CTR NAACCR 2013 Conference, Austin, TX June 12, 2013

SEER Summary Stage Still Here!

Tumor Markers Yesterday, Today & Tomorrow. Steven E. Zimmerman M.D. Vice President & Chief Medical Director

UICC TNM 8 th Edition Errata

CDC & Florida DOH Attribution

MALIGNANT NEOPLASMS OF THE BREAST MALIGNANT NEOPLASMS OF FEMALE GENITAL ORGANS

MPH Quiz. 1. How many primaries are present based on this pathology report? 2. What rule is this based on?

Epidemiology in Texas 2006 Annual Report. Cancer

Instructions for Coding Grade for 2014+

6/5/2010. Renal vein invasion & Capsule Penetration (T3a) Adrenal Gland involvement (T4 vs. M1) Beyond Gerota s Fascia? (?T4).

There is a Reason for Everything including Changes

CANCER FACTS & FIGURES For African Americans

came from a carcinoma and in 12 from a sarcoma. Ninety lesions were intrapulmonary and the as the chest wall and pleura. Details of the primary

National Cancer Statistics in Korea, 2014

Uterine Cervix. Protocol applies to all invasive carcinomas of the cervix.

2018 Grade PEGGY ADAMO, RHIT, CTR OCTOBER 11, 2018

CASEFINDING. KCR Abstractor s Training

The main credit of this work would go to the team of cancer epidemiology department without them this report would have never come into light.

A215- Urinary bladder cancer tissues

2007 New Data Items. Slide 1. In this presentation we will discuss five new data items that were introduced with the 2007 MPH Coding Rules.

Question: If in a particular case, there is doubt about the correct T, N or M category, what do you do?

This form may provide more data elements than required for collection by standard setters such as NCI SEER, CDC NPCR, and CoC NCDB.

CONSULTATION DURING SURGERY / NOT A FINAL DIAGNOSIS. FROZEN SECTION DIAGNOSIS: - A. High grade sarcoma. Wait for paraffin sections results.

Outcomes Report: Accountability Measures and Quality Improvements

FINALIZED SEER SINQ S MAY 2012

Cancer Association of South Africa (CANSA)

A Time- and Resource-Efficient Method for Annually Auditing All Reporting Hospitals in Your State: the Inpatient & Outpatient Hospital Discharge Files

What s New for 8 th Edition

Outcomes Report: Accountability Measures and Quality Improvements

Guidelines for Assigning Summary Stage 2000

A301 VI- Various cancer tissues with corresponding normal tissues

Urinary Bladder, Ureter, and Renal Pelvis

BIO 137 Human Anatomy & Physiology I. Laboratory Manual. Laboratory #1: Measurements, Body Organization and Anatomical Systems

UICC TNM 8 th Edition Errata

The European Commission s science and knowledge service. Joint Research Centre

Cancer A Superficial Introduction

Case Scenario 1: Breast

Q&A Session NAACCR Webinar Series Collecting Cancer Data: Pancreas January 05, 2012

Multiple Primary and Histology Site Specific Coding Rules URINARY. FLORIDA CANCER DATA SYSTEM MPH Urinary Site Specific Coding Rules

Transcription:

Automated Coding of Key Case Identifiers from Text-Based Electronic Pathology Reports George Cernile Artificial Intelligence in Medicine, Inc NAACCR 2017 Conference Albuquerque New Mexico, USA June 22, 2017

Introduction Coding problem - background Machine learning vs Coded rules System performance Confidence measures Conclusion

Background Volume of electronic pathology (E-Path) reports at central cancer registries increasing rapidly Key elements in E-Path reports reside in narrative text E-Path utility is limited by staff resources available to manually review large volumes of reports Reliable automated coding (auto-coding) of even minimal diagnostic elements would improve efficiency and utility E-Path currently process ~ 15 million path reports per year for cancer report selection with accuracy of 98 and 99 sensitivity and specificity

Reliable Coding of key data elements in E-Path reports would enhance: Rapid case ascertainment Case level linkages Case finding Increased abstracting efficiencies Audits Early assessments of incidence rates

The Problem Currently done manually - requires review of path reports takes time Automating would save labor, improve data acquisition Requires trained registrar to perform task Trained registrars may not always agree on the final answer Automated system needs to perform to 95% accuracy to be effective

The Coding problem Can an automated system provide the SEER Site coding designations? Use defined SEER Rules Compare against large reference set and identify discrepancies. Create new rules and re-test Measure confidence - Identify high confidence codes Provide explanation for codes Slide 6

Automated coding wizard GROSS DESCRIPTION: Specimen is received in two parts Parts 1: Specimen is labeled "left lobe and left isthmus". Specimen is received fresh for intraoperative consultation and consists of a 6 gm, 3.9 x 2.5 x 2 cm lobe of thyroid gland. Specimen is inked and serially sectioned to reveal a 1.7 cm cirrhotic mass abutting the surrounding surface focally. A fine M-80103 needle aspiration is performed. Representative section of the mass is submitted for frozen. The remnant of the frozen is submitted in cassette 1 FS. The remainder of the thyroid parenchyma is dark red and homogeneous. There is no additional nodule grossly identified. The remainder of the specimen is submitted: Cassette A-D - mass, Cassette E-G - the remainder of the thyroid gland, in a total of eight cassettes. M-82603-95% M-80503 C73.9-95% C77.9 Parts 2: Specimen is labeled "paratracheal lymph node". Specimen is received fresh and consists of a 0.4 x 0.2 x 0.2 cm ovoid tan nodule. Entire specimen is submitted in one cassette. ZZ/FA:bds # DIAGNOSIS: 1. THYROID, LEFT LOBE AND ISTHMUS (RESECTION): CARCINOMA OF THE THYROID, 1.7 CM, WITH INVASION OF THE THYROID CAPSULE. TUMOR APPROACHES TO WITHIN FRACTIONS OF A MILLIMETER OF THE EXTERNAL SURFACE. NO INVASION OF THE SURROUNDING SKELETAL MUSCLE SEEN. 2. PARATRACHEAL LYMPH NODE (BIOPSY): METASTATIC CARCINOMA, MICROSCOPIC FOCI, SUBCAPSULAR SINUS OF LYMPH NODE, SEE COMMENT. Explanation: If have C73.9 (Thyroid NOS) + M80503 (Papillary carcinoma) Then add M82603 (Papillary Carcinoma of thyroid)

Coding: What if we can use this data? Unit Name Additional Pathologic Findings AFP ALK Arterial Invasion b-hcg Bloom Richardson Grade Bloom Richardson Score BRAF Breslow's depth CA-125 Calcification Calcitonin CEA Clark's Staging Depth of Invasion Distance of Tumor from Anterior Margin Distance of Tumor from Closest margin Distance of Tumor from Deep Margin Distance of Tumor from Inferior Margin Distance of Tumor from Lateral Margin Distance of Tumor from Medial Margin Distance of Tumor from Posterior Margin Distance of Tumor from Superior Margin Distant Metastasis (pm) EGFR ER - Allred Score ER Status Examination type Extranodal Extension FISH Resu+B31:B60lts Focality Gleason Grade - Primary Pattern Gleason Grade - Secondary Pattern Gleason Grade - Tertiary Pattern Gleason Score Grade of dysplasia Grading System HER2 % cells stained HER2 gene copy number HER2 Result HER2:CEP 17 ratio Histologic Grade Histologic Type HX ICD-O-3 Morphology AIM Inc. ICD-O-3 Topography AIM Inc. Implants Ki-67 KRAS Laterality LDH Lymph Nodes Examined Lymph Nodes Negative Lymph Nodes Positive Lymphatic Invasion Lymphovascular (LV) Invasion Miscellaneous Terms Mitotic Count Nottingham Grade Nottingham Score Nuclear Pleomorphism Organ(s) Included Pathologic Staging (FIGO) Perineural Invasion Periprostatic Fat Invasion PK Pleural Invasion Positive Cancer Terms PR - Allred Score PR Status Primary Tumor (pt) Procedure PSA Regional Lymph Nodes (pn) S-100 Seminal Vesicle Invasion Site ID Specimen Greatest dimension Specimen Size Specimen Type Specimen Weight Stage Treatment Tubule Formation Tumor Configuration Tumor Site and Extent Tumor Size - Greatest dimension (cm) Tumor Weight Venous (Large Vessel) Invasion

Machine learning approach preliminary results Run experiments using previously coded reports as the reference set. Extract UNITS data from each report and combine with reference data to create a training set. Run experiments to auto generate coding rules and test for accuracy. Use those rules in a production system as a Coding Wizard. Slide 9

Experiments machine generated rules Input vector generated by UNITS Engine for each report ~ 3000 reports Histologic type1 Histologic type2 C-code1 C-code2 M-code1 M-code2 M-code3 Reference M-code Rules produced by machine learning with estimated confidence values 1. HistologicType_1=Adenocarcinoma NOS, TopographyCode_1=C61.9 49 ==> M_CODE=M-81403 49 acc:(0.98136) 2. TopographyCode_1=C61.9 Morphology, Code_1=M-81403 44 ==> M_CODE=M-81403 44 acc:(0.97935) 3. HistologicType_1=Invasive Adenocarcinoma Morphology, Code_1=M-81403 26 ==> M_CODE=M-81403 26 acc:(0.96584) 4. MorphologyCode_1=M-80001 Morphology, Code_2=M-81403 21 ==> M_CODE=M-81403 21 acc:(0.95821) 5. HistologicType_1=Adenocarcinoma NOS, HistologicType_2=Adenocarcinoma NOS 19 ==> M_CODE=M-81403 19 acc:(0.95412) 6. HistologicType_1=Adenocarcinoma NOS, TopographyCode_2=C77.5 17 ==> M_CODE=M-81403 17 acc:(0.94916) 7. HistologicType_1=Adenocarcinoma NOS, MorphologyCode_2=M-80003 17 ==> M_CODE=M-81403 17 acc:(0.94916) 8. HistologicType_1=Invasive Adenocarcinoma, TopographyCode_1=C61.9 16 ==> M_CODE=M-81403 16 acc:(0.94626) 9. TopographyCode_1=C61.9, MorphologyCode_1=M-80003 16 ==> M_CODE=M-81403 16 acc:(0.94626) Slide 10

Testing the rules on new data Site Training set classifier Accuracy Registry 1 90% NaiveBase 68% BayesNet 77% J48 69% 100% NaiveBayes 74% BayesNet 81% Registry 2 90% NaiveBayes 65% BayesNet 65% J48 67% Some issues Preliminary experiments with small data set need more reference data We are not 100% confident of the accuracy of the coding because of age of data set. Slide 11

Machine learning of coding rules? Already know the logic have a model, the rules are known Need far too many referenced results not always accurate What happens when rules change? Need another ton of training data. New data still suffers in accuracy Not really practical for coding problem

Can we code the rules directly? Direct coding of known rules allows explanations useful coding assistant, allows coder to understand the automated coding decision New coding rules can be updated immediately in the system no need to wait for training data debugging is easier since we know which rules are being applied Confidence can be provided Allows high confidence scores to by-pass manual review

The Coding Model Multiple Primary and Histology Coding Rules January 01, 2007 National Cancer Institute Surveillance Epidemiology and End Results Program Bethesda, MD

Number of coding rules per site Specific Flow-Chart Upper Aerodigestive Tract Colon Lung Melanoma of the Skin Breast Kidney Urinary Bladder & Ureter & Renal Pelvis Brain (Benign Brain) Brain (Malignant Brain) Bone Marrow Hodgkin Lymphoma Gastrointestinal Lymphoma Non-Hodgkin Lymphoma Number of Rules (Single tumor and MULTIPLE TUMORS ABSTRACTED AS A SINGLE PRIMARY) 12 Rules 24 Rules 13Rules 10Rules 29 Rules 13 Rules 15Rules 10 Rules 11 Rules 31Rules + 9 rules for primary site 31Rules + 9 rules for primary site 31Rules + 9 rules for primary site 31Rules + 9 rules for primary site

Common sites 31 Rules Prostate Gland Fallopian Tube Rhabdomyosarcoma Wilms Tumor Soft Tissue and Thyroid Gland Gallbladder Carcinoma of the Skin Bone Uterine Cervix Liver Small Intestine Appendix Adrenal Gland Neuroblastoma Stomach Vulva Heart Ovary Testis Penis Thymoma and Thymic Anus Pancreas (Endocrine) Carcinoma Rectum Ampulla of Vater Pancreas (Exocrine) Thoracic Mesothelioma Unknown Endometrium Peritoneum Trophoblast Esophagus PNET - Ewing Sarcoma Uveal Melanoma Extrahepatic Bile Ducts Retinoblastoma Vagina

Breast Histology Coding Rules H6 Is there a combination of intraductal carcinoma and two or more specific intraductal types OR are there two or more specific intraductal carcinomas? yes Code 8523/2 (intraductal carcinoma mixed with other types of in situ carcinoma) (Table 3). 1. Use Table 1 to identify the histologies. H7 no Is there in situ lobular (8520) and any in situ carcinoma other than intraductal carcinoma (Table 1)? yes Code 8524/2 (in situ lobular mixed with other types of in situ carcinoma) (Table 3). 2. Change the behavior to 2(in situ) in accordance with the ICD-O-3 matrix principle (ICD-O-3 Rule F). H8 no Is there in situ lobular (8520) and any in situ carcinoma other than intraductal carcinoma (Table 1)? yes Code 8255/2 (adenocarcinoma in situ with mixed subtypes (Table 3).

The first Codex project - 2010 3 professional CTRs coded random sample of pathology reports into spreadsheet (N=1000) Results provided to AIM to develop and train software Initial version only looked at M and C codes extracted from pathology reports. Accuracy was not sufficient for automated use

Inter-Coder Reliability (All Coders Agree) (N=600) Reportable 494 82.3% SEER Site 469 78.2% Histology 441 73.5% Behavior 503 83.8% Topography 384 64.0% Topography (Major Site) 44 73.7% Laterality 451 75.2% Grade 439 73.2%

New Approach AIMS AI engine has enhanced text analytics capability. Expanded data items Heuristics for confidence scoring of answers Uses all information in path report to determine site and procedure, not only ICD-O- 3 codes More accurate determination of SEER site buckets

Measuring System performance New York State Cancer Registry provided one year s sample reports that were already coded 2011. Comparing automated results with reference set showed various discrepancies 17 questions were compiled for NYSCR coder to help explain discrepancies.

Results Morphology/behavior Site Old stat Coder corrected multiple diagnosis reports New total number of reports Breast 85% 98% 7 2592 Melanoma 48% 95% / 1953 Prostate 97% 99% / 499 Colon 75% 98% 17 294 Bladder 68% 74% 1 238 Endometrium 66% 78% / 211 Lung 79% 86% / 162 Thyroid 80% 99% / 161 Rectum 72% 98% 5 87 Stomach 71% 83% / 87 Upperaero 81% 95% / 60 UterineCervix 68% 85% / 41 Liver 89% 93% / 28 Esophagus 78% 92% / 24 Vagina 75% 88% / 17 Ovary 83% 100% / 13 Vulva 82% 92% / 12 Pancreasexcrine 90% 100% / 11 Kidney 56% 90% / 10 Bone Marrow Average N/A 76% Average 75% 91% / 1433

Results Primary Site Site Old stat Coder corrected multiple diagnosis reports Total number of reports Breast 79% 87% 7 2592 Melanoma 71% 74% 8 1953 Prostate 99% 100% / 499 Colon 62% 80% 17 294 Bladder 70% 97% 1 238 Endometrium 89% 96% / 211 Lung 81% 88% / 162 Thyroid 99% 100% / 161 Rectum 60% 62% 5 87 Stomach 64% 74% / 87 Upperaero 80% 86% 2 60 UterineCervix 55% 70% 1 41 Liver 78% 89% / 28 Esophagus 74% 79% / 24 Vagina 81% 82% / 17 Ovary 75% 77% / 13 Vulva 100% 100% / 12 Pancreasexcrine 70% 73% / 11 Kidney 90% 90% / 10 Bone Marrow 91% 91% / 1433 Average 78% Average 85% Laterality 97% when site is correct

Confidence Scoring When the auto generated heuristic score is above.9, there is a 95% confidence that the accuracy is between these values: Site Lower Upper Lung 0.84 0.97 Breast 0.94 0.99 Prostate 0.97 0.995 Colon 0.92 0.997 Rectum 1.00 1.00 Endometrium 0.81 0.94 Thyroid 1.00 1.00 Melanoma - skin 0.94 0.99

AI engine development and use AIM s AI Technology AI Engine E-Path Concept identification and inference rules for: Case selection Diagnostic imaging classification. CNS/Chest Concept search Knowledge bases Inference rules API Abrevio RCA Discrete data extraction Medical report processing Patient data consolidation Discrete data extraction and standardization Automated study matching Automated results distribution Rigorous QC methods Machine learning Conversion of text to discrete data for ML training Incorporate decision rules into inference engine. Codex (Automated coding for cancer abstract) AIM Product Description - AIM, 2017 Page 27

Where can we use machine learning? One good example is in the identification of extraneous information Citations, reference material etc.. These sometimes contain what may be interpreted as patient data. Use a Bayesian Spam filter approach to identify this text and block it.

Example of Spam ccc et al. Lymphoma 2008: 435 441 ddd al. Cancer.2013;369(25):2391-405 Tumor et al. Leukemia 23: 2007 The V617F mutation analysis can provide information helpful in distinguishing myeloproliferative disorders (MPDs) from reactive conditions, particularly secondary thrombocytosis and erythrocytosis. The incidence of the V617F mutation in MPDs in different studies ranges from 65-97% in polycythemia vera, from 41-57% in patients with essential thrombocythemia, and from 23-95% in patients with primary myelofibrosis. The same mutation was also found in roughly 20% of Ph-negative atypical CML, in more than 10% of CMML, in about 15% of patients with megakaryocytic AML (AML M7), 20% of patients with juvenile myelomonocytic leukemia (JMML), and acute lymphoblastic leukemia with t(9;12) (p24;p13).

Conclusion System shows high performance in several major sites (reduce manual coding effort) Coder variation exists, system is consistent With more feedback, the system will improve (we only had 1 round of coder feedback) High confidence reports can skip manual review Rule explanation feature will act a coding assistant. may even be used for new coder training

Acknowledgements New York State Cancer Registry Jovanka Harrison Schymura, Maria Sherman, Colleen Jamie Musco AIM Development Team

END

Inter-coder Reliability (Percent Agreement All Coders) Trial 1 (N=749) Topography (ICD-O-3) 84.2% 81.9% Histology (ICD-O-3) 85.6% 84.1% Behavior (ICD-O-3) 88.8% 91.8% SEER Site Group 89.1% 89.8% Tumor Grade 88.7% 73.2% Laterality 89.3% 87.7% Trial 2 [With Codex] (N=856)