Automated Coding of Key Case Identifiers from Text-Based Electronic Pathology Reports

Automated Coding of Key Case Identifiers from Text-Based Electronic Pathology Reports George Cernile Artificial Intelligence in Medicine, Inc NAACCR 2017 Conference Albuquerque New Mexico, USA June 22, 2017

Introduction Coding problem - background Machine learning vs Coded rules System performance Confidence measures Conclusion

Background Volume of electronic pathology (E-Path) reports at central cancer registries increasing rapidly Key elements in E-Path reports reside in narrative text E-Path utility is limited by staff resources available to manually review large volumes of reports Reliable automated coding (auto-coding) of even minimal diagnostic elements would improve efficiency and utility E-Path currently process ~ 15 million path reports per year for cancer report selection with accuracy of 98 and 99 sensitivity and specificity

Reliable Coding of key data elements in E-Path reports would enhance: Rapid case ascertainment Case level linkages Case finding Increased abstracting efficiencies Audits Early assessments of incidence rates

The Problem Currently done manually - requires review of path reports takes time Automating would save labor, improve data acquisition Requires trained registrar to perform task Trained registrars may not always agree on the final answer Automated system needs to perform to 95% accuracy to be effective

The Coding problem Can an automated system provide the SEER Site coding designations? Use defined SEER Rules Compare against large reference set and identify discrepancies. Create new rules and re-test Measure confidence - Identify high confidence codes Provide explanation for codes Slide 6

Automated coding wizard GROSS DESCRIPTION: Specimen is received in two parts Parts 1: Specimen is labeled "left lobe and left isthmus". Specimen is received fresh for intraoperative consultation and consists of a 6 gm, 3.9 x 2.5 x 2 cm lobe of thyroid gland. Specimen is inked and serially sectioned to reveal a 1.7 cm cirrhotic mass abutting the surrounding surface focally. A fine M-80103 needle aspiration is performed. Representative section of the mass is submitted for frozen. The remnant of the frozen is submitted in cassette 1 FS. The remainder of the thyroid parenchyma is dark red and homogeneous. There is no additional nodule grossly identified. The remainder of the specimen is submitted: Cassette A-D - mass, Cassette E-G - the remainder of the thyroid gland, in a total of eight cassettes. M-82603-95% M-80503 C73.9-95% C77.9 Parts 2: Specimen is labeled "paratracheal lymph node". Specimen is received fresh and consists of a 0.4 x 0.2 x 0.2 cm ovoid tan nodule. Entire specimen is submitted in one cassette. ZZ/FA:bds # DIAGNOSIS: 1. THYROID, LEFT LOBE AND ISTHMUS (RESECTION): CARCINOMA OF THE THYROID, 1.7 CM, WITH INVASION OF THE THYROID CAPSULE. TUMOR APPROACHES TO WITHIN FRACTIONS OF A MILLIMETER OF THE EXTERNAL SURFACE. NO INVASION OF THE SURROUNDING SKELETAL MUSCLE SEEN. 2. PARATRACHEAL LYMPH NODE (BIOPSY): METASTATIC CARCINOMA, MICROSCOPIC FOCI, SUBCAPSULAR SINUS OF LYMPH NODE, SEE COMMENT. Explanation: If have C73.9 (Thyroid NOS) + M80503 (Papillary carcinoma) Then add M82603 (Papillary Carcinoma of thyroid)

Coding: What if we can use this data? Unit Name Additional Pathologic Findings AFP ALK Arterial Invasion b-hcg Bloom Richardson Grade Bloom Richardson Score BRAF Breslow's depth CA-125 Calcification Calcitonin CEA Clark's Staging Depth of Invasion Distance of Tumor from Anterior Margin Distance of Tumor from Closest margin Distance of Tumor from Deep Margin Distance of Tumor from Inferior Margin Distance of Tumor from Lateral Margin Distance of Tumor from Medial Margin Distance of Tumor from Posterior Margin Distance of Tumor from Superior Margin Distant Metastasis (pm) EGFR ER - Allred Score ER Status Examination type Extranodal Extension FISH Resu+B31:B60lts Focality Gleason Grade - Primary Pattern Gleason Grade - Secondary Pattern Gleason Grade - Tertiary Pattern Gleason Score Grade of dysplasia Grading System HER2 % cells stained HER2 gene copy number HER2 Result HER2:CEP 17 ratio Histologic Grade Histologic Type HX ICD-O-3 Morphology AIM Inc. ICD-O-3 Topography AIM Inc. Implants Ki-67 KRAS Laterality LDH Lymph Nodes Examined Lymph Nodes Negative Lymph Nodes Positive Lymphatic Invasion Lymphovascular (LV) Invasion Miscellaneous Terms Mitotic Count Nottingham Grade Nottingham Score Nuclear Pleomorphism Organ(s) Included Pathologic Staging (FIGO) Perineural Invasion Periprostatic Fat Invasion PK Pleural Invasion Positive Cancer Terms PR - Allred Score PR Status Primary Tumor (pt) Procedure PSA Regional Lymph Nodes (pn) S-100 Seminal Vesicle Invasion Site ID Specimen Greatest dimension Specimen Size Specimen Type Specimen Weight Stage Treatment Tubule Formation Tumor Configuration Tumor Site and Extent Tumor Size - Greatest dimension (cm) Tumor Weight Venous (Large Vessel) Invasion

Machine learning approach preliminary results Run experiments using previously coded reports as the reference set. Extract UNITS data from each report and combine with reference data to create a training set. Run experiments to auto generate coding rules and test for accuracy. Use those rules in a production system as a Coding Wizard. Slide 9

Experiments machine generated rules Input vector generated by UNITS Engine for each report ~ 3000 reports Histologic type1 Histologic type2 C-code1 C-code2 M-code1 M-code2 M-code3 Reference M-code Rules produced by machine learning with estimated confidence values 1. HistologicType_1=Adenocarcinoma NOS, TopographyCode_1=C61.9 49 ==> M_CODE=M-81403 49 acc:(0.98136) 2. TopographyCode_1=C61.9 Morphology, Code_1=M-81403 44 ==> M_CODE=M-81403 44 acc:(0.97935) 3. HistologicType_1=Invasive Adenocarcinoma Morphology, Code_1=M-81403 26 ==> M_CODE=M-81403 26 acc:(0.96584) 4. MorphologyCode_1=M-80001 Morphology, Code_2=M-81403 21 ==> M_CODE=M-81403 21 acc:(0.95821) 5. HistologicType_1=Adenocarcinoma NOS, HistologicType_2=Adenocarcinoma NOS 19 ==> M_CODE=M-81403 19 acc:(0.95412) 6. HistologicType_1=Adenocarcinoma NOS, TopographyCode_2=C77.5 17 ==> M_CODE=M-81403 17 acc:(0.94916) 7. HistologicType_1=Adenocarcinoma NOS, MorphologyCode_2=M-80003 17 ==> M_CODE=M-81403 17 acc:(0.94916) 8. HistologicType_1=Invasive Adenocarcinoma, TopographyCode_1=C61.9 16 ==> M_CODE=M-81403 16 acc:(0.94626) 9. TopographyCode_1=C61.9, MorphologyCode_1=M-80003 16 ==> M_CODE=M-81403 16 acc:(0.94626) Slide 10

Testing the rules on new data Site Training set classifier Accuracy Registry 1 90% NaiveBase 68% BayesNet 77% J48 69% 100% NaiveBayes 74% BayesNet 81% Registry 2 90% NaiveBayes 65% BayesNet 65% J48 67% Some issues Preliminary experiments with small data set need more reference data We are not 100% confident of the accuracy of the coding because of age of data set. Slide 11

Machine learning of coding rules? Already know the logic have a model, the rules are known Need far too many referenced results not always accurate What happens when rules change? Need another ton of training data. New data still suffers in accuracy Not really practical for coding problem

Can we code the rules directly? Direct coding of known rules allows explanations useful coding assistant, allows coder to understand the automated coding decision New coding rules can be updated immediately in the system no need to wait for training data debugging is easier since we know which rules are being applied Confidence can be provided Allows high confidence scores to by-pass manual review

The Coding Model Multiple Primary and Histology Coding Rules January 01, 2007 National Cancer Institute Surveillance Epidemiology and End Results Program Bethesda, MD

Number of coding rules per site Specific Flow-Chart Upper Aerodigestive Tract Colon Lung Melanoma of the Skin Breast Kidney Urinary Bladder & Ureter & Renal Pelvis Brain (Benign Brain) Brain (Malignant Brain) Bone Marrow Hodgkin Lymphoma Gastrointestinal Lymphoma Non-Hodgkin Lymphoma Number of Rules (Single tumor and MULTIPLE TUMORS ABSTRACTED AS A SINGLE PRIMARY) 12 Rules 24 Rules 13Rules 10Rules 29 Rules 13 Rules 15Rules 10 Rules 11 Rules 31Rules + 9 rules for primary site 31Rules + 9 rules for primary site 31Rules + 9 rules for primary site 31Rules + 9 rules for primary site

Common sites 31 Rules Prostate Gland Fallopian Tube Rhabdomyosarcoma Wilms Tumor Soft Tissue and Thyroid Gland Gallbladder Carcinoma of the Skin Bone Uterine Cervix Liver Small Intestine Appendix Adrenal Gland Neuroblastoma Stomach Vulva Heart Ovary Testis Penis Thymoma and Thymic Anus Pancreas (Endocrine) Carcinoma Rectum Ampulla of Vater Pancreas (Exocrine) Thoracic Mesothelioma Unknown Endometrium Peritoneum Trophoblast Esophagus PNET - Ewing Sarcoma Uveal Melanoma Extrahepatic Bile Ducts Retinoblastoma Vagina

Breast Histology Coding Rules H6 Is there a combination of intraductal carcinoma and two or more specific intraductal types OR are there two or more specific intraductal carcinomas? yes Code 8523/2 (intraductal carcinoma mixed with other types of in situ carcinoma) (Table 3). 1. Use Table 1 to identify the histologies. H7 no Is there in situ lobular (8520) and any in situ carcinoma other than intraductal carcinoma (Table 1)? yes Code 8524/2 (in situ lobular mixed with other types of in situ carcinoma) (Table 3). 2. Change the behavior to 2(in situ) in accordance with the ICD-O-3 matrix principle (ICD-O-3 Rule F). H8 no Is there in situ lobular (8520) and any in situ carcinoma other than intraductal carcinoma (Table 1)? yes Code 8255/2 (adenocarcinoma in situ with mixed subtypes (Table 3).

The first Codex project - 2010 3 professional CTRs coded random sample of pathology reports into spreadsheet (N=1000) Results provided to AIM to develop and train software Initial version only looked at M and C codes extracted from pathology reports. Accuracy was not sufficient for automated use

Inter-Coder Reliability (All Coders Agree) (N=600) Reportable 494 82.3% SEER Site 469 78.2% Histology 441 73.5% Behavior 503 83.8% Topography 384 64.0% Topography (Major Site) 44 73.7% Laterality 451 75.2% Grade 439 73.2%

New Approach AIMS AI engine has enhanced text analytics capability. Expanded data items Heuristics for confidence scoring of answers Uses all information in path report to determine site and procedure, not only ICD-O- 3 codes More accurate determination of SEER site buckets

Measuring System performance New York State Cancer Registry provided one year s sample reports that were already coded 2011. Comparing automated results with reference set showed various discrepancies 17 questions were compiled for NYSCR coder to help explain discrepancies.

Results Morphology/behavior Site Old stat Coder corrected multiple diagnosis reports New total number of reports Breast 85% 98% 7 2592 Melanoma 48% 95% / 1953 Prostate 97% 99% / 499 Colon 75% 98% 17 294 Bladder 68% 74% 1 238 Endometrium 66% 78% / 211 Lung 79% 86% / 162 Thyroid 80% 99% / 161 Rectum 72% 98% 5 87 Stomach 71% 83% / 87 Upperaero 81% 95% / 60 UterineCervix 68% 85% / 41 Liver 89% 93% / 28 Esophagus 78% 92% / 24 Vagina 75% 88% / 17 Ovary 83% 100% / 13 Vulva 82% 92% / 12 Pancreasexcrine 90% 100% / 11 Kidney 56% 90% / 10 Bone Marrow Average N/A 76% Average 75% 91% / 1433

Results Primary Site Site Old stat Coder corrected multiple diagnosis reports Total number of reports Breast 79% 87% 7 2592 Melanoma 71% 74% 8 1953 Prostate 99% 100% / 499 Colon 62% 80% 17 294 Bladder 70% 97% 1 238 Endometrium 89% 96% / 211 Lung 81% 88% / 162 Thyroid 99% 100% / 161 Rectum 60% 62% 5 87 Stomach 64% 74% / 87 Upperaero 80% 86% 2 60 UterineCervix 55% 70% 1 41 Liver 78% 89% / 28 Esophagus 74% 79% / 24 Vagina 81% 82% / 17 Ovary 75% 77% / 13 Vulva 100% 100% / 12 Pancreasexcrine 70% 73% / 11 Kidney 90% 90% / 10 Bone Marrow 91% 91% / 1433 Average 78% Average 85% Laterality 97% when site is correct

Confidence Scoring When the auto generated heuristic score is above.9, there is a 95% confidence that the accuracy is between these values: Site Lower Upper Lung 0.84 0.97 Breast 0.94 0.99 Prostate 0.97 0.995 Colon 0.92 0.997 Rectum 1.00 1.00 Endometrium 0.81 0.94 Thyroid 1.00 1.00 Melanoma - skin 0.94 0.99

AI engine development and use AIM s AI Technology AI Engine E-Path Concept identification and inference rules for: Case selection Diagnostic imaging classification. CNS/Chest Concept search Knowledge bases Inference rules API Abrevio RCA Discrete data extraction Medical report processing Patient data consolidation Discrete data extraction and standardization Automated study matching Automated results distribution Rigorous QC methods Machine learning Conversion of text to discrete data for ML training Incorporate decision rules into inference engine. Codex (Automated coding for cancer abstract) AIM Product Description - AIM, 2017 Page 27

Where can we use machine learning? One good example is in the identification of extraneous information Citations, reference material etc.. These sometimes contain what may be interpreted as patient data. Use a Bayesian Spam filter approach to identify this text and block it.

Example of Spam ccc et al. Lymphoma 2008: 435 441 ddd al. Cancer.2013;369(25):2391-405 Tumor et al. Leukemia 23: 2007 The V617F mutation analysis can provide information helpful in distinguishing myeloproliferative disorders (MPDs) from reactive conditions, particularly secondary thrombocytosis and erythrocytosis. The incidence of the V617F mutation in MPDs in different studies ranges from 65-97% in polycythemia vera, from 41-57% in patients with essential thrombocythemia, and from 23-95% in patients with primary myelofibrosis. The same mutation was also found in roughly 20% of Ph-negative atypical CML, in more than 10% of CMML, in about 15% of patients with megakaryocytic AML (AML M7), 20% of patients with juvenile myelomonocytic leukemia (JMML), and acute lymphoblastic leukemia with t(9;12) (p24;p13).

Conclusion System shows high performance in several major sites (reduce manual coding effort) Coder variation exists, system is consistent With more feedback, the system will improve (we only had 1 round of coder feedback) High confidence reports can skip manual review Rule explanation feature will act a coding assistant. may even be used for new coder training

Acknowledgements New York State Cancer Registry Jovanka Harrison Schymura, Maria Sherman, Colleen Jamie Musco AIM Development Team

END

Inter-coder Reliability (Percent Agreement All Coders) Trial 1 (N=749) Topography (ICD-O-3) 84.2% 81.9% Histology (ICD-O-3) 85.6% 84.1% Behavior (ICD-O-3) 88.8% 91.8% SEER Site Group 89.1% 89.8% Tumor Grade 88.7% 73.2% Laterality 89.3% 87.7% Trial 2 [With Codex] (N=856)