Automatic Identification & Classification of Surgical Margin Status from Pathology Reports Following Prostate Cancer Surgery

Size: px

Start display at page:

Download "Automatic Identification & Classification of Surgical Margin Status from Pathology Reports Following Prostate Cancer Surgery"

John Harrell
5 years ago
Views:

1 Automatic Identification & Classification of Surgical Margin Status from Pathology Reports Following Prostate Cancer Surgery Leonard W. D Avolio MS a,b, Mark S. Litwin MD c, Selwyn O. Rogers Jr. MD, MPH d, Alex A. T. Bui PhD a a Medical Imaging Informatics Group, University of California, Los Angeles CA b Department of Information Studies, University of California, Los Angeles CA c Departments of Urology and Health Services, University of California, Los Angeles CA d Center for Surgery and Public Health, Brigham and Women s Hospital, Boston MA Abstract Prostate cancer removal surgeries that result in tumor found at the surgical margin, otherwise known as a positive surgical margin, have a significantly higher chance of biochemical recurrence and clinical progression. To support clinical outcomes assessment a system was designed to automatically identify, extract, and classify key phrases from pathology reports describing this outcome. Heuristics and boundary detection were used to extract phrases. Phrases were then classified using support vector machines into one of three classes: positive (involved) margins, negative (uninvolved) margins, and not-applicable or definitive. A total of 851 key phrases were extracted from a sample of 782 reports produced between 1996 and 2006 from two major hospitals. Despite differences in reporting style, at least 1 containing a diagnosis was extracted from 780 of the 782 reports (99.74%). Of the 851 s extracted, 97.3% contained diagnoses. Overall accuracy of automated classification of extracted s into the three categories was 97.18%. Introduction According to the American Cancer Society, there were an estimated 234,460 new cases of prostate cancer in the United States in 2006 [1]. A prevalent treatment for prostate cancer is removal of the prostate and the surrounding lymph nodes in a procedure known as a radical retropubic prostatectomy (RRP). One unintended consequence of RRP is a positive surgical margin, which results from incising inadvertently into the prostate or incising into the extraprostatic tumor [2]. Positive margins are correlated with decreased cancer-specific and overall survival and a 2 to 4 times greater chance of biochemical cancer recurrence [3]. Identifying the number of patients with positive surgical margin is a requisite step toward improving RRP outcomes. Unfortunately, surgical margin status is largely reported in non-standardized, free text form within the associated pathology reports. As a result, margin status must be manually abstracted from these documents, a timely process. The Clinical Outcomes Assessment Tool (COAT) is being designed as both an interface and Java API to facilitate medical records-based clinical outcomes assessment. As part of an ongoing study in the urological domain, COAT is being used to automatically identify phrases in pathology reports that reference the margin status of a prostate specimen. Once identified, these phrases (herein referred to as margin s) are automatically classified into one of three classes: 1) positive (involved) margin; 2) negative (uninvolved) margin; and 3) not applicable or definitive enough to diagnose margin status. We report an early evaluation of its ability to automatically identify and to classify margin s from a large sample of 782 free-text pathology reports taken from two hospitals. Background Reporting of margin status appears in the pathology report, along with other key prognostic parameters as set forth by the College of American Pathologists (CAP) prostate cancer protocol, including pathologic TNM stage (tumor staging) and Gleason score (histological analysis of tumor). Unlike the other CAP measures, which typically appear in a standard format (e.g., T2bN0Mx for TNM stage and 3+3=6 for Gleason score), margin status has no standard format for representation and often appears in or phrase form. Examples of typical margin s extracted from pathology reports in the sample are provided in Table 1. Positive surgical margin surgical margins involved at right apex base margin positive focal left tumor is present focally at the margin of resection Negative surgical margin no tumor present at the soft tissue resection margin no carcinoma is present at the inked margin Table 1. Examples of margin status references The challenge of extracting information of interest from the clinical record has been approached with a variety of techniques. Robust natural language processing (NLP) technologies such as MedLEE [4] and MetaMap/MMTx [5] employ comprehensive medical dictionaries and part-of-speech taggers to structure clinical data. A few examples of successful clinical applications of such systems include formatting the AMIA 2007 Symposium Proceedings Page - 160

2 contents of radiology reports [6], identifying candidates for clinical trials [7], and detection public health outbreaks [8]. Full NLP systems are not always necessary for identifying a fewer number of predetermined lexical targets. In many cases, fast and lightweight information extraction systems have been used to capitalize on the relative consistency of the appearance of specific clinical values in medical reports. These approaches typically employ patternrecognition techniques such as regular expression matching [9]. Several systems have combined extracted text features with machine learning algorithms to classify clinical documents or clinical values of interest within the documents [10]. One particular machine learning technique that has proven successful in classification of text features is support vector machines (SVMs) [11]. SVMs are linear classifiers that attempt to find a hyperplane that maximizes the margin between two different classes of instances. SVMs have been used in the clinical domain for several NLP-related tasks including document classification [12] and complex concept identification in radiology reports [13]. The availability of downloadable application programming interfaces (APIs) for various components required for medical language processing have made it possible for researchers to assemble pipelines of functionalities to accomplish specific information extraction tasks. For example, the Cancer Biomedical Informatics Grid s (cabig) text extraction tool ca- TIES [14] and Zeng et al. s HITEx [15] feature language tools from the General Architecture for Text Engineering (GATE) [16]. While caties uses the NLM s MetaMap/MMTx for medical concept mapping, HITEx implements only portions of the Unified Medical Language System [17] relevant to its tasks. The Clinical Outcomes Assessment Tool (COAT) is similarly designed, assembling a suite of functionalities specific to the task of facilitating records-based clinical outcomes assessment. It features a Java API of generalizable text processing facilities (record import, tokenization, regular expression matching, etc.), integration with MetaMap/MMTx for concept mapping, and Weka s API for data and text mining [18]. These features are combined in an interface created for outcomes researchers to manipulate, manage, visualize, and analyze clinical data for outcomes assessment. COAT is being developed at UCLA in cooperation with the Center for Surgery and Public Health at Brigham and Women s Hospital (BWH). In this current study, COAT was extended to identify and classify margin s in cooperation with urologists at both UCLA and BWH. Methods The goal of this study was to facilitate RRP outcomes assessment by identifying key phrases which describe the margin status of patients from pathology reports. Once extracted, a classification algorithm was applied to automatically classify the extracted s. A consideration in this study was how best to balance the level of customization needed to achieve an acceptable level of performance in identifying and classifying margin s, relative to the extensibility of this technique to encompass different institutions. To explore this issue, a sample of pathology reports from two major teaching hospitals, UCLA Medical Center and Brigham and Women s Hospital, was collected for analysis and testing. Data: A corpus of 787 pathology reports for patients having undergone RRPs was randomly selected from BWH (n = 456) and the UCLA Medical Center (n = 331). The pathology reports selected from BWH were drawn from RRP surgeries conducted at BWH between Jan. 1, 1996 and June 1, The UCLA pathology reports were drawn from surgeries performed between Jan. 1, 1998 and June 1, The population of patients having RRP procedures at both institutions during these time periods was approximately A total of 5 reports (.6% of sample) were excluded from the sample for being inaccurately coded as prostate cancer or for not featuring some reference to the surgical margin status as required by the American College of Surgeons Commission on Cancer [19]. As a result, the final number of reports comprising the sample was 453 from BWH and 329 from UCLA for a total sample of 782 reports. To conduct this study, IRB approvals were obtained from both institutions. Heuristics Design: The first step in designing a method to automatically identify and extract margin s was an exploratory pilot analysis of the pathology reports. A random sample of 30 pathology reports was taken from each institution (60 total) to manually identify potential consistencies and differences in the description of margin status. Reports from both BWH and UCLA feature a summary section in which margin status is described in narrative form, along with other CAP prognostic parameters (tumor stage, Gleason score, etc.). Most UCLA reports also included a Microscopic Examination section that featured semi-structured text describing margin status and other analytic measures. For example: Surgical Margin: Less than 1 mm from margin These features were incorporated in the design of an identification and extraction algorithm. A graphical overview of the pipeline used is provided in Figure 1. Identification & Extraction of Margin Sentences: Sentence boundary detection was used to break up the reports and all text converted to lowercase. Based on the pilot study results a pair of two, two-word combinations appearing consistently in the corpus was used to flag potential s of interest. This set of trigger rules was supplemented with three additional combinations after iterating through the sample and logging misses, for a total of five trigger rules. Table 2 shows the rules used for extracting s describing margin status that were applied. AMIA 2007 Symposium Proceedings Page - 161

3 First Run Margin Sentence Rules resection and (margin or margins) surgical and (margin or margins) Margin Sentence Rules Added apical and (margin or margins) tumor and (margin or margins) carcinoma and (margin or margins) Table 2. Rules used for extracting margin s To capitalize on the consistency of the semistructured Microscopic Exam section in UCLA reports, the extraction algorithm was designed to make an initial pass on UCLA reports to identify margin status in this section using regular expression matching. The expression matched case-insensitive appearances of the word margin or margins followed by a colon and the word or words that followed. Regular expression: (?i)\\w+\\s*\\w*((margin)s*(:)\\s*\\w*\\w+) implementation of an SVM classifier that used a polynomial kernel function and sequential minimal optimization (SMO) for training [20]. The performance of the classifier was evaluated using a 10-fold cross-validation. BWH Text Docs File import Text cleaner Document Object UCLA XML Docs XML parsed Data structure for medical records In the case of a match, a modified boundary detection algorithm was used to accommodate the bulleted-style of text featured in the Microscopic Exam section. It captured all text between double line breaks when the semi-structured margin regular expression was matched. This initial pass to capture semi-structured margin status in UCLA reports was the only variation in handling data from one hospital versus the other. The IDs of reports for which no match was found were logged and these reports were manually reviewed for causes of failures. Classification of Margin Sentences: The s extracted from the sample of 782 reports were manually classified into one of three classes by the author (LWD) based a review of the literature and consultation with urologists to create a training set and gold standard for classification. All s were classified into one of three categories: 1) positive (involved) surgical margin, 2) negative (uninvolved) surgical margin, and 3) not applicable or no explicit diagnosis. This third category was used to classify s extracted with no relevance to margin status, as well as s in which the diagnosis could not be definitively determined (i.e., false positives). Examples of category three s from the sample are provided in Table 3. the apical and basal margins are amputated and fixed separately note benign prostate glands are focally present at an inked resection margin Table 3. Examples of category 3 (false positive) s Once classified, the s extracted from the reports were tokenized into vectors of lowercase words. A feature vector was created from all unique token appearances. Vectors of tokens appearing in classified s, as well as their assigned category were also created. COAT was integrated with Weka s Java API and the vectors were passed to an End Miss logged N RegEx matcher Break content into s Meets Criteria? Has semistructured section Has regular expression match? Narrow boundaries Text cleaner Custom pipeline assembled for margin extraction N (BWH match) N (UCLA w/o semi-structure match) Final margin (UCLA with semistructure match) Figure 1. Workflow and classes for identifying and extracting margin s Results Identification of Margin Sentences: At least one potential margin was extracted for 780 of the 782 documents. Extrapolated to the collection of approximately 3480 pathology reports, at least one describing margin status should be extracted in 99.74% of reports (C.I %, 99.92% with 95% confidence). The s describing margin status that were missed are listed below in Table 4. Tumor is within 0.1 cm of the ink on both sides. Margins, negative. Table 4. Missed s describing margin status For the 780 documents from which s were extracted, 851 potential s were identified. Of AMIA 2007 Symposium Proceedings Page - 162

4 the 851 s extracted, there were 23 false positives (category 3) for a extraction precision of 97.3%. Of the 23 false positive s, 22 were from BWH, and 1 from UCLA. Examples of s falsely considered to be describing surgical margin status are featured in the previous Table 3. classified as Category 1 (positive) Category 2 (negative) Category 3 (NA) Category (97%) 4 (3%) 0 (0%) Category 2 8 (1.1%) 688 (98.7%) 1 (.1%) Category 3 1 (4.3%) 10 (43.5%) 12 (52.2%) Table 9. Confusion matrix of classification results Each document yielding a false positive also produced at least 1 true positive. In other words, all of the 780 documents from which s were identified produced at least one true positive (category 1 or 2) from which a diagnosis of margin status could be made. The results of margin extraction are provided in Table 5. Reports RRP path reports with diagnosis of carcinoma of the prostate 782 Reports from which true positive (category 1 or 2) s were extracted 780 (99.74%) Sentences Extracted Number of s extracted 851 Positive margin s (category 1) 131 Negative margin s (category 2) 697 Not applicable or definitive (category 3) 23 Number of true positives (precision) 828 (97.3%) Distribution of False Positives BWH false positive s 22 UCLA false positive 1 Table 5. Margin extraction results Classification of Margin Sentences: The SVM classifier correctly classified 827 of 851 s for an overall accuracy of 97.18%. Sensitivity and specificity for the three categories was as follows; category 1 (positive margin) = 96.95%, 98.77%, category 2 (negative margin) = 98.71%, 91.67, and category 3 (not applicable to or definitive of margin status) = 52.17%, 99.88%. Summaries of the overall classification results, sensitivity and specificity, area under receiver operator curves (ROC), and a confusion matrix are provided in the tables below. Total Number of Sentences 851 Correctly Classified Sentences 827 (97.19%) Incorrectly Classified Sentences 24 (2.82%) Table 6. Overall classification accuracy Sensitivity Specificity Category % 98.77% (positive margin) Category 2 (negative margin) 98.71% 91.67% Category 3 (not applicable to margin status) 52.17% 99.88% Table 7. Sensitivity and specificity of classification Area under ROC Category 1 (positive margin).9789 Category 2 (negative margin).9482 Category 3 (not applicable).8457 Table 8. Area under ROC for classification results Discussion The overall performance of the system in identifying, extracting, and classifying s was promising. The only customization included in handling reports from the two different hospitals was a preliminary iteration in the algorithm to capitalize on the existence of any semi-structured reporting of margin status. The preliminary pilot analysis of a small subset of reports was a useful starting point for designing heuristics for capturing potential margin s. It led to the five simple rules based on keyword appearance to capture s from 780 of 782 reports (99.74%) with only 23 (2.7%) extracted s not containing a margin status diagnosis. It also informed the decision to modify the extraction algorithm to capitalize the appearance of semi-structured reporting. The larger collection of BWH documents explains some of the differential in false positives, but not the ratio of 22 to 1. In addition, only 3 of the 22 BWH false positive reports was created in 1996 or 1997, ruling out some significant temporal factor in producing false positives. Instead, the majority of BWH false positives might be attributed to the lack of consistency in the way margin status and related concepts are referenced in BWH RRP pathology reports. If this is the case, then the results offer support for the standardization of key analytic measures to facilitate automated outcomes assessment research. The strong performance of the SVM classifier (97.19% accuracy) was partially an indication of the success of the boundary detection techniques in extracting representative feature vectors (margin s). It was also additional evidence of the power of SVMs in classifying text using distance-based techniques. We were also encouraged by the classifier s performance in light of no modifications made to accommodate classification of margin s from one hospital versus the other. Performance was predictably poor (12/23 for 53.3%) for classifying category three (not applicable) s due to a training set of only 23 s in the collection of 851 total s extracted. However, only one of the 11 misses for category 3 was falsely classified as a positive surgical margin (category 1) while the rest were classified as negative margin s. This observation is important as it implies that, using this technique, incorrectly classified category 3 s are unlikely to inflate the number of cases resulting in positive surgical margin. While only 1.1% (8 s) of all AMIA 2007 Symposium Proceedings Page - 163

5 negative margin s was incorrectly classified as positive margin, the number of positive margin cases was falsely inflated by 6.3%. The effect of the large ratio of negative to positive margin cases must therefore be considered in efforts to extend this technique for discovering the total number of positive margin cases in a collection. The conclusions drawn from this study are limited by the sample used. First, both UCLA and BWH are American College of Surgeons Commission on Cancer-approved hospitals which indicates that both have achieved a baseline of quality in regards to their oncology services [19]. The format and inclusion of margin status in non-approved facilities may differ systematically. Second, this study did not account for the effects of poor data quality introduced by incorrectly assigned administrative codes or missing values. Conclusion We have demonstrated the ability to automatically identify and classify s describing a key outcome of prostate cancer surgery from pathology reports with high accuracy. This is a fundamental first step in supporting automated prostate cancer surgery outcomes assessment. Currently, COAT is being extended to capture other key quality measures including Gleason score and tumor stage. An evaluation of the effects of incorrect administrative code assignments and missing values on produced results is also in progress. Acknowledgements This work was supported in part by the NLM Medical Informatics Training Grant # LM This paper benefited from suggestions from Dr. Jim Sayre, Dr. David Miller, Dr. Jim Hu, and Vijay Bashyam. References [1] American Cancer Society. Overview: Prostate Cancer. [Web site] 2006 [cited 2006; Available: t=36 [2] Richie J. Management of patients with positive surgical margins following radical prostatectomy. The Urological Clinics of North America. 1994;21:717. [3] Hull G, Rabbani F, et al. Cancer control with radical prostatectomy alone in 1,000 consecutive patients. The Journal of Urology. 2002;167:528. [4] Friedman C, Alderson P, et al. A general natural language text processor for clinical radiology. Journal of American Medical Informatics Association. 1994;1(2): [5] Aronson A. Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program. In: Belfus H, editor. AMIA Symposium; 2001; p [6] Hripsak G, Austin J, et al. Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology. 2002;224(1): [7] Xu H, Anderson K, et al. Facilitating Cancer Research using Natural Language Processing of Pathology Reports. MedInfo; 2004; [8] Chapman W, Fiszman M, et al. Identifying respiratory findings in emergency department reports for biosurveillance using MetaMap. Medinfo ///;11(Pt 1): [9] Turchin A, Kolatkar N, et al. Using regular expressions to abstract blood pressure and treatment intensification information from the text of physician notes. Journal of the American Medical Informatics Association. 2006;13(6): [10] Chapman WW, Fizman M, et al. A comparison of classification algorithms to automatically identify chest X-ray reports that support pneumonia. J Biomed Inform. 2001;34(1):14. [11] Joachims T. Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms: Kluwer Academic Publishers [12] etisgen-ildiz M, Pratt W. The effect of feature representation on MEDLINE document classification. Proceedings of the American Medical Informatics Association; 2005; Washington, DC; [13] Bashyam V, Taira R. Identifying Anatomical Phrases in Clinical Reports by Shallow Semantic Parsing Methods. Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining; 2007; Honolulu, Hawaii; [14] Cancer Biomedical Informatics Grid. About caties [cited 2007 March 10, 2007]; Available: [15] Zeng Q, Goryachev S, et al. Extracting principle diagnosis, co-morbidity, and smoking status for asthma research: evaluation of a natural language processing system. BMC Medical Informatics and Decision Making. 2006;6(30). [16] Cunningham H. GATE, A General Architecure for Text Engineering. Computers and the Humanities. 2004;36(2): [17] National Library of Medicine. UMLS Fact Sheet [cited 2006 August 10]; Available from: [18] Witten I, Frank E. Data Mining: Practical machine learning tools and techniques. 2nd Edition ed. San Francisco: Morgan Kaufmann [19] American College of Surgeons Commission on Cancer. Categories of Approval. [cited Feb. 24, 2007];Available: [20] Platt J. Machines using sequential minimal optimization. In: B S, C B, A S, eds. Advances in Kernel Methods - Support Vector Learning AMIA 2007 Symposium Proceedings Page - 164

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1 1 Department of Biomedical Informatics, Columbia University, New York, NY, USA 2 Department