COMPARISON OF BREAST CANCER STAGING IN NATURAL LANGUAGE TEXT AND SNOMED ANNOTATED TEXT

Similar documents
A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

Clinician-Driven Automated Classification of Limb Fractures from Free-Text Radiology Reports

Symbolic rule-based classification of lung cancer stages from free-text pathology reports

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1

Keeping Abreast of Breast Imagers: Radiology Pathology Correlation for the Rest of Us

An Improved Algorithm To Predict Recurrence Of Breast Cancer

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts

Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports

George Cernile Artificial Intelligence in Medicine Toronto, ON. Carol L. Kosary National Cancer Institute Rockville, MD

Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor

IBM Research Report. Automated Problem List Generation from Electronic Medical Records in IBM Watson

Chapter 12 Conclusions and Outlook

Outline. How to Use the AJCC Cancer Staging Manual, 7 th ed. 7/9/2015 FCDS ANNUAL CONFERENCE ST PETERSBURG, FLORIDA JULY 30, 2015.

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation

Christina Martin Kazi Russell MED INF 406 INFERENCING Session 8 Group Project November 15, 2014

Classification of Cancer-related Death Certificates using Machine Learning

Innovative Risk and Quality Solutions for Value-Based Care. Company Overview

Automatic Identification & Classification of Surgical Margin Status from Pathology Reports Following Prostate Cancer Surgery

Asthma Surveillance Using Social Media Data

Efficient Encoding of Pathology Reports Using Natural Language Processing

Automatic Extraction of Synoptic Data. George Cernile Artificial Intelligence in Medicine AIM

NUMERATOR: Reports that include the pt category, the pn category and the histologic grade

TeamHCMUS: Analysis of Clinical Text

EXTRACT THE BREAST CANCER IN MAMMOGRAM IMAGES

The feasibility of using natural language processing to extract clinical information from breast pathology reports

CLAMP-Cancer an NLP tool to facilitate cancer research using EHRs Hua Xu, PhD

Reporting of Cancer Stage Information by Acute Care Hospitals in Ontario

Analysis of Diabetic Dataset and Developing Prediction Model by using Hive and R

Cardiac Risk Prediction Analysis Using Spark Python (PySpark)

Early Detection of Lung Cancer

Predicting Breast Cancer Survivability Rates

Building a Diseases Symptoms Ontology for Medical Diagnosis: An Integrative Approach

Improved Intelligent Classification Technique Based On Support Vector Machines

Retrieving disorders and findings: Results using SNOMED CT and NegEx adapted for Swedish

CHAPTER 2 MAMMOGRAMS AND COMPUTER AIDED DETECTION

Application of Automated Pathology Reporting Concepts to Radiology Reports

A Descriptive Delta for Identifying Changes in SNOMED CT

Applications of Machine learning in Prediction of Breast Cancer Incidence and Mortality

Modeling Annotator Rationales with Application to Pneumonia Classification

Shades of Certainty Working with Swedish Medical Records and the Stockholm EPR Corpus

Lung Tumour Detection by Applying Watershed Method

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

Building Evaluation Scales for NLP using Item Response Theory

SAGE. Nick Beard Vice President, IDX Systems Corp.

TF-IDF-Based Automated Application for classification Forensic Autopsy Reports to Identification of Cause of Death (CoD)

Prediction of Key Patient Outcome from Sentence and Word of Medical Text Records

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

Integration of hospital based breast cancer data and population based data at the Greater Poland Cancer Centre

Classification of Smoking Status: The Case of Turkey

NUMERATOR: Reports that include the pt category, the pn category and the histologic grade

A REVIEW ON CLASSIFICATION OF BREAST CANCER DETECTION USING COMBINATION OF THE FEATURE EXTRACTION MODELS. Aeronautical Engineering. Hyderabad. India.

Factuality Levels of Diagnoses in Swedish Clinical Text

A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U

A review of approaches to identifying patient phenotype cohorts using electronic health records

CLASSIFICATION OF BRAIN TUMOUR IN MRI USING PROBABILISTIC NEURAL NETWORK

Not all NLP is Created Equal:

Improving the Accuracy of Neuro-Symbolic Rules with Case-Based Reasoning

Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval

Phone Number:

Creating prognostic systems for cancer patients: A demonstration using breast cancer

Evaluating E&M Coding Accuracy of GoCode as Compared to Internal Medicine Physicians and Auditors

EXPLORING THE INTERNAL CONSISTENCY OF REGISTRY DATA ON STAGE OF DISEASE AT DIAGNOSIS

Lung Cancer Concept Annotation from Spanish Clinical Narratives

Problem-Oriented Patient Record Summary: An Early Report on a Watson Application

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods

Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments

A comparative study of different methods for automatic identification of clopidogrel-induced bleeding in electronic health records

Effect of (OHDSI) Vocabulary Mapping on Phenotype Cohorts

SNOMED CT and Orphanet working together

PREDICTION OF METASTATIC DISEASE BY COMPUTER AIDED INTERPRETATION OF TUMOUR MARKERS IN PATIENTS WITH MALIGNANT MELANOMA: A FEASIBILITY STUDY

Wikipedia-Based Automatic Diagnosis Prediction in Clinical Decision Support Systems

May All Your Wishes Come True: A Study of Wishes and How to Recognize Them

Enhanced Detection of Lung Cancer using Hybrid Method of Image Segmentation

Automated Prediction of Thyroid Disease using ANN

Data mining with Ensembl Biomart. Stéphanie Le Gras

PREPROCESSING AND GENERATION OF ASSOCIATION RULES FOR PREDICTION OF ACUTE MYELOID LEUKEMIA FROM BONE MARROW DATA

Primary Level Classification of Brain Tumor using PCA and PNN

BREAST CANCER EPIDEMIOLOGY MODEL:

Automatically extracting, ranking and visually summarizing the treatments for a disease

HHS Public Access Author manuscript Stud Health Technol Inform. Author manuscript; available in PMC 2015 July 08.

How preferred are preferred terms?

Building a framework for handling clinical abbreviations a long journey of understanding shortened words "

PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH

Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients

Automatic coding of death certificates to ICD-10 terminology

Lung Cancer and Mesothelioma Site Specific Clinical Reference Group Data Quality Report 2009

Predicting the Effect of Diabetes on Kidney using Classification in Tanagra

Copyright 2008 Society of Photo Optical Instrumentation Engineers. This paper was published in Proceedings of SPIE, vol. 6915, Medical Imaging 2008:

A DATA MINING APPROACH FOR PRECISE DIAGNOSIS OF DENGUE FEVER

I.2 CNExT This section was software specific and deleted in 2008.

CANCER REPORTING IN CALIFORNIA: ABSTRACTING AND CODING PROCEDURES California Cancer Reporting System Standards, Volume I

Lung Cancer Detection using CT Scan Images

Conditional Outlier Detection for Clinical Alerting

Analysis of Classification Algorithms towards Breast Tissue Data Set

City, University of London Institutional Repository

FUZZY DATA MINING FOR HEART DISEASE DIAGNOSIS

Artificial Intelligence In Medicine xxx (2018) xxx-xxx. Contents lists available at ScienceDirect. Artificial Intelligence In Medicine

Transcription:

Volume 116 No. 21 2017, 243-249 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu COMPARISON OF BREAST CANCER STAGING IN NATURAL LANGUAGE TEXT AND SNOMED ANNOTATED TEXT 1Johanna Johnsi Rani G, 2 Dennis Gladis, 3 Joy John Mammen 1 Department of Computer Science, Madras Christian College, Chennai 600 059, South India 2 Department of Computer Science, Presidency College, Chennai 600 005, South India 3 Department of Transfusion Medicine, Christian Medical College, Vellore - 632 004, South India 1 johanna.g@mcc.edu.in, 2 Christophergladis67@gmail.com, 3 joymammen@cmcvellore.ac.in Abstract: In recent times, medical reports are generated electronically and stored in databases for automated systems to collate, process, analyze and interpret the patient data. A collection of such reports can help in population studies on the disease domain. Automated systems can also verify the manual diagnosis presented in the reports by experts. The corpus for the automated system discussed is a set of breast cancer pathology reports retrieved and processed using Natural Language Processing (NLP) techniques. According to the protocol by American Joint Committee on Cancer, ptnm classification is used to determine the pathological staging of breast cancer. The characteristics and classifications of Tumour T, Lymph node N and Distant Metastases M determine the stage of cancer. M is not evident from Pathology reports, hence it is given a default value of M0. The T and N classifications in the reports are validated and modified by the domain experts to give the Gold standard, with generation of discrepancy report for those with varying values. The cancer staging parameters extracted by the automated system is compared against the Gold Standard for analysis. The focus of the work is to extract the parameters required to determine the cancer stage of patients from two kinds of reports namely reports with natural language text and reports with SNOMED annotated text. The cancer staging process on both types of reports is compared and results indicate that cancer stage derived from SNOMED annotated pathology reports yield better results than on natural language text. Keywords: Breast cancer; Pathology reports; Natural Language Processing; Annotated text; Cancer stage 1. Introduction Most of the medical reports both in written and electronic form have descriptive narrations in natural language mostly in English. Processing textual data from these documents can be accomplished through natural language processing methods. Most of the hospitals in India generate and store medical reports in databases. Processing these medical reports using automated systems can provide valuable information for analysis and interpretation about the patient population. Statistics indicates that India ranks at the top in Breast cancer deaths. With the available set of breast cancer pathology reports, an automated system is developed to determine the cancer stage of patients. The required parameters are extracted from both natural language reports and reports annotated using Systematized Nomenclature of Medicine Clinical terms (SNOMED CT). The set of breast cancer pathology reports are obtained from a hospital in South India. The report has the following sections namely Specimen, Clinical, Gross, Micro, and Impression. The Impression section of the de-identified Pathology reports are processed to derive the Pathological classification ptnm, in which T represents Tumour, N represents Lymph node and M represents Distant Metastasis. The grouping of T, N and M classifications, is used to detect the stage of cancer of patients. The American Joint Committee on Cancer (AJCC) has created resource materials that provide indepth and easy-to-access information for doctors and other medical professionals who perform the staging of cancer patients, and for cancer registrars who abstract the cancer cases [11]. The existence of primary tumour and its size are the prime values required to classify T. Breast cancer may spread to the axillary lymph nodes in the armpit. The conditions for classification of lymph node N, which is more complex than T classification. Distant Metastases M is not classified based on details in a Pathology report. Hence the system sets a default value of M0 to derive the cancer stage. The stage of breast cancer in a patient describes the extent of the spread of cancer in the body and the grouping of T, N and M clearly specifies the extent of the disease in a patient. The cancer stage is determined through grouping of T, N and M as recommended by AJCC. 243

Prior to determining the cancer stage on natural language text and the SNOMED annotated text, the textual content is pre-processed. The pre-processing steps required for natural language includes Natural Language Processing (NLP) related tasks, and standardization of numerical and non-numerical values in the text. The SNOMED annotated text requires a major pre-processing step of extracting a disease specific subset from the SNOMED database. As the subsequent pre-processing step, the SNOMED subset extracted for the disease domain is used to annotate the text with SNOMED terms and their code. Out of the processing steps, extraction of SNOMED subset for breast cancer domain and annotation of natural language text using the subset are out of scope of this paper. The work uses regional data collected from hospital in India. Hence it has practical applicability in the diagnosis, treatment and population-based studies of breast cancer in women in India. The paper is organized as follows: Section II describes Related Works in Natural Language Processing (NLP), SNOMED annotation of text, and Cancer Staging. Section III explains the Materials and Methods used. Section IV describes the Results obtained. Section V presents the Conclusion. 2. Related Works Electronic Health Records (EHR), especially those in narrative text form are processed by applying Natural Language Processing (NLP) and Information Extraction (IE) techniques. Erik Cambria and White mention various approaches that use Production rules, Semantic categories and those based on First-order Logic (FOL) Bayesian and Semantic networks [1]. Dunham et al.[12], Schadow and McDonald [4], Xu et al.[3], Anni Coden et al.[6], and Nguyen et al.[5], used domain-specific lexicons and rules in processing pathology reports. Nelson et al. developed a web-based search application with sequential queries. [9] Buckley JM et al. converted free text EHRs to a machine readable form using NLP techniques. [3] Anni Coden et al. automatically extracted cancer disease characteristics from pathology reports [6]. David Martinz and Yue Li, used text mining tools to extract information with minimal human intervention [8]. Cancer staging in this work is done using extraction of required parameters using pattern-matching on free text and annotated text. The Clinical reports in the developed countries use medical terminologies such as SNOMED or ICD. Buckley et al. used ICD and Current Procedural Terminology (CPT) codes to identify those reports pertaining to breast [2]. Schadow G and McDonald developed a method of extraction for details about specimens and their related findings from coded text. [4] Nguyen et al. applied Symbolic rule-based classification methodology, to identify SNOMED CT concepts in free text. [14]. Napolitano G, Fox C, Middleton R and Connolly D used Pattern-based extraction from pathology reports [7]. Many breast cancer related research works in India use the Wisconsin Breast Cancer dataset. This work uses regional data and hence the results have practical relevance and applicability. The system uses Patternmatching rules for extraction and cancer staging on both natural language text [17] and annotated text. The annotation is done using SNOMED. Ching-Heng Lin, Nai-Yuan Wu, Wei-Shao Lai and Der-Ming Liou developed an Auto-annotation tool that selects terms using a suggesting and ranking algorithm to annotate reports from terms in a SNOMED subset [16]. The two essential processing steps in this work are use of pattern-matching algorithms to extract the necessary parameters for cancer staging and annotation of text using SNOMED. A comparison in the cancer staging process on both natural language text and those annotated text is performed using the dataset and the results are compared and analyzed to determine which performs better. 3. Materials and Methods The dataset and the methods applied to determine the cancer stage of patients are explained in this section. The process applies steps in natural language processing and pattern matching rules to determine the cancer stage. A. Dataset One hundred and fifty de-identified breast cancer pathology reports constitute the corpus used in this work. The reports written by a Pathologist narrates the patient s condition determined by examining cells and tissues under a microscope. The report has the following sections: Demographic information, Specimen section indicating the body part from where the tissue samples are taken, Clinical history describing breast abnormality and the kind of surgery done and, Gross description giving the size, weight, and color of each piece of tissue removed. The Microscopic description describes how cancer cells look under the microscope, and their relationship to the normal surrounding tissue, the size of cancer, results of special tests and growth rate of cells. The Impression section summarizes all the important findings from the tissues examined. 244

B. Cancer staging The stage of cancer indicates how far the cancer has spread. There are two types of cancer staging - Clinical staging and Pathological staging. Out of the two, Pathological staging is more accurate than Clinical staging. T, N and M classifications are found from the Impression section, applying AJCC protocol and their grouping determines the stage of cancer. The stage is determined on reports with natural language and SNOMED annotated text. C. Preprocessing for cancer staging on Plain text Retrieval of reports, pre-processing on the report content, extraction of the required details for TNM classification and staging are the major tasks performed in the developed automated system. The pathology report is retrieved either as.pdf or.txt file and the listed preprocessing steps are performed on plain text reports. The precision of results in any process on natural language text depends on the number of preprocessing steps applied to homogenize and standardize the data. The preprocessing steps applied to the breast cancer pathology reports are listed below. Report segregation: Separating multiple reports into individual reports. Section segmentation: Extracting the contents of the sections in the reports as separate sections. Standardization of measures: All tumour sizes are either given in centimeters or millimeters. This step converts all the measures into millimeters. Date formats: All dates are converted to a uniform DD/MM.YYYY format. Sentence segmentation: The contents of each section are separated into individual sentences. Period (.) is used to identify the sentences, with handling of exceptions for fraction values. Standardization of numerical values: The pathology reports have numeric values represented in numerals (3), or in English words (three). Such numerical values are standardized to Arabic numerals. Alpha numeric representations: The number of lymph nodes are represented as 1/3, or 1 out of three, or one out of three. This value is converted into complete textual form as one out of three. Abbreviations: Abbreviations are expanded by the system. Spelling variations: All discrepancies in spelling between British and American English are standardized using British English. Whitespace removal: The whitespaces are removed from the document. This improves the data extraction process. Handling parenthesized terms: Parentheses () or [ ] in the document are homogenized into [ ]. Case sensitivity: All text comparisons are made by converting the terms into lower case. In case of medical terms such as Ductal Carcinoma in situ, the terms are converted to a form as found in SNOMED. Missing headers: The pre-processing module appends missing headers into the document whenever necessary. The application of the above pre-processing steps homogenizes the reports and improves the parameters extraction process for cancer staging. The efficiency and precision of annotation of medical terms in the report, using SNOMED improves with the preprocessing steps. Fig. 1 shows the workflow for the cancer staging process on natural language text in pathology reports and on SNOMED annotated reports. The diagram shows that both archived reports and newly generated reports are processed to determine the cancer stage of patients. Figure 1. Workflow of Comparison on Breast Cancer Staging D. Preprocessing for cancer staging on SNOMED Annotated text The pre-processing steps required for extraction of cancer stage on SNOMED annotated text are, manually building a Lexicon of breast cancer terms and extraction of SNOMED subset for breast cancer domain using the Lexicon and queries. These are part of our earlier work in the development of the automated system. The Lexicon is built in two ways i. Through manual process of examining the reports to accumulate terms and store them in a database and ii. Through application of NLP based tasks such as sectioning, sentence splitting, tokenization and stop word removal, after tagging the medical terms 245

found in the manual lexicon. [18] The above two preprocessing steps are out of scope of this paper. In the annotation process using the subset, each medical term in the report is replaced with its corresponding SNOMED term and its code. Pattern-matching algorithms are applied on SNOMED annotated text to find the ptnm classification. The patterns used for cancer staging on annotated text have been coined using several components. The components are SNOMED Concept Ids that were identified using the CliniClue SNOMED browser, numerical values, negation values (No / Not), Logical connective (and, or). The conditions are the same classification conditions specified in AJCC protocol. When the free text is annotated with the SNOMED codes for medical terms, the ptnm classifications are also annotated with their respective codes. 4. Results The automated system successfully determined the cancer stage for each patient from the natural language text and annotated text in all the 150 reports. The pattern-matching rules applied for the process extracted the details required for classification of T, and N and cancer staging. Figure 2to Figure 4 present the analysis reports of T, N and Cancer Stage extracted from natural language text. E. Gold Standard for the Cancer Staging The system has three ptnm classifications: i. ptnm given at the end of the report, manually derived by the Pathologist by examining the parameters in the report, ii. Gold Standard ptnm, the ptnm verified and validated by the Pathologist through a graphical interface and iii. ptnm classification automatically derived by the application. The ptnm specified in the Impression section of each pathology report is verified by the Pathologists, to correct erroneous and missing classifications. This is the gold standard that is used to validate the automatically derived ptnm classification. The ptnm is of prime importance as it determines the stage of cancer in patients. Figure 2. Analysis of T- Classification on Natural language text F. Analysis of Cancer staging process The analysis on cancer stage values derived is performed by finding the True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN) values. The evaluation parameters used in the analysis are listed below. Precision (P) = TP / (TP + FP) Recall (R) = TP / (TP + FN) Specificity = TN / (TN + FP) Accuracy = (TP + TN) / (TP + TN + FP + FN) F-measure = (2*Precision*Recall) / (Precision + Recall) Error Rate = (FP + FN) / (TP + TN + FP + FN) Figure 3. Analysis of N-Classification on Natural Language text Figure 4. Analysis of Cancer Staging on Natural Language text 246

Cancer staging on free text indicates that the average Precision in cancer staging on natural language text is 94.72%, the average Recall is 95.94%, average Accuracy is 92.12% and average Specificity is 80.96%. The average F-measure for the process is 95.89% and average Error is 3.94%. The results show that the system performs well to extract cancer stage of patients. This efficiency can be attributed to numerous pre-processing steps applied on the textual contents before the extraction process. The results of Cancer staging on SNOMED annotated text in pathology reports is presented in Figures 5 to Figure 7. Figure 5. Analysis of T-Classification on SNOMED Annotated text Cancer staging process on SNOMED annotated report yields the following results. The average Precision of the process is 95.48%, the average Recall is 100%, average Accuracy is 97.97% and average Specificity of 96.27%. The average F-measure for the process is 97.66% and average is Error 0.04%. As the analysis parameters indicate the cancer staging process on SNOMED annotated text, yields better results. This can be attributed to the following reasons. i. The preprocessing steps extensively applied on the medical text contribute to homogeniztion and standardization of text in the reports. This cleans the dataset for efficient process. ii. The correctness of the process is ensured by the manually collating a Lexicon of medical terms relating to breast cancer from the pathology reports and using it for the annotation process. The Lexicon has been obtained and verified using manual and automated means, which standardized the subset extraction process. iii. The Lexicon generated by the system is used in SNOMED subset extraction. The comprehensiveness and the completeness of the lexicon terms contributes to effective subset extraction. iv. SNOMED subset for cancer consists of about 1% of all the SNOMED CT concepts in the database. The extraction of SNOMED subset for breast cancer domain, instead of using the complete SNOMED database, result in faster and precise annotation of reports, thus giving better results for cancer staging than on natural language text. v. The annotation process standardized every medical term in the report, by replacing it with its equivalent term in the Medical vocabulary in SNOMED. 5. Conclusions Figure 6. Analysis of N-Classification on SNOMED Annotated text Figure 7. Analysis of Cancer staging on SNOMED Annotated text The objective of the work to derive the stage of cancer on natural language textual reports and SNOMED annotated reports was successfully achieved. The use of standard AJCC protocol for cancer staging and globally accepted medical vocabulary such as SNOMED yielded better results in the staging process. The natural language text is heterogeneous but the pre-processing steps bring homogeneity to the text. In spite of this, the reason for less efficiency in cancer staging on natural language text reports can be attributed to the use of only the Impression section of the report for the staging process. Processing other sections would improve the results. The accuracy of automated systems in medical domain, especially in a task as critical as cancer staging is of vital importance, as it involves diagnostic and treatment decision on a human being. This critical factor necessitates that reports be annotated and processed for better results, analysis and decision-making. Annotation of the reports using SNOMED also makes it possible to apply numerous 247

queries on any annotated disease dataset to get better understanding of the patient population. The work clearly indicates that between cancer staging process on natural language text and the SNOMED annotated text, the process on annotated text yields best results. As extension of this work, the annotation process can be performed on reports of other disease domains for required processing and decision making. 6. Acknowledgement The authors would like to thank the Department of Pathology, Christian Medical College and Hospital, Vellore for providing the sample data for the study. The authors would also like to acknowledge S. Pradeep Vignesh, student of MCA in the Department of Computer Science, Madras Christian College for his contributions towards developing the automated system. References [1] Erik Cambria, Bebo White, Jumping NLP Curves: A Review of Natural Language Processing Research, IEEE Computational intelligence magazine, pp 48-57, May 2014. [2] Buckley JM, Coopey SB, Sharko J, et al. The feasibility of using natural language processing to extract clinical information from breast pathology reports. Journal of Pathology Informatics. 2012;3:23. doi:10.4103/2153-3539.97788. [3] Xu H, Friedman C. Facilitating research in pathology using natural language processing. AMIA Annual Symp. Proc. 2003:1057. [4] Schadow G, McDonald CJ. Extracting Structured Information from Free Text Pathology Reports. AMIA Annual Symposium Proceedings., pp. 584-588, 2003. [5] Nguyen, Moore, Lawley, Hansen, Colquist, Automatic extraction of cancer characteristics from freetext pathology reports for cancer notifications, Stud Health echnol Inform. 2011;168:117-24. [6] Anni Coden et al., Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model, Elsevier, Journal of Biomedical Informatics 42, pp 937 949, 2009. [7] Napolitano G, Fox C, Middleton R, Connolly D, Pattern-based information extraction from pathology reports for cancer registration, Cancer causes control, 2010 Nov;21(11):1887-94. doi: 10.1007/s10552-010-9616-4. Epub 2010 Jul 23. [8] David Martinz, Yue Li,,, Information Extraction from Pathology reports in a hospital setting, Proceedings of the 20th ACM international conference on Information and knowledge management, pp. 1877-1882, 2010. [9] Nelson HD, Weerasinghe R, Martel M, Bifulco C, Assur T, Elmore JG, et al. Development of an electronic breast pathology database in a community health system. J Pathol Inform 2014;5:26. [10] McCowan I, Moore D, Nguyen AN, Bowman RV, Clarke BE, Duhig EE, et al. Application of Information Technology: Collection of Cancer Stage Data by Classifying Free-text Medical Reports. JAMIA. 2007;14(6):736 745. [11] AJCC Cancer Staging Manual. 7th ed. New York, NY: Springer, 00 347-76, 2010. [12] G.S. Dunham, M.G. Pacak and A. W. Pratt, Automatic indexing of Pathology data, Journal of the American Society for Information Science, 29(2):81-90, Mar., 1978. [13] David A. Hanauer et al., The registry case finding engine: an automated tool to identify cancer cases from unstructured, free-text pathology reports and clinical notes, Journal of the American College of Surgeons, 205(5): pp. 690-697, Nov. 2007. [14] Anthony N Nguyen et al., Symbolic rule-based classification of lung cancer stages from free-text pathology reports, Journal of the American Medical Informatics Association (JAMIA), 17:440-445, 2010. [15] Carlos Rodrigues-Solano, Leonardo Lezcano, Miguel-Angel Sicilia, Information Systems and Technologies for Enhancing Health and Social Care, 2013, pp. 15. [16] Lin C-H, Wu N-Y, Lai W-S, Liou D-M. Comparison of a semi-automatic annotation tool and a natural language processing application for the generation of clinical statement entries. Journal of the American Medical Informatics Association : JAMIA. 2015;22(1):132-142. doi:10.1136/amiajnl-2014-002991. [17] Johanna Johnsi Rani G., Dennis Gladis, Marie Therese Manipadam, Gunadala Ishitha, Breast Cancer Staging using Natural Language Processing, 2015, IEEE Conference publications, pp. 1552-1558, DOI: 10.1109/ICACCI.2015.7275834. [18] Johanna Johnsi Rani G., Dennis Gladis, Joy John Mammen, Lexicon-based and Query-based Autoannotation of Medical Reports using SNOMED, Proceedings of the International Conference on Computing Paradigms (ICCP), 2017 [19] Johanna Johnsi Rani G., Dennis Gladis, Joy John Mammen, SNOMED Subset Extraction for Annotation of Breast Cancer Pathology Reports, Proceedings of National Conference on ICT Solutions for Challenges and Issues in e-health (NCICTEH'17), 2017. [20] K.Srikar,M.Akhil,V.Krishna reddy, Execution of Cloud Scheduling Algorithms,International Innovative Research Journal of Engineering and Technology, vol 02,no 04,pp.108-111,2017. 248

249

250