Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts

Similar documents
Distillation of Knowledge from the Research Literatures on Alzheimer s Dementia

American College of Radiology ACR Appropriateness Criteria

4/22/2010. Hakan Korkmaz, MD Assoc. Prof. of Otolaryngology Ankara Dıșkapı Training Hospital-Turkey.

Index. Surg Oncol Clin N Am 15 (2006) Note: Page numbers of article titles are in boldface type.

3/29/2012. Thyroid cancer- what s new. Thyroid Cancer. Thyroid cancer is now the most rapidly increasing cancer in women

Management guideline for patients with differentiated thyroid cancer. Teeraporn Ratanaanekchai ENT, KKU 17 October 2007

COME HOME Innovative Oncology Business Solutions, Inc.

Thyroid Cancer (Carcinoma)

Thyroid Nodule. Disclosure. Learning Objectives P A P A P A 3/18/2014. Nothing to disclose.

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods

Adjuvant therapy for thyroid cancer

Thyroid Cancer: When to Treat? MEGAN R. HAYMART, MD

Calcitonin. 1

Thyroid Cancer. With 51 Figures and 30 Tables. Springer

Thyroid Nodules. Family Medicine Refresher Course Geeta Lal MD, FACS April 2, No financial disclosures

Thyroid and Adrenal Gland

Management of thyroid cancer in the Northern and Yorkshire region,

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts

Thyroid Nodules. Dr. HAKIMI, SpAK Dr. MELDA DELIANA, SpAK Dr. SISKA MAYASARI LUBIS, SpA

Evaluation and Management of Thyroid Nodules. Nick Vernetti, MD, FACE Palm Medical Group Las Vegas, Nevada

THYROID CANCER IN CHILDREN. Humberto Lugo-Vicente MD FACS FAAP Professor Pediatric Surgery UPR School of Medicine

Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor

Abstracts. 2. Sittichai Sukreep, King Mongkut's University of Technology Thonburi (KMUTT) Time: 10:30-11:00

What is Thyroid Cancer? Here are four types of thyroid cancer:


Thyroid nodules - medical and surgical management. Endocrinology and Endocrine Surgery Manchester Royal Infirmary

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

Modeling Annotator Rationales with Application to Pneumonia Classification

Thyroid Pathology: It starts and ends with the gross. Causes of Thyrophobia. Agenda. Diagnostic ambiguity. Treatment/prognosis disconnect

Radiation Therapy for Thyroid Cancer. When is Radiation Therapy indicated in Thyroid Cancer of Follicular or Parafollicular Cell Origin?

CHAPTER 5 DECISION TREE APPROACH FOR BONE AGE ASSESSMENT

Review Article Management of papillary and follicular (differentiated) thyroid carcinoma-an update

COMETRIQ (cabozantinib) oral capsule

Setting The setting was secondary care. The economic study was conducted in the USA.

- RET/PTC rearrangement: 20% papillary thyroid cancer - RET: medullary thyroid cancer

How preferred are preferred terms?

The International Federation of Head and Neck Oncologic Societies. Current Concepts in Head and Neck Surgery and Oncology

Data Mining in Bioinformatics Day 4: Text Mining

DOWNLOAD OR READ : TREATMENT OF THYROID TUMOR JAPANESE CLINICAL GUIDELINES PDF EBOOK EPUB MOBI

Case year old female presented with asymmetric enlargement of the left lobe of the thyroid

Case 4 Diagnosis 2/21/2011 TGB

1. Protocol Summary Summary of Trial Design. IoN

A Review of Differentiated Thyroid Cancer

Lenvatinib and sorafenib for treating differentiated thyroid cancer after radioactive iodine [ID1059]

Carcinoma of thyroid - clinical presentation and outcome

What is Thyroid Cancer?

Sub-Topic Classification of HIV related Opportunistic Infections. Miguel Anderson and Joseph Fonseca

May All Your Wishes Come True: A Study of Wishes and How to Recognize Them

Chapter 12 Conclusions and Outlook

Thyroid Gland. Protocol applies to all malignant tumors of the thyroid gland, except lymphomas.

Schema-Driven Relationship Extraction from Unstructured Text

Reference No: Author(s) Approval date: October committee. September Operational Date: Review:

Persistent & Recurrent Differentiated Thyroid Cancer

2015 American Thyroid Association Thyroid Nodule and Cancer Guidelines

How good are we at finding nodules? Thyroid Nodules Thyroid Cancer Epidemiology Initial management Long-term follow up Disease-free status

National Horizon Scanning Centre. Vandetanib (Zactima) for locally advanced or metastatic medullary thyroid cancer. December 2007

Approach to Thyroid Nodules

Understanding Thyroid Cancer

Extraction of Adverse Drug Effects from Clinical Records

Differentiated Thyroid Cancer: Initial Management

Cabozantinib for medullary thyroid cancer. February 2012

Building On The Best A Review and Update on Bethesda Thyroid 2017

INDEX. Note: Page numbers of issue and article titles are in boldface type. cell carcinoma. ENDOCRINE SURGERY

Animal Disease Event Recognition and Classification

Clinical Policy: Pazopanib (Votrient) Reference Number: ERX.SPA.139 Effective Date:

Thyroid carcinoma. Assoc. prof. V. Marković, MD, PhD Assoc. prof. A. Punda, MD, PhD D. Brdar, MD, nucl. med. spec.

4/10/2018. SEER EOD and Summary Stage. Overview KCR 2018 SPRING TRAINING. What is SEER EOD? Ambiguous Terminology General Guidelines

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation

Dr J K Jekel Dept. Surgery University of Pretoria

A Study of Thyroid Swellings and Correlation between FNAC and Histopathology Results

Chapter 10. Summary, conclusions and future perspectives

CAP Cancer Protocol and ecc Summary of Changes for August 2014 Thyroid Agile Release

Downloaded from journal.gums.ac.ir at 9:58 IRST on Sunday February 17th 2019

Differentiated Thyroid Carcinoma

What you need to know about Thyroid Cancer

Chapter 14: Thyroid Cancer

Rossella Elisei. Department of Endocrinology, University Hospital, Pisa, Italy

NCI Thyroid FNA State of the Science Conference. The Bethesda System For Reporting Thyroid Cytopathology

Introduction to ICD-O-3 coding rules

Thyroid Cancer: Overview And Peculiar Aspects In Philippines Nemencio A. Nicodemus Jr., MD

POORLY DIFFERENTIATED, HIGH GRADE AND ANAPLASTIC CARCINOMAS: WHAT IS EVERYONE TALKING ABOUT?

Pattern of thyroid malignancy at a University Hospital in Western Saudi Arabia

CARCINOMA OF THE THYROID

A descriptive study on solitary nodular goitre

Innovative Risk and Quality Solutions for Value-Based Care. Company Overview

Radiotherapy in feline and canine head and neck cancer

radioactive iodine (iodine-131) Knowing the benefits and risks of radioactive iodine enables you and your doctor to decide WHAT S RIGHT FOR YOU.

2018 Updates for Neoplasms of the Thyroid

Index. Surg Oncol Clin N Am 16 (2007) Note: Page numbers of article titles are in boldface type.

UnitedHealthcare Pharmacy Clinical Pharmacy Programs

THYROID CYTOLOGY THYROID CYTOLOGY FINE-NEEDLE-ASPIRATION ANCILLARY TESTS IN THYROID FNA

COMPARISON OF BREAST CANCER STAGING IN NATURAL LANGUAGE TEXT AND SNOMED ANNOTATED TEXT

Deep Learning based Information Extraction Framework on Chinese Electronic Health Records

UnitedHealthcare Pharmacy Clinical Pharmacy Programs

Prognostic value of the 8 th tumor-node-metastasis classification for follicular carcinoma and poorly differentiated carcinoma of the thyroid in Japan

Thyroid Update. Timothy C. Petersen, MD, ECNU

B. Environmental Factors. a. The major risk factor to papillary thyroid cancer is exposure to ionizing radiation, during the first 2 decades of life.

Background to the Thyroid Nodule

Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains

Pilot Study: Clinical Trial Task Ontology Development. A prototype ontology of common participant-oriented clinical research tasks and

Transcription:

jsci2016 Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Wutthipong Kongburan, Praisan Padungweang, Worarat Krathu, Jonathan H. Chan School of Information Technology King Mongkut's University of Technology Thonburi

Agenda INTRODUCTION PROPOSED METHOD RESULT AND DISCUSSION CONCLUSION AND FUTURE WORK

Agenda INTRODUCTION Overview of text mining and NER Thyroid Cancer and Medical Intervention Rationale and Motivation Research questions PROPOSED METHOD RESULT AND DISCUSSION CONCLUSION AND FUTURE WORK

Thyroid cancer The cells of the thyroid gland become abnormal, grow uncontrollably, and form a mass of cells called a tumor. Thyroid neoplasms represent almost 95% of all endocrine tumors Women are three times more likely than men In Bangkok, Thailand, the number of new cases of thyroid cancer is 6 per 100,000 men and women per year It is now ranked as the 8th most common cancer in women in the United States and 4% increase annually

Medical Intervention Any objects or acts of treatment and preventive care of intervening, interfering or interceding with the intent of modifying the outcome Such typical treatment - the side effects. Medical interventions alternative treatment increasing patient satisfaction lead to better health outcomes

Thyroid cancer publications in PubMed 3500 3000 #Publications 2500 2000 1500 1000 1995 2000 2005 2010 2015 Years

Overview of Biomedical Text mining

Named Entity Recognition (NER) Classify tokens in to predefined categories such as person name, organizations, diseases, proteins.

Rationale and Motivation Training data ML-based Text mining Classifier/ Model

Research questions How can we construct a thyroid cancer intervention corpus in reasonable time? What are the criteria to measure the performance of corpus?

Agenda INTRODUCTION PROPOSED METHOD A construction workflow of corpus Basic definitions of entities Evaluation of corpora 10-fold cross validation Leave-one(intervention)-out cross validation RESULT AND DISCUSSION CONCLUSION AND FUTURE WORK

A construction workflow of thyroid cancer intervention corpus PubMed query: (((((((((((thyroid neoplasms) OR thyroid cancer) OR thyroid carcinoma) OR thyroid malignancies) OR papillary) OR follicular) OR medullary) OR anaplastic) AND treatment) AND hasabstract[text] AND Humans[Mesh] AND English[lang])) The query limits the results to abstracts of human studies in English. Sorted by relevance Online database Information Retrieval Relevant documents Title Text PubMed ID

A construction workflow of thyroid cancer intervention corpus Online database Information Retrieval The initial number of 50 abstracts is arbitrary, subject to the limitation of human annotator that can be done in reasonable time. To determine entity boundaries in a text by sentence splitting and define the data in token form. Relevant documents Carefully select 50 Preprocessing 50

A construction workflow of thyroid cancer intervention corpus Online database Information Retrieval Relevant documents Carefully select 50 Each token or word is tagged with three classes of biomedical entities: INTERVENTION, DISEASE, and O. The annotation process is carried out following the annotation guideline. Preprocessing 50 Manual annotation 50 manually annotated abstracts Annotation guideline

Basic definitions of entities INTERVENTION means a treatment and preventive care object or action that performed by doctor or other clinician targeted at a thyroid cancer patient. Non-drug therapy: e.g., surgery, radioactive iodine treatment, external beam radiation therapy, and Drug therapy: e.g., anti-angiogenesis drugs, target inhibitors, chemotherapy, and thyroid hormone replacement therapy. DISEASE means a disorder in which the cells of the thyroid gland become abnormal, grow uncontrollably, and form a mass of cells called a tumor. Including Common name and tumors name of thyroid cancer: e.g., thyroid cancer, thyroid malignancies, thyroid carcinoma, thyroid tumors. Group names of thyroid cancer: e.g., differentiated thyroid carcinoma, undifferentiated thyroid cancer. Sub types or other forms of thyroid cancer: e.g., follicular cancer, papillary thyroid cancer, Hurthle cell carcinoma, medullary carcinoma, anaplastic carcinoma, insular thyroid carcinoma, thyroid lymphoma, and thyroid sarcoma. Other terms which not refer to INTERVENTION and DISEASE will be tagged as O. http://dlab.sit.kmutt.ac.th/wp-content/uploads/2015/10/tcit-annotation-guideline.pdf

A construction workflow of thyroid cancer intervention corpus Online database Information Retrieval Relevant documents Carefully select 50 Preprocessing 50 Manual annotation Annotation guideline 50 manually annotated abstracts NER classifier Train NER classifier Stanford NER provides a general implementation of linear chain Conditional Random Fields (CRFs) sequence models.

A construction workflow of thyroid cancer intervention corpus Online database Information Retrieval Relevant documents The number of n equals to 30 for each experiment round which is the arbitrarily selected in our current experiment Select n first abstracts 30 Carefully select 50 Preprocessing Preprocessing 30 50 Manual annotation Annotation guideline 50 manually annotated abstracts NER classifier Train NER classifier

A construction workflow of thyroid cancer intervention corpus Online database Information Retrieval 30 automatically annotated abstracts Relevant documents Select n first abstracts Carefully select 30 50 Preprocessing Preprocessing 30 50 Automatic annotation Manual annotation Annotation guideline 50 manually annotated abstracts NER classifier Train NER classifier

A construction workflow of thyroid cancer intervention corpus Online database Information Retrieval 30 automatically annotated abstracts Relevant documents Select n first abstracts Carefully select Neglect No Contain new intervention 30 50 To filter only abstracts that contain new INTERVENTION out of the manually annotated datasets. Yes 30-neglect Preprocessing 30 Preprocessing 50 Automatic annotation Manual annotation Annotation guideline 50 manually annotated abstracts NER classifier Train NER classifier

A construction workflow of thyroid cancer intervention corpus Online database Information Retrieval 30 automatically annotated abstracts Relevant documents Select n first abstracts Carefully select Neglect No Contain new intervention 30 50 Yes Preprocessing Preprocessing 30-neglect 30 50 The automatic annotated abstracts will be judged by annotators together with the annotation guideline. Manual correction Correct automatically annotated abstracts Annotation guideline Automatic annotation Manual annotation 50 manually annotated abstracts Annotation guideline This step can reduce the effort of manually annotated. NER classifier Train NER classifier

A construction workflow of thyroid cancer intervention corpus Online database Information Retrieval 30 automatically annotated abstracts Relevant documents Select n first abstracts Carefully select Neglect No Contain new intervention 30 50 Yes Preprocessing Preprocessing 30-neglect 30 50 Manual correction Annotation guideline Automatic annotation Manual annotation Annotation guideline Correct automatically annotated abstracts 50 manually annotated abstracts The result of each round of incremental model is a corpus. It is a combination of manually annotated datasets and automatic tagging with manual correction datasets. Construction a corpus Corpus NER classifier Train NER classifier

A construction workflow of thyroid cancer intervention corpus Online database Information Retrieval 30 automatically annotated abstracts Relevant documents Select n first abstracts Carefully select Neglect No Contain new intervention 30 50 Yes Preprocessing Preprocessing 30-neglect 30 50 Manual correction Annotation guideline Automatic annotation Manual annotation Annotation guideline Correct automatically annotated abstracts 50 manually annotated Construction a corpus NER classifier Train NER classifier The process will be repeated again with next 30-first abstracts, until the performance of a new corpus and current corpus have no significant difference. Corpus

Incremental construction Round Current corpus size #Tokens unique interventions New n-first abstracts # contain new intervention # after manual correction 1 50 312 1-30 11 9 2 59 339 31-60 9 8 3 67 356 61-90 6 6 4 73 379 91-120 12 8 5 81 400 121-150 10 8 6 89 413 151-180 10 10 7 99 460 181-210 14 13 8 112 501 211-240 13 13 9 125 525 241-270 11 10 10 135 546 271-300 10 8 11 143 564 301-330 12 10 12 153 594 331-360 11 9 13 162 619 361-390 10 5 14 167 632 391-420 11 7 15 174 648 421-450 9 7 16 181 655 451-480 8 7 17 188 675 - - -

Research questions How can we construct a thyroid cancer intervention corpus in reasonable time? What are the criteria to measure the performance of corpus?

Agenda INTRODUCTION PROPOSED METHOD A construction workflow of corpus Basic definitions of entities Evaluation of corpora 10-fold cross validation Leave-one(intervention)-out cross validation RESULT AND DISCUSSION CONCLUSION AND FUTURE WORK

Performance Evaluation 10-fold cross validation Predicted class Predicted class Actual class INTERVENTION DISEASE or O INTERVENTION DISEASE or O True Positive(TP) False Positive(FP) False Negative(FN) True Negative(TN) Actual class DISEASE INTERVENTION or O DISEASE True Positive(TP) False Positive(FP) INTERVENTION or O False Negative(FN) True Negative(TN) Predicted class Actual class O INTERVENTION or DISEASE O True Positive(TP) False Positive(FP) INTERVENTION or DISEASE False Negative(FN) True Negative(TN) Precision (P) =, Recall (R) =, F1-score (F) = 2

10-fold cross validation Sequentially select Randomly select

Agenda INTRODUCTION PROPOSED METHOD RESULT AND DISCUSSION How many abstracts are suitable for constructing the corpus? CONCLUSION AND FUTURE WORK

The average performance of three entity classes by using 10-fold cross validation

The change rate of average F1-score of three entity classes

The average performance of 143 model Class Precision Recall F1-score INTERVENTION 0.923 0.666 0.768 DISEASE 0.982 0.915 0.946 O 0.974 0.996 0.984 Average 0.959 0.859 0.906 in a case of 143 abstracts there are 41,657 tokens consist of 2,473 INTERVENTION (564 unique intervention tokens), 1,996 DISEASE, (94 unique tokens of disease), and 37,188 O.

Leave-one(intervention)-out cross validation All leave out keywords are the words that describe common interventions applied for thyroid cancer e.g. chemotherapy, radioiodine, radioactive, radiotherapy, drug, vandetanib, cabozantinib, sorafenib, inhibitors, surgery, ablation, resection, dissection, and thyroidectomy.

The accuracy of discovery new interventions by using leave-one-out cross validation

Result and discussion Based on the experimental results, the corpus with 143 abstracts is a suitable size of corpus for Acceptable model in performance metrics. Discovery new interventions (overfitting avoidance)

Confusion matrix of radioactive keyword in leave-one-out cross validation using 143 model Predicted class Actual class O INTERVENTION DISEASE O 8668 56 4 INTERVENTION 191 465 4 DISEASE 19 1 396 False negatives and false positive mainly fall into mislabeled (e.g., total lobectomy is annotated as O), lacking in entity s boundaries (e.g., central neck node dissection only dissection annotated as INTERVENTION), and case sensitive (e.g., Surgery is annotated as O instead INTERVENTION).

Agenda INTRODUCTION PROPOSED METHOD RESULT AND DISCUSSION CONCLUSION AND FUTURE WORK

Result and discussion The production of a gold standard corpus requires a significant amount of manual curation work Contribution a semi-automatic semantic annotation approach to construct a thyroid cancer intervention corpus together with a criteria of identifying reasonable performance Future work Improve performance of the corpus for example, diversify the keywords search, remove sentences without annotations, and use different number of abstracts in the experiment Apply our approach to extract relationships between general biomedical entities

38