A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1

Size: px
Start display at page:

Download "A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1"

Transcription

1 A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1 1 Department of Biomedical Informatics, Columbia University, New York, NY, USA 2 Department of Medicine, Columbia University, New York, NY, USA Various natural language processing (NLP) systems have been developed to unlock patient information from narrative clinical notes in order to support knowledge based applications such as error detection, surveillance and decision support. In many clinical notes, abbreviations are widely used without mention of their definitions, which is very different from the use of abbreviations in the biomedical literature. Thus, it is critical, but more challenging, for NLP systems to correctly interpret abbreviations in these notes. In this paper we describe a study of a two-step model for building a clinical abbreviation database: first, abbreviations in a text corpus were detected and then a sense inventory was built for those that were found. Four detection methods were developed and evaluated. Results showed that the best detection method had a precision of 91.4% and recall of 80.3%. A simple method was used to build sense inventories from two different knowledge sources: the Unified Medical Language System (UMLS) and a MEDLINE abbreviation database (ADAM). Evaluation showed the inventory from the UMLS appeared to be the more appropriate of the two for defining the sense of abbreviations, but was not ideal. It covered 35% of the senses and had an ambiguity rate of 40% for those that were covered. However, annotation by domain experts appears necessary for uncovered abbreviations and to determine the correct senses. INTRODUCTION Natural language processing (NLP) systems 1-3 have been developed in the clinical domain to unlock clinical information from free text. Information retrieved by these systems was used for various knowledge-based applications, such as decision support systems, and was shown to improve the quality of health care. Studies have shown that different types of clinical notes have various challenges for NLP systems. Long 4 discussed several issues when parsing free text nursing notes, including tokenization, recognition of special forms, determining the meaning of abbreviations, and spelling correction. Stetson et al. 5 performed a study of clinical Signout notes and found they contain more abbreviations and have denser content than ambulatory and discharge notes. In general, the wide usage of abbreviations seems to be a common problem for most types of clinical notes. In the biomedical literature, abbreviations usually occur together with their expanded forms at least once in the document, typically with the format of short form (long form) e.g. CABG (coronary artery bypass graft). Various approaches have been developed to map abbreviation-definition patterns, which were then applied to MEDLINE abstracts to build databases 6-8 that contain abbreviations that were detected along with their possible senses (also called a sense inventory). In most clinical reports, such as admission notes, occurrences of abbreviations are different from those in the literature because in clinical reports, they usually do not occur along with their expanded forms, making the task of identification more difficult. Therefore, approaches based on abbreviation-definition patterns applicable for the literature are not applicable for the patient record. Abbreviations are also highly ambiguous, e.g. RA could be right atrium or rheumatoid arthritis. Liu and colleagues 9 reported that 33.1% of abbreviations found in the UMLS were ambiguous. In this paper, we describe an initial study of abbreviations in inpatient admission notes. We developed and evaluated four different methods that automatically detect abbreviations in the notes. To the best of our knowledge, this is the first time machine learning techniques have been developed for detecting abbreviations in clinical notes. We also studied the coverage of available biomedical knowledge sources for the abbreviations that were detected in order to explore adequacy of those knowledge sources for building sense inventories. Although we applied our methods to admission notes, our methods are generalizable and could be applied to other types of notes as well. BACKGROUND Detection of abbreviations in unrestricted text is a challenging task. In the domain of general English, different methods have been developed for detecting them. Park and Byrd 10 defined a set of manually created rules for abbreviation detection. Toole 11 described a decision tree based method to identify abbreviations from words that were not recognized by his NLP system. He AMIA 2007 Symposium Proceedings Page - 821

2 reported a precision of 91.1% on free text from an Air Safety Reporting System database. For abbreviations containing a period, the process of identifying them involves solving the sentence boundary problem as well. (.e.g. Dr. should not be considered the end of sentence). In the biomedical domain, there are several knowledge sources available that contain abbreviations and their possible senses. The UMLS 12, which combines many biomedical vocabularies, is a comprehensive source for medical terms, including clinical abbreviations. The Metathesaurus file MRCONSO.RRF contains information associated with medical terms, such as the concept unique identifier (CUI) and source vocabulary. Therefore it can be used to as a source of medical terms, including abbreviations. Additionally, Liu 9 et al. reported on a method to extract abbreviations embedded in UMLS terms. For example, the abbreviation CAD is extracted from the UMLS term CAD - Coronary artery disease. ADAM 8 is another source of knowledge concerning abbreviations, and contains acronym and non-acronym abbreviations. Different definitions of the abbreviations were extracted automatically from MEDLINE titles and abstracts based on short form / long form patterns. In the clinical domain, Berman 13 reviewed thousands of clinical abbreviations from pathology reports and classified them based on the relationship between the abbreviations and their corresponding long forms, which is useful for implementation of abbreviation detection and expansion algorithms. METHODS In this study, we developed four methods to automatically detect abbreviations in clinical notes, and evaluated performance using a manually annotated test set. We also used the test set to evaluate the coverage of sense inventories generated from UMLS and ADAM. Data Set The New York Presbyterian Hospital (NYPH) Clinical Data Repository (CDR) is a central pool of clinical data, which includes narrative data consisting of different types of clinical reports. In 2004, NYPH implemented a new Physician Data Entry system called enote 14, which allows physicians to directly key in various types of notes, such as hospital admission notes and discharge summaries. For this study, we collected all the admission notes from the internal medicine service, which were entered via the enote system during , amounting to 16,949 notes. All the admission notes were then tokenized in order to separate the entire text into a sequence of individual strings or tokens. A simple heuristic rule-based tokenizer was developed based on observation of a few admission notes in the data set. Ten admission notes were randomly selected from the collection and a hospitalist attending physician from the internal medical service manually reviewed the selected notes, listed all the abbreviations they contained, and specified their full forms. The set was then randomly partitioned. Six annotated notes were used as a training set for the methods. The remaining four annotated notes, which served as a reference standard in the study, were used as a test set. In order to evaluate the coverage of available knowledge sources for clinical abbreviations, 100 different abbreviations with their expanded forms based on the annotations were randomly selected from the test set and used to evaluate the coverage of the two available knowledge sources. Abbreviations Detection We developed the following four methods for detecting abbreviations in clinical notes. The first is a simple method, which is used to measure the baseline performance of the abbreviation detection task. This method selects all unknown tokens in the text as abbreviations. To determine if a token is unknown or not, we used two word lists, and labeled any word not in the two lists as unknown (i.e. an abbreviation). The first list is an English word list (Knuth's list of 110,573 American English words 15 ), which also contains morphological variants of normal English words. The second list is a medical term list consisting of 9,721 words, which was obtained from two of the lexical files of an NLP system called MedLEE 3 (Medical Language Extraction and Encoding System). These files contained single medical words and single word drug names commonly found in clinical reports. To improve the performance of this baseline detection method, known medical abbreviations in the MedLEE lexicon were eliminated from this list by manual review so that they would be considered unknown. The second method is a heuristic rule-based program developed by observing several admission notes. It utilizes information concerning word formation, such as capital letters, numeric and alphabetic characters and their combinations, together with the above two word lists of English and medical terms. If a word meets one of the following criteria, it is considered an abbreviation: 1) If the word AMIA 2007 Symposium Proceedings Page - 822

3 contains special characters such as - and. ; 2) If the word contains less than 6 characters, and contains one of following: a) mixture of numeric and alphabetic characters; b) capital letter(s), but not when only the first letter is uppercase following a period; c) lower case letters where the word is not in the English or medical list. For the third and fourth methods, we trained a decision tree classifier using the training set. We used the J48 decision tree in Weka , which is an implementation of a C4.5 decision tree generating algorithm. Method 3, the decision tree 1 (DT1) method, used features concerning word formation and a feature from the corpus associated with frequency, as described below. Method 4, the decision tree 2 (DT2) method, used the same features as method 3 but also used features derived from outside knowledge sources. Features concerning the word formation include 1) special characters such as -, and. ; 2) alphabetic/numeric characters and their combination; 3) information about upper case and positions in the word; 4) length of the word. A feature derived from the corpus is the average document frequency of a word, which is defined as the total number of occurrences of the word over the number of notes in the corpus. The features derived from outside knowledge sources, consisting of the English and medical term lists, consider whether a word is an English word and whether it is a known medical term. To evaluate performance, the testing set was processed by each of the abbreviation detection methods, and a list of predicted abbreviations was generated respectively for each: Baseline, Rule-Based, DT1 (decision tree method based only on word information and frequency) and DT2 (decision tree method based on word information, frequency and external knowledge). Those automatically generated abbreviations were compared to the reference standard and precision/recall was reported for each method. Precision is defined as the ratio between the number of correctly predicted abbreviations and the number of all predicted abbreviations by the automated method. Recall is defined as the ratio between the number of correctly predicted abbreviations by the automated method and the number of all abbreviations in the reference standard. Sense Inventory Study After detecting the abbreviations, a sense inventory was created for them. We created three sense inventories from two available knowledge sources: the UMLS and the ADAM abbreviation database, and evaluated the coverage of generated sense inventories. We used the UMLS 2006AB to generate the UMLS sense inventory. It was obtained by: 1) using all the terms in the metathesaurus file MRCONSO.RRF ; 2) derived abbreviations from the UMLS as described in Liu 9. All terms in UMLS were normalized to lower case. CUIs which had a corresponding term that matched the abbreviation were considered as possible senses for that abbreviation. The sense inventory from the UMLS contained the CUIs and their corresponding preferred strings for the abbreviations found in the UMLS. For example, the UMLS sense inventory for the abbreviation ESR would be C : Erythrocyte Sedimentation Rate and C : Electron Spin Resonance Spectroscopy. Similarly, we also obtained a sense inventory of the abbreviations from the ADAM abbreviation database. For example, we obtained forms such as electron spin resonance, erythrocyte sedimentation rate and estrogen receptor for abbreviation ESR. For an abbreviation, we studied two types of coverage: 1) abbreviation coverage to determine if the sense inventory contains an entry for the abbreviated term, but not necessarily for the correct sense; 2) sense coverage to determine if the sense inventory contains the correct sense of the abbreviation as determined by the expert. We computed both term and sense coverage for three sense inventories from: 1) the UMLS directly; 2) the UMLS+Abbr, which includes the UMLS and abbreviated terms derived from the UMLS and 3) ADAM from the ADAM database. One evaluator was used for this study, and was shown 1) the abbreviation, 2) the corresponding sense in the clinical note as determined by the expert when annotating the note, and 3) possible senses from each of the three inventories; if an inventory did not have an entry for the abbreviation, that field was left blank. The evaluator then determined which inventories contained an entry for the abbreviated term and which inventories contained the correct sense for each abbreviation. If a sense inventory covered the clinical abbreviation, we also noted whether the abbreviation was ambiguous in that inventory, signifying that the abbreviation mapped to more than one expanded sense. RESULTS Based on the expert annotations, the training set for the decision tree contained 3007 tokens, AMIA 2007 Symposium Proceedings Page - 823

4 where 415 were punctuation tokens and 418 were abbreviations; the test set contained 2611 tokens where 363 were punctuation tokens and 411 were abbreviations. Table 1 shows the precision and recall for each abbreviation detection method. Both the rule-based and decision tree-based methods achieved better precision and recall than the baseline method. Among them, the DT2 method reached the highest precision of 91.4% while retaining a recall of 80.3%. Table 1. Results of abbreviation detection. Method Precision Recall Baseline 286/387=73.9% 286/411=69.6% Rule-Based 345/404=85.4% 345/411=83.9% DT1 294/336=87.5% 294/411=71.5% DT2 330/361=91.4% 330/411=80.3% Table 2 shows the abbreviation coverage, sense coverage, and ambiguity rate of the three different sense inventories. Note that although the abbreviation coverage was greater than 50% for all inventories, the sense coverage was lower. ADAM had the highest sense coverage of 38.0%, but 71.1% of the covered abbreviations were ambiguous. UMLS+Abbr had a slightly lower coverage (35.0%) than ADAM, but with a much lower ambiguity rate of 40.0%. Table 2. Results of sense coverage and ambiguity study. Sense Resource % of Abbr. Coverage % of sense Coverage % of ambiguity if covered UMLS 56.0% (56/100) 24.0% (24/100) 33.3% (8/24) UMLS+ 67.0% 35.0% 40.0% Abbr (67/100) ADAM 66.0% (66/100) (35/100) 38.0% (38/100) (14/35) 71.1% (27/38) DISCUSSION When analyzing the abbreviations, we observed that most were either acronyms or shortened words, but also observed a few other ways in which they were formed. Table 3 shows a summary of the different types of abbreviation formation along with examples and an estimate of the frequencies based on the coverage testing set of 100 abbreviations. As shown in Table 3, acronyms usually are associated with multi-word phrases, and are formed by taking the first letter of each word in a phrase. Another type is a shortened form, which usually is a substring of a long word, but not always. Contraction is another type of abbreviation, which consists of an abbreviated contraction of multiple words with a separator (usually / ) between each word. We also noted the semantic classes of the abbreviations and found that disease/symptom occurs most frequently (33%), followed by procedure (11%) and labtest (11%). Table 3. Different types of abbreviations. Abbr. Type Examples Frequency Acronym BP Blood Pressure 50.0% Shortened Words Contraction Pt Patient Sx Symptoms t/d/a-tobacco, drugs or alcohol 2/2-secondary to 32.0% 9.0% Others etoh-alcohol 9.0% We performed an error analysis of the abbreviation detection methods and noted several issues. First, not surprisingly, the tokenization step substantially affected the precision and recall of all methods of detection. For example, S. Aureus (Staphylococcus aureus) was broken into three tokens: S,., and Aureus by our simple tokenizer. In the predicted abbreviation list, both S and Aureus were detected individually but not S. Aureus, which should have been considered a single token. Similar problems were also observed when a single token contains a space (e.g. ex tol represents the abbreviation exercise tolerance ). A number of studies concerning tokenization, discussed methods that handle tokenization problems such as ambiguous periods. To develop a sophisticated tokenizer for clinical notes is not in scope of this study, but a tokenizer specifically developed for clinical notes would improve the performance of abbreviation detection systems. The baseline method is simple, but performance was not good. Many terms not in the two word lists were misclassified as abbreviations but were not, which lowered the precision the most. A significant cause of error, which lowered recall, occurred because many abbreviations were actually in the English word list and therefore were not considered abbreviations by the method. For example, cc (e.g. chief complaint) and CT (e.g. cat scan) were included in the English word list. If we manually review the English word list and remove abbreviations, recall of the baseline would be better, but this would be a timeconsuming task. Compared to the baseline, the rule-based method achieved dramatically increased performance because it included some simple rules about word formation. As expected, both decision tree methods had better precision than the rule-based method since the rules generated by the decision tree algorithm were AMIA 2007 Symposium Proceedings Page - 824

5 optimized based on the training set. However, DT2 which used external knowledge about words, performed better, although the recall of that method is a little lower than that of the rule based method. We notice that some abbreviations, which were all lower case, such as prob and uri were not captured by either of the decision tree methods. This may be related to the small size of the training set and to the uneven distribution of positive and negative samples in the training set. Since we used only six notes for training, we anticipate that there will be a larger performance gain when more training data is used. The test set is relative small too and we plan to use a large one in the future. Another interesting observation is that the decision tree method successfully excluded most misspelled words (e.g. givn given ), while the rule-based method did not. After looking at the decision tree that was generated, we believe that the performance gain was mostly from the frequency feature. Another advantage of the decision tree method is that we can modify the weight of the tree to maximize either precision or recall based on our application. Sense inventory from UMLS together with the derived abbreviations had slightly less coverage but a much lower ambiguity rate than the sense inventory from MEDLINE generated database, which indicates it would be a more appropriate source for defining the sense of abbreviations. However, none of the generated sense inventories had adequate coverage, due to the large number of unusual abbreviations (e.g. 2/2 meaning secondary to ). Therefore, a clinical expert is likely to be required to determine the appropriate interpretation, which could vary depending on the note type and clinical domain. Future work might include 1) performing a study to estimate the appropriate size of the training set for the decision tree based detection method so that it will achieve higher performance, 2) developing a method to automatically identify senses of an abbreviation from existing knowledge sources, 3) studying how to facilitate the building of a sense inventory for abbreviations that are not covered by available knowledge sources and 4) developing better tokenizers. CONCLUSIONS In this paper, we developed and evaluated several methods for detecting abbreviations from hospital admission notes, and compared their relative performance. Among them, the decision tree method with external knowledge reached a highest precision of 91.4% with a reasonable recall of 80.3%. Sense inventories were generated from different knowledge sources via a simple method. Evaluation on the coverage of generated sense inventories generated showed that coverage of abbreviations could be up to 67%, but at best only 38% of their senses are covered. The sense inventory from UMLS with derived abbreviations, which covers 35% of abbreviation senses and has an ambiguity rate of 40% for covered abbreviations, may help build a sense inventory automatically, but annotation by domain experts seems to be necessary for uncovered abbreviations. Acknowledgement This study was supported by grants LM007659, LM and K22LM from the NLM and NSF-IIS from NSF. Reference: 1. Haug P. J. et al. A natural language parsing system for encoding admitting disgnoses. AMIA 1997: Aronson AR. Effective mapping of biomedical text to the UMLS matathesaurus: the MetaMap program. AMIA 2001: Friedman, C.et al.. A general natural language text processor for clinical radiology. JAMIA. 1994;1: Long WJ. Parsing free text nursing notes. AMIA 2003: Stetson PD, Johnson SB, Scotch M, Hripcsak G. The sublanguage of cross-coverage.amia 2002: Chang,J.T., Schutze,H. and Altman,R.B. Creating an online dictionary of abbreviations from MEDLINE. JAMIA., 2001, 9, Adar,E. SaRAD: a Simple and Robust Abbreviation Dictionary. Bioinformatics, 2004, 20, Zhou W, Torvik VI, Smalheiser NR. ADAM: another database of abbreviations in MEDLINE. Bioinformatics, 2006 Nov 15;22(22): Liu H, Lussier YA, Friedman C. A study of abbreviations in the UMLS. AMIA 2001: Park Y and Byrd R. Hybrid text mining for finding abbreviations and their definitions. In Proc. EMNLP Toole J. A hybrid approach to the identification and expansion of abbreviations. In RIAO Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research, 2004, 32, D267-D Berman JJ. Pathology abbreviated: a long review of short terms. Arch Pathol Lab Med Mar;128(3): Stetson PD et al. Electronic discharge summaries. AMIA. 2005: Witten I and Frank E. Data Mining: Practical machine learning tools and techniques, 2nd Edition, Morgan Kaufmann, San Fran, Mikheev A. Document centered approach to text normalization. In Proc. SIGIR Kiss T and Strunk J. Scaled log likelihood ratios for the detection of abbreviations in text corpora. ACL 2002:1-5. AMIA 2007 Symposium Proceedings Page - 825

Automatic Identification & Classification of Surgical Margin Status from Pathology Reports Following Prostate Cancer Surgery

Automatic Identification & Classification of Surgical Margin Status from Pathology Reports Following Prostate Cancer Surgery Automatic Identification & Classification of Surgical Margin Status from Pathology Reports Following Prostate Cancer Surgery Leonard W. D Avolio MS a,b, Mark S. Litwin MD c, Selwyn O. Rogers Jr. MD, MPH

More information

Challenges and Practical Approaches with Word Sense Disambiguation of Acronyms and Abbreviations in the Clinical Domain

Challenges and Practical Approaches with Word Sense Disambiguation of Acronyms and Abbreviations in the Clinical Domain Original Article Healthc Inform Res. 2015 January;21(1):35-42. pissn 2093-3681 eissn 2093-369X Challenges and Practical Approaches with Word Sense Disambiguation of Acronyms and Abbreviations in the Clinical

More information

Analyzing the Semantics of Patient Data to Rank Records of Literature Retrieval

Analyzing the Semantics of Patient Data to Rank Records of Literature Retrieval Proceedings of the Workshop on Natural Language Processing in the Biomedical Domain, Philadelphia, July 2002, pp. 69-76. Association for Computational Linguistics. Analyzing the Semantics of Patient Data

More information

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts Erik M. van Mulligen, Zubair Afzal, Saber A. Akhondi, Dang Vo, and Jan A. Kors Department of Medical Informatics, Erasmus

More information

Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor

Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor Text mining for lung cancer cases over large patient admission data David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor Opportunities for Biomedical Informatics Increasing roll-out

More information

Extracting Diagnoses from Discharge Summaries

Extracting Diagnoses from Discharge Summaries Extracting Diagnoses from Discharge Summaries William Long, PhD CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA Abstract We have developed a program for extracting the diagnoses and procedures

More information

Lessons Extracting Diseases from Discharge Summaries

Lessons Extracting Diseases from Discharge Summaries Lessons Extracting Diseases from Discharge Summaries William Long, PhD CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA Abstract We developed a program to extract diseases and procedures

More information

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text Anthony Nguyen 1, Michael Lawley 1, David Hansen 1, Shoni Colquist 2 1 The Australian e-health Research Centre, CSIRO ICT

More information

A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD)

A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD) Journal of the American Medical Informatics Association, 24(e1), 2017, e79 e86 doi: 10.1093/jamia/ocw109 Advance Access Publication Date: 18 August 2016 Research and Applications Research and Applications

More information

Building a framework for handling clinical abbreviations a long journey of understanding shortened words "

Building a framework for handling clinical abbreviations a long journey of understanding shortened words Building a framework for handling clinical abbreviations a long journey of understanding shortened words " Yonghui Wu 1 PhD, Joshua C. Denny 2 MD MS, S. Trent Rosenbloom 2 MD MPH, Randolph A. Miller 2

More information

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods D. Weissenbacher 1, A. Sarker 2, T. Tahsin 1, G. Gonzalez 2 and M. Scotch 1

More information

Comparing ICD9-Encoded Diagnoses and NLP-Processed Discharge Summaries for Clinical Trials Pre-Screening: A Case Study

Comparing ICD9-Encoded Diagnoses and NLP-Processed Discharge Summaries for Clinical Trials Pre-Screening: A Case Study Comparing ICD9-Encoded Diagnoses and NLP-Processed Discharge Summaries for Clinical Trials Pre-Screening: A Case Study Li Li, MS, Herbert S. Chase, MD, Chintan O. Patel, MS Carol Friedman, PhD, Chunhua

More information

Chapter 12 Conclusions and Outlook

Chapter 12 Conclusions and Outlook Chapter 12 Conclusions and Outlook In this book research in clinical text mining from the early days in 1970 up to now (2017) has been compiled. This book provided information on paper based patient record

More information

A review of approaches to identifying patient phenotype cohorts using electronic health records

A review of approaches to identifying patient phenotype cohorts using electronic health records A review of approaches to identifying patient phenotype cohorts using electronic health records Shivade, Raghavan, Fosler-Lussier, Embi, Elhadad, Johnson, Lai Chaitanya Shivade JAMIA Journal Club March

More information

Ambiguity of Human Gene Symbols in LocusLink and MEDLINE: Creating an Inventory and a Disambiguation Test Collection

Ambiguity of Human Gene Symbols in LocusLink and MEDLINE: Creating an Inventory and a Disambiguation Test Collection Ambiguity of Human Gene Symbols in LocusLink and MEDLINE: Creating an Inventory and a Disambiguation Test Collection Marc Weeber, PhD, Bob J. A. Schijvenaars, PhD, Erik M. van Mulligen, PhD, Barend Mons,

More information

KNOWLEDGE-BASED METHOD FOR DETERMINING THE MEANING OF AMBIGUOUS BIOMEDICAL TERMS USING INFORMATION CONTENT MEASURES OF SIMILARITY

KNOWLEDGE-BASED METHOD FOR DETERMINING THE MEANING OF AMBIGUOUS BIOMEDICAL TERMS USING INFORMATION CONTENT MEASURES OF SIMILARITY KNOWLEDGE-BASED METHOD FOR DETERMINING THE MEANING OF AMBIGUOUS BIOMEDICAL TERMS USING INFORMATION CONTENT MEASURES OF SIMILARITY 1 Bridget McInnes Ted Pedersen Ying Liu Genevieve B. Melton Serguei Pakhomov

More information

IBM Research Report. Automated Problem List Generation from Electronic Medical Records in IBM Watson

IBM Research Report. Automated Problem List Generation from Electronic Medical Records in IBM Watson RC25496 (WAT1409-068) September 24, 2014 Computer Science IBM Research Report Automated Problem List Generation from Electronic Medical Records in IBM Watson Murthy Devarakonda, Ching-Huei Tsou IBM Research

More information

The Impact of Belief Values on the Identification of Patient Cohorts

The Impact of Belief Values on the Identification of Patient Cohorts The Impact of Belief Values on the Identification of Patient Cohorts Travis Goodwin, Sanda M. Harabagiu Human Language Technology Research Institute University of Texas at Dallas Richardson TX, 75080 {travis,sanda}@hlt.utdallas.edu

More information

Biomedical resources for text mining

Biomedical resources for text mining August 30, 2005 Workshop Terminologies and ontologies in biomedicine: Can text mining help? Biomedical resources for text mining Olivier Bodenreider Lister Hill National Center for Biomedical Communications

More information

How can Natural Language Processing help MedDRA coding? April Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics

How can Natural Language Processing help MedDRA coding? April Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics How can Natural Language Processing help MedDRA coding? April 16 2018 Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics Summary About NLP and NLP in life sciences Uses of NLP with MedDRA

More information

Detecting Patient Complexity from Free Text Notes Using a Hybrid AI Approach

Detecting Patient Complexity from Free Text Notes Using a Hybrid AI Approach Detecting Patient Complexity from Free Text Notes Using a Hybrid AI Approach Malcolm Pradhan, CMO MBBS, PhD, FACHI Daniel Padilla, ML Engineer BEng,, PhD Alcidion Corporation Overview Alcidion s Natural

More information

How preferred are preferred terms?

How preferred are preferred terms? How preferred are preferred terms? Gintare Grigonyte 1, Simon Clematide 2, Fabio Rinaldi 2 1 Computational Linguistics Group, Department of Linguistics, Stockholm University Universitetsvagen 10 C SE-106

More information

CLAMP-Cancer an NLP tool to facilitate cancer research using EHRs Hua Xu, PhD

CLAMP-Cancer an NLP tool to facilitate cancer research using EHRs Hua Xu, PhD CLAMP-Cancer an NLP tool to facilitate cancer research using EHRs Hua Xu, PhD School of Biomedical Informatics The University of Texas Health Science Center at Houston 1 Advancing Cancer Pharmacoepidemiology

More information

Making the Best Use of Textual ED Data for Syndromic Surveillance

Making the Best Use of Textual ED Data for Syndromic Surveillance Making the Best Use of Textual ED Data for Syndromic Surveillance Debbie Travers, PhD, RN, FAEN, CEN Associate Professor, Health Care Systems & Emergency Medicine Faculty, Carolina Health Informatics Program

More information

Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments

Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations December 2014 Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments Eric James Klosterman University

More information

TeamHCMUS: Analysis of Clinical Text

TeamHCMUS: Analysis of Clinical Text TeamHCMUS: Analysis of Clinical Text Nghia Huynh Faculty of Information Technology University of Science, Ho Chi Minh City, Vietnam huynhnghiavn@gmail.com Quoc Ho Faculty of Information Technology University

More information

Retrieving disorders and findings: Results using SNOMED CT and NegEx adapted for Swedish

Retrieving disorders and findings: Results using SNOMED CT and NegEx adapted for Swedish Retrieving disorders and findings: Results using SNOMED CT and NegEx adapted for Swedish Maria Skeppstedt 1,HerculesDalianis 1,andGunnarHNilsson 2 1 Department of Computer and Systems Sciences (DSV)/Stockholm

More information

COMPARISON OF BREAST CANCER STAGING IN NATURAL LANGUAGE TEXT AND SNOMED ANNOTATED TEXT

COMPARISON OF BREAST CANCER STAGING IN NATURAL LANGUAGE TEXT AND SNOMED ANNOTATED TEXT Volume 116 No. 21 2017, 243-249 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu COMPARISON OF BREAST CANCER STAGING IN NATURAL LANGUAGE TEXT AND SNOMED

More information

Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track

Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track Christopher Wing and Hui Yang Department of Computer Science, Georgetown University,

More information

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation 1,2,3 EMR and Intelligent Expert System Engineering Research Center of

More information

READ-BIOMED-SS: ADVERSE DRUG REACTION CLASSIFICATION OF MICROBLOGS USING EMOTIONAL AND CONCEPTUAL ENRICHMENT

READ-BIOMED-SS: ADVERSE DRUG REACTION CLASSIFICATION OF MICROBLOGS USING EMOTIONAL AND CONCEPTUAL ENRICHMENT READ-BIOMED-SS: ADVERSE DRUG REACTION CLASSIFICATION OF MICROBLOGS USING EMOTIONAL AND CONCEPTUAL ENRICHMENT BAHADORREZA OFOGHI 1, SAMIN SIDDIQUI 1, and KARIN VERSPOOR 1,2 1 Department of Computing and

More information

Extraction of Adverse Drug Effects from Clinical Records

Extraction of Adverse Drug Effects from Clinical Records MEDINFO 2010 C. Safran et al. (Eds.) IOS Press, 2010 2010 IMIA and SAHIA. All rights reserved. doi:10.3233/978-1-60750-588-4-739 739 Extraction of Adverse Drug Effects from Clinical Records Eiji Aramaki

More information

A Method for Analyzing Commonalities in Clinical Trial Target Populations

A Method for Analyzing Commonalities in Clinical Trial Target Populations A Method for Analyzing Commonalities in Clinical Trial Target Populations Zhe (Henry) He 1, Simona Carini 2, Tianyong Hao 1, Ida Sim 2, and Chunhua Weng 1 1 Department of Biomedical Informatics, Columbia

More information

Factuality Levels of Diagnoses in Swedish Clinical Text

Factuality Levels of Diagnoses in Swedish Clinical Text User Centred Networked Health Care A. Moen et al. (Eds.) IOS Press, 2011 2011 European Federation for Medical Informatics. All rights reserved. doi:10.3233/978-1-60750-806-9-559 559 Factuality Levels of

More information

Effect of (OHDSI) Vocabulary Mapping on Phenotype Cohorts

Effect of (OHDSI) Vocabulary Mapping on Phenotype Cohorts Effect of (OHDSI) Vocabulary Mapping on Phenotype Cohorts Matthew Levine, Research Associate George Hripcsak, Professor Department of Biomedical Informatics, Columbia University Intro Reasons to map: International

More information

Boundary identification of events in clinical named entity recognition

Boundary identification of events in clinical named entity recognition Boundary identification of events in clinical named entity recognition Azad Dehghan School of Computer Science The University of Manchester Manchester, UK a.dehghan@cs.man.ac.uk Abstract The problem of

More information

Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval

Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval Enhanced Cohort Identification and Retrieval S105 Tracy Edinger, ND, MS Oregon Health & Science University Twitter: #AMIA2017 Co-Authors

More information

Automatic coding of death certificates to ICD-10 terminology

Automatic coding of death certificates to ICD-10 terminology Automatic coding of death certificates to ICD-10 terminology Jitendra Jonnagaddala 1,2, * and Feiyan Hu 3 1 School of Public Health and Community Medicine, UNSW Sydney, Australia 2 Prince of Wales Clinical

More information

Kalpana Raja, PhD 1, Andrew J Sauer, MD 2,3, Ravi P Garg, MSc 1, Melanie R Klerer 1, Siddhartha R Jonnalagadda, PhD 1*

Kalpana Raja, PhD 1, Andrew J Sauer, MD 2,3, Ravi P Garg, MSc 1, Melanie R Klerer 1, Siddhartha R Jonnalagadda, PhD 1* A Hybrid Citation Retrieval Algorithm for Evidence-based Clinical Knowledge Summarization: Combining Concept Extraction, Vector Similarity and Query Expansion for High Precision Kalpana Raja, PhD 1, Andrew

More information

FDA Workshop NLP to Extract Information from Clinical Text

FDA Workshop NLP to Extract Information from Clinical Text FDA Workshop NLP to Extract Information from Clinical Text Murthy Devarakonda, Ph.D. Distinguished Research Staff Member PI for Watson Patient Records Analytics Project IBM Research mdev@us.ibm.com *This

More information

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials Riccardo Miotto and Chunhua Weng Department of Biomedical Informatics Columbia University,

More information

Headings: Information Extraction. Natural Language Processing. Blogs. Discussion Forums. Named Entity Recognition

Headings: Information Extraction. Natural Language Processing. Blogs. Discussion Forums. Named Entity Recognition Het R. Mehta. Quantitative Analysis of Physician language and Patient language in Social Media. A Master s Paper for the M.S. in IS degree. April 2017. 33 pages. Advisor: Stephanie W. Haas There has been

More information

Clinical Event Detection with Hybrid Neural Architecture

Clinical Event Detection with Hybrid Neural Architecture Clinical Event Detection with Hybrid Neural Architecture Adyasha Maharana Biomedical and Health Informatics University of Washington, Seattle adyasha@uw.edu Meliha Yetisgen Biomedical and Health Informatics

More information

Schema-Driven Relationship Extraction from Unstructured Text

Schema-Driven Relationship Extraction from Unstructured Text Wright State University CORE Scholar Kno.e.sis Publications The Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis) 2007 Schema-Driven Relationship Extraction from Unstructured Text Cartic

More information

Identifying Adverse Drug Events from Patient Social Media: A Case Study for Diabetes

Identifying Adverse Drug Events from Patient Social Media: A Case Study for Diabetes Identifying Adverse Drug Events from Patient Social Media: A Case Study for Diabetes Authors: Xiao Liu, Department of Management Information Systems, University of Arizona Hsinchun Chen, Department of

More information

Not all NLP is Created Equal:

Not all NLP is Created Equal: Not all NLP is Created Equal: CAC Technology Underpinnings that Drive Accuracy, Experience and Overall Revenue Performance Page 1 Performance Perspectives Health care financial leaders and health information

More information

Clinical Trial and Evaluation of a Prototype Case-Based System for Planning Medical Imaging Work-up Strategies

Clinical Trial and Evaluation of a Prototype Case-Based System for Planning Medical Imaging Work-up Strategies From: AAAI Technical Report WS-94-01. Compilation copyright 1994, AAAI (www.aaai.org). All rights reserved. Clinical Trial and Evaluation of a Prototype Case-Based System for Planning Medical Imaging Work-up

More information

Pneumonia identification using statistical feature selection

Pneumonia identification using statistical feature selection Pneumonia identification using statistical feature selection Research and applications Cosmin Adrian Bejan, 1 Fei Xia, 1,2 Lucy Vanderwende, 1,3 Mark M Wurfel, 4 Meliha Yetisgen-Yildiz 1,2 < An additional

More information

Use of Online Resources While Using a Clinical Information System

Use of Online Resources While Using a Clinical Information System Use of Online Resources While Using a Clinical Information System James J. Cimino, MD; Jianhua Li, MD; Mark Graham, PhD, Leanne M. Currie, RN, MS; Mureen Allen, MB BS, Suzanne Bakken, RN, DNSc, Vimla L.

More information

General Symptom Extraction from VA Electronic Medical Notes

General Symptom Extraction from VA Electronic Medical Notes General Symptom Extraction from VA Electronic Medical Notes Guy Divita a,b, Gang Luo, PhD c, Le-Thuy T. Tran, PhD a,b, T. Elizabeth Workman, PhD a,b, Adi V. Gundlapalli, MD, PhD a,b, Matthew H. Samore,

More information

PhenDisco: a new phenotype discovery system for the database of genotypes and phenotypes

PhenDisco: a new phenotype discovery system for the database of genotypes and phenotypes PhenDisco: a new phenotype discovery system for the database of genotypes and phenotypes Son Doan, Hyeoneui Kim Division of Biomedical Informatics University of California San Diego Open Access Journal

More information

Asthma Surveillance Using Social Media Data

Asthma Surveillance Using Social Media Data Asthma Surveillance Using Social Media Data Wenli Zhang 1, Sudha Ram 1, Mark Burkart 2, Max Williams 2, and Yolande Pengetnze 2 University of Arizona 1, PCCI-Parkland Center for Clinical Innovation 2 {wenlizhang,

More information

Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy

Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy Bryan Tinsley, Alex Thomas, Joseph F. McCarthy, Mike Lazarus Atigeo, LLC

More information

Problem-Oriented Patient Record Summary: An Early Report on a Watson Application

Problem-Oriented Patient Record Summary: An Early Report on a Watson Application Problem-Oriented Patient Record Summary: An Early Report on a Watson Application Murthy Devarakonda, Dongyang Zhang, Ching-Huei Tsou, Mihaela Bornea IBM Research and Watson Group Yorktown Heights, NY Abstract

More information

Jeremy Lai. at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY. June, Massachusetts Institute of Technology All rights reserved.

Jeremy Lai. at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY. June, Massachusetts Institute of Technology All rights reserved. Concept Extraction for Disability Insurance Payment Evaluation by Jeremy Lai Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the

More information

WHI Extension Appendix A, Form 33D Medical History Update (Detail) (Ver. 9) Page 1

WHI Extension Appendix A, Form 33D Medical History Update (Detail) (Ver. 9) Page 1 WHI Extension Appendix A, Form 33D Medical History Update (Detail) (Ver. 9) Page 1 FORM: 33D - MEDICAL HISTORY UPDATE (Detail) Version: 9 March 30, 2007 Description: When used: Self-administered or interviewer-administered;

More information

Validating Patient Names in an Integrated Clinical Information System

Validating Patient Names in an Integrated Clinical Information System Validating Patient Names in an Integrated Clinical Information System Robert V. Sideli, M.D. Carol Friedman, Ph.D.* Columbia-Presbyterian Medical Center, New York * Queens College of the City University

More information

Keeping Abreast of Breast Imagers: Radiology Pathology Correlation for the Rest of Us

Keeping Abreast of Breast Imagers: Radiology Pathology Correlation for the Rest of Us SIIM 2016 Scientific Session Quality and Safety Part 1 Thursday, June 30 8:00 am 9:30 am Keeping Abreast of Breast Imagers: Radiology Pathology Correlation for the Rest of Us Linda C. Kelahan, MD, Medstar

More information

Automatic Identification of Pneumonia Related Concepts on Chest x-ray Reports

Automatic Identification of Pneumonia Related Concepts on Chest x-ray Reports Automatic Identification of Pneumonia Related Concepts on Chest x-ray Reports Marcelo Fiszman MD, Wendy W. Chapman, Scott R. Evans Ph.D., Peter J. Haug MD Department of Medical Informatics, LDS Hospital,

More information

Curriculum Vitae. Degree and date to be conferred: Masters in Computer Science, 2013.

Curriculum Vitae. Degree and date to be conferred: Masters in Computer Science, 2013. i Curriculum Vitae Name: Deepal Dhariwal. Degree and date to be conferred: Masters in Computer Science, 2013. Secondary education: Dr. Kalmadi Shamarao High School, Pune, 2005 Fergusson College, Pune 2007

More information

Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports

Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports Ramakanth Kavuluru, Ph.D 1, Isaac Hands, B.S 2, Eric B. Durbin, DrPH 2, and Lisa Witt, A.S 2 1 Division of Biomedical Informatics,

More information

WikiWarsDE: A German Corpus of Narratives Annotated with Temporal Expressions

WikiWarsDE: A German Corpus of Narratives Annotated with Temporal Expressions WikiWarsDE: A German Corpus of Narratives Annotated with Temporal Expressions Jannik Strötgen, Michael Gertz Institute of Computer Science, Heidelberg University Im Neuenheimer Feld 348, 69120 Heidelberg,

More information

Evaluation of SNOMED Coverage of Veterans Health Administration Terms

Evaluation of SNOMED Coverage of Veterans Health Administration Terms MEDINFO 2004 M. Fieschi et al. (Eds) Amsterdam: IOS Press 2004 IMIA. All rights reserved Evaluation of SNOMED Coverage of Veterans Health Administration Terms Janet FE Penz a, Steven H Brown b, John S

More information

On-time clinical phenotype prediction based on narrative reports

On-time clinical phenotype prediction based on narrative reports On-time clinical phenotype prediction based on narrative reports Cosmin A. Bejan, PhD 1, Lucy Vanderwende, PhD 2,1, Heather L. Evans, MD, MS 3, Mark M. Wurfel, MD, PhD 4, Meliha Yetisgen-Yildiz, PhD 1,5

More information

Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports

Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports Ramon Maldonado, BS, Travis Goodwin, PhD Sanda M. Harabagiu, PhD The University

More information

Evaluating Classifiers for Disease Gene Discovery

Evaluating Classifiers for Disease Gene Discovery Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics

More information

Distillation of Knowledge from the Research Literatures on Alzheimer s Dementia

Distillation of Knowledge from the Research Literatures on Alzheimer s Dementia JSCI 2017 1 Distillation of Knowledge from the Research Literatures on Alzheimer s Dementia Wutthipong Kongburan, Mark Chignell, and Jonathan H. Chan School of Information Technology King Mongkut's University

More information

Exploiting Task-Oriented Resources to Learn Word Embeddings for Clinical Abbreviation Expansion

Exploiting Task-Oriented Resources to Learn Word Embeddings for Clinical Abbreviation Expansion Exploiting Task-Oriented Resources to Learn Word Embeddings for Clinical Abbreviation Expansion Yue Liu 1, Tao Ge 2, Kusum Mathews 3, Heng Ji 1, Deborah L. McGuinness 1 1 Department of Computer Science,

More information

HHS Public Access Author manuscript Stud Health Technol Inform. Author manuscript; available in PMC 2016 July 22.

HHS Public Access Author manuscript Stud Health Technol Inform. Author manuscript; available in PMC 2016 July 22. Analyzing Differences between Chinese and English Clinical Text: A Cross-Institution Comparison of Discharge Summaries in Two Languages Yonghui Wu a,*, Jianbo Lei a,b,*, Wei-Qi Wei c, Buzhou Tang a, Joshua

More information

Laboratory Report, APA Style (Psychology)

Laboratory Report, APA Style (Psychology) Laboratory Report, APA Style (Psychology) Running head: REACTION TIMES IN TWO VISUAL SEARCH TASKS 1 The header consists of a shortened title (no more than 50 characters) in all capital letters at the left

More information

Automatically extracting, ranking and visually summarizing the treatments for a disease

Automatically extracting, ranking and visually summarizing the treatments for a disease Automatically extracting, ranking and visually summarizing the treatments for a disease Prakash Reddy Putta, B.Tech 1,2, John J. Dzak III, BS 1, Siddhartha R. Jonnalagadda, PhD 1 1 Division of Health and

More information

Data Mining in Bioinformatics Day 4: Text Mining

Data Mining in Bioinformatics Day 4: Text Mining Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen Karsten Borgwardt: Data Mining in Bioinformatics, Page 1 What is text mining?

More information

Word2Vec and Doc2Vec in Unsupervised Sentiment Analysis of Clinical Discharge Summaries

Word2Vec and Doc2Vec in Unsupervised Sentiment Analysis of Clinical Discharge Summaries Word2Vec and Doc2Vec in Unsupervised Sentiment Analysis of Clinical Discharge Summaries Qufei Chen University of Ottawa qchen037@uottawa.ca Marina Sokolova IBDA@Dalhousie University and University of Ottawa

More information

Building Realistic Potential Patient Queries for Medical Information Retrieval Evaluation

Building Realistic Potential Patient Queries for Medical Information Retrieval Evaluation Building Realistic Potential Patient Queries for Medical Information Retrieval Evaluation Lorraine Goeuriot 1, Wendy Chapman 2, Gareth J.F. Jones 1, Liadh Kelly 1, Johannes Leveling 1, Sanna Salanterä

More information

BMC Medical Informatics and Decision Making 2006, 6:30

BMC Medical Informatics and Decision Making 2006, 6:30 BMC Medical Informatics and Decision Making This Provisional PDF corresponds to the article as it appeared upon acceptance. Copyedited and fully formatted PDF and full text (HTML) versions will be made

More information

Annotating Temporal Relations to Determine the Onset of Psychosis Symptoms

Annotating Temporal Relations to Determine the Onset of Psychosis Symptoms Annotating Temporal Relations to Determine the Onset of Psychosis Symptoms Natalia Viani, PhD IoPPN, King s College London Introduction: clinical use-case For patients with schizophrenia, longer durations

More information

Modeling Annotator Rationales with Application to Pneumonia Classification

Modeling Annotator Rationales with Application to Pneumonia Classification Modeling Annotator Rationales with Application to Pneumonia Classification Michael Tepper 1, Heather L. Evans 3, Fei Xia 1,2, Meliha Yetisgen-Yildiz 2,1 1 Department of Linguistics, 2 Biomedical and Health

More information

UMLS and phenotype coding

UMLS and phenotype coding One Medicine One Pathology: 2 nd annual CASIMIR Symposium on Human and Mouse Disease Informatics UMLS and phenotype coding Anita Burgun, Fleur Mougin, Olivier Bodenreider INSERM U936, EA 3888- Faculté

More information

Clinical Narratives Context Categorization: The Clinician Approach using RapidMiner

Clinical Narratives Context Categorization: The Clinician Approach using RapidMiner , pp.128-138 http://dx.doi.org/10.14257/astl.2014.51.30 Clinical Narratives Context Categorization: The Clinician Approach using RapidMiner Osama Mohammed 1, Sabah Mohammed 2, Jinan Fiaidhi 2, Simon Fong

More information

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts jsci2016 Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Wutthipong Kongburan, Praisan Padungweang, Worarat Krathu, Jonathan H. Chan School of Information Technology King

More information

HHS Public Access Author manuscript Stud Health Technol Inform. Author manuscript; available in PMC 2015 July 08.

HHS Public Access Author manuscript Stud Health Technol Inform. Author manuscript; available in PMC 2015 July 08. Navigating Longitudinal Clinical Notes with an Automated Method for Detecting New Information Rui Zhang a, Serguei Pakhomov a,b, Janet T. Lee c, and Genevieve B. Melton a,c a Institute for Health Informatics,

More information

Identifying Deviations from Usual Medical Care using a Statistical Approach

Identifying Deviations from Usual Medical Care using a Statistical Approach Identifying Deviations from Usual Medical Care using a Statistical Approach Shyam Visweswaran, MD, PhD 1, James Mezger, MD, MS 2, Gilles Clermont, MD, MSc 3, Milos Hauskrecht, PhD 4, Gregory F. Cooper,

More information

Term variation in clinical records First insights from a corpus study

Term variation in clinical records First insights from a corpus study Term variation in clinical records First insights from a corpus study Séminaires du CENTAL 09 March 2018 Leonie Grön PhD candidate at Quantitative Lexicology and Variational Linguistics Supervised by Ann

More information

Toward a Unified Representation of Findings in Clinical Radiology

Toward a Unified Representation of Findings in Clinical Radiology Toward a Unified Representation of Findings in Clinical Radiology Valérie Bertaud a, Jérémy Lasbleiz ab, Fleur Mougin a, Franck Marin a, Anita Burgun a, Régis Duvauferrier ab a EA 3888, LIM, Faculty of

More information

Informatics methods in Infection and Using computers to help find infection Syndromic Surveillance

Informatics methods in Infection and Using computers to help find infection Syndromic Surveillance Informatics methods in Infection and Using computers to help find infection Syndromic Surveillance Professor Karin Verspoor @karinv School of Computing and Information Systems The University of Melbourne

More information

. Semi-automatic WordNet Linking using Word Embeddings. Kevin Patel, Diptesh Kanojia and Pushpak Bhattacharyya Presented by: Ritesh Panjwani

. Semi-automatic WordNet Linking using Word Embeddings. Kevin Patel, Diptesh Kanojia and Pushpak Bhattacharyya Presented by: Ritesh Panjwani Semi-automatic WordNet Linking using Word Embeddings Kevin Patel, Diptesh Kanojia and Pushpak Bhattacharyya Presented by: Ritesh Panjwani January 11, 2018 Kevin Patel WordNet Linking via Embeddings 1/22

More information

Statement of research interest

Statement of research interest Statement of research interest Milos Hauskrecht My primary field of research interest is Artificial Intelligence (AI). Within AI, I am interested in problems related to probabilistic modeling, machine

More information

These Terms Synonym Term Manual Used Relationship

These Terms Synonym Term Manual Used Relationship These Terms Synonym Term Manual Used Relationship For more information see the 'Term Basket' section below. In this manner, a set of GO terms can be selected that can then be used to retrieve window showing

More information

Shades of Certainty Working with Swedish Medical Records and the Stockholm EPR Corpus

Shades of Certainty Working with Swedish Medical Records and the Stockholm EPR Corpus Shades of Certainty Working with Swedish Medical Records and the Stockholm EPR Corpus Sumithra VELUPILLAI, Ph.D. Oslo, May 30 th 2012 Health Care Analytics and Modeling, Dept. of Computer and Systems Sciences

More information

Cardiac Risk Prediction Analysis Using Spark Python (PySpark)

Cardiac Risk Prediction Analysis Using Spark Python (PySpark) Cardiac Prediction Analysis Using Spark Python (PySpark) G.Tirupati, Prof. K.Venkata Rao Abstract-Cardiovascular disease is the acute disorder in the world today. Disease control and early diagnosis of

More information

Ad Hoc Classification of Radiology Reports

Ad Hoc Classification of Radiology Reports Journal of the American Medical Informatics Association Volume 6 Number 5 Sep / Oct 1999 393 Research Paper Ad Hoc Classification of Radiology Reports DAVID B. ARONOW, MD, MPH, FENG FANGFANG, MD, W. BRUCE

More information

Wikipedia-Based Automatic Diagnosis Prediction in Clinical Decision Support Systems

Wikipedia-Based Automatic Diagnosis Prediction in Clinical Decision Support Systems Wikipedia-Based Automatic Diagnosis Prediction in Clinical Decision Support Systems Danchen Zhang 1, Daqing He 1, Sanqiang Zhao 1, Lei Li 1 School of Information Sciences, University of Pittsburgh, USA

More information

Clinical Narrative Analytics Challenges

Clinical Narrative Analytics Challenges Clinical Narrative Analytics Challenges Ernestina Menasalvas (B), Alejandro Rodriguez-Gonzalez, Roberto Costumero, Hector Ambit, and Consuelo Gonzalo Centro de Tecnología Biomédica, Universidad Politécnica

More information

Seeking Informativeness in Literature Based Discovery

Seeking Informativeness in Literature Based Discovery Seeking Informativeness in Literature Based Discovery Judita Preiss University of Sheffield, Department of Computer Science Regent Court, 211 Portobello Sheffield S1 4DP, United Kingdom j.preiss@sheffield.ac.uk

More information

Looking for Subjectivity in Medical Discharge Summaries The Obesity NLP i2b2 Challenge (2008)

Looking for Subjectivity in Medical Discharge Summaries The Obesity NLP i2b2 Challenge (2008) Looking for Subjectivity in Medical Discharge Summaries The Obesity NLP i2b2 Challenge (2008) Michael Roylance and Nicholas Waltner Tuesday 3 rd June, 2014 Michael Roylance and Nicholas Waltner Looking

More information

Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients

Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients Abstract Prognosis for stage IV (metastatic) breast cancer is difficult for clinicians to predict. This study examines the

More information

Using Timeline Displays to Improve Medication Reconciliation

Using Timeline Displays to Improve Medication Reconciliation International Conference on ehealth, Telemedicine, and Social Medicine Using Timeline Displays to Improve Medication Reconciliation Xinxin Zhu, MD, MS 1, Sigfried Gold, MFA 1, Albert Lai, PhD 1, George

More information

Helping Healthcare Consumers Understand: An Interpretive Layer for Finding and Making Sense of Medical Information

Helping Healthcare Consumers Understand: An Interpretive Layer for Finding and Making Sense of Medical Information MEDINFO 2004 M. Fieschi et al. (Eds) Amsterdam: IOS Press 2004 IMIA. All rights reserved Helping Healthcare Consumers Understand: An Interpretive Layer for Finding and Making Sense of Medical Information

More information

Analysis of Semantic Classes in Medical Text for Question Answering

Analysis of Semantic Classes in Medical Text for Question Answering Analysis of Semantic Classes in Medical Text for Question Answering Yun Niu and Graeme Hirst Department of Computer Science University of Toronto Toronto, Ontario M5S 3G4 Canada yun@cs.toronto.edu, gh@cs.toronto.edu

More information

International Journal of Computer Engineering and Applications, Volume XI, Issue IX, September 17, ISSN

International Journal of Computer Engineering and Applications, Volume XI, Issue IX, September 17,  ISSN CRIME ASSOCIATION ALGORITHM USING AGGLOMERATIVE CLUSTERING Saritha Quinn 1, Vishnu Prasad 2 1 Student, 2 Student Department of MCA, Kristu Jayanti College, Bengaluru, India ABSTRACT The commission of a

More information