KNOWLEDGE-BASED METHOD FOR DETERMINING THE MEANING OF AMBIGUOUS BIOMEDICAL TERMS USING INFORMATION CONTENT MEASURES OF SIMILARITY

Similar documents
FINAL REPORT Measuring Semantic Relatedness using a Medical Taxonomy. Siddharth Patwardhan. August 2003

A Survey on Semantic Similarity Measures between Concepts in Health Domain

(2017) Comparison of Ontology-Based Semantic-Similarity. in the Biomedical Text. Abstract. Keywords. 1. Introduction. Ahmad Fayez S.

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1

Journal of Biomedical Informatics

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

Challenges and Practical Approaches with Word Sense Disambiguation of Acronyms and Abbreviations in the Clinical Domain

How preferred are preferred terms?

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods

Biomedical resources for text mining

Building a framework for handling clinical abbreviations a long journey of understanding shortened words "

ONTOLOGY-BASED METHODS FOR DISEASE SIMILARITY ESTIMATION AND DRUG REPOSITIONING. A DISSERTATION IN Computer Science and Mathematics

Automatic Identification & Classification of Surgical Margin Status from Pathology Reports Following Prostate Cancer Surgery

Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track

Ambiguity of Human Gene Symbols in LocusLink and MEDLINE: Creating an Inventory and a Disambiguation Test Collection

Semantic similarity-driven decision support in the skeletal dysplasia domain

A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD)

Back to the Future: Knowledge Light Case Base Cookery

Word Sense Disambiguation Using Sense Signatures Automatically Acquired from a Second Language

. Semi-automatic WordNet Linking using Word Embeddings. Kevin Patel, Diptesh Kanojia and Pushpak Bhattacharyya Presented by: Ritesh Panjwani

CLAMP-Cancer an NLP tool to facilitate cancer research using EHRs Hua Xu, PhD

Evaluating Measures of Redundancy in Clinical Texts

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts

Pilot Study: Clinical Trial Task Ontology Development. A prototype ontology of common participant-oriented clinical research tasks and

Schema-Driven Relationship Extraction from Unstructured Text

CLINICAL ENCOUNTER INFORMATION FLOW: APPLICATIONS IN EVALUATING MEDICAL DOCUMENTATION TOOLS. Naqi Ali Khan. Thesis. Submitted to the Faculty of the

Analyzing the Semantics of Patient Data to Rank Records of Literature Retrieval

Exploitation of semantic similarity for adaptation of existing terminologies within biomedical area

Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports

Elsevier ClinicalKey TM FAQs

Semantic interestingness measures for discovering association rules in the skeletal dysplasia domain

Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor

Detecting Patient Complexity from Free Text Notes Using a Hybrid AI Approach

A Method for Analyzing Commonalities in Clinical Trial Target Populations

Innovative Risk and Quality Solutions for Value-Based Care. Company Overview

Semantic Similarity between Ontologies at Different Scales

Combining unsupervised and supervised methods for PP attachment disambiguation

Computer based delineation and follow-up multisite abdominal tumors in longitudinal CT studies

Approximating hierarchy-based similarity for WordNet nominal synsets using Topic Signatures

Modeling Sentiment with Ridge Regression

Automatic generation of MedDRA terms groupings using an ontology

Toward a Unified Representation of Findings in Clinical Radiology

Extracting Diagnoses from Discharge Summaries

Automatically extracting, ranking and visually summarizing the treatments for a disease

Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts

How to code rare diseases with international terminologies?

These Terms Synonym Term Manual Used Relationship

FDA Workshop NLP to Extract Information from Clinical Text

A framework for the study of diseases and adverse drug reactions

Problem-Oriented Patient Record Summary: An Early Report on a Watson Application

Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions?

Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data

Evaluation of SNOMED Coverage of Veterans Health Administration Terms

General Symptom Extraction from VA Electronic Medical Notes

Natural Language Processing for EHR-Based Computational Phenotyping

Text Mining of Patient Demographics and Diagnoses from Psychiatric Assessments

Automatic Indexing and Retrieving Context-Specific Evidence to Support Physician Decision Making at the Point of Care

Feedback-Driven Radiology Exam Report Retrieval with Semantics

ADAPTING COPYCAT TO CONTEXT-DEPENDENT VISUAL OBJECT RECOGNITION

Headings: Information Extraction. Natural Language Processing. Blogs. Discussion Forums. Named Entity Recognition

A Framework for Public Health Surveillance

A Study on a Comparison of Diagnostic Domain between SNOMED CT and Korea Standard Terminology of Medicine. Mijung Kim 1* Abstract

A Quick-Start Guide for rseqdiff

Results. NeuRA Hypnosis June 2016

Medical information: Where to find it, what to trust. Lewis H. Rowett Executive Editor Annals of Oncology

Curriculum Vitae. Degree and date to be conferred: Masters in Computer Science, 2013.

PhenDisco: a new phenotype discovery system for the database of genotypes and phenotypes

Insights into Analogy Completion from the Biomedical Domain

Mapping the UMLS Semantic Network into General Ontologies

Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy

NLM at TREC 2012 Medical Records track

Identifying Adverse Drug Events from Patient Social Media: A Case Study for Diabetes

PubMed Tutorial Author: Gökhan Alpaslan DMD,Ph.D. e-vident

Word2Vec and Doc2Vec in Unsupervised Sentiment Analysis of Clinical Discharge Summaries

Identifying anatomical concepts associated with ICD10 diseases

Journal of Biomedical Informatics

Extraction of Adverse Drug Effects from Clinical Records

The Impact of Belief Values on the Identification of Patient Cohorts

Cochrane Breast Cancer Group

A computational framework for discovery of glycoproteomic biomarkers

Seeking Informativeness in Literature Based Discovery

Research proposal. The impact of medical knowledge accumulation and diffusion on health: evidence from Medline and other N.I.H.

Automatic extraction of adverse drug reaction terms from medical free text

Senior Design Project

KOS Design for Healthcare Decision-making Based on Consumer Criteria and User Stories

Characterizing Concepts in Taxonomy for Entity Recommendations

Formulation of a Focused Clinical Question Using PICO University of Mary Department of Physical Therapy Michael G. Parker, PhD, PT, FACSM, Professor

Clay Tablet Connector for hybris. User Guide. Version 1.5.0

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011

EDUCATION/TRAINING (Begin with baccalaureate or other initial professional education, such as nursing, and include postdoctoral training.

Automatic Creation and Refinement of the Clusters of Pharmacovigilance Terms

Exploiting Task-Oriented Resources to Learn Word Embeddings for Clinical Abbreviation Expansion

Predicting About-to-Eat Moments for Just-in-Time Eating Intervention

Fusing Generic Objectness and Visual Saliency for Salient Object Detection

READ-BIOMED-SS: ADVERSE DRUG REACTION CLASSIFICATION OF MICROBLOGS USING EMOTIONAL AND CONCEPTUAL ENRICHMENT

The Danish Sign Language Dictionary Jette H. Kristoffersen and Thomas Troelsgård University College Capital, Copenhagen, Denmark

UKParl: A Data Set for Topic Detection with Semantically Annotated Text

Vision: Over Ov view Alan Yuille

Asthma Surveillance Using Social Media Data

Transcription:

KNOWLEDGE-BASED METHOD FOR DETERMINING THE MEANING OF AMBIGUOUS BIOMEDICAL TERMS USING INFORMATION CONTENT MEASURES OF SIMILARITY 1 Bridget McInnes Ted Pedersen Ying Liu Genevieve B. Melton Serguei Pakhomov

OBJECTIVE OFTHIS WORK Develop and evaluate a method than can disambiguate terms in biomedical text by exploiting similarity information extrapolated from the Unified Medical Language System Evaluate the efficacy of Information Content-based similarity measures over path-based similarity measures for Word Sense Disambiguation, WSD 2

WORD SENSE DISAMBIGUATION Word sense disambiguation is the task of determining the appropriate sense of a term given context in which it is used. TERM: tolerance Drug Tolerance Immune Tolerance 3

WORD SENSE DISAMBIGUATION Word sense disambiguation is the task of determining the appropriate sense of a term given context in which it is used. Busprione attenuates tolerance to morphine in mice with skin cancer Drug Tolerance Immune Tolerance 4

SENSE INVENTORY: UNIFIED MEDICAL LANGUAGE SYSTEM Unified Medical Language Sources (UMLS) Semantic Network Metathesaurus ~1.7 million biomedical and clinical concepts; integrated semi-automatically CUIs (Concept Unique Identifiers), linked: Hierarchical: PAR/CHD and RB/RN Non-hierarchical: SIB, RO Sources viewed together or independently Medical Subject Heading (MSH) SPECIALIST Lexicon Biomedical and clinical terms, including variants 5

WORD SENSE DISAMBIGUATION Busprione attenuates tolerance to morphine in mice with skin cancer Drug Tolerance: C0013220 Immune Tolerance: C0020963 Concept Unique Identifiers: CUIs 6

SENSERELATE ALGORITHM Each possible sense of a target word is assigned a score [sum similarity between it and its surrounding terms] Assign target word the sense with highest score Proposed by Patwardhan and Pedersen 2003 using WordNet UMLS::SenseRelate is a modification of this algorithm using information from the UMLS NEXT UP: an example 7

SENSERELATE EXAMPLE Busprione attenuates tolerance to morphine in mice with skin cancer 8

SENSERELATE EXAMPLE Busprione attenuates tolerance to morphine in mice with skin cancer Drug Tolerance: C0013220 Immune Tolerance: C0020963 9

SENSERELATE EXAMPLE Busprione attenuates tolerance to morphine in mice with skin cancer Drug Tolerance: C0013220 Immune Tolerance: C0020963 Busprione: C0006462 Morphine: C0026549 Mice: C0026809 Skin cancer: C0007114 10

SENSERELATE EXAMPLE Busprione attenuates tolerance to morphine in mice with skin cancer Drug Tolerance: C0013220 Immune Tolerance: C0020963 0.09 0.09 0.16 0.11 Busprione: C0006462 Morphine: C0026549 Mice: C0026809 Skin cancer: C0007114 11

SENSERELATE EXAMPLE Busprione attenuates tolerance to morphine in mice with skin cancer Drug Tolerance Score = 0.09 + 0.09 + 0.16 + 0.11 = 0.45 Drug Tolerance: C0013220 Immune Tolerance: C0020963 0.09 0.09 0.16 0.11 Busprione: C0006462 Morphine: C0026549 Mice: C0026809 Skin cancer: C0007114 12

SENSERELATE EXAMPLE Busprione attenuates tolerance to morphine in mice with skin cancer Drug Tolerance Score = 0.09 + 0.09 + 0.16 + 0.11 = 0.45 Drug Tolerance: C0013220 Immune Tolerance: C0020963 0.09 0.09 0.16 0.11 0.09 0.09 0.05 0.04 Busprione: C0006462 Morphine: C0026549 Mice: C0026809 Skin cancer: C0007114 13

SENSERELATE EXAMPLE Busprione attenuates tolerance to morphine in mice with skin cancer Drug Tolerance Score = 0.09 + 0.09 + 0.16 + 0.11 = 0.45 Immune Tolerance Score = 0.09 + 0.09 + 0.05 + 0.05 = 0.27 Drug Tolerance: C0013220 Immune Tolerance: C0020963 0.09 0.09 0.16 0.11 0.09 0.09 0.05 0.04 Busprione: C0006462 Morphine: C0026549 Mice: C0026809 Skin cancer: C0007114 14

SENSERELATE EXAMPLE Busprione attenuates tolerance to morphine in mice with skin cancer Drug Tolerance Score = 0.09 + 0.09 + 0.16 + 0.11 = 0.45 Immune Tolerance Score = 0.09 + 0.09 + 0.05 + 0.05 = 0.27 Drug Tolerance: C0013220 Immune Tolerance: C0020963 0.09 0.09 0.16 0.11 0.09 0.09 0.05 0.04 Busprione: C0006462 Morphine: C0026549 Mice: C0026809 Skin cancer: C0007114 15

SENSE RELATE ASSUMPTION An ambiguous word is often used in the sense that is most similar to the sense of the terms that surround it 16

SENSERELATE COMPONENTS Identifying the concepts of surrounding terms Calculating semantic similarity 17

IDENTIFYING THE CONCEPTS OF THE SURROUNDING TERMS Use the SPECIALIST LEXICON to identify the terms and map the terms doing a string match to the MRCONSO table in the UMLS 18

IDENTIFYING THE CONCEPTS OF THE SURROUNDING TERMS Use the SPECIALIST LEXICON to identify the terms and map the terms doing a string match to the MRCONSO table in the UMLS Busprione attenuates tolerance to morphine in mice with skin cancer 19

IDENTIFYING THE CONCEPTS OF THE SURROUNDING TERMS Use the SPECIALIST LEXICON to identify the terms and map the terms doing a string match to the MRCONSO table in the UMLS Busprione attenuates tolerance to morphine in mice with skin cancer SPECIALIST LEXICON... skin cancer skin grafting skin disease... 20

IDENTIFYING THE CONCEPTS OF THE SURROUNDING TERMS Use the SPECIALIST LEXICON to identify the terms and map the terms doing a string match to the MRCONSO table in the UMLS Busprione attenuates tolerance to morphine in mice with skin cancer MRCONSO SPECIALIST LEXICON skin cancer skin grafting skin disease...... C0007114 C0037297 C0037274... skin cancer skin grafting skin disease... 21

SEMANTIC SIMILARITY MEASURES Path-based measures Path Wu and Palmer Leacock and Chodorow Ngyuen and Al-Mubaid Information content (IC)-based measures Resnik Lin Jiang and Conrath 22

PATH-BASED SIMILARITY MEASURES Use only the path information obtained from a taxonomy 23

PATH-BASED SIMILARITY MEASURES Use only the path information obtained from a taxonomy Path measure sim(c1,c2) = 1 / minpath(c2,c2) where minpath is the shortest path between the two concepts 24

PATH-BASED SIMILARITY MEASURES Use only the path information obtained from a taxonomy Path measure sim(c1,c2) = 1/minpath(c2,c2) where minpath is the shortest path between the two concepts Wu and Palmer, 1994 sim(c1,c2) = (2*depth(LCS(c2,c2))) / (depth(c1)+depth(c2)) where LCS is the least common subsumer of the two concepts 25

PATH-BASED SIMILARITY MEASURES Use only the path information obtained from a taxonomy Path measure sim(c1,c2) = 1/ minpath(c2,c2) where minpath is the shortest path between the two concepts Wu and Palmer, 1994 sim(c1,c2) = (2*depth(LCS(c2,c2))) / (depth(c1)+depth(c2)) where LCS is the least common subsumer of the two concepts Leacock and Chodorow, 1998 sim(c1,c2) = -log( minpath(c1,c2) / (2D) ) where D is the total depth of the taxonomy 26

PATH-BASED SIMILARITY MEASURES Use only the path information obtained from a taxonomy Path measure sim(c1,c2) = 1/ minpath(c2,c2) where minpath is the shortest path between the two concepts Leacock and Chodorow, 1998 sim(c1,c2) = -log( minpath(c1,c2) / (2D) ) where D is the total depth of the taxonomy Wu and Palmer, 1994 sim(c1,c2) = (2*depth(LCS(c2,c2))) / (depth(c1)+depth(c2)) where LCS is the least common subsumer of the two concepts Nyguen and Al-Mubaid, 2006 sim(c1,c2) = log ( (2 + minpath(c1,c2) - 1) * (D - depth(lcs(c1,c2))) ) 27

PATH-BASED SIMILARITY MEASURES Disease: C0012634 Drug Tolerance: C0013220 Drug Related Disorder: C0277579 USE ONLY THE PATH INFORMATION OBTAINED FROM A TAXONOMY Neoplasm: C1302761 Neoplastic Disease: C1882062 Malignant Neoplasm: C0006826 Skin cancer: C0007114 28

INFORMATION CONTENT-BASED MEASURES Incorporate the probability of the concepts IC = -log(p(concept)) 29

INFORMATION CONTENT-BASED MEASURES Incorporate the probability of the concepts IC = -log(p(concept)) P(concept) Calculated by summing the probability of the concept and the probability of its descendants Probabilities are obtained from an external corpus 30

INFORMATION CONTENT-BASED MEASURES Incorporate the probability of the concepts IC = -log(p(concept) Resnik, 1995 sim(c1,c2) = IC(LCS(c1,c2)) 31

INFORMATION CONTENT-BASED MEASURES Incorporate the probability of the concepts IC = -log(p(concept) Resnik, 1995 sim(c1,c2) = IC(LCS(c2,c2)) Jiang and Conrath, 1997 sim(c1,c2) = 1 / (IC(c1)+IC(c2) 2* IC(LCS(c1,c2)) 32

INFORMATION CONTENT-BASED MEASURES Incorporate the probability of the concepts IC = -log(p(concept) Resnik, 1995 sim(c1,c2) = IC(LCS(c2,c2)) Jiang and Conrath, 1997 sim(c1,c2) = 1 (IC(c1)+IC(c2) 2* IC(LCS(c1,c2)) Lin, 1998 sim(c1,c2) = (2*IC(LCS(c2,c2))) / (IC(c1)+IC(c2)) 33

IC-BASED SIMILARITY MEASURES PATH INFORMATION Disease: C0012634 PROBABILITY OF CONCEPTS Drug Related Disorder: C0277579 Drug Tolerance: C0013220 Neoplasm: C1302761 Neoplastic Disease: C1882062 Malignant Neoplasm: C0006826 EXTERNAL CORPUS Skin cancer: C0007114 34

EXPERIMENTAL FRAMEWORK Use open-source UMLS::Similarity package to obtain the similarity between the terms and possible senses in the SenseRelate algorithm Path information: parent/child relations in MSH source Information content: calculated using the UMLSonMedline dataset created by NLM Consists of concepts from 2009AB UMLS and the frequency they occurred in Medline using the Essie Search Engine (Ide et al 2007) Medline: database of citations of biomedical/clinical articles 35

EVALUATION DATA: MSH WSD MSH-WSD dataset (Jimeno-Yepes, et al 2011) 203 target words (ambiguous word) from Medline 106 terms e.g. tolerance 88 acronyms e.g. CA (calcium, california) 9 mixtures e.g. bat (brown adipose tissue) Each target word contains ~187 instances (Medline abstracts) abstract = ~ 500 words Each target word in the instances assigned a concept from MSH by exploiting the manually assigned MSH concepts assigned to the abstract Average of 2.08 possible senses per target word Majority sense over all the target words is 54.5% 36

RESULTS 0.8 0.7 0.72 0.69 0.7 0.72 0.73 0.74 0.74 a c c u r a c y 0.6 0.5 0.4 0.3 0.2 0.1 0.55 0 baseline path lch wup nam res jcn lin 37 Path-based IC-based

COMPARISON ACROSS SUBSETS OF MSH-WSD a c c u r a c y 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.93 0.87 0.85 0.88 0.8 0.8 0.73 0.74 0.71 0.67 0.67 0.55 0.54 0.53 0.55 0.78 Baseline SenseRelate MRD 2-MRD 0.1 0 Terms Acronyms Mixture Overall 38

COMPARISON ACROSS SUBSETS OF MSH-WSD a c c u r a c y 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.93 0.87 0.85 0.88 0.8 0.8 0.73 0.74 0.71 0.67 0.67 0.55 0.54 0.53 0.55 0.78 Baseline SenseRelate MRD 2-MRD 0.1 0 Terms Acronyms Mixture Overall 39

COMPARISON ACROSS SUBSETS OF MSH-WSD a c c u r a c y 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.93 0.87 0.85 0.88 0.8 0.8 0.73 0.74 0.71 0.67 0.67 0.55 0.54 0.53 0.55 0.78 Baseline SenseRelate MRD 2-MRD 0.1 0 Terms Acronyms Mixture Overall 40

COMPARISON ACROSS SUBSETS OF MSH-WSD a c c u r a c y 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.93 0.87 0.85 0.88 0.8 0.8 0.73 0.74 0.71 0.67 0.67 0.55 0.54 0.53 0.55 0.78 Baseline SenseRelate MRD 2-MRD 0.1 0 Terms Acronyms Mixture Overall 41

COMPARISON ACROSS SUBSETS OF MSH-WSD a c c u r a c y 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.93 0.87 0.85 0.88 0.8 0.8 0.73 0.74 0.71 0.67 0.67 0.55 0.54 0.53 0.55 0.78 Baseline SenseRelate MRD 2-MRD 0.1 0 Terms Acronyms Mixture Overall 42

WINDOW SIZES Use the terms surrounding the target word within a specified window: 1, 2, 5, 10, 25, 50, 60, 70 WINDOW SIZE = 2 Busprione attenuates tolerance to morphine in mice with skin_cancer 43

COMPARISON OF WINDOW SIZES FOR LIN 0.8 0.7 0.65 0.69 0.71 0.74 0.74 0.74 0.74 a c c u r a c y 0.6 0.5 0.4 0.3 0.2 0.5 0.53 lin 0.1 0 0 1 2 5 10 25 50 60 70 window size 44

SURROUNDING TERMS Not all terms have a concept in the UMLS therefore Not all surrounding terms in the window mapped to CUIs 45

WINDOW SIZES VERSUS MAPPED TERMS n u m b e r 18 16 14 12 12.96 14.28 15.64 o f m a p p i n g s 10 8 6 4 2 0 7.6 3.49 1.85 0 0.27 0.79 0 1 2 5 10 25 50 60 70 46 lin window size

FUTURE WORK: MAPPING TERMS Currently looking at mapping the terms to CUIs using information from the concept mapping system MetaMap Obtain the terms from MetaMap and do a dictionary look up in MRCONSO Hypothesis the terms obtained by MetaMap are more accurate than using the SPECIALIST Lexicon Obtain the CUIs from MetaMap Hypothesis the CUIs obtained by MetaMap will be more accurate than the dictionary look-up 47

OBJECTIVE #1 Develop and evaluate a method than can disambiguate terms in biomedical text by exploiting similarity information extrapolated from the UMLS UMLS::SenseRelate statistically significantly higher disambiguation accuracy than the baseline On par with previous unsupervised methods for terms 48

OBJECTIVE #2 Evaluate the efficacy of IC-based similarity measures over pathbased measures on a secondary task There is no statistically significant difference between the accuracies obtained by the IC-based measures There is a statistically significant difference between the ICbased measures and the path-based measures 49

TAKE HOME MESSAGE: An ambiguous word is often used in the sense that is most similar to the sense of the concepts of the terms that surround it 50

RESOURCES Software: UMLS::SenseRelate http://search.cpan.org/dist/umls-senserelate/ UMLS::Similarity Data http://search.cpan.org/dist/umls-similarity/ MSH-WSD http://wsd.nlm.nih.gov/collaboration.shtml 51

RESOURCES Software: UMLS::SenseRelate http://search.cpan.org/dist/umls-senserelate/ UMLS::Similarity Data http://search.cpan.org/dist/umls-similarity/ MSH-WSD http://wsd.nlm.nih.gov/collaboration.shtml THANK YOU 52

RESOURCES Software: UMLS::SenseRelate http://search.cpan.org/dist/umls-senserelate/ UMLS::Similarity Data http://search.cpan.org/dist/umls-similarity/ MSH-WSD http://wsd.nlm.nih.gov/collaboration.shtml QUESTIONS? 53