Automatic extraction of adverse drug reaction terms from medical free text

Similar documents
Reporting Patterns Indicative of Adverse Drug Interactions

Principles of Signal Detection. Pia Caduff-Janosa MD Annual Course May 2013

Terminologies for recording of ADR:s

WHO-ART. Magnus Wallberg. November 24 th 2009 Dar Es Salaam, Tanzania.

Key Elements in Adverse Drug Interaction Safety Signals

A Data Mining Approach for Signal Detection and Analysis

Existing WHO work on Medicines Safety & Safe Use. Department of Essential Medicines and Health Products World Health Organization

SNOMED CT and Orphanet working together

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts

How can Natural Language Processing help MedDRA coding? April Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics

Rare Diseases Nomenclature and classification

MedDRA Coding/ AE Log Item 1 Refresher. ASPIRE Protocol Team Meeting February 10, 2013

A method for data driven exploration to pinpoint key features in medical data and facilitate expert review

Standardize and Optimize. Trials and Drug Development

WELSH INFORMATION STANDARDS BOARD

COMMENTS FROM PRESCRIRE <CONTACT PERSON>

PHARMACOVIGILANCE: WHEN ADVERSE DRUG EVENTS OCCUR. Bartlomiej Piechowski-Jozwiak, MD Rabih Dabliz, Pharm.D., FISMP

Automatic generation of MedDRA terms groupings using an ontology

Global Vaccine Safety Initiative Activities Portfolio

How preferred are preferred terms?

Data mining in forensics: a text mining approach to proling criminals

A Proof-of-Concept Evaluation of Adverse Drug Reaction Surveillance in Electronic Health Records

Global Vaccine Safety Initiative Portfolio activities

Data retrieval using the new SMQ Medication Errors

Drug Safety Monitoring in a Prepandemic and Pandemic Avian Flu Mass Vaccination Scenario

Recommendations of fourth meeting of the WHO Advisory Committee on Safety of Medicinal Products 26 and 27 February 2007

A practical handbook on the pharmacovigilance of antiretroviral medicines

How to code rare diseases with international terminologies?

Safety monitoring of vaccines. Jeremy Labadie MD vaccine safety specialist

Graphical display of population data

Access from the University of Nottingham repository:

Modeling Sentiment with Ridge Regression

Text Mining MEDLINE. A Simple Application to Mine MEDLINE. Raf Podowski Life Sciences Oracle Corporation

Electronic Support for Public Health Vaccine Adverse Event Reporting System (ESP:VAERS)

TREAT-NMD Conference 2013

Accepted Manuscript. Accuracy of an Automated Knowledge Base for Identifying Drug Adverse Reactions

Tuning Epidemiological Study Design Methods for Exploratory Data Analysis in Real World Data

P values From Statistical Design to Analyses to Publication in the Age of Multiplicity

Mining first-order frequent patterns in the STULONG database

A practical handbook on the pharmacovigilance of antiretroviral medicines

Vaccine Safety: the risk of undermining our successes at the global level

Pharmacovigilance in HIV/AIDS Public Health Programmes: Luxury or Priority?

A MODIFIED FREQUENCY BASED TERM WEIGHTING APPROACH FOR INFORMATION RETRIEVAL

Phenotype analysis in humans using OMIM

MedDRA Coding Quality: How to Avoid Common Pitfalls

1. Introduction to Pharmacovigilance

City, University of London Institutional Repository

EMBASE Find quick, relevant answers to your biomedical questions

Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy

An overview of reports on lacosamide

PEPFAR-to-MoH Data Alignment An overview. Webinar One July 20, 2017

Pharmacovigilance Methods and Post-Authorisation Safety Studies

TKT CLIL LESSON PLAN

Blanka Hirschlerová. EDQM CEP conference Prague, Czech Republic

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm:

Automatically extracting, ranking and visually summarizing the treatments for a disease

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

ANSM (France) Inspections in both materiovigilance and pharmacovigilance.


Pilot Study: Clinical Trial Task Ontology Development. A prototype ontology of common participant-oriented clinical research tasks and

PubMed Tutorial Author: Gökhan Alpaslan DMD,Ph.D. e-vident

CHAPTER 4 POST-MARKETING SURVEILLANCE OF DRUGS

The impact of age and sex on the reporting of cough and angioedema with renin angiotensin system inhibitors: a case/ noncase study in VigiBase

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

TITLE: SAFE USE OF MEDICINES IN ZANZIBAR A

Learner s worksheet - My neighbourhood. Worksheet 1. Warmer - What are these types of houses called? Write the names in the boxes below.

Monitoring Vaccines Introduction. Pia Caduff-Janosa MD May 2013

Click on the Case Entry tab and the Procedure Menu will display. To add new procedures, click on Add.

Global WordNet Tools

Sarah Cohen-Boulakia. Université Paris Sud, LRI CNRS UMR 8623

Response to: Concept paper of 9 February 2011 submitted for public consultation by the European Commission on the

The Influenza Epidemiological Reporting Form

News English.com Ready-to-use ESL / EFL Lessons

The Imp.Ac.T.project. Improving Access to HIV/TB Testing for marginalized groups

Appendix B. Nodulus Observer XT Instructional Guide. 1. Setting up your project p. 2. a. Observation p. 2. b. Subjects, behaviors and coding p.

5. Current Activities

Rhabdomyolysis a result of azithromycin and statins: an unrecognized interaction

Adverse Events Following Immunisation

HOW TO PREPARE A GOOD ACCEPTED

. Semi-automatic WordNet Linking using Word Embeddings. Kevin Patel, Diptesh Kanojia and Pushpak Bhattacharyya Presented by: Ritesh Panjwani

Consumer Review Analysis with Linear Regression

Diagnostic Documentation and Impact on Quality of Patient Care. Khalid Moidu, MD, PhD Lyle McClellan, DDS Joel White, DDS, MS

MAKING THE NSQIP PARTICIPANT USE DATA FILE (PUF) WORK FOR YOU

ICD-10 in the United States: Better Late than Never

PBSI-EHR Off the Charts!

Active TB drug-safety management & monitoring

The Logic of Causality. Dr Ruth Savage With acknowledgements to Professor Ralph Edwards

Using Recall Data to Assess the 510(k) Process

ClinicalTrials.gov Registration User s Guide. January 2018

Omnisense: At Least As Good As DXA

COMMUNITY EFFORTS TO PREVENT TYPE 2 DIABETES

Word2Vec and Doc2Vec in Unsupervised Sentiment Analysis of Clinical Discharge Summaries

Regulatory Aspects of Pharmacovigilance

Christina Martin Kazi Russell MED INF 406 INFERENCING Session 8 Group Project November 15, 2014

CSRC White Papers. An update Dec. 2010

MedDRA DATA RETRIEVAL AND PRESENTATION: POINTS TO CONSIDER

Deep venous thrombosis and pulmonary embolism in joint replacement surgery

Cochrane and related publications... 2

Chapter 12 Conclusions and Outlook

Transcription:

Automatic extraction of adverse drug reaction terms from medical free text Ola Caster Uppsala Monitoring Centre WHO Collaborating Centre for International Drug Monitoring ISCB 2009, Prague, Czech Republic 26 th August Joint work with Johanna Strandell, Andrew Bate, and I. Ralph Edwards

Safety monitoring of drugs Marketed drugs are continuously monitored for adverse drug reactions (ADRs) not detected during clinical testing So called individual case safety reports (ICSRs) are a key component of this monitoring Primarily spontaneously reported incidents from clinical practice The Uppsala Monitoring Centre (UMC) collects ICSRs from more than 90 countries Vigibase: pooled database of almost 5 million reports

ADR terminologies and free text sources Reactions are classified according to standard, hierarchical, ADR terminologies MedDRA WHO-ART, which is maintained by UMC Statistical analyses depend on this classification For example data mining for pairs of drugs and ADR terms occurring more often than expected However, lots of relevant information is only available in free text For example the Summary of Product Characteristics

ADR terminologies and free text sources Terms from ADR terminologies do occur in medical free text sources however, exact matching is likely to exhibit poor sensitivity This precludes efficient comparison between results from statistical analyses and published information Of particular interest: Is a drug-adr pair highlighted by data mining of ICSRs already listed in (free text) standard literature sources? Purpose is to detect as yet unknown adverse reactions

Objective To develop an algorithm that can extract ADR terms from medical free text at higher sensitivity than exact matching, yet with high precision

Data sources XML version of Stockley's Drug Interactions [1] Well-renowned source of information on drug-drug interactions Each interaction, including its potential adverse effects, is described in free text 40259 database entries in used version The WHO-ART terminology for ADRs 5770 terms (lowest level of hierarchy)

The algorithm exemplified ADR term QT prolonged Free text Some quinolones (e.g. gatifloxacin) can prolong the QT interval and [ ] when used with quinidine. A real example from Stockley s database Exact matching does not work Important aspects of the algorithm will be explained

Algorithmic step None Remove nonalphanumerics (+ lower case) ADR term QT prolonged qt prolonged Free text Some quinolones (e.g. gatifloxacin) can prolong the QT interval and [ ] when used with quinidine. some quinolones e g gatifloxacin can prolong the qt interval and [ ] when used with quinidine Problem 1: There is a the in the free text but not in the ADR term Solution 1: Remove stopwords

Stopwords Words that carry none or little semantical meaning... if, of, any, for, at, are, also... Quite common to remove stopwords in algorithms like this The list used here comes from the Snowball project [2]

Algorithmic step None Remove nonalphanumerics (+ lower case) Remove stopwords ADR term QT prolonged qt prolonged qt prolonged Free text Some quinolones (e.g. gatifloxacin) can prolong the QT interval and [ ] when used with quinidine. some quinolones e g gatifloxacin can prolong the qt interval and [ ] when used with quinidine quinolones e g gatifloxacin can prolong qt interval [...] used quinidine Problem 2: The words are in different grammatical forms (prolonged vs prolong) Solution 2: Use stemming

Stemming Each word is replaced by its stem Increases the chance to match words with the same semantic meaning, but with different grammatical functions interact, interacts, interacted, interaction, interacting would all be mapped to interact The stemming algorithm used here was the so called Porter algorithm [3]

Algorithmic step None Remove nonalphanumerics (+ lower case) Remove stopwords Use stemming ADR term QT prolonged qt prolonged qt prolonged qt prolong Free text Some quinolones (e.g. gatifloxacin) can prolong the QT interval and [ ] when used with quinidine. some quinolones e g gatifloxacin can prolong the qt interval and [ ] when used with quinidine quinolones e g gatifloxacin can prolong qt interval [...] used quinidine quinolon e g gatifloxacin can prolong qt interv [...] use quinidin Problem 3: The order is wrong Solution 3: Permute the ADR term

Permutation of ADR terms Many ADR terms that include an adjective are constructed in the opposite order of what would be normal in free text Terminology: scoliosis congenital Free text: congenital scoliosis Use every possible order of the individual words of the ADR term

Algorithmic step None Remove nonalphanumerics (+ lower case) Remove stopwords Use stemming Use permutation ADR term QT prolonged qt prolonged qt prolonged qt prolong prolong qt Free text Some quinolones (e.g. gatifloxacin) can prolong the QT interval and [ ] when used with quinidine. some quinolones e g gatifloxacin can prolong the qt interval and [ ] when used with quinidine quinolones e g gatifloxacin can prolong qt interval [...] used quinidine quinolon e g gatifloxacin can prolong qt interv [...] use quinidin quinolon e g gatifloxacin can prolong qt interv [...] use quinidin

Outline of algorithm 1. For each ADR term and free text entry Remove all non-alphanumerics and set to lower case Remove all stopwords Exchange each indvidual word by its stem 2. For each free text entry as resulting from step 1 Look for occurrences of each ADR term as resulting from step 1, as well as all permutations of its individual words 3. Study frequently non-matched words that are relevant, and, if possible, add synonyms to ADR terms in order to match these words (not mandatory) 4. Reiterate steps 1 and 2 (not mandatory)

Synonyms A few examples from this study: Original word haemorrhage oestrogen cardiac Synonym bleeding estrogen heart

Overall result The algorithm extracted 42330 ADR terms from 40259 database entries

Results disseminated Removal of stopwords was less important than using permutation, which was less important than stemming Combined effect (including synonyms): 49% increase in number of matches Suggests increased sensitivity

What about specificity? Relevant to examine whether the increased sensitivity comes at a loss of specificity A subset of findings were manually reviewed The rate of false positives is acceptable

Conclusion The algorithm, which combines generic principles with addition of source-specific synonyms, is able to increase the hit rate, and thus the sensitivity, compared to exact matching, while maintaining a high specificity.

References [1] Baxter K (ed.). Stockley s Drug Interactions (XML). London: Pharmaceutical Press, 2008. [2] http://snowball.tartarus.org/ [3] Porter MF. An algorithm for suffix stripping. Program 1980; 14(3):130 137.

Thank you! Questions and comments are most welcome at ola.caster@who-umc.org