Interpretability of Sudden Concept Drift in Medical Informatics Domain

Similar documents
Flu Watch. MMWR Week 3: January 14 to January 20, and Deaths. Virologic Surveillance. Influenza-Like Illness Surveillance

Flu Watch. MMWR Week 4: January 21 to January 27, and Deaths. Virologic Surveillance. Influenza-Like Illness Surveillance

Data Visualization - Basics

McLean ebasis plus TM

From Analytics to Action

Seasonality of influenza activity in Hong Kong and its association with meteorological variations

Durham Region Influenza Bulletin: 2017/18 Influenza Season

East London Community Kidney Service

Breast Test Wales Screening Division Public Health Wales

18 Week 92% Open Pathway Recovery Plan and Backlog Clearance

Influenza Season, Boston

Avian influenza in poultry, wild and captive birds (AI)

Influenza Season, Boston

Overview of the Radiation Exposure Doses of the Workers at Fukushima Daiichi Nuclear Power Station

Date : September Permit/License or Registration Application. Permit/License/ Notification/ Registration Description. Remark

Influenza Season, Boston

Adult Immunizations. Business Health Care Group (BHCG) April 25, Cathy Edwards. Immunization Program Advisor

Dementia Content Report January Produced By The NHS Choices Reporting Team

FGSZ Zrt. from 28 February 2019 till 29 February 2020 AUCTION CALENDAR: YEARLY YEARLY BUNDLED AT CROSS BORDER POINTS

Dementia Content Report May Produced By The NHS Choices Reporting Team

Kansas EMS Naloxone (Narcan) Administration

Confounding in influenza VE studies in seniors, and possible solutions

One Palliative Care Annual Report

Influenza A (H1N1)pdm09 in Minnesota Epidemiology

Quit Rates of New York State Smokers

Emergency Department Boarding of Psychiatric Patients in Oregon

An Updated Approach to Colon Cancer Screening and Prevention

CURRICULUM PACING CHART ACES Subject: Science-Second Grade

Education around PML risk and monitoring at NHNN Queen Square MS Centre

March 2012: Next Review September 2012

Understanding the Role of Palliative Care in the Treatment of Cancer Patients

Sleep Market Panel. Results for June 2015

Consultant-led Referral to Treatment (RTT) waiting times collection timetable: outcome of consultation

Emergency Department Visits for Behavioral Health Conditions in Harris County, Texas,

Cost-Effectiveness of Lung Volume Reduction Surgery

Surgical Site Infection (SSI) Surveillance Update (with special reference to Colorectal Surgeries)

APPENDIX ONE. 1 st Appointment (Non-admitted) recovery trajectories

STRENGTHENING THE COORDINATION, DELIVERY AND MONITORING OF HIV AND AIDS SERVICES IN MALAWI THROUGH FAITH-BASED INSTITUTIONS.

Clostridium difficile (C. difficile) and Staphylococcus aureus bacteraemia (MRSA and MSSA) Bi-annual Report. Surveillance: Report:

Chi-Square Goodness-of-Fit Test

Complete Central Registry Treatment Information Requires Ongoing Reporting and Consolidation Well Beyond 6-Month Reporting

Tri-County Opioid Safety Coalition Data Brief December 2017 Clackamas, Multnomah, and Washington Counties

FORECASTING DEMAND OF INFLUENZA VACCINES AND TRANSPORTATION ANALYSIS.

Update on Pandemic H1N1 2009: Oman

Telehealth Data for Syndromic Surveillance

Influenza Season, Boston

FAQs about Provider Profiles on Breast Cancer Screenings (Mammography) Q: Who receives a profile on breast cancer screenings (mammograms)?

TACKLING COPD READMISSIONS. Wendy Presley RN

GREENWOOD PUBLIC SCHOOL DISTRICT PHYSICAL EDUCATION

The PROMs Programme in the NHS in England

Global Trade in Lightweight Coated Writing Paper TradeData International Pty Ltd ( Page 1 5/18/2015

Magellan s Transport Route Lead Monitoring Program

Swine Flu Pandemic Weekly Report Thursday 20 August 2009

Reducing COPD Exacerbation Readmissions in a Community-Based Teaching Hospital

Analysis of Meter Reading Validation Tolerances proposed by Project Nexus

GP encounter data to assess vaccine safety. Rob Menzies, Lieu Trinh, Clayton Chiu, Aditi Dey, Kristine Macartney, Peter McIntyre NCIRS

Epidemiology of adolescent and young adult hospital utilization for alcohol and drug use, poisoning, and suicide attempts in the United States

Pennine Acute Hospitals NHS Trust. Advancing Quality Results October 2008 to December 2016

FORECASTING THE DEMAND OF INFLUENZA VACCINES AND SOLVING TRANSPORTATION PROBLEM USING LINEAR PROGRAMMING

PRRS Control Practitioner s View. John Hayden BVSc MRCVS Integra Veterinary Services

Has the UK had a double epidemic?

Outbreak Response/Epidemiology Influenza Weekly Report Arkansas

Global and National Trends in Vaccine Preventable Diseases. Dr Brenda Corcoran National Immunisation Office.

Outbreak Response/Epidemiology Influenza Weekly Report Arkansas

Influenza Surveillance Animal and Public Health Partnership. Jennifer Koeman Director, Producer and Public Health National Pork Board

The Opioid Addiction Emergency In Virginia June 8, 2017

American hospitals crawling towards Electronic Medical Records (EMR) and Computerized Physician Order Entry (CPOE)

MICHIGAN PATHOLOGY QUALITY SYSTEM (MPQS)

Empowering Weight Loss Charts & Logs Healthy Weight Chart Cholesterol Chart Blood Pressure Chart Exercise Calorie Burning Chart

Pre-hospital thrombolysis (PHT) Clinical Audit Report 30 th November 2007

Title page. Adults COM R

Hand, Foot, and Mouth Disease Situation Update. Hand, Foot, and Mouth Disease surveillance summary

Weekly Influenza News 2016/17 Season. Communicable Disease Surveillance Unit. Summary of Influenza Activity in Toronto for Week 43

Curators of the University of Missouri - Combined January 1, 2016 through December 31, 2016

New Brunswick Influenza Activity Summary Report: season (Data from August 30,2015 to June 4,2016)

Table 1: Summary of Texas Influenza (Flu) and Influenza-like Illness (ILI) Activity for the Current Week Texas Surveillance Component

Blood Pressure Management: A Journey in Quality Improvement Phil E. Yphantides, M.D.

The Impact of Clinical Decision Support (CDS) Tools on Catheter Associated Urinary Tract Infections (CAUTI) January 22, 2010.

HAEMOPHILUS INFLUENZAE INVASIVE DISEASE

Middle East respiratory syndrome coronavirus (MERS-CoV) and Avian Influenza A (H7N9) update

Sexual Health Content Report June Produced By The NHS Choices Reporting Team

Reducing Readmissions and Improving Outcomes at OhioHealth Mansfield Hospital:

National Institute for Communicable Diseases -- Weekly Surveillance Report --

Health impact assessment of particulate matter exposure in Pearl River Delta (PRD), China

INFLUENZA IN MANITOBA 2010/2011 SEASON. Cases reported up to October 9, 2010

An Overview of Syndromic Surveillance

TRANSFORMING STROKE CARE IN THE CAPITAL: THE LONDON STROKE STRATEGY

Implementing Rapid Response Teams (RRT) National Call September 13, 2007

Smoking kills - so why is it missing from death certificates?

Crisis Connections Crisis Line Phone Worker Training (Online/Onsite) Winter 2019

Shigella Infections in Maryland

Lauren DiBiase, MS, CIC Associate Director Public Health Epidemiologist Hospital Epidemiology UNC Hospitals

NILE RIVER WATER RESOURCES ANALYSIS

Utilizing CQI to Improve the Health of Supportive Housing Residents The North American Housing and HIV/AIDS Research Summit VII September 25-27, 2013

IMPLEMENTING RECOVERY ORIENTED CLINICAL SERVICES IN OPIOID TREATMENT PROGRAMS PILOT UPDATE. A Clinical Quality Improvement Program

BJA Performance Measures

BREATH AND BLOOD ALCOHOL STATISTICS

Assessing Change with IIS. Steve Robison Oregon Immunization Program

Region of Waterloo Public Health and Emergency Services Infectious Diseases, Dental, and Sexual Health

Transcription:

Interpretability of Sudden Concept Drift in Medical Informatics Domain Gregor Stiglic, Peter Kokol Faculty of Health Sciences University of Maribor Slovenia

Presentation Outline Visualization of Concept Drift National Hospital Discharge Data Data Pre-processing Experimental Settings Results Conclusions

Visualization of Concept Drift Changes in data distribution or in the concept of the predicted class. (Tsymbal, 2004) Very few research papers dealing with concept drift visualization or visual interpretation Mostly visual data exploration techniques for: Multivariate streams visualization or Univariate time-series anomaly detection K.B. Pratt, and G. Tschapek, Visualizing concept drift, KDD 2003. Using brushed parallel histograms

National Hospital Discharge Data Hospital discharge records for approximately 1% of US hospitals 10 consecutive years from 2000 to 2009 (approx. 300.000 hospitalizations/year) Altogether 3,106,176 hospitalization events We used 2,509,113 events (after removing nonadults)

National Hospital Discharge Data Each NHDS record contains the: Personal characteristics of the patient (age, gender, race, and marital status); Administrative information (length of stay, discharge status, etc.); Medical information Up to 7 diagnoses (optional admitting diagnosis) and Up to 4 surgical and nonsurgical procedures).

Data Pre-processing The ICD-9-CM coding of diagnoses 5-digit codes were collapsed to 3-digit codes For example: Flu is assigned to category 487, Influenza 487.0, Influenza with pneumonia 487.1, Influenza with other respiratory manifestations etc. Altogether 1188 3-digit diagnosis codes

Data Pre-processing ID Month Sex Age D038 D585 D678 1 1 M 25 A A A A 2 1 F 37 A P A A 21,343 2 F 65 P A A P 21,344 2 M 77 A A A A 21,345 2 M 23 A P A A 2,509,113 120 F 81 A A A A Class D585

Data Pre-processing Sparse matrix 1187 codes ID Month Sex Age D038 D585 D678 1 1 M 25 A A A A 2 1 F 37 A P A A 21,343 2 F 65 P A A P 21,344 2 M 77 A A A A 21,345 2 M 23 A P A A 2,509,113 120 F 81 A A A A Class D585

Data Pre-processing Sparse matrix 1187 codes ID Month Sex Age D038 D585 D678 1 1 M 25 A A A A 2 1 F 37 A P A A 21,343 2 F 65 P A A P 21,344 2 M 77 A A A A 21,345 2 M 23 A P A A 2,509,113 120 F 81 A A A A Class D585

Human Disease Network Metrics We use two co-morbidity measures for visualization: Relative risk RR ij = C ijn M i M j Phi-correlation ij = C ij N M i M j M i M j (N M i )(N M j )

Experimental Settings Performance comparison of Static vs. Dynamic Ensemble of Naïve Bayes classifiers Static: trained only on the first 12 months of data, does not change after that Dynamic: updated with each new batch of incoming patients (i.e. each month) 25 updateable NB classifiers are built using Random Spread Subsample (balanced instance sampling) Simple majority voting was used

Experimental Settings Performance metrics observed over 119 months AUC (effective for imbalanced classes) F-measure (integration of precision and recall) Own implementation of prequential evaluation in Weka was used Each batch (month) of data is first used for testing, before it can be used for training

FEB-00 JUL-00 DEC-00 MAY-01 OCT-01 MAR-02 AUG-02 JAN-03 JUN-03 NOV-03 APR-04 SEP-04 FEB-05 JUL-05 DEC-05 MAY-06 OCT-06 MAR-07 AUG-07 JAN-08 JUN-08 NOV-08 APR-09 SEP-09 Results 0.9 0.88 0.86 0.84 0.82 0.8 0.78 0.76 0.74 0.72 0.7 AUC (D585 Chronic kidney disease) D_585 AUC Static D_585 AUC Dynamic

Visualization How to explain the concept drift? Using motion charts: -correlation and relative risk measures w.r.t. the attribute class, Chi-square value or significance (p-value), Morbidity (support) for different diseases, Time. Only disease codes with support over 100 were visualized.

Visualization

Relative Risk Visualization

Visualization Relative Risk Phi-correlation

Visualization Relative Risk Phi-correlation Chi-square p-val

Visualization Relative Risk Phi-correlation Chi-square p-val Prevalence

Visualization Relative Risk Phi-correlation Chi-square p-val Prevalence Time

Visualization August 2005

Visualization Diagnosis code D401 D403 D404 D428 D583 D584 D585 D588 Description Essential hypertension Hypertensive chronic kidney disease Hypertensive heart and chronic kidney disease Heart failure Nephritis and nephropathy, not specified as acute or chronic Acute renal failure Chronic kidney disease (CKD) Disorders resulting from impaired renal function August 2005

Visualization October 2005

Visualization October 2005

Conclusions D403 and D404 stand out Further examination uncovers change of coding in October 2005 D403: Hypertensive renal disease (Sep 2005) -> Hypertensive chronic kidney disease (Oct 2005) Possible explanation: The new code names brought more attention to D403 and D404 in Oct 2005 and resulted in more accurate coding.

Conclusions Visualization can help in interpretation of concept drift events. Possible improvements: Additional variables for visualization of significant changes in variable values ( movement p-values ) Educational use (visualization of all classifiers in an ensemble of classifiers) Integration into MOA instead of Weka + Google Motion Charts implementation

Questions Gregor Stiglic gregor.stiglic@uni-mb.si Demo visualization available at: http://ri.fzv.uni-mb.si/icdm11