Trading off coverage for accuracy in forecasts: Applications to clinical data analysis

Similar documents
Combining Inductive and Analytical Learning

Classification of Insulin Dependent Diabetes Mellitus Blood Glucose Level Using Support Vector Machine

Lecture 2: Foundations of Concept Learning

Application of Tree Structures of Fuzzy Classifier to Diabetes Disease Diagnosis

Positive and Unlabeled Relational Classification through Label Frequency Estimation

DIABETIC RISK PREDICTION FOR WOMEN USING BOOTSTRAP AGGREGATION ON BACK-PROPAGATION NEURAL NETWORKS

Dealing with Missing Values in Neural Network-Based Diagnostic Systems

How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection

Learning with Rare Cases and Small Disjuncts

Credal decision trees in noisy domains

Tools for Life Introduction to patterns

Positive and Unlabeled Relational Classification through Label Frequency Estimation

Empirical function attribute construction in classification learning

Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool

Lazy Learning of Bayesian Rules

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

Article begins on next page

Speech Processing / Speech Translation Case study: Transtac Details

Self-Monitoring Blood Glucose (SMBG) Frequency & Pattern Tool

Phone Number:

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Analysis of Classification Algorithms towards Breast Tissue Data Set

A scored AUC Metric for Classifier Evaluation and Selection

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

January 7, 5:00 p.m. EST

INTRODUCTION TO MACHINE LEARNING. Decision tree learning

A STUDY OF AdaBoost WITH NAIVE BAYESIAN CLASSIFIERS: WEAKNESS AND IMPROVEMENT

International Journal of Pharma and Bio Sciences A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS ABSTRACT

Classification of benign and malignant masses in breast mammograms

Comparative Analysis of Machine Learning Algorithms for Chronic Kidney Disease Detection using Weka

Diabetes Technology Continuous Subcutaneous Insulin Infusion Therapy And Continuous Glucose Monitoring In Adults: An Endocrine Society Clinical

An SVM-Fuzzy Expert System Design For Diabetes Risk Classification

Type 1 diabetes and exams

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

BLOOD GLUCOSE PREDICTION MODELS FOR PERSONALIZED DIABETES MANAGEMENT

Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering

student is independent staff to supervise student is independent staff to supervise student is independent staff to supervise student is independent

Pattern Analysis and Machine Intelligence

A Learning Method of Directly Optimizing Classifier Performance at Local Operating Range

My Transplant Log. Patient Education. After a kidney/pancreas transplant. Vital Signs

Classification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang

The Diabetes Team Auf der Bult

You probably don t spend a lot of time here, but if you do, you are reacting to the most basic needs a human has survival and protection.

Bayesian models of inductive generalization

Leveraging Expert Knowledge to Improve Machine-Learned Decision Support Systems

**Medicare and Medicaid have other Billing Codes and different eligibility. Please contact our office for more information. Thank you!

Track Your Magic Number

Using AUC and Accuracy in Evaluating Learning Algorithms

AARP/American Speech-Language-Hearing Association (ASHA)

Case Study: Competitive exercise

A DATA MINING APPROACH FOR PRECISE DIAGNOSIS OF DENGUE FEVER

Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 2010

Diabetes Care and Education Dietetic Practice Group (DCE DPG) members

My Weight (Assessment)

Predicting Human Immunodeficiency Virus Type 1 Drug Resistance From Genotype Using Machine Learning. Robert James Murray

ABSTRACT. classification algorithm, so rules can be extracted from the decision tree models.

Using Bayesian Networks to Direct Stochastic Search in Inductive Logic Programming

Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data

PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods

Predicting Breast Cancer Survivability Rates

A Deep Learning Approach to Identify Diabetes

Variable Features Selection for Classification of Medical Data using SVM

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

Diagnosis Of the Diabetes Mellitus disease with Fuzzy Inference System Mamdani

Insulin Pump Therapy

Mining first-order frequent patterns in the STULONG database

CANDY Camp Application

Augmented Medical Decisions

Diabetes: Just the Basics

Do Not Reproduce. Things to Tell Your Health Care Provider

A Fuzzy Expert System for Heart Disease Diagnosis

Software Version 1.0. User s Manual

Prediction of Malignant and Benign Tumor using Machine Learning

Key words Machine Learning, Clinical Dementia Rating Scale, dementia severity, dementia staging, Alzheimer s disease.

Rerepresenting and Restructuring Domain Theories: A Constructive Induction Approach. Abstract

Artificial Intelligence For Homeopathic Remedy Selection

My sick day plan for Type 1 Diabetes

Outline. Model Development GLUCOSIM. Conventional Feedback and Model-Based Control of Blood Glucose Level in Type-I Diabetes Mellitus

Personalized Colorectal Cancer Survivability Prediction with Machine Learning Methods*

PATIENT QUESTIONNAIRE / ASSESSMENT

Medical Questionnaire

An Improved Algorithm To Predict Recurrence Of Breast Cancer

Using Wittgenstein s family resemblance principle to learn exemplars

CS 4365: Artificial Intelligence Recap. Vibhav Gogate

Postnatal Care for Women with Type 1 or Type 2 Diabetes

Maximum Likelihood ofevolutionary Trees is Hard p.1

Weekly Questions for Dieters

Contents. Just Classifier? Rules. Rules: example. Classification Rule Generation for Bioinformatics. Rule Extraction from a trained network

How To Gain Your First 10 Pounds

DETECTION AND CLASSIFICATION OF MICROCALCIFICATION USING SHEARLET WAVE TRANSFORM

This paper presents an alternative approach that seeks instead to adjust the probabilities produced by a standard naive Bayesian classier in order to

Date of birth: Type 2 Other: Parent/guardian 1: Address: Telephone: Home: Work: Cell: address: Camper physician / health care provider:

Fuzzy Decision Tree FID

DiasNet an Internet Tool for Communication and Education in Diabetes

Diabetes Medical Management Plan (DMMP)

Classification and Predication of Breast Cancer Risk Factors Using Id3

Breast Cancer Diagnosis Based on K-Means and SVM

Comparative Study of K-means, Gaussian Mixture Model, Fuzzy C-means algorithms for Brain Tumor Segmentation

Transcription:

Trading off coverage for accuracy in forecasts: Applications to clinical data analysis Michael J Pazzani, Patrick Murphy, Kamal Ali, and David Schulenburg Department of Information and Computer Science University of California Irvine, CA 92717 {pazzani, pmurphy, ali, schulenb}@ics.uci.edu Research supported by Air Force Office of Scientific Research Grant, F49620-92-J-0430 AIM-94 Thursday, June 30, 1994 1

Inductive Learning of Classification Procedures Given: A set of training examples a. Attribute-value pairs: { (age, 24) (gender, female)... } b. A class label: pregnant Create A classification procedure to infer the class label of an example represented as a set of Attribute-value pairs Decision Tree Weights of neural network Conditional probability of a class given an attribute Rules Rule with confidence factors Typical evaluation of a learning algorithm: Divide available data into a training and test set Infer procedure from data in training set. Estimate accuracy of procedure on data in the test set. AIM-94 Thursday, June 30, 1994 2

Trading off coverage for accuracy Learners usually infer the classification of all test examples Give learner ability to say I don t know on some examples Goal: Learner is more accurate when it makes a classification. Possible applications: Human computer interfaces: Learning Macros Learning rules to translate from Japanese to English Analysis of medical databases - Let learner automatically handle the typical cases - Refer hard cases to a human specialist Evaluation: T- Total number of test examples P- Number of examples for which the learner makes a prediction C- Number of examples whose class is inferred correctly = C P = P T AIM-94 Thursday, June 30, 1994 3

0.94 Trading off coverage for accuracy Lymphography Backprop 0.92 0.9 0.90 8 6 0.7 4 0.5 2 0.2 0.3 0.5 Activation 0.7 0.9 AIM-94 Thursday, June 30, 1994 4

Goals of this research Modify learning algorithms to trade off coverage for accuracy Learners typically have an internal measure of hypothesis quality Use hypothesis quality measure to determine whether to classify Experimental evaluate trading off coverage for accuracy on databases from UCI Archive of Machine Learning Databases Train on 2/3rds Test on remaining 1/3. Averages over 20 trials. Breast Cancer (699 examples; benign from malignant tumors) Lymphography (148 examples; identify malignant tumors) DNA Promoter (106 examples; Leave-one-out testing) Describe how a sparse clinical database (diabetes data sets) can be analyzed by classification learners. AIM-94 Thursday, June 30, 1994 5

Neural Networks One output unit per class. An output units activation is between 0 and 1. Assign an example to the class with the highest activation. Fever Bloodshot eyes Pregnant Headache Nausea Swollen Glands Gender Cancer Age AIM-94 Thursday, June 30, 1994 6

Trading off coverage for accuracy in Neural Networks 1. Assign an example to the class with the highest activation provided that that activation is above a threshold. 2. Assign an example to the class with the highest activation provided that that the difference between that activation and the next highest is above a threshold. (Didn t make a significant difference in our experiments) AIM-94 Thursday, June 30, 1994 7

0.74 Breast Cancer Backprop 0.72 0.9 0.70 8 0.7 6 0.5 4 0.5 0.7 Activation 0.9 AIM-94 Thursday, June 30, 1994 8

0 Promoter Backprop 0.99 0.98 0.97 0.96 0.95 0.2 0.94 0.5 0.7 Activation 0.9 0.0 AIM-94 Thursday, June 30, 1994 9

Bayesian Classifier An example is assigned to the class that maximizes the probability of that class, given the example. If we assume features are independent P(C i A 1 =V 1j &...A n =V nj ) = P(C i ) Estimate from training data: P(C i ) P C i A k =V kj k P C i A k =V kj P(C i ) Trading off coverage for accuracy: (Like backprop) Only make prediction if P(C i A 1 =V 1j &...A n =V nj ) is above some threshold. AIM-94 Thursday, June 30, 1994 10

4 Breast Cancer Bayesian Classifier 2 0 0.78 0.76 0.74 0.2 0.72-20 -18-16 -14-12 -10-8 0.0-6 ln(probability) AIM-94 Thursday, June 30, 1994 11

A Decision Tree (for determining suitability of contact lenses) No 15n 0h 0s Tears Reduced Normal Age Prescription Astigmatic Prescription Soft Hard <15 15-55 >55 Hyper Myope Yes No Hyper Myope Soft Leaf nodes assign classes (n =no, h =hard, s = soft) Different leaves can be more reliable. No 1n 1h 3s 0n 1h 0s 1h 3s 5n 1h Hard 1n 3h 2s Astigmatic Yes No Soft 0n 0h 3s No 2n 1h 1s AIM-94 Thursday, June 30, 1994 12

Trading off coverage for accuracy in decision trees Estimate the probability that an example belongs to some class given that it classified by a particular leaf Two possibilities: Divide training data *learning set *probability estimation set *unbiased estimate of probability, but not most accurate tree Estimate probability from training data * Use Laplace estimate of probability of class given leaf p(class = i) = k + N i +1 k j N j 3 soft, 1 hard, 0 none P(soft) = 4/7 AIM-94 Thursday, June 30, 1994 13

Breast Cancer ID3 0.7 0.0 0.2 Maximum Probability 0.2 AIM-94 Thursday, June 30, 1994 14

First Order Combined Learner Learns a set of first order Horn Clauses (Like Quinlan s FOIL) no_payment_due(p) :- enlisted(p, Org) & armed_forces(org). no_payment_due(p) :- longest_absence_from_school(p,a) & 6 > A & enrolled(p,s,u) & U > 5. no_payment_due(p) :- unemployed(p). Negation as failure Selects literal that maximizes information gain p p 1 log 1 p 2 p 1 +n -log 0 2 1 p 0 +n 0 Averaging Multiple Models Learn several different rules sets (stocastically select literals) Assign example to the class predicted by the majority of rule sets Trading off coverage for accuracy Only make prediction if at least k of the rules sets agree AIM-94 Thursday, June 30, 1994 15

0.72 Breast Cancer FOCL 0.70 8 6 4 2 5 6 7 8 9 Number of Voters 10 11 0.2 12 AIM-94 Thursday, June 30, 1994 16

0 Promotor FOCL 0.98 0.96 0.94 0.2 0.92 5 6 7 8 9 10 11 0.0 12 Number of Voters AIM-94 Thursday, June 30, 1994 17

Learns a contrasting set of rules HYDRA no_payment_due(p) :- enlisted(p, Org) & armed_forces(org). [LS = 4.0] no_payment_due(p) :- longest_absence_from_school(p,a) & 6 > A & enrolled(p,s,u) & U > 5. [LS = 3.2] no_payment_due(p) :- unemployed(p). [LS = 2.1] payment_due(p) :- longest_absence_from_school(p,a) & A > 36 [LS = 2.7] payment_due(p) :- not (enrolled(p,_,_)) & not (unemployed(p)) [LS = 4.1] Attaches a measure of reliability to clauses (logical sufficiency) ls ij = p(clause ij(t) = true t class i ) p(clause ij (t) = true t class i ) Assigns example to the class of satisfied clause with the highest logical sufficiency Trading off coverage for accuracy: Only make prediction if at ratio of logical sufficiency is greater than a threshold AIM-94 Thursday, June 30, 1994 18

Breast Cancer HYDRA 0.7 0.2 0.5 0 1 2 3 4 0.0 5 log(ls Ratio) AIM-94 Thursday, June 30, 1994 19

Analysis of the diabetes data sets with classification learners 02-01-1989 8:00 58 154 Pre-breakfast blood glucose 02-01-1989 8:00 33 006 Regular insulin dose 02-01-1989 8:00 34 016 NPH insulin dose 02-01-1989 11:30 60 083 Pre-lunch blood glucose 02-01-1989 11:30 33 004 Regular insulin dose 02-01-1989 16:30 62 102 Pre-supper blood glucose 02-01-1989 16:30 33 004 Regular insulin dose 02-01-1989 23:00 48 076 Unspecified blood glucose Problems with applying machine learning classifiers: 1. There is not a fixed, small number of classes 2. The data isn t divided into a fixed number of attributes 3. We know very little about medicine, diabetes, blood glucose If you have hammer, everything looks like a nail: 1. Predict whether a blood glucose is above mean for the patient 2. Create attributes and values from coded data 3. Come to AIM-94 and be willing to learn AIM-94 Thursday, June 30, 1994 20

Converting the diabetes data set into attribute value format Current glucose measurement: (above or below). CGT: Current glucose time: (in hours) numeric. CGM: Current glucose meal: (unspecified, breakfast, lunch, super, or snack). CGP: Current glucose period: (unspecified, pre, or post). LGV: Last glucose measurement: numeric. ELGV: Elapsed time since last glucose measurement: (in hours) numeric. LGM: Last glucose meal: (unspecified, breakfast, lunch, super, or snack). LGP: Last glucose period: (unspecified, pre, or post). ENPH: Elapsed time since last NPH insulin: (in hours) numeric. NPH: Last NPH dose: numeric. EREG: Elapsed time since last regular insulin: (in hours) numeric. LREG: Last regular dose: numeric. Ran experiments with patients 20 and 27. Trained on 450, tested on 150 155 PRE BREAKFAST 7.17 PRE LUNCH 16 7.17 6 7.17 15.2 Below 80 PRE LUNCH 2.83 PRE SUPPER 16 10.0 4 2.83 18.0 Below 101 UNSPEC UNSPEC 59.0 PRE BREAKFAST 16 72.0 6 62.0 8.0 Above AIM-94 Thursday, June 30, 1994 21

Backpropagation results 3 (Patient 27) 1 0.59 0.57 0.2 (Patient 27) 0.55 0.5 0.7 Activation 0.0 0.9 AIM-94 Thursday, June 30, 1994 22

FOCL results 1 (Patient 27) 0 0.59 0.58 (Patient 27) 0.57 5 6 7 8 Votes FOCL 9 10 11 0.2 12 AIM-94 Thursday, June 30, 1994 23

Decision tree results 8 (Patient 27) 6 4 2 0 0.58 0.56 0.2 (Patient 27) 0.54 0.5 0.7 Maximum Probability 0.9 0.0 AIM-94 Thursday, June 30, 1994 24

Example of Rule learned by FOCL ABOVE if ENPH 12.0 & LGV 131 & ENPH < 24.0 & CGM SUPPER ABOVE if LGV 132 & LREG < 6.5 & CGT 23.0 ABOVE if ELGV 6.5 & LGV < 130 & LGV 121 ABOVE if ELGV < 56.0 & LGV < 83 & ENPH 24.0 ABOVE if LGV 163 & CGP = UNSPECIFIED & LGV < 181 ABOVE if LGV 131 & LGV < 147 & CGM = LUNCH ABOVE if ENPH 12.0 & LGV 131 & CGT 8.0 & LGV < 142 ABOVE if ELGV 6.5 & LGV 191 & ELGV < 10.5 ABOVE if ELGV 6.5 & CGT 8.0 & LGV < 90 ABOVE if LGV 96 & LGV < 118 & ELGV 4.5 & ENPH 14.5 ABOVE if LGV 128 & ENPH 5.0 & ELGV < 5.5 & LGV < 147 ABOVE if ENPH 5.0 & LGV 189 & ELGV < 4.0 ABOVE if LREG 7.5 & ENPH < 11.5 & LGV < 147 ABOVE if LGV 128 & ELGV 33.5 & CGT < 7.5 Above if at lease 5 hours have elapsed since last NPH insulin Last glucose maesurement was above 189 It s been less than 4 hours since last measurement AIM-94 Thursday, June 30, 1994 25

Conclusions Experimentally evaluated trading off coverage for accuracy in machine learning classifiers Rather than forcing problems to be classification problems, and important issue is to identify new classes of learning problems: Different goals Different example representations We also do research in: Reducing cost of misclassification errors Knowledge-guided induction AIM-94 Thursday, June 30, 1994 26