COMP90049 Knowledge Technologies

Similar documents
Knowledge Discovery and Data Mining. Testing. Performance Measures. Notes. Lecture 15 - ROC, AUC & Lift. Tom Kelsey. Notes

INTRODUCTION TO MACHINE LEARNING. Decision tree learning

Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Sampling for Impact Evaluation. Maria Jones 24 June 2015 ieconnect Impact Evaluation Workshop Rio de Janeiro, Brazil June 22-25, 2015

Behavioral Data Mining. Lecture 4 Measurement

Data that can be classified as belonging to a distinct number of categories >>result in categorical responses. And this includes:

Predictive Models for Healthcare Analytics

MS&E 226: Small Data

CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL

Screening (Diagnostic Tests) Shaker Salarilak

An Improved Algorithm To Predict Recurrence Of Breast Cancer

Predicting Breast Cancer Survivability Rates

Statistical Analysis Using Machine Learning Approach for Multiple Imputation of Missing Data

4. Model evaluation & selection

BMI 541/699 Lecture 16

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

A DATA MINING APPROACH FOR PRECISE DIAGNOSIS OF DENGUE FEVER

An Introduction to ROC curves. Mark Whitehorn. Mark Whitehorn

Week 2 Video 3. Diagnostic Metrics

Comparative study of Naïve Bayes Classifier and KNN for Tuberculosis

EECS 433 Statistical Pattern Recognition

Data Mining of Access to Tetanus Toxoid Immunization Among Women of Childbearing Age in Ethiopia

Comparing Multifunctionality and Association Information when Classifying Oncogenes and Tumor Suppressor Genes

Analysis of Diabetic Dataset and Developing Prediction Model by using Hive and R

Various performance measures in Binary classification An Overview of ROC study

Statistics, Probability and Diagnostic Medicine

Prediction of Diabetes Using Probability Approach

Intro to Probability Instructor: Alexandre Bouchard

Sensitivity, Specificity, and Relatives

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India

Cognitive Modeling. Mechanistic Modeling. Mechanistic Modeling. Mechanistic Modeling Rational Analysis

Predictive performance and discrimination in unbalanced classification

Pattern Classification and Analysis of Brain Maps through FMRI data with Multiple Methods

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods

Enhanced Detection of Lung Cancer using Hybrid Method of Image Segmentation

Lecture 2: Foundations of Concept Learning

Probability and Sample space

Analysis of Classification Algorithms towards Breast Tissue Data Set

Section 1.1 What is Statistics?

When Overlapping Unexpectedly Alters the Class Imbalance Effects

An Experimental Study of Diabetes Disease Prediction System Using Classification Techniques

Clinical Decision Analysis

Christina Martin Kazi Russell MED INF 406 INFERENCING Session 8 Group Project November 15, 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Assessment of diabetic retinopathy risk with random forests

A Cue Imputation Bayesian Model of Information Aggregation

Using and Improving EARS for Local Public Health Biosurveillance

TIPSHEET QUESTION WORDING

Our goal in this section is to explain a few more concepts about experiments. Don t be concerned with the details.

sickness, disease, [toxicity] Hard to quantify

Cognitive Modeling. Lecture 12: Bayesian Inference. Sharon Goldwater. School of Informatics University of Edinburgh

HEART DISEASE PREDICTION USING DATA MINING TECHNIQUES

In-house* validation of Qualitative Methods

Reasoning with Bayesian Belief Networks

1.4. Read and match. Reading and matching activity. Matching activity showing illnesses and

METHODS FOR DETECTING CERVICAL CANCER

Creative Commons Attribution-NonCommercial-Share Alike License

Bayesian (Belief) Network Models,

International Journal of Pharma and Bio Sciences A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS ABSTRACT

Diagnostic Reasoning: Approach to Clinical Diagnosis Based on Bayes Theorem

Comparison of ESI-MS Spectra in MassBank Database

Lecture 13: Finding optimal treatment policies

the standard deviation (SD) is a measure of how much dispersion exists from the mean SD = square root (variance)

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Machine learning II. Juhan Ernits ITI8600

Modelling and Application of Logistic Regression and Artificial Neural Networks Models

SURVEY ABOUT YOUR PRESCRIPTION CHOICES

COMMITMENT. &SOLUTIONS Act like someone s life depends on what we do. UNPARALLELED

Lecture 1 An introduction to statistics in Ichthyology and Fisheries Science

Probability Revision. MED INF 406 Assignment 5. Golkonda, Jyothi 11/4/2012

Probability and Counting Techniques

Systematic Reviews and meta-analyses of Diagnostic Test Accuracy. Mariska Leeflang

Loop Dependence and Parallelism

Bayesian a posteriori performance estimation for speech recognition and psychophysical tasks

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Cognitive Modeling. Lecture 9: Intro to Probabilistic Modeling: Rational Analysis. Sharon Goldwater

Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients

Improved Intelligent Classification Technique Based On Support Vector Machines

Phone Number:

6. Unusual and Influential Data

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

IE 5203 Decision Analysis Lab I Probabilistic Modeling, Inference and Decision Making with Netica

Introduction to Epidemiology Screening for diseases

Red Cab, Blue Cab. Choose one answer from A, B, or C. A. I am comfortable that the answer is %.

Review. Imagine the following table being obtained as a random. Decision Test Diseased Not Diseased Positive TP FP Negative FN TN

Analysis of Data for Diabetics Patient

Using AUC and Accuracy in Evaluating Learning Algorithms

Prediction Models of Diabetes Diseases Based on Heterogeneous Multiple Classifiers

Bayesian Networks in Medicine: a Model-based Approach to Medical Decision Making

Heuristics & Biases:

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA

Copyright 2014, 2011, and 2008 Pearson Education, Inc. 1-1

Introduction to Computational Neuroscience

TESTING at any level (e.g., production, field, or on-board)

Premature Ventricular Contraction Arrhythmia Detection Using Wavelet Coefficients

Can You See Me Now? Communication Over Cell Phones for Deaf People. Eve Riskin Richard Ladner Computer Engineering University of Washington

Prediction of Malignant and Benign Tumor using Machine Learning

George Cernile Artificial Intelligence in Medicine Toronto, ON. Carol L. Kosary National Cancer Institute Rockville, MD

Transcription:

COMP90049 Knowledge Technologies Introduction Classification (Lecture Set4) 2017 Rao Kotagiri School of Computing and Information Systems The Melbourne School of Engineering Some of slides are derived from Prof Vipin Kumar and modified, http://www-users.cs.umn.edu/~kumar/

What is classification? Given a collection of training data Each item of the data contains a set of attributes (features), at least one of the attributes is the class. The goal is to build a model for class attributes (dependent variables) as a function of the values of other attributes (independent or decision variables). The discovered model (function) is used to predict the label of a previously unseen item. For determining how good the discovered model is a test set is used. Usually, the given data set is partitioned into two disjoint training and test sets. The training set is used to build the model and the test set is used to validate it by for example the % of the time the class label is correctly discovered.

10 10 Classification Example Tid Refund Marital Status Taxable Income Cheat Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes No Single 75K? Yes Married 50K? No Married 150K? Yes Divorced 90K? No Single 40K? No Married 80K? Training Set Learn Classifier Test Set Model

Given A set of training tuples and their associated class labels Each tuple X is represented by an n dimensional attribute vector X = (x 1, x 2,, x n ) There are K classes C 1, C 2,, C K. Given a tuple X, predict which class X belongs to? Naïve Bayesian classifier For each class C i, estimate the probability p(c i X): P( X C ) P( C ) likelihood prior P( C X) i i posteriori i P( X) evidence Predicts X belongs to C i iff the probability P(C i X) is the highest among all the P(C k X) for all the K classes Since P(X) is constant for all classes, only P( C X) P( X C ) P( C ) i i i needs to be maximized Naïve assumption: attributes are conditionally independent (i.e., no dependence relation between attributes given the condition): Given a training data set, what are the probabilities we need to estimate?

Naïve Bayesian Classifier : An example Headache Sore Temperature Cough Diagnosis severe mild high yes Flu no severe normal yes Cold mild mild normal yes Flu mild no normal no Cold severe severe normal yes Flu Ann comes to the clinic with severe headache, no soreness, normal temperature and with cough. What does she has? Choose the case with highest probability. P( Flu Headache = severe, Sore = no, Temperature = normal, Cough = yes) P(Flu)*P(Headache = severe Flu)*P(Sore = no Flu)*P(Temperature = normal Flu)*P(Cough = yes Flu) P( Cold Headache = severe, Sore = no, Temperature = normal, Cough = yes) P(Cold)*P(Headache = severe Cold)*P(Sore = no Cold)*P(Temperature = normal Cold)*P(Cough = yes Cold)

Naïve Bayesian Classifier: An example We need labelled data to build a classifier Headache Cough Temperature Sore Diagnosis severe mild high yes Flu no severe normal yes Cold mild mild normal yes Flu mild no normal no Cold severe severe normal yes Flu We need to estimate probabilities from the data we have.

Naïve Bayesian Classifier Headache Sore Temperature Cough Diagnosis severe mild high yes Flu no severe normal yes Cold mild mild normal yes Flu mild no normal no Cold severe severe normal yes Flu P(FLU) = 3/5 P( Cold) = 2/5 P( Headache = severe Flu ) = 2/3 P( Headache = severe Cold) = 0/2 P( Headache = mild Flu) = 1/3 P( Headache = mild Cold) = 1/2 P( Headache = no Flu) = 0/3 P( Headache = no Cold) =1/2 P( Sore = severe Flu) = 1/3 P( Sore = severe Cold) = 1/2 P( Sore = mild Flu) = 2/3 P( Sore = mild Cold) = 0/2 P( Sore = no Flu) = 0/3 P( Sore = no Cold) = 1/2

Naïve Bayesian Classifier Headache Cough Temperature Cough Diagnosis severe mild high yes Flu no severe normal yes Cold mild mild normal yes Flu mild no normal no Cold severe severe normal yes Flu P(FLU) = 3/5 P( Cold) = 2/5 P( Temperature = High Flu) = 1/3 P( Temperature = High Cold) = 0/2 P( Temperature = Normal Flu) = 2/3 P( Temperature = Normal Cold) = 2/2 P( Cough = yes Flu) = 3/3 P( Cough = yes Cold) = 1/2 P( Cough = no Flu) = 0/3 P( Cough = no Cold) = 1/2

P(FLU) = 3/5 P( Cold) = 2/5 P( Headache = severe Flu ) = 2/3 P( Headache = severe Cold) = 0/2 ~= e P( Headache = mild Flu) = 1/3 P( Headache = mild Cold) = 1/2 P( Headache = no Flu) = 0/3 ~= e P( Headache = no Cold) =1/2 P( Sore = severe Flu) = 1/3 P( Sore = severe Cold) = 1/2 P( Sore = mild Flu) = 2/3 P( Sore = mild Cold) = 0/2 ~ e P( Sore = no Flu) = 0/3 ~ e P( Sore = no Cold) = 1/2 P(FLU) = 3/5 P( Cold) = 2/5 P( Temperature = High Flu) = 1/3 P( Temperature = High Cold) = 0/2 ~e P( Temperature = Normal Flu) = 2/3 P( Temperature = Normal Cold) = 2/2 P( Cough = yes Flu) = 3/3 P( Cough = yes Cold) = 1/2 P( Cough = no Flu) = 0/3 ~=e P( Cough = no Cold) = 1/2 e= small value = 10-7 (one can use e to be less than 1/n where n is the number of training instances) P( Flu Headache = severe, Sore = no, Temperature = normal, Cough = yes) = P(Flu)*P(Headache = severe Flu)*P(Sore = no Flu)*P(Temperature = normal Flu)*P(Cough = yes Flu) = 3/5 x 2/3 x e x 2/3 x 3/3 = 0.26e P( Cold Headache = severe, Sore = no, Temperature = normal, Cough = yes) ~ P(Cold)*P(Headache = severe Cold)*P(Sore = no Cold)*P(Temperature = normal Cold)*P(Cough = yes Cold) = 2/5 x e x ½ x 1 x ½ = 0.1e => Diagnosis is Flu

Naïve Bayesian Classifier Headache Sore Temperature Cough Diagnosis severe mild high yes Flu no severe normal yes Cold mild mild normal yes Flu mild no normal no cold severe severe normal yes Flu P(Flu Headache = severe, Sore = no, Temperature = normal, Cough = yes) =? P(Cold Headache = severe, Sore = no, Temperature = normal, Cough = yes) =? Abbreviations: F = Flu, C = Cold, H = Headache, S = Sore, T = Temperature, Cou = Cough se = severe, mi = mild, nor = normal, hi = high, ye = yes, no= no P(F H = se, S= no, T = nor, Cou= ye) = P(F, H = se, S= no, T = nor, Cou= ye)/p( H = se, S= no, T = nor, Cou= ye) = P(F)P( H = se F) P( S= no F) P(T = nor F) P(C= ye F)/ P( H = se, S= no, T = nor, Cou= ye) We assumes H, T, S and Cou are conditionally independent (naïve assumption) when the individual is suffering from Flue or Cold

Bayesian Network Diagnosis Headache Sore Temperature Cough

Bayesian Network with Naïve assumption Diagnosis Headache Sore Temperature Cough

Naïve Bayesian Classifier Conclusions Naïve Bayesian (NB) Classifier is very simple to build, extremely fast to make decisions, and easy to update the probabilities when the new data becomes available. Works well in many application areas. Scales easily for large number of dimensions (100s) and data sizes. Easy to explain the reason for the decision made. One should apply NB first before launching into more sophisticated classification techniques. Storage required is O( C O( C O( C( 1 D i 1 CD C Di D)) Di) 1 C where D number of dimensions Di ith domain size C number of classes D i 1 D i 1 ( Di 1))

How to evaluate Classifiers? Actual P N For two class problem: There are Positive (P) cases and Negative (N) cases. A classifier may classify a Positive instance as Positive (this case is called True Positives, TP) or as Negative (False Negatives, FN). Similarly a Negative instance can be classified as Negative instance (this case is False Positive, FP) or as Negative (this case is True Negative, TN) Predicted P TP FP N FN TN

How to evaluate Classifiers? Actual Accuracy TP TN TP TN FP FN P N P TP FP Sensitivity Specificity TP TP FN TN TN FP TP Recall TP FN TP Pr ecision TP FP F1_ Score 2Recall * Pr escision Recall precision Accuracy with respect to positive cases also called true positive rate Accuracy with respect to negative cases Predicted N FN TN False negative rate = F/(TP+FN)

How to evaluate Classifiers? Actual P N P TP FP N FN TN Predicted TP TN Accuracy TP TN FP FN TP Sensitivity TP FN TN Specificity TN FP TP Recall TP FN TP Pr ecision TP FP F1_ Score 2 Recall * Pr escision Recall precision

Evaluation schemes Leave-One-Out Let us assume we have N data points for which we know the labels. We chose each data point as test case and the rest as training data. This means we have to train the system N times and the average performance is computed. Good points: There is no sampling bias in evaluating the system and the results will be unique and repeatable for given method. The method also generally gives higher accuracy values as all N -1 points are used in training. (We are assuming more data points means a more accurate classifier can be built this may not be always be true with certain data.) Bad point: It is infeasible if we have large data set and the training is itself very expensive.

Evaluation schemes 10 Fold cross validation. Let us assume we have N data points for which we know the labels. We partition the data into 10 (approximately) equal size partitions. We choose each partition for testing and the remaining 9 partitions for training. This means we have to train the system 10 times and the average performance is computed. P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 Train with {P2,P3,,P10} and test with {P1} Train with {P1,P3,,P10} and test with {P2} Train with {P1,P2,P4,,P10} and test with {P3} Train with {P1,P2,P3,P5,,P10} and test with {P4} Train with {P1,P2,P4,P6,,P10} and test with {P5} Train with {P1,P2,P8,P10} and test with {P9} Train with {P1,P2,P9,} and test with {P10}

Evaluation schemes 10 Fold cross validation. Good points: We need to train the system only 10 times unlike Leave-One-Out which requires training N times. Bad Points: There can be a bias in evaluating the system due to sampling (the way we do the partitioning), that is how data is distributed among the 10 partitions. The results will not be unique unless we always partition the data identically. One solution is repeat the 10 Fold Cross Validation by randomly shuffling the data say by 5 or more times. The results will give slightly lower accuracy values as only 90% data is used for training. For small data sets it is not always possible to partition the data properly such that each partition represents the data IID (Identically Independently Distributed).

Exercise 1 Income Student? Credit rating Buy computer High No Fair No High No Excellent No High No Fair Yes Medium No Fair Yes low yes Excellent No Low Yes Excellent Yes Medium No Fair No What are the probabilities we need to estimate in a Naïve Bayesian classifier? Will a student with high income, excellent credit rating buy a computer?

Exercise 2 Given graph: x 1 x 2 p(x 1,x 2,x 3,x 4,x 5 ) =? x 3 x 5 x 4

Exercise 3 p(x 1,x 2,x 3,x 4,x 5 ) =? Build a graph for p(x 1,x 2,x 3,x 4,x 5 ).