A DATA MINING APPROACH FOR PRECISE DIAGNOSIS OF DENGUE FEVER

Similar documents
Predicting Breast Cancer Survivability Rates

Early Detection of Dengue Using Machine Learning Algorithms

Evaluating Classifiers for Disease Gene Discovery

Analysis of Classification Algorithms towards Breast Tissue Data Set

FORECASTING MYOCARDIAL INFARCTION USING MACHINE LEARNING ALGORITHMS

Performance Evaluation of Machine Learning Algorithms in the Classification of Parkinson Disease Using Voice Attributes

FUZZY DATA MINING FOR HEART DISEASE DIAGNOSIS

Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool

Classification of Smoking Status: The Case of Turkey

Prediction of Dengue, Diabetes and Swine Flu Using Random Forest Classification Algorithm

International Journal of Pharma and Bio Sciences A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS ABSTRACT

Comparative Analysis of Machine Learning Algorithms for Chronic Kidney Disease Detection using Weka

An Improved Algorithm To Predict Recurrence Of Breast Cancer

International Journal of Pharma and Bio Sciences A STUDY OF CLINCAL PROFILE IN DENGUE CASES ABSTRACT

Data Mining with Weka

An Experimental Study of Diabetes Disease Prediction System Using Classification Techniques

PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH

Mining Big Data: Breast Cancer Prediction using DT - SVM Hybrid Model

Data Mining and Knowledge Discovery: Practice Notes

Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients

Classification of Dengue Outbreak Using Data Mining

Predictive Modeling of Terrorist Attacks Using Machine Learning

Colon cancer survival prediction using ensemble data mining on SEER data

HEART DISEASE PREDICTION USING DATA MINING TECHNIQUES

An Approach for Diabetes Detection using Data Mining Classification Techniques

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods

Effective Values of Physical Features for Type-2 Diabetic and Non-diabetic Patients Classifying Case Study: Shiraz University of Medical Sciences

Automatic Detection of Epileptic Seizures in EEG Using Machine Learning Methods

Australian Journal of Basic and Applied Sciences

International Journal of Advance Research in Computer Science and Management Studies

Analysis of Diabetic Dataset and Developing Prediction Model by using Hive and R

Prediction Models of Diabetes Diseases Based on Heterogeneous Multiple Classifiers

Effect of Feedforward Back Propagation Neural Network for Breast Tumor Classification

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 3 Issue 2, February

A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction

Dengue Disease Prediction Using SMO Classification

Comparative study of Naïve Bayes Classifier and KNN for Tuberculosis

Sudden Cardiac Arrest Prediction Using Predictive Analytics

CANCER PREDICTION SYSTEM USING DATAMINING TECHNIQUES

Multi Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 *

A Critical Study of Classification Algorithms for LungCancer Disease Detection and Diagnosis

Improved Intelligent Classification Technique Based On Support Vector Machines

Diagnosing Diabetes using Data Mining Techniques

Classification and Predication of Breast Cancer Risk Factors Using Id3

COMP90049 Knowledge Technologies

Performance Analysis of Decision Tree Algorithms for Breast Cancer Classification

Ciguatera Fish Poisoning & Dengue Fever

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

Rohit Miri Asst. Professor Department of Computer Science & Engineering Dr. C.V. Raman Institute of Science & Technology Bilaspur, India

Analysis of Cow Culling Data with a Machine Learning Workbench. by Rhys E. DeWar 1 and Robert J. McQueen 2. Working Paper 95/1 January, 1995

A Deep Learning Approach to Identify Diabetes

Detection of Abnormalities of Retina Due to Diabetic Retinopathy and Age Related Macular Degeneration Using SVM

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

CLASSIFICATION OF BREAST CANCER INTO BENIGN AND MALIGNANT USING SUPPORT VECTOR MACHINES

INTRODUCTION TO MACHINE LEARNING. Decision tree learning

Credal decision trees in noisy domains

Impute vs. Ignore: Missing Values for Prediction

Performance Based Evaluation of Various Machine Learning Classification Techniques for Chronic Kidney Disease Diagnosis

Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon Network

CE Unit. Viruses and Vaccines

Clinical Profile of the Dengue Infection in Children

Predicting the Likelihood of Heart Failure with a Multi Level Risk Assessment Using Decision Tree

Decision-Tree Based Classifier for Telehealth Service. DECISION SCIENCES INSTITUTE A Decision-Tree Based Classifier for Providing Telehealth Services

Where is Yellow Fever found?

Vectors and Virulence

ParkDiag: A Tool to Predict Parkinson Disease using Data Mining Techniques from Voice Data

Analysis of Data for Diabetics Patient

Headache Disease Type Classification and Predicting System using Data Mining Techniques

Traffic Accident Analysis Using Decision Trees and Neural Networks

DEVELOPING A PREDICTED MODEL FOR DIABETES TYPE 2 TREATMENT PLANS BY USING DATA MINING

A scored AUC Metric for Classifier Evaluation and Selection

Predicting Breast Cancer Recurrence Using Machine Learning Techniques

Stage-Specific Predictive Models for Cancer Survivability

Genetic Algorithm based Feature Extraction for ECG Signal Classification using Neural Network

Diagnosis of Coronary Arteries Stenosis Using Data Mining

Efficient AUC Optimization for Information Ranking Applications

Remarks on Bayesian Control Charts

A Feed-Forward Neural Network Model For The Accurate Prediction Of Diabetes Mellitus

CANCER DIAGNOSIS USING DATA MINING TECHNOLOGY

Repellent Soap. The Jojoo Mosquito. Africa s innovative solution to Malaria prevention. Sapphire Trading Company Ltd

Arboviruses in Florida. Carina Blackmore, DVM, PhD Florida Department of Health Bureau of Environmental Public Health Medicine

Predicting Juvenile Diabetes from Clinical Test Results

EBOLA & OTHER VIRUSES IN THE NEWS EBOLA VIRUS, CHIKUNGUNYA VIRUS, & ENTEROVIRUS D68

July Hepatitis Monthly Awareness Toolkit

Machine learning model for HIV1 and HIV2 enzyme secondary structure classification

INVESTIGATING ILPD FOR MOST SIGNIFICANT FEATURES

Knowledge Discovery and Data Mining. Testing. Performance Measures. Notes. Lecture 15 - ROC, AUC & Lift. Tom Kelsey. Notes

Binary Classification of Cognitive Disorders: Investigation on the Effects of Protein Sequence Properties in Alzheimer s and Parkinson s Disease

Modelling the risk of dengue for tourists in Rio de Janeiro, during. Janeiro, during the FIFA confederation cup in Brazil 2013

Efficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection

Predicting Breast Cancer Survival Using Treatment and Patient Factors

Ineffectiveness of Use of Software Science Metrics as Predictors of Defects in Object Oriented Software

Predicting the Effect of Diabetes on Kidney using Classification in Tanagra

A Rough Set Theory Approach to Diabetes

Mercer MRC A Newsletter for and about our volunteers

Applying Data Mining for Epileptic Seizure Detection

Liver Functions as a marker of severity of disease in children with Dengue fever

Decision Tree approach in Machine Learning for Prediction of Cervical Cancer Stages using WEKA

Prediction of Diabetes Using Probability Approach

Dengue Virus-Danger from Deadly Little Dragon

Transcription:

A DATA MINING APPROACH FOR PRECISE DIAGNOSIS OF DENGUE FEVER M.Bhavani 1 and S.Vinod kumar 2 International Journal of Latest Trends in Engineering and Technology Vol.(7)Issue(4), pp.352-359 DOI: http://dx.doi.org/10.21172/1.74.048 e-issn:2278-621x Abstract- Dengue is a eviscerate disease common in tropical countries. It is also known as break-bone fever. Dataset for dengue gives information about the patient suffering with the dengue disease. The Dataset consist of attribute like fever, bleeding, metallic taste, Fatigue. The main objective of this study is to calculate the performance of various classification Techniques and compare their performance. The classification techniques used in this study are REP Tree, J48, SMO, ZeroR and Random Tree. The performance of classification techniques were compared by plotting graphs and table. Weka the data mining tool is used for the classification. Keywords Data mining, Dataset, Classification, break bone fever, Weka. I. INTRODUCTION Data mining is the process of analyzing data from different aspect and converting it into useful information. Data mining is a well- known technique used by health organizations for classification of diseases such as dengue, diabetes and cancer in bioinformatics research. Dengue fever is a painful, debilitating mosquito-borne disease caused by dengue viruses. Nearly 390 million dengue infections occur worldwide each year, with about 96 million resulting in illness. Dengue is classified into two types namely type1 and type2.first one is called classical dengue and the other one is called dengue hemorrhagic fever(dhf).dengue hemorrhage Fever is further classified into DHF I,DHF II,DHF III,DHF IV. Dengue Fever transmitted by the bite of an Aedes mosquito infected with a dengue virus. The mosquito becomes infected when it bites a person with dengue virus in their blood and it can be transmitted to other healthy person. It can t be spread directly from one person to another person. Symptoms always starts after four to six days from the day of infection and it will last for 10 days. The symptoms such as sudden fever, severe headaches, pain behind the eyes, Severe joint and muscle pain, Fatigue, nausea, vomiting, skin rash, mild bleeding are common among the patients. Sometimes, symptoms will be mild and it can be predicted as other viral infection such as flu, hence the accurate prediction of the fever is needed. At the beginning stages it is difficult to differentiate the dengue fever and dengue hemorraghic fever. Several Data Mining Techniques are used for the prediction of Dengue Fever. 1 Department of Computer Science and Engineering Rajalakshmi Engineering College, Chennai 2 Department of Computer Science and Engineering Rajalakshmi Engineering College, Chennai.

A Data mining Approach for Precise Diagnosis of Dengue Fever 353 II. METHODOLOGY Weka (Waikato Environment for Knowledge Analysis) is a data mining tool written in java developed at Waikato. WEKA is a very good data mining tool for the users to classify the accuracy on the basis of datasets by applying different algorithmic approaches and compared in the field of bioinformatics [6].It is also well-suited for developing new machine learning schemes. Our main objective is to identify that whether the patient is affected by Dengue or not. Some of the parameter are used for predicting the fever and compare the performance of the various classification techniques. The Various result obtained from the dataset using Weka tool are given below. Figure-1 Weka. Figure-2.SMO classifier. Figure-3: REP Tree classifier. Figure-4.Random Tree classifier. Figure-5 J48 classifier. Figure-6 ZeroR classifier.

M.Bhavani and S.Vinod kumar 354 The main objective of this study is dengue disease prediction using data mining tool. The main task carried out in this study are: 1. Various Data mining classification techniques are used for the Prediction the dengue fever. 2. Comparing different classification techniques. 3. Finding best algorithm for the disease prediction. A. Classification It is the organization of data in given classes. Classification uses given class labels to order the objects in the data collection. Classification approaches normally use a training set where all objects are already associated with known class labels. The classification algorithm learns from the training set and builds a model. The model is used to classify new objects. Five techniques have been used in this study. They are REP Tree, RT, J48, ZeroR and SMO. Their performance was analyzed using some s such as Accuracy, TP Rate, FP Rate, and ROC Area. B. Dataset Dataset is a collection of data. The dataset is contained in database table. The dataset is created in the CSV format. The dataset contains the attribute such as fever, bleeding, myalgia, flu, fatigue, pain, metallic taste. The symptoms are used as Attribute. The dataset loaded in the Weka tool in ARRF format and the performance of the classification techniques are identified and tabulated. Table -1 ATTRIBUTE DESCRIPTION Attribute EPID Fever Bleeding Myalgia Flu Pain Joint/Muscle pain Metallic Taste Result Fatigue possible values any alpha-numeric values Positive or Negative

A Data mining Approach for Precise Diagnosis of Dengue Fever 355 Figure-7 DATASET III. EXPERIMENTS AND RESULT There are numerous techniques available for the prediction of the dengue fever. The classification technique used in this study are following: REP Tree. J48. SMO. ZeroR. Random Tree. SMO: Sequential Minimal Optimization (SMO) is one way to solve the SVM training problem that is more efficient than standard QP solvers. The Sequential Minimal Optimization is commonly called as Platt s SMO algorithm and it is well organized with good computational efficiency. SMO uses heuristics to partition the training problem into smaller problems that can be solved analytically. Whether or not it works well depends largely on the assumptions behind the heuristics (working set selection). We got the outcomes mentioned in the (Table 4).various s obtained as a result was plotted as a graph (Figure 8). J48: C4.5 is the technique used to create a decision ID3 developed by Ross Quinlan. C4.5 is an addition of Quinlan s earlier ID3 Technique. The decision trees created by C4.5 can also be used for classification, and for this purpose, C4.5 is often stated to as an arithmetical classifier [6]. The commonly used method is information gain or entropy in which the that corresponds to level uncertainty in the information. Thus it is like tree structure with root node, intermediate and leaf nodes. Node holds the decision and helps to achieve our result. The dataset is evaluated in Weka Data Mining tool with J48 technique; we got the outcomes

M.Bhavani and S.Vinod kumar 356 mentioned in the Table (Table 3).various s obtained as a result was plotted as a graph (Figure 12). REP Tree: REP Tree is a fast decision tree learner. Builds a decision/regression tree using entropy as impurity and prunes it using reduced-error pruning. It only sorts values for numeric attributes once. The dataset is evaluated in Weka Data Mining tool with REP Tree technique; we got the outcomes mentioned in the Table (Table 2).various s obtained as a result was plotted as a graph (Figure 8). Random Tree: Random Tree is an ensemble learning algorithm that generates many individual learners. It is an algorithm for constructing a tree that considers K random features at each node. It involves a bagging idea to produce a random set of data for constructing a decision tree. In standard tree each node is split using the best split among all variables. The dataset is evaluated in Weka Data Mining tool with Random Tree technique; we got the outcomes mentioned in the Table (Table 2).various s obtained as a result was plotted as a graph (Figure 11). ZeroR: ZeroR is the simplest classification method which relies on the target and ignores all predictors. ZeroR classifier simply predicts the majority category (class). The dataset is evaluated in Weka Data Mining tool with ZeroR technique; we got the outcomes mentioned in the Table (Table 5).various s obtained as a result was plotted as a graph (Figure 10).

A Data mining Approach for Precise Diagnosis of Dengue Fever 357 Table-2 REP TREE Table-3 J48 Attributes name Attributes name Correctly classified 72% Incorrectly classified 28% Relative absolute error 93.05% TP rate 0.72 FP rate 0.72 Precision 0.518 Recall 0.72 F- 0.603 ROC Area 0.46 Correctly classified 76% Incorrectly classified 24% Relative absolute error 66.59% TP rate 0.76 FP rate 0.53 Precision 0.763 Recall 0.76 F- 0.724 ROC Area 0.623

M.Bhavani and S.Vinod kumar 358 Table-4 SMO Table-6 RANDOM TREE Attribute name Attribute name Correctly classified 84% Incorrectly classified 16% Relative absolute error 38.54% TP rate 0.84 FP rate 0.411 Precision 0.869 Recall 0.84 F- 0.816 ROC Area 0.714 Table-5 ZeroR Correctly classified 68% Incorrectly classified 32% Relative absolute error 77.08% TP rate 0.68 FP rate 0.474 Precision 0.68 Recall 0.68 F- 0.68 ROC Area 0.6 Table-7 PERFORMANCE MEASURES Attribute name Correctly classified 72% Incorrectly classified 28% Relative absolute error 100% Techniques TP rate 0.72 FP rate 0.72 Precision 0.72 Recall 0.518 F- 0.603 ROC Area 0.294 Table 8 PERFORMANCE COMPARISON TP Rate ROC Area FP Rate Accuracy SMO 0.84 0.714 0.411 0.84 J48 0.72 0.623 0.53 0.76 REP Tree 0.72 0.46 0.72 0.72 ZeroR 0.72 0.294 0.72 0.72 Random Tree 0.68 0.6 0.47 0.68 Attribute Name Correctly classified Incorrectly classified TP Rate FP Rate ROC area Precision Recall Accuracy F- Description percentage of test instance correctly classified percentage of test instance incorrectly classified true positives: number of examples predicted positive that are actually positive false positives: number of examples predicted positive that are actually negative It is the useful technique for visualizing and selection of classifier based on their performance It is the fraction of retrieved instances that are relavant. It is the fraction of relavant instances that are retrieved A of predictive model that reflects the proportionate number of time that model is correct when applied to dataset It is harmonic mean of precision and recall. IV. COMPARISON In this study, five techniques were used to classify the dengue dataset. After the analysis of each classification techniques, comparison was made between them. Some s like TP Rate, FP Rate, Correctly classified, incorrectly classified and precision were used for comparison; their values are given in the table.(table 8) and the graph is plotted comparing the values obtained as a result (figure 13). From the comparison made, the SMO and J48 give more accuracy than other classifier.

A Data mining Approach for Precise Diagnosis of Dengue Fever 359 V. CONCLUSION The main objective of this study is to predict the dengue fever with high accuracy; five classifiers namely REP Tree, J48, SMO, ZeroR and Random Tree were used for this purpose. These techniques were applied to the dengue dataset using Weka Data mining tool. These techniques were evaluated and their performance was compared. From the analysis, SMO and J48 outperforms well than other algorithms, they achieved the accuracy of 84 % and 76 % respectively. The performance used for comparison are listed in the table (Table 7). REFERENCES: [1] Kamran Shaukat, et.al, Dengue Fever Prediction: A Data Mining Problem, Data Mining Genomics Proteomics, Vol 6, 2015. [2] Farooqi W, Ali S, A Critical Study of Selected Classification Algorithms for Dengue Fever and Dengue Hemorrhagic Fever. Frontiers of Information Technology (FIT), 11th International Conference on IEEE, 2013. [3] Shameem Fathima, Nisar Hundewale, Comparison of Classification Techniques-SVM and Naives Bayes to predict the Arboviral Disease-Dengue, International Conference on Bioinformatics and Biomedicine Workshops on IEEE, 2011. [4] Hlaudi Daniel Masethe, Mosima Anna Masethe, Prediction of Heart Disease using Classification Algorithms, Proceedings of the World Congress on Engineering and Computer Science,Vol 2, 2014. [5] Tina R. Patil, Mrs. S.S.Sherekar, Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification, international Journal of Computer Science and Applications, Vol 6, 2013. [6] Shakil KA et.al. (2015), Dengue disease prediction using weka data mining tool, arxiv preprint arxiv: 1502.05167. [7] Stany Leena Princy and A. Muruganandam, An Implementation of Dengue Fever Disease Spread Using Informatica Tool with Special Reference to Dharmapuri District, International Journal of Innovative Research in Computer and Communication Engineering, 2016. [8] Hlaudi Daniel Masethe, Mosima Anna Masethe, Prediction of Heart Disease using Classification Algorithms, World Congress on Engineering and Computer Science, Vol 2, 2014. [9] Jyoti Rohilla, Preeti Gulia, Analysis of Data Mining Techniques for Diagnosing Heart Disease, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 5, 2015. [10] Dr.T.Christopher, et.al, Study of Classification Algorithm for Lung Cancer Prediction, International Journal of Innovative Science, Engineering & Technology, volume 3, 2016.