PREDICTION OF HEART DISEASE USING HYBRID MODEL: A Computational Approach

Similar documents
BACKPROPOGATION NEURAL NETWORK FOR PREDICTION OF HEART DISEASE

Australian Journal of Basic and Applied Sciences

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

HEART DISEASE PREDICTION USING DATA MINING TECHNIQUES

Predicting Heart Attack using Fuzzy C Means Clustering Algorithm

Sudden Cardiac Arrest Prediction Using Predictive Analytics

HEART DISEASE PREDICTION BY ANALYSING VARIOUS PARAMETERS USING FUZZY LOGIC

DEVELOPMENT OF AN EXPERT SYSTEM ALGORITHM FOR DIAGNOSING CARDIOVASCULAR DISEASE USING ROUGH SET THEORY IMPLEMENTED IN MATLAB

Multi Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 *

IJTC.ORG. Heart Disease Prediction using Genetic Algorithm with Rule Based Classifier in Data Mining. Megha Shahi 1, Er. Rupinder Kaur Gurm 2

Jyotismita Talukdar, Dr. Sanjib Kr. Kalita

An Intelligent System based on Fuzzy Inference System to prophesy the brutality of Cardio Vascular Disease

Classification of Smoking Status: The Case of Turkey

FUZZY DATA MINING FOR HEART DISEASE DIAGNOSIS

Mayuri Takore 1, Prof.R.R. Shelke 2 1 ME First Yr. (CSE), 2 Assistant Professor Computer Science & Engg, Department

Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering

Artificial Intelligence Approach for Disease Diagnosis and Treatment

A Fuzzy Expert System for Heart Disease Diagnosis

Predicting the Likelihood of Heart Failure with a Multi Level Risk Assessment Using Decision Tree

Machine Learning Classifications of Coronary Artery Disease

A Review on Arrhythmia Detection Using ECG Signal

Predicting Breast Cancer Survivability Rates

FORECASTING MYOCARDIAL INFARCTION USING MACHINE LEARNING ALGORITHMS

Prediction of heart disease using k-nearest neighbor and particle swarm optimization.

PREDICTION SYSTEM FOR HEART DISEASE USING NAIVE BAYES

Cardiac Arrest Prediction to Prevent Code Blue Situation

A Fuzzy Improved Neural based Soft Computing Approach for Pest Disease Prediction

Predictive Models for Healthcare Analytics

Enhanced Detection of Lung Cancer using Hybrid Method of Image Segmentation

PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH

A Deep Learning Approach to Identify Diabetes

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods

A Data Mining Technique for Prediction of Coronary Heart Disease Using Neuro-Fuzzy Integrated Approach Two Level

An SVM-Fuzzy Expert System Design For Diabetes Risk Classification

Performance Evaluation of Machine Learning Algorithms in the Classification of Parkinson Disease Using Voice Attributes

An ECG Beat Classification Using Adaptive Neuro- Fuzzy Inference System

SPPS: STACHOSTIC PREDICTION PATTERN CLASSIFICATION SET BASED MINING TECHNIQUES FOR ECG SIGNAL ANALYSIS

Accurate Prediction of Heart Disease Diagnosing Using Computation Method

COMPUTATIONAL APPROACHES FOR HEART DISEASE PREDICTION - A REVIEW

Genetic Algorithm based Feature Extraction for ECG Signal Classification using Neural Network

PG Scholar, 2 Associate Professor, Department of CSE Vidyavardhaka College of Engineering, Mysuru, India I. INTRODUCTION

Effect of Feedforward Back Propagation Neural Network for Breast Tumor Classification

COMPARATIVE STUDY ON FEATURE EXTRACTION METHOD FOR BREAST CANCER CLASSIFICATION

Comparison of ANN and Fuzzy logic based Bradycardia and Tachycardia Arrhythmia detection using ECG signal

Applying Data Mining for Epileptic Seizure Detection

Lung Cancer Diagnosis from CT Images Using Fuzzy Inference System

Brain Tumor segmentation and classification using Fcm and support vector machine

Particle Swarm Optimization Supported Artificial Neural Network in Detection of Parkinson s Disease

Prediction of Diabetes Using Probability Approach

Improved Intelligent Classification Technique Based On Support Vector Machines

Diabetes Patient s Risk through Soft Computing Model

Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon Network

An Improved Patient-Specific Mortality Risk Prediction in ICU in a Random Forest Classification Framework

INTRODUCTION TO MACHINE LEARNING. Decision tree learning

Analysis of Classification Algorithms towards Breast Tissue Data Set

An Improved Algorithm To Predict Recurrence Of Breast Cancer

Primary Level Classification of Brain Tumor using PCA and PNN

Keywords: Adaptive Neuro-Fuzzy Interface System (ANFIS), Electrocardiogram (ECG), Fuzzy logic, MIT-BHI database.

Automatic Detection of Epileptic Seizures in EEG Using Machine Learning Methods

Heart Disease Prediction System Using Data Mining and Hybrid Intelligent Techniques: A Review

Detection of Glaucoma and Diabetic Retinopathy from Fundus Images by Bloodvessel Segmentation

Research Article. Automated grading of diabetic retinopathy stages in fundus images using SVM classifer

A Critical Study of Classification Algorithms for LungCancer Disease Detection and Diagnosis

Phone Number:

NAÏVE BAYESIAN CLASSIFIER FOR ACUTE LYMPHOCYTIC LEUKEMIA DETECTION

REVIEW ON ARRHYTHMIA DETECTION USING SIGNAL PROCESSING

Improvising Heart Attack Prediction System using Feature Selection and Data Mining Methods

Heart Disease Diagnosis System based on Multi-Layer Perceptron neural network and Support Vector Machine

A Survey on Prediction of Diabetes Using Data Mining Technique

MRI Image Processing Operations for Brain Tumor Detection

Quick detection of QRS complexes and R-waves using a wavelet transform and K-means clustering

Survey on Decision Support System For Heart Disease

Feature Diminution by Ant Colonized Relative Reduct Algorithm for improving the Success Rate for IVF Treatment

Automatic Detection of Heart Disease Using Discreet Wavelet Transform and Artificial Neural Network

Rohit Miri Asst. Professor Department of Computer Science & Engineering Dr. C.V. Raman Institute of Science & Technology Bilaspur, India

Identification and Classification of Coronary Artery Disease Patients using Neuro-Fuzzy Inference Systems

Machine learning II. Juhan Ernits ITI8600

Detection of Lung Cancer Using Backpropagation Neural Networks and Genetic Algorithm

Evidence Based Diagnosis of Mesothelioma

DIABETIC RISK PREDICTION FOR WOMEN USING BOOTSTRAP AGGREGATION ON BACK-PROPAGATION NEURAL NETWORKS

AN EXPERT SYSTEM FOR THE DIAGNOSIS OF DIABETIC PATIENTS USING DEEP NEURAL NETWORKS AND RECURSIVE FEATURE ELIMINATION

ABSTRACT I. INTRODUCTION II. HEART DISEASE

Artificial Neural Network Classifiers for Diagnosis of Thyroid Abnormalities

An Edge-Device for Accurate Seizure Detection in the IoT

Detection of Abnormalities of Retina Due to Diabetic Retinopathy and Age Related Macular Degeneration Using SVM

An Enhanced Approach on ECG Data Analysis using Improvised Genetic Algorithm

ECG Beat Recognition using Principal Components Analysis and Artificial Neural Network

Personalized Colorectal Cancer Survivability Prediction with Machine Learning Methods*

Vital Responder: Real-time Health Monitoring of First- Responders

Various performance measures in Binary classification An Overview of ROC study

Keywords Missing values, Medoids, Partitioning Around Medoids, Auto Associative Neural Network classifier, Pima Indian Diabetes dataset.

Predicting Breast Cancer Recurrence Using Machine Learning Techniques

When Overlapping Unexpectedly Alters the Class Imbalance Effects

Heart Abnormality Detection Technique using PPG Signal

Analysis of Diabetic Dataset and Developing Prediction Model by using Hive and R

Neural Network based Heart Arrhythmia Detection and Classification from ECG Signal

A Learning Method of Directly Optimizing Classifier Performance at Local Operating Range

A REVIEW ON CLASSIFICATION OF BREAST CANCER DETECTION USING COMBINATION OF THE FEATURE EXTRACTION MODELS. Aeronautical Engineering. Hyderabad. India.

A MULTI-STAGE NEURAL NETWORK CLASSIFIER FOR ECG EVENTS

Cancer Cells Detection using OTSU Threshold Algorithm

Transcription:

PREDICTION OF HEART DISEASE USING HYBRID MODEL: A Computational Approach 1 G V N Vara Prasad, 2 Dr. Kunjam Nageswara Rao, 3 G Sita Ratnam 1 M-tech Student, 2 Associate Professor 1 Department of Computer Science and Systems Engineering 1 Andhra University College of Engineering (A), Visakhapatnam, AP, India Abstract: One in each four passings in the country happens because of heart disease. The demise rate of heart ailments are more because of absence of prior prediction. Medical databases contain huge amount of patient therapeutic information having some concealed information. The main aim is to predict the heart disease with the help of dataset which is having 14 attributes (age, gender, resting blood pressure, cholesterol, chest pain type, fasting blood sugar, resting cardiographic results, thalach etc.,) by applying advanced Machine learning techniques on it. Heart disease dataset is taken from the UCI (University of California, Irvine) repository. The proposed methodology is to combine clustering and classification algorithms to form a Hybrid Model for prediction of heart diseases. In this model k-means clustering is used as a clustering algorithm to cluster the dataset as two clusters and multilayer perceptron, random forest, gradient boosting tree as classification algorithms. After the dataset is clustered as two clusters by using k-means clustering algorithm, the clustered data is used to train the models which are implemented by multilayer perceptron, random forest and gradient boosting tree. The results acquired by three models are compared and analysed. Combination of K-means clustering and multilayer perceptron giving a more accuracy rate. Index Terms: Multilayer Perceptron, classification, Clustering, K-Means, Hybrid Model, Random Forest, and Gradient Boosting Tree. I. Introduction Heart is one of the strong muscle in the human body. The heart beats of human body is one lakh times per day. For every heartbeat the blood is pumped by heart is 70 times per minute. Heart plays the crucial role in circulatory system. During exercise or in some excitement the heart beat increased as twice. From 1970 onwards many patients are suffered with cardiovascular disease. The heart disease diagnosis is based on signs of symptoms and physical examination of the patient. There are several factors that increase the risk of heart disease, such as smoking habit, body cholesterol level, family history of heart disease, obesity, high blood pressure, and lack of physical exercise. Heart disease is often used interchangeably with the term "cardiovascular disease". Cardiovascular disease generally refers to conditions that involve narrowed or blocked blood vessels that can lead to a heart attack, chest pain (angina) or stroke. Other heart conditions, such as those that affect your heart's muscle, valves or rhythm, also are considered forms of heart disease. Heart disease is the number one killer of women and more than 8.6 million women die of CVD(Cardiovascular disease) including heart attack and stroke around the world each year. Heart attacks claim the lives of 3.3 millions. Heart disease doesn t discriminate. It s the leading cause of death for several populations, including Caucasians, Hispanics, and African-Americans. Almost half of Americans are at risk for heart disease and the numbers are rising. While heart disease can be deadly, it s also preventable in most people. By adopting healthy lifestyle habits early, people can potentially live longer with a healthier heart. Heart disease is the leading cause of death in the world over the past 10 years. The European Public Health Alliance reports that heart attacks, strokes and other circulatory diseases account for 41% of all deaths. The Australian bureau of statistics reports that heart and circulatory system diseases are the first leading cause of death in Australia, 33.7% being fatal. Motivated by the world-wide increasing mortality of heart disease patients each year and the availability of huge amount of patient data from which to extract useful knowledge, researchers have been using data mining techniques to help health care professionals in the diagnosis of heart disease. The American Heart Association says that around 65% of diabetics die from heart disease or stroke, emphasizing the need for new research looking at links between the conditions. There is also evidence that obesity, having a sedentary lifestyles and poor blood glucose control contribute to increased chances of high blood pressure. Women prior to menopause stage, usually have less risk of heart disease than men of the same age. However, women of all ages with diabetes have an increased risk of heart disease because diabetes cancels out the protective effects of being a woman in her childbearing years. In fact, the cardiovascular disease leading to heart attack or stroke is by far the leading cause of death in both men and women diabetics. Another major component of cardiovascular disease is poor circulation in the legs, which contributes to a greatly increased risk of foot ulcers and amputations. Control of the ABCs (A1C, blood pressure, and cholesterol)of diabetes can reduce risk for heart disease. II. Literature Review IJRTI1812020 International Journal for Research Trends and Innovation (www.ijrti.org) 114

Backpropagation algorithm is used to train a neural network for diagnosing and prediction of neonatal diseases. Compared different algorithms MLP (multilayer perceptron), Quick Propagation, Conjugate Gradient Descent to measure the prediction accuracy. The Backpropagation algorithm was used to train the ANN architecture and the same has been tested for the various categories of neonatal disease. This study exhibits the ANN-based prediction of neonatal disease improved the diagnostic accuracy of 75% with higher stability [1]. Cardiovascular disease dataset has applied to different data mining algorithms, such as Support Vector Machine, Artificial neural networks (ANNs), Decision Tree, and RIPPER classifier. The performance of these algorithms through statistical analysis factors such as sensitivity, specificity, accuracy, error rate, True Positive Rate, and False Positive Rate. Accuracy of ANN, RIPPER, Decision Tree and SVM(Support vector machines) are 80.06%, 81.08%, 79.05% and 84.12% respectively[2]. The analysis shows that out of these four classification models SVM predicts cardiovascular disease with minimum error rate and highest accuracy. A decision support system for diagnosis of Congenital Heart Disease has been proposed. The core of the proposed system is based on Backpropagation Neural Network (Multi-Layer Feed Forward Neural Network). The proposed system achieved an accuracy of 90%[3]. A neural network for prediction of heart disease, blood pressure, and sugar. The author used a supervised network for diagnosis of heart disease and trained it using backpropagation algorithm. On the basis of unknown data is entered by a doctor the system will find that unknown data from training data and generate a list of possible disease from which patient can suffer[4]. A prototype Intelligent Heart Disease Prediction System (IHDPS) based on data mining techniques is proposed. The techniques used are Decision Trees, Naˆive Bayes, and Neural Network. The proposed models are developed based on the.net platform. The benchmark dataset has several attributes such as age, sex, blood pressure and blood sugar which is used to predict the likelihood of patients getting a heart disease [5]. A proficient methodology for the extraction of significant patterns from the heart disease warehouses for heart attack prediction. The first step of this work, the data warehouse is pre-processed in order to make it suitable for the mining process. K-mean clustering algorithm has been applied for clustering the heart disease warehouse. Consequently, the frequent patterns applicable to heart disease are mined with the aid of the MAFIA(Maximal Frequent Itemset Algorithm) from the data extracted. In addition, the patterns vital to heart attack prediction are selected on basis of the computed significant weight age. The neural network is trained with the selected significant patterns for the effective prediction of heart attack[6]. Lung Cancer Survival Modelling using Adaptive Neuro-fuzzy Inference System. Here author develops an approach for modelling lung cancer survival based on Adaptive neuro-fuzzy inference System where he combines both the learning capabilities of a neural network and reasoning capabilities of fuzzy logic in order to give enhanced prediction capabilities, as compared to using a single methodology alone[17]. III. Proposed Methodology Data Collection Dataset is taken from the UCI Repository. In UCI (University of California, Irvine) Repository publicly available heart disease dataset is used for determining various heart diseases. using medical profiles such as age, sex, blood pressure, and blood sugar it can predict the likelihood of patients getting a heart disease. The dataset contains 14 columns of 303 records, which describes different features of the dataset. It enables significant knowledge, e.g. patterns, relationships between medical factors related to heart disease. Attribute Description 1. age: age in years 2. sex: sex (1 = male; 0 = female) 3. cp: chest pain type -- Value 1: typical angina -- Value 2: atypical angina -- Value 3: non-anginal pain -- Value 4: asymptomatic 4. trestbps: resting blood pressure (in mm Hg on admission to the hospital) 5. chol: serum cholesterol in mg/dl 6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) 7. restecg: resting electrocardiographic results IJRTI1812020 International Journal for Research Trends and Innovation (www.ijrti.org) 115

-- Value 0: normal --Value 1: having ST-T wave abnormality (T wave inversions and/st elevation or depression of > 0.05 mv) -- Value 2: showing probable or definite left ventricular hypertrophyby Estes Criteria 8. thalach: maximum heart rate achieved 9. exang: exercise induced angina (1 = yes; 0 = no) 10.oldpeak = ST depression induced by exercise relative to rest 11.slope: the slope of the peak exercise ST segment -- Value 1: upsloping -- Value 2: flat -- Value 3: downsloping 12.ca: number of major vessels (0-3) colored by fluoroscopy 13. thal: 3 = normal; 6 = fixed defect; 7 = reversible defect 14. num: diagnosis of heart disease (angiographic disease status) -- Value 0: < 50% diameter narrowing -- Value 1: > 50% diameter narrowing Data preprocessing Cleaning and filtering of the data might be necessarily carried out with respect to the data. In this process removing the duplicate records, normalizing the values used to represent information in the dataset, accounting for missing data points and removing unneeded data fields. Data to transformed for making the appropriate data for the mining process. The raw data is changed into datasets with few appropriate characteristics. In our approach, the heart disease dataset is refined by removing duplicate records and filling missing values. Furthermore, it is also transformed into a form appropriate for clustering. Model Building In model building, Different algorithms are trained on Heart Disease Dataset for heart disease prediction. In proposed system multilayer perceptron, Random Forest and Gradient Boosted Tree algorithms are used to train the model. 85% of data from the dataset is used for training the model. Remaining 15% data is used for testing. In order to improve the accuracy of the proposed model sampling technique is used to balance the dataset. Sampling technique is used to balance the imbalanced dataset by increasing either majority samples or minority samples. Again the model is trained and compared. After balancing the dataset k- means clustering technique is applied on data to separate as two clusters.this data will be concatenated with the dataset and again dataset is trained by multilayer perceptron, random forest and Gradient boosted tree algorithms to form a hybrid model for better accuracy. System architecture is shown in fig 2. Model testing Model testing is necessary to check the performance of the model on real-time data. The proposed system tested the trained models on the test data to compare their performance. IJRTI1812020 International Journal for Research Trends and Innovation (www.ijrti.org) 116

UCI Repository Data Collection Data Preprocessing Clustering Sampling Model Building Model Testing Result Analysis Figure. 1. System Architecture Clustering Clustering is a standard procedure in multivariate data analysis. It is designed to explore an inherent natural structure of the data objects, where objects in the same cluster are as similar as possible and objects in different clusters are as dissimilar as possible. The equivalence classes induced by the clusters to provide a means for generalizing over the data objects and their features. Clustering methods are applied in many domains, such as medical research, psychology, economics and pattern recognition. Sampling Sampling is a technique to balance the imbalance dataset. In sampling technique it balance the dataset either by increasing the minority samples or decreasing the majority samples. This make sure that the training data has an equal amount of two classes of data. IJRTI1812020 International Journal for Research Trends and Innovation (www.ijrti.org) 117

Algorithm step 1 : Start. Figure. 2. Flowchart step 2 : Heart disease dataset is collected from the UCI repository. step 3 : Preprocess the data which is collected from the repository. step 4 : Separate the data as training data and testing data. step 5 : Apply the K-means clustering algorithm on dataset to cluster the data as two classes. step 6 : Concatenate the clustered data with the dataset. step 7 : Train the multilayer perceptron, random forest and gradient boosting tree with clustered data. step 8 : Test the model with testing data. step 9 : Stop IV. Results The input for the model is a heart disease dataset taken from the UCI repository which is a having 14 attributes(age, gender, resting blood pressure, cholesterol, chest pain type, fasting blood sugar, resting cardiographic results, thalach etc.,). When the input dataset is trained with the classification algorithms like multilayer perceptron, random forest and gradient boosting tree before clustering the dataset, the following results are obtained which are shown in table.1. Accuracy - Accuracy is the most intuitive performance measure and it is simply a ratio of correctly predicted observation to the total observations. Accuracy = TP+TN/TP+FP+FN+TN Precision - Precision is the ratio of correctly predicted positive observations to the total predicted positive observations. Precision = TP/TP+FP Recall (Sensitivity) - Recall is the ratio of correctly predicted positive observations to the all observations in actual class. Recall = TP/TP+FN TP-->True positives TN-->True negatives FP-->False positives FN-->False negatives IJRTI1812020 International Journal for Research Trends and Innovation (www.ijrti.org) 118

Table 1. Confusion matrix Class 1 Actual TN FP Class 2 Actual FN TP Table 1. is a model for confusion matrix, it helps to identify the position of true positive, true negative, false positive and false negatives in the confusion matrix. Table 2. Confusion matrix obtained for MLP before clustering Class 1 Actual 20 07 Class 2 Actual 04 15 Table 2. shows the confusion matrix of multilayer perceptron before clustering the dataset. 20, 07, 04, 15 are the values of true negative, false positive, false negative and true positive. Based on these values we can find precision, recall and accuracy. Accuracy = 15+20/15+07+04+20 = 35/46 =0.7608 Precision = 15/15+07 = 15/22 = 0.6818 Recall = 15/15+04 =15/19 = 0.7894 these values are shown in table 5 Table 3. Confusion matrix obtained for RF before clustering Class 1 Actual 17 10 Class 2 Actual 07 12 Table 3. shows the confusion matrix of random forest before clustering the dataset. 17, 10, 07, 12 are the values of true negative, false positive, false negative and true positive. Based on these values we can find precision, recall and accuracy. Accuracy = 12+17/12+10+07+17 = 29/46 = 0.6304 Precision =12/12+10 =12/22 = 0.5454 Recall = 12/12+07 =12/19 = 0.6315 these values are shown in table 5 Table 4. confusion matrix obtained for GBT before clustering Class 1 Actual 19 08 Class 2 Actual 02 17 Table 4. shows the confusion matrix of gradient boosting tree before clustering the dataset. 19, 08, 02, 17 are the values of true negative, false positive, false negative and true positive. Based on these values we can find precision, recall and accuracy. Accuracy = 17+19/17+08+02+19 =36/46 = 0.7826 IJRTI1812020 International Journal for Research Trends and Innovation (www.ijrti.org) 119

Precision = 17/17+08 = 17/25 = 0.68 Recall = 17/17+02 =17/19 = 0.8947 these values are shown in table 5 Table 5. Metric Values before Clustering Model Accuracy Precision Recall Multilayer Perceptron 76.08 68.18 78.94 Random Forest 63.04 54.54 63.15 Gradient Boosting Tree 78.26 68.0 89.47 Table.5 describes the accuracy, precision and recall values of the heart disease prediction using multilayer perceptron, random forest and gradient boosting tree algorithms. Among these three algorithms Gradient boosting tree gives the best accuracy, precision and recall values. GBT gives 78.26 accuracy rate. It means the measured value of GBT is close to actual value when compared to other algorithm. It have 68.0 precision rate. It defines how precisely the value is obtained. The ratio of correctly predicted positive values to the total predicted positive values. It means that an algorithm returned substantially more relevant results than irrelevant ones. Precision is independent of accuracy. It have 89.47 recall value, It defines the ratio of correctly predicted positive values to the actual positive values. When the input dataset is trained with the classification algorithms like multilayer perceptron, random forest and gradient boosting tree after clustering the dataset, the following results are obtained which are shown in table.5 Table 6. Confusion matrix obtained for MLP after clustering Class 1 Actual 17 01 Class 2 Actual 00 13 Table 6. shows the confusion matrix of multilayer perceptron after clustering the dataset. 17, 01, 00, 13 are the values of true negative, false positive, false negative and true positive. Based on these values we can find precision, recall and accuracy. Accuracy = 13+17/13+01+00+17 = 30/31 = 0.9677 Precision = 13/13+01 = 13/14 = 0.9285 Recall = 13/13+00 = 13/13 = 1 these values are shown in table 9 Table 7. confusion matrix obtained for RF after clustering Class 1 Actual 17 01 Class 2 Actual 03 10 Table 7. shows the confusion matrix of random forest after clustering the dataset. 17, 01, 03, 10 are the values of true negative, false positive, false negative and true positive. Based on these values we can find precision, recall and accuracy. Precision = 10/10+01 Accuracy = 10+17/10+01+03+17 = 27/31 = 0.8709 = 10/11 = 0.9090 IJRTI1812020 International Journal for Research Trends and Innovation (www.ijrti.org) 120

Recall = 10/10+03 = 10/13 = 0.7692 these values are shown in table 9 Table 8. Confusion matrix obtained for GBT after clustering Class 1 Actual 16 02 Class 2 Actual 02 11 Table 8. shows the confusion matrix of gradient boosting tree after clustering the dataset. 16, 02, 02, 11 are the values of true negative, false positive, false negative and true positive. Based on these values we can find precision, recall and accuracy. Precision = 11/11+02 Recall = 11/11+02 Accuracy = 11+16/11+02+02+16 = 27/31 = 0.8709 = 11/13 = 0.8461 = 11/13 = 0.8461 these values are shown in table 9 Table 9. Metric Values after Clustering Model Accuracy Precision Recall Multilayer Perceptron with k-means clustering Random Forest with k- means clustering Gradient Boosting Tree with k-means clustering 96.77 92.85 100 87.09 90.90 76.92 87.09 84.61 84.61 Table.9 describes the accuracy, precision and recall values of the heart disease prediction using multilayer perceptron, random forest and gradient boosting tree algorithms. these algorithms are designed as a hybrid model using k-means clustering technique to improve the accuracy. among these three hybrid models, multilayer perceptron with k-means clustering gives the best accuracy, precision and recall values. MLP with K-means clustering algorithm gives 96.77 accuracy rate, It means the measured value of MLP with K-means is close to actual value when compared to other hybrid algorithms. It have 92.85 precision rate, It defines how precisely the value is obtained. The ratio of correctly predicted positive values to the total predicted positive values. It means that an algorithm returned substantially more relevant results than irrelevant ones. Precision is independent of accuracy. It have 100 recall value, It defines the ratio of correctly predicted positive values to the actual positive values. (a)multilayer Perceptron (b) Random Forest IJRTI1812020 International Journal for Research Trends and Innovation (www.ijrti.org) 121

(c)gradient Boosting Tree Figure.3. ROC curves After Clustering Figure 4 shows the ROC( Receiver operating characteristic ) curves of multilayer perceptron,random forest and gradient boosting tree after clustering.roc curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. In fig 4 (a) shows the roc curve of multilayer perceptron which is having 0.97 area.(b) shows the roc curve of random forest which is having 0.86 curve area.(c) shows the roc curve of gradient boosting tree which is having 0.87 curve area.among these curves multilayer perceptron having more curve area. If a model have above 50% AUC(Area Under Curve) then it is acceptable. V. Conclusion Many people are dying because of heart diseases. The death rate of heart diseases is more due to a lack of early prediction. Early prediction of heart disease can be accomplished with an efficient model. In this study multilayer perceptron, gradient boosted technique and Random forest algorithms are used to predict heart disease. The study of results indicating that multilayer perceptron is outperforming the other models in prediction of heart disease. The proposed model implemented with a combination of classification and clustering methods to improve the accuracy of the algorithm, which will predict heart diseases in an early stage. multilayer perceptron with backpropagation neural network and K-means clustering algorithms develop a hybrid approach for increasing effectiveness and accuracy compared to other methods. Early Heart Disease prediction plays a crucial role in successful treatment and saving lives of thousands of patients if its prediction is more accurate, which can be attained by applying more advanced Machine learning algorithms on the Heart Disease dataset. References [1] Dilip Roy Chowdhury, Mridula Chatterjee R. K. Samanta, An Artificial Neural Network Model for Neonatal Disease Diagnosis, International Journal of Artificial Intelligence and Expert Systems (IJAE), Volume (2): Issue (3), 2011. [2] Milan Kumari, Sunila Godara, Comparative Study of Data Mining Classification Methods in Cardiovascular Disease Prediction, IJCST Vol. (2), Issue (2), June 2011. [3] Vanisree K, Jyothi Singaraju, Decision Support System for Congenital Heart Disease Diagnosis based on Signs and Symptoms using Neural Networks, International Journal of Computer Applications (0975 8887) Volume 19 No.6, April 2011. [4] Niti Guru, Anil Dahiya, Navin Rajpal, Decision Support System for Heart Disease Diagnosis Using Neural Network, Delhi Business Review, Vol. 8, No. 1, January-June 2007. [5] Sellappan Palaniappan, Rafiah Awang, Intelligent Heart Disease Prediction System Using Data Mining Technique, 978-1-4244-1968-5/08/, 2008 IEEE. [6] Shanta Kumar B.Patil, Y.S.Kumaraswamy, Intelligent and Effective Heart Attack Prediction System Using Data Mining and Artificial Neural Network, European Journal of Scientific Research, ISSN 1450-216X, Vol.31 No.4 (2009), pp.642-656. [7] Boris Milovic & Milan Milovic 2012, Prediction and Decision Making in Health Care using Data Mining, International Journal of Public Health Science, vol. 1, no.2, pp.69-78 [8] Vikas Chaurasia & Saurabh Pal 2013, Data Mining Approach to Detect Heart Diseases, International Journal of Advanced Computer Science and Information Technology, vol. 2, no.4, pp. 56-66 [9] Lashari, SA & Ibrahim, R 2013, Comparative Analysis of Data Mining Techniques for Medical Data Classification, In Proceedings of 4th International Conference on Computing and Informatics, Sarawak, Malaysia, pp. 365-370 [10] Sunita Beniwal & Jitender Arora 2012, Classification and Feature Selection Techniques in Data Mining, International Journal of Engineering Research & Technology, vol.1, no. 6, pp. 1-6 [11] Chaitrali S, Dangare & Sulabha, S Apte 2012, A Data Mining Approach For Prediction of Heart Disease Using Neural Networks, International Journal of Computer Engineering and Technology, vol. 3, no. 3, pp. 30-40. [12] Chitra, R & Seenivasagam, V 2013, Review of Heart Disease Prediction System Using Data Mining and Hybrid Intelligent Techniques, ICTACT Journal on Soft Computing, vol.3, no.4, pp. 605-609. [13] Hema Vijay 2013, Heart of the matter, The Hindu Metro plus Chennai, viewed 21 October, pp.2 IJRTI1812020 International Journal for Research Trends and Innovation (www.ijrti.org) 122

[14] Ishtake, SH & Sanap, SA 2013, Intelligent Heart disease Prediction System using Data Mining Techniques, International Journal of Healthcare & BioMedical Research, vol.1, no. 3, pp. 94-101 [15] John Peter, T & Somasundaram, KS 2012, Study and Development of Novel Feature Selection Framework for Heart Disease Prediction, International Journal of Scientific and Research Publications, vol. 2, no. 10, pp. 1-7 [16] Karthikeyani, V & Parvin Begum, I 2012, Comparative of Data mining classification algorithm in Diabetes disease Prediction, International Journal of Computer Applications, vol. 60, no. 12, pp. 26-31 [17] Plachikkad.A. Rehana Bhadhrika et al, Lung Cancer Survival Modelling using Adaptive Neuro-fuzzy Inference System, International Journal of Computer Science and Information Technologies, Vol. 6 (5), 2015, 4371-4374 IJRTI1812020 International Journal for Research Trends and Innovation (www.ijrti.org) 123