Modelling and Application of Logistic Regression and Artificial Neural Networks Models

Similar documents
Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon Network

Influence of Hypertension and Diabetes Mellitus on. Family History of Heart Attack in Male Patients

Predicting Breast Cancer Survivability Rates

An Improved Algorithm To Predict Recurrence Of Breast Cancer

Classification of Smoking Status: The Case of Turkey

ARTIFICIAL NEURAL NETWORKS TO DETECT RISK OF TYPE 2 DIABETES

A Deep Learning Approach to Identify Diabetes

COMPARATIVE STUDY ON FEATURE EXTRACTION METHOD FOR BREAST CANCER CLASSIFICATION

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT

Classification of Cervical Cancer Cells Using HMLP Network with Confidence Percentage and Confidence Level Analysis.

Predictive Models for Healthcare Analytics

SCIENTIFIC STUDY REPORT

SCIENCE & TECHNOLOGY

PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH

A prediction model for type 2 diabetes using adaptive neuro-fuzzy interface system.

DYSLIPIDAEMIC PATTERN OF PATIENTS WITH TYPE 2 DIABETES MELLITUS. Eid Mohamed, Mafauzy Mohamed*, Faridah Abdul Rashid

Keywords Missing values, Medoids, Partitioning Around Medoids, Auto Associative Neural Network classifier, Pima Indian Diabetes dataset.

Analysis of Classification Algorithms towards Breast Tissue Data Set

Prediction of Diabetes Using Probability Approach

R Jagdeesh Kanan* et al. International Journal of Pharmacy & Technology

A Biostatistics Applications Area in the Department of Mathematics for a PhD/MSPH Degree

Daniel Boduszek University of Huddersfield

Prediction Models of Diabetes Diseases Based on Heterogeneous Multiple Classifiers

Prediction of Diabetes by using Artificial Neural Network

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

Phone Number:

International Journal of Pharma and Bio Sciences A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS ABSTRACT

LOGISTIC REGRESSION VERSUS NEURAL

MEDICAL SCORING FOR BREAST CANCER RECURRENCE. Nurul Husna bt Jamian UiTM (Perak), Tapah Campus

Jyotismita Talukdar, Dr. Sanjib Kr. Kalita

A hybrid Model to Estimate Cirrhosis Using Laboratory Testsand Multilayer Perceptron (MLP) Neural Networks

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Application of distributed lighting control architecture in dementia-friendly smart homes

Accurate Prediction of Heart Disease Diagnosing Using Computation Method

ABSTRACT I. INTRODUCTION II. HEART DISEASE

MACHINE LEARNING BASED APPROACHES FOR PREDICTION OF PARKINSON S DISEASE

Predicting Juvenile Diabetes from Clinical Test Results

Multi Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 *

Logistic Regression Predicting the Chances of Coronary Heart Disease. Multivariate Solutions

Namık Zengin, Research assistant, Yildiz Technical University, Department of Mechatronics Engineering, Republic of Turkey,

Data-Mining-Based Coronary Heart Disease Risk Prediction Model Using Fuzzy Logic and Decision Tree

Predicting Breast Cancer Recurrence Using Machine Learning Techniques

Results of the liberalisation of Medisave for a population-based diabetes management programme in Singapore

A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction

Improved Intelligent Classification Technique Based On Support Vector Machines

DIABETIC RISK PREDICTION FOR WOMEN USING BOOTSTRAP AGGREGATION ON BACK-PROPAGATION NEURAL NETWORKS

Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool

Prediction of Malignant and Benign Tumor using Machine Learning

An SVM-Fuzzy Expert System Design For Diabetes Risk Classification

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods

Keywords Artificial Neural Networks (ANN), Echocardiogram, BPNN, RBFNN, Classification, survival Analysis.

Predictive Modeling for Wellness and Chronic Conditions

An Efficient Attribute Ordering Optimization in Bayesian Networks for Prognostic Modeling of the Metabolic Syndrome

Automated Prediction of Thyroid Disease using ANN

Chapter 1. Introduction

A Rough Set Theory Approach to Diabetes

Primary Level Classification of Brain Tumor using PCA and PNN

Medical Surveillance on the Staff of UiTM Pahang

Artificial Neural Network Classifiers for Diagnosis of Thyroid Abnormalities

A Survey on Prediction of Diabetes Using Data Mining Technique

Procedia Computer Science

Large-Scale Statistical Modelling via Machine Learning Classifiers

Question 1 Multiple Choice (8 marks)

SVM-Kmeans: Support Vector Machine based on Kmeans Clustering for Breast Cancer Diagnosis

COMP90049 Knowledge Technologies

Module Overview. What is a Marker? Part 1 Overview

Discovering Meaningful Cut-points to Predict High HbA1c Variation

COMPARING THE IMPACT OF ACCURATE INPUTS ON NEURAL NETWORKS

Identification and Classification of Coronary Artery Disease Patients using Neuro-Fuzzy Inference Systems

Prediction of Diabetes Using Bayesian Network

International Journal of Advance Research in Computer Science and Management Studies

A Feed-Forward Neural Network Model For The Accurate Prediction Of Diabetes Mellitus

Mayuri Takore 1, Prof.R.R. Shelke 2 1 ME First Yr. (CSE), 2 Assistant Professor Computer Science & Engg, Department

A Study on Type 2 Diabetes Mellitus Patients Using Regression Model and Survival Analysis Techniques

Chapter 1: Introduction

Automatic Detection of Heart Disease Using Discreet Wavelet Transform and Artificial Neural Network

Automatic Definition of Planning Target Volume in Computer-Assisted Radiotherapy

Application of Artificial Neural Network-Based Survival Analysis on Two Breast Cancer Datasets

Diabetes Control and Complications in Public Hospitals in Malaysia

Comparative Analysis of Machine Learning Algorithms for Chronic Kidney Disease Detection using Weka

INTRODUCTION TO MACHINE LEARNING. Decision tree learning

Brain Tumor segmentation and classification using Fcm and support vector machine

BACKPROPOGATION NEURAL NETWORK FOR PREDICTION OF HEART DISEASE

Does Machine Learning. In a Learning Health System?

IDENTIFYING MOST INFLUENTIAL RISK FACTORS OF GESTATIONAL DIABETES MELLITUS USING DISCRIMINANT ANALYSIS

Performance Evaluation of Machine Learning Algorithms in the Classification of Parkinson Disease Using Voice Attributes

Hypertension and Associated Cardiovascular Risk Factors in Kelantan

DISCRETE WAVELET PACKET TRANSFORM FOR ELECTROENCEPHALOGRAM- BASED EMOTION RECOGNITION IN THE VALENCE-AROUSAL SPACE

SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers

Organ Failure Diagnosis by Artificial Neural Networks

Data mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis

Supplementary Online Content

Original Research Article

MISSING DATA ESTIMATION FOR CANCER DIAGNOSIS SUPPORT

Gender Based Emotion Recognition using Speech Signals: A Review

[Kiran, 2(1): January, 2015] ISSN:

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India

Reduction of Overfitting in Diabetes Prediction Using Deep Learning Neural Network

Available online at ScienceDirect. Procedia Computer Science 62 (2015 ) 39 46

Applications of Machine learning in Prediction of Breast Cancer Incidence and Mortality

Transcription:

Modelling and Application of Logistic Regression and Artificial Neural Networks Models Norhazlina Suhaimi a, Adriana Ismail b, Nurul Adyani Ghazali c a,c School of Ocean Engineering, Universiti Malaysia Terengganu, 21030 Kuala Terengganu, Terengganu, Malaysia. b Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, 40450 Shah Alam, Selangor, Malaysia. norhazlinasuhaimi@gmail.com a, adriana@tmsk.uitm.edu.my b Abstract Data mining involves exploration of large dataset to find valuable information that can aid decision making. The uses of data mining approach to build predictive model for predicting blood glucose level of T2DM patient after receiving treatment in ward. There are two data mining predictive models; logistic regression and artificial neural networks (ANNs). The purpose of this study work was to compare the performance of logistic regression and ANNs models for identifying risk factors that contributing to blood glucose level on T2DM patients. The scope of this study only involves one public hospital in Kelantan, Malaysia. 229 patients with T2DM who had received treatment in ward between 2008 and 2012 with ten input variables were selected. The classification accuracy, sensitivity and specificity have been measured to evaluate the performance for both models. For overall dataset, the logistic regression models achieved classification accuracy of 69.9% with a sensitivity of 50% and a specificity of 70.4%. The ANNs model reached classification accuracy 77.3% with sensitivity of 78.4% and specificity of 71.8%. Meanwhile, after partitioning dataset, logistic regression achieved classification accuracy of 71.3% with a sensitivity of 58.3% and specificity of 73.5%; and the ANNs reached classification accuracy of 72.5% with sensitivity of 72.3% and specificity of 75.3%. Hence, the ANNs model for the overall dataset had the highest classification accuracy compared to logistic regression model. Five important independent variables were identified on blood glucose level including diastolic blood pressure, platelet, white blood cell, low density lipoprotein and total cholesterol in ANNs analysis. This study would be able to contribute to the improvement of strategies and planning in hospital in Malaysia. Keywords: T2DM, Logistic Regression, Artificial Neural Networks, Accuracy, Risk Factors 1. Introduction Type 2 diabetes mellitus (T2DM) also known as diabetes is a major disease especially in developing countries. People are starting wonder in mind what is diabetes and how this disease can be avoided or prevented. Diabetes occurred when a person has high blood sugar and his or her pancreas does not produce enough insulin to process the sugars. Insulin is a peptide hormone, produced by Beta cell of the pancreas that regulates carbohydrate and fat or blood sugar in the body into energy needed for daily life. The guidelines from World Health Organization (WHO,2012) to diagnose T2DM as having fasting plasma glucose concentration of 7.0 mmol/l (or 126 mg/dl) or 11.1 mmol/l (or 200 mg/dl) after 2 hours drinking 75g glucose. According to National Health and Morbidity Survey (Hasan, 2012), the prevalence of diabetes for year 2006 is 11.6% and 15.2% for year 2011. The prevalence of diabetes in Malaysia is increasing 3.6% within five years. This result indicates that T2DM has become a major public health problem in Malaysia. T2DM is a major risk factor for morbidity and mortality due to coronary heart disease, cerebrovascular disease and peripheral vascular disease (Eid et al., 2003). There are several complications that may arise from T2DM such as renal dysfunction with microalbuminuria, proteinuria, cataract and neuropathy (Mafauzy, 2005). Organized by http://worldconferences.net 77

Data mining is the process of exploring and modeling large amounts of dataset to discover unknown relationship that provides a clear useful result that can be used (Belazzi et al., 2008). Data mining is useful in the medical field to explore unknown factor and to build the predictive model (Chen et al., 2011). According to Bellazzi (2006), the most frequently used predictive data mining models in medical field are decision tree (i.e. C4.5, C5, CART), decision rules, logistic regression, ANNs, support vector machines, naïve Bayesian classifier, Bayesian networks and k-nearest neighbors. This study had compared the predictive ability of logistic regression and ANNs models. Logistic regression model has been especially popular with medical research and traditional model (Bosevski et al., 2007; Chew et al., 2012). Meanwhile, ANNs model is new model in medical research. ANNs is the preferred tool for many predictive data mining applications because it power, flexibility and easy to use. It is currently successfully applied to various areas of medicine to solve in medical field, such as diagnosis systems, developing model, forecasting and image analysis (Ganesan et al., 2010; Al-Shayea, 2011). The objective of this study is developing the predictive model to identify the risk factors that contributing to T2DM patients. In order to achieve the objectives if this study, data mining techniques were applied to verify the important risk factors and obtain information that might help clinicians in diagnosing disease. This paper is organized as follows. In section 2, we briefly review the applications of blood glucose level models and the selection of variables. Section 3 presents the methodology for constructing the blood glucose level models. The results are discussed in section 4. Finally, some concluding remarks are given in section 5. 2. Methodology This section explains the process of constructing to identify risk factors associated with T2DM. 2.1 Sources of Data This study involving real secondary data that obtained from medical data department at one of public hospital in Malaysia. Sample that obtained who was diagnosed diabetes for the follow-up period patients starting from 2008 to 2012. 2.2 Description of Variables The variables were chosen after discussing with doctor, based on the some previous study about diabetes and availability of data from green medical record book. The target (dependent) variable of interest is blood glucose level, a binary variable with two categories: uncontrolled or well controlled. There are a total of ten variables that were used in this study as risk factors and selected for building the predictive model as shown in Table 1. Table 1: Description of variables Variable Name Blood glucose level Variable Type Role Description Binary Target 0 : Well controlled (fasting plasma glucose < mmol/l or <11 mmol/l after 2 hours drinking 75g glucose) 1 : Uncontrolled Organized by http://worldconferences.net 78

(fasting plasma glucose 7 mmol/l or 11 mmol/l after 2 hours drinking 75g glucose) Age Continuous Input Age of patient in years Duration of diabetes Categorical Input Length of time of duration in years 0 : <5 1 : 5 10 2 : 11 Total cholesterol Continuous Input In mmol/l Hypertension Categorical Input 0 : No 1 : Yes Systolic blood pressure Continuous Input In mmhg Diastolic blood pressure Continuous Input In mmhg Low density Continuous Input In mmol/l lipoprotein Platelet Continuous Input In 10 9 cells per liter Hemoglobin Continuous Input In g/l White blood cell Continuous Input In 10 9 cells per liter 2.3 Data Modelling This section discusses the construction of the logistic regression and artificial neural networks models. The sample consists of 229 patients whereby there are 69(30.1%) controlled well and 160(69.9%) uncontrolled blood glucose level. Four models were built in this study. The first two models are from overall sample and then partitioned into a training sample (80%) and a validation sample (20%). The training sample data is used to build the models, while the validation sample data is for validation of the models. Figure 1 depicts for the overall data modeling process using IBM SPSS Modeler using version 15.0 and Figure 2 for partitioned data. Organized by http://worldconferences.net 79

Figure 1: Data mining process flow diagram for overall sample Figure 2: Data mining process flow diagram for partitioned sample The pentagon-shaped nodes show the construction of the models using logistic regression and ANNs. The diamond-shaped nodes show the model output of the respective models. 2.3.1 Logistic regression Logistic regression is a typical statistical approach to analyze binary outcome. Logistic regression is sometimes called logistic model or logit model for prediction of the probability of occurrence of an event. For a binary dependent variable, the event of interest is coded as 1 and the nonevent as 0. Predictor variables (x) allow a mixture both qualitative and quantitative. Logistic regression model are derived from logistic function, where write z as the linear function that combine of X s; z X X... X 1 1 2 2 k k (1) Then substitute the linear expression of z into f(z), 1 f ( z) 1 z e (2) Therefore, logistic regression model is written as below: P( Y 1) log P Y X X 1 1 2 2... 1 ( 1) (3) Multiple logistic regression was performed to determine the risk factors associated with T2DM. If twotailed p-value less than α = 0.05 was considered significant risk factor indicating that the independent variable has an effect on outcome. 2.3.2 Artificial Neural Networks k X k Artificial Neural Networks (ANNs) is new model or modern model in medical research. ANNs are commonly known as biologically inspired, highly sophisticated analytical techniques and capable of modeling extreme complex of non-linear functions. ANNs are developed based on brain structure. ANNs is one of the methods of artificial intelligence. ANNs used in predictive applications such as, the Organized by http://worldconferences.net 80

multilayer perceptron (MLP) and radial basis function (RBF) networks. MLP was used in this study because can minimize the prediction error of target variable. ANNs often represented using diagram. The circle in the diagram is called nodes (or units) with lines connecting each nodes are known as connection weight. The weight is representing the level of information. The structure of ANNs is called architecture usually organized in layers that are consisting of unit. There are three layers neuron structure that consist series of nodes in the model such as input, hidden, and output. Each neuron has an internal state is called activation to receive inputs. A study reported (Su et al., 2006) that n input nodes for single hidden layer and s output nodes for single output layer as shown in Figure 3. 2.4 Evaluating Model Performance Figure 3: A typical neural networks (Source: Su et al., 2006) The performance of logistic regression and ANNs models were evaluated according to their accuracy, sensitivity and specificity (Su et al., 2006; Meng et al., 2012). Sensitivity (true positive ratio) and specificity (true negative ratio) were calculated by using confusion matrix. There are four subsets in confusion matrix and represented in Table 3 as below; True positive (TP): Correct classification of positive cases. True negative (TN): Correct classification of negative cases. False positive (FP): Incorrect classification of negative cases into positive class. False negative (FN): Incorrect classification of positive cases into negative class. Let TOTAL TP FP TN FN TP TN Accuracy = TOTAL Table 2: Confusion matrix Predicted Total actual Good Bad probability Actual Good TP FP TP + FP Bad FN TN FN + TN Total Predicted TP + FN FP + TN TOTAL Sensitivity = TP TP FN Organized by http://worldconferences.net 81

Specificity = TN TN FP The model with higher accuracy, sensitivity and specificity were chosen as the best model in predicting the risk factors for T2DM case. 3. Results In this section the results of the predictive models are presented. 3.1 Logistic Regression Results - For Overall Sample The sample for this study consists 229 cases and there are no missing values. The logistic regression model achieved classification accuracy 69.9% with a sensitivity of 50% and a specificity of 70.4%. Result in Table 4 shows that, only duration of diabetes was selected as potential risk factors for T2DM since p-value less than 0.05. Result shows that, the EXP(B) for duration of diabetes is greater than 1, indicating that patients with recently short duration of diabetes (<11 years) are more likely to have uncontrolled blood glucose level compared to those longer duration of diabetes. Duration of diabetes for patients within 6 to 10 years are 2 times (EXP(B) = 2.825) more likely to uncontrolled blood glucose level compared to those duration of diabetes greater than 11 years. Table 3: Variable in the equation Variable B Exp(B) Constant 2.054 7.795 HPT -0.505 0.603 Systolic -0.012 0.988 Diastolic 0.011 1.011 LDL -0.095 0.910 TC 0.170 1.185 Platelet -0.001 0.999 HGB 0.002 1.002 WBC -0.017 0.983 Duration (1) 0.293* 1.341 Duration (2) 1.039* 2.825 Age -0.015 0.986 Chi-square 5.582* -2LL 266.927 Nagelkerke 0.080 R-sq - For Partitioned Sample The dataset was partitioned into 80% for training sample and 20% for testing sample. This model achieved classification accuracy of 71.3% with a sensitivity of 58.3% and specificity of 73.5%. Table 5 shows the coefficient of variables. There are two potential risk factors for T2DM since p-value less than 0.05. There are total cholesterol and platelet. For total cholesterol, the odds ratio means that for every measurement in TC, the odds of uncontrolled blood glucose level will increase by 61.7%. For platelet, the Organized by http://worldconferences.net 82

odds ratio indicates that when measurement platelet increases, the odds of uncontrolled blood glucose level will decrease by 3%. 3.2 Artificial Neural Networks Table 4: Variable in the equation Variable B Exp(B) Constant 1.627 5.087 HPT -0.604 0.547 Systolic -0.018 0.982 Diastolic 0.028 1.028 LDL -0.427 0.652 TC 0.481* 1.617 Platelet -0.003* 0.997 HGB 0.000 1.000 WBC -0.023 0.978 Duration (1) -0.041 0.959 Duration (2) 1.012 2.752 Age -0.007 5.087 Chi-square 20.714* -2LL 191.003 Nagelkerke 0.161 R-sq - For overall sample The network created in this study is multilayer percepteron neural network consists of 10 inputs neurons in the input layer which are the risk factors of T2DM, 9 hidden neurons in the hidden layer and 2 output neurons in the output layer. The ANNs model reached classification accuracy of 77.3% with sensitivity of 78.4% and specificity of 71.8%. Figure 4 shows the importance of the input variables in descending order. It appears that top five most important independent variables (Wah et al., 2011) related to the blood glucose level are diastolic blood pressure, platelet, white blood cell, low density lipoprotein and total cholesterol have the greatest effect on how the networks classifies to the patients. The least importance of independent variable is hypertension. Organized by http://worldconferences.net 83

Figure 4: Importance of independent variable - For partitioned sample For ANNs model, the neural network has 10 input neurons in the input layer, 6 hidden neurons in the hidden layer and 2 output neurons in the output layer. The ANNs reached classification accuracy of 72.5% with sensitivity of 72.3% and specificity of 75.3%. Figure 5 shows the importance of independent variables. The top five most important input variables in descending order of importance are; white blood cell, diastolic blood pressure, platelet, low density lipoprotein and systolic blood pressure have the greatest effect on how the networks classifies to the patients. 3.3 Model Comparisons Figure 5: Importance of independent variables Comparison between these LR and ANNs models was made to determine the best model. The accuracy, specificity and sensitivity rates for training and validation sample are given in Table 6. This study found Organized by http://worldconferences.net 84

that ANNs model for overall sample is the best model because highest accuracy, sensitivity and specificity. Table 5: Comparison models Accuracy Sensitivity Specificity Overall sample LR 69.9% 50.0% 70.4% ANNs 77.3% 78.4% 71.8% Partition (80:20) LR 71.35% 58.3% 73.5% ANNs 72.5% 72.3% 75.0% 4. Conclusion This study was conducted to find the best predictive model for blood glucose level by using logistic regression and artificial neural networks models. Result revealed that the ANNs model is the best model. Employing the ANNs technique in application medical research can reduce cost, time, medical error and need of human expertise. However, the performance of predictive models depends on the structure of data, data quality and variable selection. Acknowledgments This study was funded by Ministry of Higher Education Malaysia under Research Acculturation Collaborative Effort (RACE) Grant Scheme Phase 2/2013. Thanks to Universiti Malaysia Terengganu for its financial support to carry out this study and the record unit from one of public hospital in Kelantan, Malaysia for providing data of diabetes. Special thank to Dr Adriana Ismail and Prof Dr Yap Bee Wah from UiTM Shah Alam because helped me bring ideas down to earth. References 1. World Health Organization,WHO, Diabetes Programme About Types Of Diabetes-2012 2. A.R. Hasan, The Future of Diabetes in Malaysia-2012 retrieved from www.moh.gov.my/attachments/7168 3. M. Eid, M. Mafauzy, A.R Faridah. Glycemic control on type e diabetic patients on follow up at Hospital Universiti Sains Malaysia. Malaysian Journal of Medicia Science, 10, 40-49, 2003. 4. M. Mafauzy, diabetes control and complications in Private Primary Healthcare in Malaysia. Med J Malaysia, 60, 212-217, 2005. 5. R. Belazzi and B. Zupan, Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inform, 77, 81-97, 2008. 6. H.Y. Chen, C.H. Chuang, Y.J. Yang, T.P. Wu, Exploring the risk factors in preterm birth using data mining. Expert Syst Appl, 38, 5384-5387, 2011. 7. R. Belazzi, B. Zupan, Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inform, 17, 1-17, 2006. 8. M. Bosevski, V. Borozanov, L.G. Ismail, Influence of metabolic risk factors on the presence of carotid artery disease in patients with type 2 diabetes and coronary artery disease. Diabetes and Vascular Disease Research, 4, 49-52, 2007. Organized by http://worldconferences.net 85

9. B.C. Chew, M. Ismail, S.S. Ghazali, et al., Determinants of uncontrolled hypertension in adult type 2 diabetes mellitus: An analysis of the Malaysian diabetes registry 2009. Cardiovascular Diabetology, 11, 1-8, 2012. 10.N. Ganesan, K. Venkatesh, A.M. Palani, et al., Application of neural networks in diagnosing cancer disease using demographic data. International Journal of Computer Applications, 26(1), 76-85, 2010. 11.Q.K. Al-Shayea, Artificial neural network in medical diagnosis. Int Journal of Computer Science Issues, 8(2), 150-154, 2011. 12.C-T. Su, C-H. Yang, K-H. Hsu, et al., Data mining for the diagnosis of type II diabetes from threedimensional body surface anthropometrical scanning data. Computers and Mathematics with Applications, 51, 1075-1092, 2006. 13.X-H. Meng, Y-X. Huang, D-P Rao, et al., Comparison of three data mining models for predicting diabetes or prediabetes by risk factors. Kaohsiung Journal of Medical Sciences, 1-7, 2012. 14.Y.B. Wah, N. Huwaina, S. Fong, Predicting car purchase intent using data mining approach. The Eight International Conference on Fizzy Systems and Knowledge Discovery (FSKD) Proceedings, 2052-2057, 2011. Organized by http://worldconferences.net 86