DIABETIC RISK PREDICTION FOR WOMEN USING BOOTSTRAP AGGREGATION ON BACK-PROPAGATION NEURAL NETWORKS

Similar documents
Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon Network

AN EXPERT SYSTEM FOR THE DIAGNOSIS OF DIABETIC PATIENTS USING DEEP NEURAL NETWORKS AND RECURSIVE FEATURE ELIMINATION

An Improved Algorithm To Predict Recurrence Of Breast Cancer

Correlate gestational diabetes with juvenile diabetes using Memetic based Anytime TBCA

International Journal of Pharma and Bio Sciences A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS ABSTRACT

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

A Classification Technique for Microarray Gene Expression Data using PSO-FLANN

ARTIFICIAL NEURAL NETWORKS TO DETECT RISK OF TYPE 2 DIABETES

A Feed-Forward Neural Network Model For The Accurate Prediction Of Diabetes Mellitus

J2.6 Imputation of missing data with nonlinear relationships

Cardiac Arrest Prediction to Prevent Code Blue Situation

CSE Introduction to High-Perfomance Deep Learning ImageNet & VGG. Jihyung Kil

Application of Tree Structures of Fuzzy Classifier to Diabetes Disease Diagnosis

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

A prediction model for type 2 diabetes using adaptive neuro-fuzzy interface system.

Reduction of Overfitting in Diabetes Prediction Using Deep Learning Neural Network

Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering

BACKPROPOGATION NEURAL NETWORK FOR PREDICTION OF HEART DISEASE

A Deep Learning Approach to Identify Diabetes

Prediction of Diabetes Disease using Data Mining Classification Techniques

Question 1 Multiple Choice (8 marks)

Prediction Models of Diabetes Diseases Based on Heterogeneous Multiple Classifiers

A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction

Medical Diagnosis System based on Artificial Neural Network

Multi Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 *

Predicting Breast Cancer Survival Using Treatment and Patient Factors

Predicting Breast Cancer Recurrence Using Machine Learning Techniques

On Training of Deep Neural Network. Lornechen

Assistant Professor, School of Computing Science and Engineering, VIT University, Vellore, Tamil Nadu

Predicting Breast Cancer Survivability Rates

Performance Based Evaluation of Various Machine Learning Classification Techniques for Chronic Kidney Disease Diagnosis

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Training and Analysis of a Neural Network Model Algorithm

Survey on Breast Cancer Analysis using Machine Learning Techniques

ML LAId bare. Cambridge Wireless SIG Meeting. Mary-Ann & Phil Claridge 23 November

Keywords Missing values, Medoids, Partitioning Around Medoids, Auto Associative Neural Network classifier, Pima Indian Diabetes dataset.

An SVM-Fuzzy Expert System Design For Diabetes Risk Classification

MRI Image Processing Operations for Brain Tumor Detection

IDENTIFYING MOST INFLUENTIAL RISK FACTORS OF GESTATIONAL DIABETES MELLITUS USING DISCRIMINANT ANALYSIS

Predicting Heart Attack using Fuzzy C Means Clustering Algorithm

PREDICTION OF DIABETES USING BACK PROPAGATION ALGORITHM

Brain Tumor segmentation and classification using Fcm and support vector machine

R Jagdeesh Kanan* et al. International Journal of Pharmacy & Technology

COMPARING THE IMPACT OF ACCURATE INPUTS ON NEURAL NETWORKS

Predicting Juvenile Diabetes from Clinical Test Results

BLOOD GLUCOSE PREDICTION MODELS FOR PERSONALIZED DIABETES MANAGEMENT

Deep Learning Analytics for Predicting Prognosis of Acute Myeloid Leukemia with Cytogenetics, Age, and Mutations

Prediction of heart disease using k-nearest neighbor and particle swarm optimization.

Radiotherapy Outcomes

Gender Based Emotion Recognition using Speech Signals: A Review

Keywords Artificial Neural Networks (ANN), Echocardiogram, BPNN, RBFNN, Classification, survival Analysis.

Leveraging Pharmacy Medical Records To Predict Diabetes Using A Random Forest & Artificial Neural Network

Predicting the Effect of Diabetes on Kidney using Classification in Tanagra

Learning in neural networks

Learning Convolutional Neural Networks for Graphs

Contents. Just Classifier? Rules. Rules: example. Classification Rule Generation for Bioinformatics. Rule Extraction from a trained network

Learning and Adaptive Behavior, Part II

An Experimental Study of Diabetes Disease Prediction System Using Classification Techniques

INTRODUCTION TO MACHINE LEARNING. Decision tree learning

Personalized Colorectal Cancer Survivability Prediction with Machine Learning Methods*

A HMM-based Pre-training Approach for Sequential Data

Multilayer Perceptron Neural Network Classification of Malignant Breast. Mass

A study of machine learning performance in the prediction of juvenile diabetes from clinical test results

Data mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis

A Survey on Prediction of Diabetes Using Data Mining Technique

Accurate Prediction of Heart Disease Diagnosing Using Computation Method

Automated Prediction of Thyroid Disease using ANN

PMR5406 Redes Neurais e Lógica Fuzzy. Aula 5 Alguns Exemplos

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures

PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH

Machine Learning Classifier for Preoperative Diagnosis of Benign Thyroid Nodules

Convolutional and LSTM Neural Networks

Efficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine based on Analysis of Variance Features

ANN predicts locoregional control using molecular marker profiles of. Head and Neck squamous cell carcinoma

DIABETES MELLITUS DIAGNOSTIC EXPERT SYSTEM

Design of Multi-Class Classifier for Prediction of Diabetes using Linear Support Vector Machine

An Edge-Device for Accurate Seizure Detection in the IoT

Prediction of Diabetes Using Probability Approach

Fuzzy Cognitive Maps Approach to Identify Risk Factors of Diabetes

Implementation of Inference Engine in Adaptive Neuro Fuzzy Inference System to Predict and Control the Sugar Level in Diabetic Patient

[Kiran, 2(1): January, 2015] ISSN:

ABSTRACT I. INTRODUCTION II. HEART DISEASE

Prediction of Heart Attack risk from Behavioral habits and Demographic variables: An Artificial Neural Network approach

Predictive Model for Detection of Colorectal Cancer in Primary Care by Analysis of Complete Blood Counts

Efficient Classification of Lung Tumor using Neural Classifier

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods

COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) DETECTION OF ACUTE LEUKEMIA USING WHITE BLOOD CELLS SEGMENTATION BASED ON BLOOD SAMPLES

Recognition of English Characters Using Spiking Neural Networks

Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool

Efficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection

A FUZZY LOGIC BASED CLASSIFICATION TECHNIQUE FOR CLINICAL DATASETS

A Fuzzy Expert System for Heart Disease Diagnosis

Data Mining Diabetic Databases

Figure 1: MRI Scanning [2]

CS 453X: Class 18. Jacob Whitehill

Particle Swarm Optimization Supported Artificial Neural Network in Detection of Parkinson s Disease

Application of Computational Technique in. Design of Classifier for Early Detection of. Gestational Diabetes Mellitus

Brain Tumour Detection of MR Image Using Naïve Beyer classifier and Support Vector Machine

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Transcription:

International Journal of Computer Engineering & Technology (IJCET) Volume 9, Issue 4, July-Aug 2018, pp. 196-201, Article IJCET_09_04_021 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=9&itype=4 Journal Impact Factor (2016): 9.3590(Calculated by GISI) www.jifactor.com ISSN Print: 0976-6367 and ISSN Online: 0976 6375 IAEME Publication DIABETIC RISK PREDICTION FOR WOMEN USING BOOTSTRAP AGGREGATION ON BACK-PROPAGATION NEURAL NETWORKS Alan Jacob, Ananthakrishnan D.S., Jishnu Prakash K, Karishma Elsa Johns Department of Computer Science and Engineering, T.K.M. College of Engineering, Kerala ABSTRACT The greatest challenge to current health care is the rapid growth of diabetes. This paper helps in predicting diabetes by using bootstrap aggregation with backpropagation neural network. Backpropagation is a method used in artificial neural network to calculate the error contribution of each neuron after a batch of data is processed. Bootstrap aggregation is an ensemble method which combines the predictions from multiple neural networks together to make more accurate predictions than any individual model. The dataset used is collected from UCI machine learning repository which contains information of persons with and without diabetics. Python scikit-learn library was used for designing the neural network and for implementing bootstrap aggregation. Results with greater accuracy have been obtained. Key words: Diabetes, Bootstrap aggregation, neural networks, Backpropagation. Cite this Article: Alan Jacob, Ananthakrishnan D.S., Jishnu Prakash K, Karishma Elsa Johns, Diabetic Risk Prediction For Women Using Bootstrap Aggregation On Back- Propagation Neural Networks. International Journal of Computer Engineering & Technology, 9(4), 2018, pp. 196-201. http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=9&itype=4 1. INTRODUCTION Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. Machine learning works effectively in the presence of huge data. Medical science is yielding large amount of data daily from research and development (R&D), physicians and clinics, patients, caregivers etc. These data can be used in synchronizing the information and using it to improve healthcare infrastructure and treatments. This has potential to help so many people, to save lives and money. With 50.8 million suffering from diabetes, India continues to be the diabetes capital. And by 2030, nearly 9% of the India s population is likely to be affected from diabetes, according to a study of International Diabetes Federation [1]. Diabetes is a chronic disease caused when either pancreas does not produce enough insulin or the cells in the body do not respond properly to insulin. There are three types of diabetes - Type 1 corresponds to first condition, Type 2 corresponds to the second condition, and Gestational Diabetes is formed during pregnancy [2]. Type 1 http://www.iaeme.com/ijcet/index.asp 196 editor@iaeme.com

Diabetic Risk Prediction For Women Using Bootstrap Aggregation On Back-Propagation Neural Networks diabetes occurs when the immune system mistakenly attacks and kills the beta cells of the pancreas. About five to 10 percent of people with diabetes have type 1 diabetes. Type 1 diabetes generally develops in childhood or adolescence. Type 2 diabetes occurs when the body can t properly use the insulin that is released (called insulin insensitivity) or does not make enough insulin. About 90 per cent of people with diabetes have type 2 diabetes. Type 2 diabetes more often develops in adults, but children can also be affected. A third type of diabetes, gestational diabetes, is a temporary condition that occurs during pregnancy. It affects approximately two to four per cent of all pregnancies (in the non-aboriginal population) and involves an increased risk of developing diabetes for both mother and child. In this paper, the performance of Back-propagation Neural Networks with Bootstrap aggregation on predicting diabetes risk was tested and investigated. Bootstrapping is a process of selecting samples from original sample and using these samples for estimating various statistics or model accuracy. The dataset from the UCI machine learning repository were collected and scaled and then divided into five random sets with replacement and was fed onto modelled neural network with 4 layers. The results achieved by previous studies using Artificial Neural Networks and the results of Bootstrap Aggregation with ANN is compared. 2. LITERATURE REVIEW In the current scenario there exists many methods to predict and classify diabetes. [3] focuses on diabetic prediction using Machine Learning techniques-support Vector Machine for detection and Decision Trees for prediction where Support Vector Machine is a supervised machine learning data-set classification technique. They have constructed a hyper-plane that divides the data sets into various categories, which is at a maximal distance from the classes, during training phase. SVM technique can be extended for large data sets, in which Hyperplane is done through Kernel Formation. It is easy to implement and requires less processing time for small data sets and it also removes over fit nature of the samples. It also uses Decision tree algorithm, a supervised learning technique for prediction, obtaining a tree or graph like structure upon splitting the values based on attributes and conditions. The prediction was made by traversing from root to leaf. Usage of decision tree leads to the instability of the system even on slight variation of the input dataset, adding to the drawbacks of the system. [4] focuses on diabetes prediction and related diseases using artificial neural networks and decision tree classifiers. The artificial neural network and decision tree classifiers are used as classifiers to determine the type of treatment required for the patients and the artificial neural network is trained to forecast the blood sugar level of patients. This method also uses Self Organizing Maps (SOM) to predict possible chronic diseases for a patient with diabetes to have. Both [3] and [4] makes use of decision tree classifiers as it enables deep analysis of the problem and involves the disadvantage of high variance in output for small changes in the dataset. [5] implements an artificial neural network combined with fuzzy logic to detect diabetes. This method gives better results as fuzzy accounts for uncertainties also. Extracting rules from existing methods is not very efficient as it takes time adds to the disadvantages of the system. The method proposed in this paper holds a clear upper hand over existing models of diabetic risk prediction as it involves the usage of back propagation neural networks and bootstrap aggregation method. Bagging prevents the model from overfitting the dataset. Using an ensemble of neural networks in the system reduces variance in the output and thereby increases accuracy of prediction. The model is applicable to large datasets as it uses neural networks leading to a larger variety of application. http://www.iaeme.com/ijcet/index.asp 197 editor@iaeme.com

Alan Jacob, Ananthakrishnan D.S., Jishnu Prakash K, Karishma Elsa Johns 3. BACK-PROPAGATION ALGORITHM Back propagation, short for "backward propagation of errors", is an algorithm for supervised learning of artificial neural networks. The back propagation algorithm involves specifying a cost function then modifying the weights iteratively according to the gradient of the cost function [7]. It has the advantages of accuracy and versatility. For each hidden layer Z in j = V 0j + n i=1x i V ij Z j = f(z in j) For each output unit yk, net input is calculated as, Y in k = W 0k + j=1 n z jw jk Y k= f(y in k) as in [7] For back propagation phase, ꝭ k has to back propagate. ꝭ k=(t k y k) f (y in k) ΔW jk = ꝭ kαz j Δ W 0k = ꝭ k α ꝭ j= ( j=1 n ꝭ kw jk) f (Z in j) using ꝭ k weights and bias between input and hidden layer is updated as ΔV ij = ꝭ jαx i ΔV 0j = ꝭ jα as in [7] 4. BOOTSTRAP AGGREGATION METHOD Bootstrap Aggregation also known as Bagging[6], is a simple yet powerful ensemble method which combines the predictions from multiple neural networks together to make more accurate predictions than any individual model.it involves fitting the model, including all the potential data points, on the original training set. Training set of sizes up to the training set are generated by the replacement of the original training dataset. Data points may appear more than once and may appear not even once. By averaging across the samples, bagging effectively removes the instability of the decision rule, thus reducing the variance of the bagged prediction model than the model where we fit only one classifier to the original training set. Bootstrap Aggregation Algorithm 1. Build the model: for m=1 to M Bootstrap sample D m of size N with replacement from the original training set D with equal weight. Train a neural network G m(x) to the bootstrap sample D m 2. Predicting: For m = 1 to M: Apply Gm to the testing set DT. Classifier using I { M i=1gi (xi)/m >threshold value} 1 http://www.iaeme.com/ijcet/index.asp 198 editor@iaeme.com

Diabetic Risk Prediction For Women Using Bootstrap Aggregation On Back-Propagation Neural Networks 5. CONSTRUCTING MODEL The dataset was collected from the UCI machine learning repository. The features considered were number of pregnancies, Glucose level, Blood pressure, Insulin, BMI and age. Standard feature scaling was done to each of the training sets. The dataset consisting of 492 samples was split into training and testing dataset. Table 1 Input and Output Variables No Variables Description Value 1 Pregnancies Number of times pregnant Numeric 2 Glucose Plasma glucose concentration a 2 hour in an oral glucose tolerance Numeric test 3 Blood Pressure Diastolic blood pressure (mm Hg) Numeric 4 Insulin 2-Hour serum insulin (mu U/ml) Numeric 5 BMI Body mass index (weight in kg/ (height in m)2) Numeric 6 Age Age (years) Numeric 7 Output Class 0 - Normal group 1- Diabetic Risk group A 4 layer neural network was modelled having 6 input nodes in the input layer. The two hidden layers consists of 10 and 3 nodes respectively. The activation function used between input layer and hidden layer is tanh and logistic function was used between hidden and output layer. The neural network was trained on the training data using back-propagation algorithm for weight updation and learning rate was fixed at 0.001. Stochastic gradient-based optimizer was employed and maximum epoch limit was fixed to 1000. The training dataset is divided into 5 sets of random samples picked with replacement from the dataset. Each set has a maximum limit of 290 samples. Five back-propagation neural networks were modelled on each of the sets and the actual prediction was made on averaging the sum of predictions of the models. Figure 1 Neural Network model. Input layer consists of 6 nodes and hidden layers contains 10 and 3 nodes respectively Output layer has two nodes http://www.iaeme.com/ijcet/index.asp 199 editor@iaeme.com

Alan Jacob, Ananthakrishnan D.S., Jishnu Prakash K, Karishma Elsa Johns 6. RESULTS AND FUTURE SCOPE The neural network was trained and tested on the testing data. The gradient descent converged at 438 epochs and the resultant weights were obtained. The backpropagation neural network modelled produced an accuracy of 84% on the testing dataset. Figure 2 Confusion Matrix of single Backpropagation Neural Network. The model employed with bootstrap aggregation method with 4 estimators produced 84% accuracy and with 10 estimators produced produced an accuracy of 87% with considerable reduction in the variance of prediction of the model. Table 2 Comparison of accuracy with different number of estimators during Bootstrap Aggregation. The optimal number of estimators was fixed at 5 Bagging n=1 n=5 n=8 Base Estimator BPN Neural Network BPN Neural Network BPN Neural Network Accuracy 84% 87% 86.6% It is found that better results are obtained by using bootstrap aggregation with neural networks when compared to the commonly used artificial neural networks for diabetic risk prediction. Figure 3 Confusion matrix for Bootstrap Aggregation on Neural Network http://www.iaeme.com/ijcet/index.asp 200 editor@iaeme.com

Diabetic Risk Prediction For Women Using Bootstrap Aggregation On Back-Propagation Neural Networks The proposed model can be extended to a more general dataset for diabetes so that prediction can be done for both men and women. Bootstrap aggregation on Neural Networks gives significant improvement in results and can be used in different domains. 7. CONCLUSION In the work, we proposed a diabetic-risk prediction system using backpropagation neural network boosted by Bootstrap aggregation method to reduce the variance of prediction. A backpropagation neural network was modelled using the dataset collected from UCI machine learning repository. Features were scaled and training set were fed to the model which produced an accuracy of 87%. Bootstrap aggregation method was employed on the base classifier and results obtained were analyzed. It was found that variance of prediction of the model was successfully reduced using the bootstrap aggregation method which makes the model a better prediction system for diabetes. REFERENCES [1] David R Whiting, Leonor Guariguata, Clara Weil and Jonathan Shaw, IDF Diabetes Atlas: Global estimates of the prevalence of diabetes for 2011 and 2030, 2011. [2] Definition, diagnosis and classification of diabetes mellitus and its complications. Part 1: diagnosis and classification of diabetes mellitus. Provisional report of a WHO Consultation, pp. 17-19, (n.d.) [3] Detecting and Predicting Diabetes Using Supervised Learning: An Approach towards Better Healthcare for Women, by Aakansha Rathore, Simran Chauhan and Sakshi Gujral [4] Decision Support System for Diabetes Mellitus through machine learning techniques by Tariq A Rashid, Saman Abdulla and Rezhna Abdulla [5] Design of a hybrid system for the diabetes and heart diseases by Kahramanli, Humar, and Novruz Allahverdi [6] Boosting and Bagging of Neural Networks with applications to Financial Time Series by Zhuo Zheng [7] Stock Price Prediction Using Back Propagation Neural Network Based on Gradient Descent with Momentum and Adaptive Learning Rate by Dwiarso Utomo,Pujiono and Moch Arief Soeleman http://www.iaeme.com/ijcet/index.asp 201 editor@iaeme.com