International Journal of Computer Engineering & Technology (IJCET) Volume 9, Issue 4, July-Aug 2018, pp. 196-201, Article IJCET_09_04_021 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=9&itype=4 Journal Impact Factor (2016): 9.3590(Calculated by GISI) www.jifactor.com ISSN Print: 0976-6367 and ISSN Online: 0976 6375 IAEME Publication DIABETIC RISK PREDICTION FOR WOMEN USING BOOTSTRAP AGGREGATION ON BACK-PROPAGATION NEURAL NETWORKS Alan Jacob, Ananthakrishnan D.S., Jishnu Prakash K, Karishma Elsa Johns Department of Computer Science and Engineering, T.K.M. College of Engineering, Kerala ABSTRACT The greatest challenge to current health care is the rapid growth of diabetes. This paper helps in predicting diabetes by using bootstrap aggregation with backpropagation neural network. Backpropagation is a method used in artificial neural network to calculate the error contribution of each neuron after a batch of data is processed. Bootstrap aggregation is an ensemble method which combines the predictions from multiple neural networks together to make more accurate predictions than any individual model. The dataset used is collected from UCI machine learning repository which contains information of persons with and without diabetics. Python scikit-learn library was used for designing the neural network and for implementing bootstrap aggregation. Results with greater accuracy have been obtained. Key words: Diabetes, Bootstrap aggregation, neural networks, Backpropagation. Cite this Article: Alan Jacob, Ananthakrishnan D.S., Jishnu Prakash K, Karishma Elsa Johns, Diabetic Risk Prediction For Women Using Bootstrap Aggregation On Back- Propagation Neural Networks. International Journal of Computer Engineering & Technology, 9(4), 2018, pp. 196-201. http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=9&itype=4 1. INTRODUCTION Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. Machine learning works effectively in the presence of huge data. Medical science is yielding large amount of data daily from research and development (R&D), physicians and clinics, patients, caregivers etc. These data can be used in synchronizing the information and using it to improve healthcare infrastructure and treatments. This has potential to help so many people, to save lives and money. With 50.8 million suffering from diabetes, India continues to be the diabetes capital. And by 2030, nearly 9% of the India s population is likely to be affected from diabetes, according to a study of International Diabetes Federation [1]. Diabetes is a chronic disease caused when either pancreas does not produce enough insulin or the cells in the body do not respond properly to insulin. There are three types of diabetes - Type 1 corresponds to first condition, Type 2 corresponds to the second condition, and Gestational Diabetes is formed during pregnancy [2]. Type 1 http://www.iaeme.com/ijcet/index.asp 196 editor@iaeme.com
Diabetic Risk Prediction For Women Using Bootstrap Aggregation On Back-Propagation Neural Networks diabetes occurs when the immune system mistakenly attacks and kills the beta cells of the pancreas. About five to 10 percent of people with diabetes have type 1 diabetes. Type 1 diabetes generally develops in childhood or adolescence. Type 2 diabetes occurs when the body can t properly use the insulin that is released (called insulin insensitivity) or does not make enough insulin. About 90 per cent of people with diabetes have type 2 diabetes. Type 2 diabetes more often develops in adults, but children can also be affected. A third type of diabetes, gestational diabetes, is a temporary condition that occurs during pregnancy. It affects approximately two to four per cent of all pregnancies (in the non-aboriginal population) and involves an increased risk of developing diabetes for both mother and child. In this paper, the performance of Back-propagation Neural Networks with Bootstrap aggregation on predicting diabetes risk was tested and investigated. Bootstrapping is a process of selecting samples from original sample and using these samples for estimating various statistics or model accuracy. The dataset from the UCI machine learning repository were collected and scaled and then divided into five random sets with replacement and was fed onto modelled neural network with 4 layers. The results achieved by previous studies using Artificial Neural Networks and the results of Bootstrap Aggregation with ANN is compared. 2. LITERATURE REVIEW In the current scenario there exists many methods to predict and classify diabetes. [3] focuses on diabetic prediction using Machine Learning techniques-support Vector Machine for detection and Decision Trees for prediction where Support Vector Machine is a supervised machine learning data-set classification technique. They have constructed a hyper-plane that divides the data sets into various categories, which is at a maximal distance from the classes, during training phase. SVM technique can be extended for large data sets, in which Hyperplane is done through Kernel Formation. It is easy to implement and requires less processing time for small data sets and it also removes over fit nature of the samples. It also uses Decision tree algorithm, a supervised learning technique for prediction, obtaining a tree or graph like structure upon splitting the values based on attributes and conditions. The prediction was made by traversing from root to leaf. Usage of decision tree leads to the instability of the system even on slight variation of the input dataset, adding to the drawbacks of the system. [4] focuses on diabetes prediction and related diseases using artificial neural networks and decision tree classifiers. The artificial neural network and decision tree classifiers are used as classifiers to determine the type of treatment required for the patients and the artificial neural network is trained to forecast the blood sugar level of patients. This method also uses Self Organizing Maps (SOM) to predict possible chronic diseases for a patient with diabetes to have. Both [3] and [4] makes use of decision tree classifiers as it enables deep analysis of the problem and involves the disadvantage of high variance in output for small changes in the dataset. [5] implements an artificial neural network combined with fuzzy logic to detect diabetes. This method gives better results as fuzzy accounts for uncertainties also. Extracting rules from existing methods is not very efficient as it takes time adds to the disadvantages of the system. The method proposed in this paper holds a clear upper hand over existing models of diabetic risk prediction as it involves the usage of back propagation neural networks and bootstrap aggregation method. Bagging prevents the model from overfitting the dataset. Using an ensemble of neural networks in the system reduces variance in the output and thereby increases accuracy of prediction. The model is applicable to large datasets as it uses neural networks leading to a larger variety of application. http://www.iaeme.com/ijcet/index.asp 197 editor@iaeme.com
Alan Jacob, Ananthakrishnan D.S., Jishnu Prakash K, Karishma Elsa Johns 3. BACK-PROPAGATION ALGORITHM Back propagation, short for "backward propagation of errors", is an algorithm for supervised learning of artificial neural networks. The back propagation algorithm involves specifying a cost function then modifying the weights iteratively according to the gradient of the cost function [7]. It has the advantages of accuracy and versatility. For each hidden layer Z in j = V 0j + n i=1x i V ij Z j = f(z in j) For each output unit yk, net input is calculated as, Y in k = W 0k + j=1 n z jw jk Y k= f(y in k) as in [7] For back propagation phase, ꝭ k has to back propagate. ꝭ k=(t k y k) f (y in k) ΔW jk = ꝭ kαz j Δ W 0k = ꝭ k α ꝭ j= ( j=1 n ꝭ kw jk) f (Z in j) using ꝭ k weights and bias between input and hidden layer is updated as ΔV ij = ꝭ jαx i ΔV 0j = ꝭ jα as in [7] 4. BOOTSTRAP AGGREGATION METHOD Bootstrap Aggregation also known as Bagging[6], is a simple yet powerful ensemble method which combines the predictions from multiple neural networks together to make more accurate predictions than any individual model.it involves fitting the model, including all the potential data points, on the original training set. Training set of sizes up to the training set are generated by the replacement of the original training dataset. Data points may appear more than once and may appear not even once. By averaging across the samples, bagging effectively removes the instability of the decision rule, thus reducing the variance of the bagged prediction model than the model where we fit only one classifier to the original training set. Bootstrap Aggregation Algorithm 1. Build the model: for m=1 to M Bootstrap sample D m of size N with replacement from the original training set D with equal weight. Train a neural network G m(x) to the bootstrap sample D m 2. Predicting: For m = 1 to M: Apply Gm to the testing set DT. Classifier using I { M i=1gi (xi)/m >threshold value} 1 http://www.iaeme.com/ijcet/index.asp 198 editor@iaeme.com
Diabetic Risk Prediction For Women Using Bootstrap Aggregation On Back-Propagation Neural Networks 5. CONSTRUCTING MODEL The dataset was collected from the UCI machine learning repository. The features considered were number of pregnancies, Glucose level, Blood pressure, Insulin, BMI and age. Standard feature scaling was done to each of the training sets. The dataset consisting of 492 samples was split into training and testing dataset. Table 1 Input and Output Variables No Variables Description Value 1 Pregnancies Number of times pregnant Numeric 2 Glucose Plasma glucose concentration a 2 hour in an oral glucose tolerance Numeric test 3 Blood Pressure Diastolic blood pressure (mm Hg) Numeric 4 Insulin 2-Hour serum insulin (mu U/ml) Numeric 5 BMI Body mass index (weight in kg/ (height in m)2) Numeric 6 Age Age (years) Numeric 7 Output Class 0 - Normal group 1- Diabetic Risk group A 4 layer neural network was modelled having 6 input nodes in the input layer. The two hidden layers consists of 10 and 3 nodes respectively. The activation function used between input layer and hidden layer is tanh and logistic function was used between hidden and output layer. The neural network was trained on the training data using back-propagation algorithm for weight updation and learning rate was fixed at 0.001. Stochastic gradient-based optimizer was employed and maximum epoch limit was fixed to 1000. The training dataset is divided into 5 sets of random samples picked with replacement from the dataset. Each set has a maximum limit of 290 samples. Five back-propagation neural networks were modelled on each of the sets and the actual prediction was made on averaging the sum of predictions of the models. Figure 1 Neural Network model. Input layer consists of 6 nodes and hidden layers contains 10 and 3 nodes respectively Output layer has two nodes http://www.iaeme.com/ijcet/index.asp 199 editor@iaeme.com
Alan Jacob, Ananthakrishnan D.S., Jishnu Prakash K, Karishma Elsa Johns 6. RESULTS AND FUTURE SCOPE The neural network was trained and tested on the testing data. The gradient descent converged at 438 epochs and the resultant weights were obtained. The backpropagation neural network modelled produced an accuracy of 84% on the testing dataset. Figure 2 Confusion Matrix of single Backpropagation Neural Network. The model employed with bootstrap aggregation method with 4 estimators produced 84% accuracy and with 10 estimators produced produced an accuracy of 87% with considerable reduction in the variance of prediction of the model. Table 2 Comparison of accuracy with different number of estimators during Bootstrap Aggregation. The optimal number of estimators was fixed at 5 Bagging n=1 n=5 n=8 Base Estimator BPN Neural Network BPN Neural Network BPN Neural Network Accuracy 84% 87% 86.6% It is found that better results are obtained by using bootstrap aggregation with neural networks when compared to the commonly used artificial neural networks for diabetic risk prediction. Figure 3 Confusion matrix for Bootstrap Aggregation on Neural Network http://www.iaeme.com/ijcet/index.asp 200 editor@iaeme.com
Diabetic Risk Prediction For Women Using Bootstrap Aggregation On Back-Propagation Neural Networks The proposed model can be extended to a more general dataset for diabetes so that prediction can be done for both men and women. Bootstrap aggregation on Neural Networks gives significant improvement in results and can be used in different domains. 7. CONCLUSION In the work, we proposed a diabetic-risk prediction system using backpropagation neural network boosted by Bootstrap aggregation method to reduce the variance of prediction. A backpropagation neural network was modelled using the dataset collected from UCI machine learning repository. Features were scaled and training set were fed to the model which produced an accuracy of 87%. Bootstrap aggregation method was employed on the base classifier and results obtained were analyzed. It was found that variance of prediction of the model was successfully reduced using the bootstrap aggregation method which makes the model a better prediction system for diabetes. REFERENCES [1] David R Whiting, Leonor Guariguata, Clara Weil and Jonathan Shaw, IDF Diabetes Atlas: Global estimates of the prevalence of diabetes for 2011 and 2030, 2011. [2] Definition, diagnosis and classification of diabetes mellitus and its complications. Part 1: diagnosis and classification of diabetes mellitus. Provisional report of a WHO Consultation, pp. 17-19, (n.d.) [3] Detecting and Predicting Diabetes Using Supervised Learning: An Approach towards Better Healthcare for Women, by Aakansha Rathore, Simran Chauhan and Sakshi Gujral [4] Decision Support System for Diabetes Mellitus through machine learning techniques by Tariq A Rashid, Saman Abdulla and Rezhna Abdulla [5] Design of a hybrid system for the diabetes and heart diseases by Kahramanli, Humar, and Novruz Allahverdi [6] Boosting and Bagging of Neural Networks with applications to Financial Time Series by Zhuo Zheng [7] Stock Price Prediction Using Back Propagation Neural Network Based on Gradient Descent with Momentum and Adaptive Learning Rate by Dwiarso Utomo,Pujiono and Moch Arief Soeleman http://www.iaeme.com/ijcet/index.asp 201 editor@iaeme.com