, pp.143-147 http://dx.doi.org/10.14257/astl.2017.143.30 Application of Tree Structures of Fuzzy Classifier to Diabetes Disease Diagnosis Chang-Wook Han Department of Electrical Engineering, Dong-Eui University, 176 Eomgwangno, Busanjin-gu, Busan 47340, Korea cwhan@deu.ac.kr Abstract. Recently, computational intelligence has been successfully applied to diagnosis of many diseases. In this paper, tree structures of fuzzy classifier have been applied to diagnosis of diabetes disease that is very common and important disease. Tree structures of fuzzy classifier can reduce the size of rules by optimally placing fuzzy neurons as nodes and selecting relevant input subspaces as leaves. In the optimization stage, we considered two-step optimization where genetic algorithm (GA) develops the binary structure by optimally selecting the nodes and leaves, and then random signal-based learning (RSL) further refines the binary connections. We considered the diabetes disease dataset obtained from the UCI Machine Learning Repository Database. Keywords: Fuzzy classifier, Genetic algorithms, Diabetes disease diagnosis 1 Introduction As computational intelligence area has been dramatically developed, automatic diagnosis of disease using computational intelligence becomes more important area, and helps doctors decision. Computational intelligence has been successfully applied to diagnosis of many diseases [1]-[3]. This paper applied tree structures of fuzzy classifier [4] to diabetes disease diagnosis. Tree structures of fuzzy classifier can reduce the size of rules by optimally placing fuzzy neurons as nodes and selecting relevant input sub-spaces as leaves. In the optimization stage, we used two-step optimization where genetic algorithm (GA) [5] develops the binary structure by optimally selecting the nodes and leaves, and then random signal-based learning (RSL) [6] further refines the binary connections. To show the effectiveness of the proposed method, we considered the diabetes disease dataset obtained from the UCI Machine Learning Repository site at the University of California at Irvine. 2 Tree Structures of Fuzzy Classifier [4] This paper is a new application version of the tree structures of fuzzy controller, proposed by the author in [4], to diabetes disease diagnosis (modified as a fuzzy classifier). Therefore, the same version of tree structures of fuzzy classifier and its optimiza- ISSN: 2287-1233 ASTL Copyright 2017 SERSC
tion method in [4] were used in this paper. For this reason, all of this section directly refer to [4]. For more details about the tree structures of fuzzy classifier, please refer to [4]. AND neuron is a nonlinear logic processing element with n-inputs x [0,1] n producing an output y governed by the expression n Ti 1 y = AND(x; w) ( w s x ). where w denotes an n-dimensional vector of adjustable connections (weights). s denoting some s-norm and t standing for a t-norm. Individual inputs (coordinates of x) are combined or-wise with the corresponding weights and these results produced at the level of the individual aggregation are aggregated and-wise with the aid of the t- norm. By reverting the order of the t- and s-norms in the aggregation of the inputs, we end up with a category of OR neurons, y= OR(x; w) S ( wi t xi ). n i 1 The AND and OR neurons realize pure logic operations on the membership values. Some obvious observations hold. (i) For binary inputs and connections, the neurons transform to standard AND and OR gates. (ii) The connections close to zero (one) identify the relevant inputs in the AND (OR) neuron. (iii) The parametric flexibility is an important feature to be exploited in the design of the networks. i i (1) (2) Level 1 + * + Level 2 * * + * * * Leaves Fig. 1. Tree structures of fuzzy classifier In all experiments, we consider these triangular norms and co-norms to be a product operation (a t b=ab) and probabilistic sum (a s b=a+b-ab), respectively. Fig. 1 shows the tree structures of fuzzy classifier using logic-based fuzzy neurons being viewed here as a generic means of forming the skeleton of the logic model. In this figure, * and + of the nodes represent AND and OR neuron, respectively. In this structure, each node and leaf can select one of fuzzy neurons (AND/OR) and input sub-spaces, respectively. Obviously, tree structures of fuzzy classifier have flexible structure by allowing {0, 1} in every leaf to enhance the performance, i.e. eliminate 144 Copyright 2017 SERSC
useless connections from tree structures of fuzzy classifier, and can express any logic by selecting proper Level. To battle the problem of exponential increase of the rule, GA attempts to construct a Boolean structure of tree structures of fuzzy classifier by selecting inputs, including {0, 1}, as leaves and fuzzy neurons as nodes that shape up the tree structure, and then concentrate on the detailed optimization of the connections (weights) connected to each nodes by RSL. RSL is a kind of reinforcement learning algorithm that is very effective to find the local optimum because the candidate solution moves in a downhill direction very quickly [6]. For more detail about RSL, please refer to [6]. During GA optimization, the connections to AND and OR neuron set as zero and one, respectively, because of the characteristic of the fuzzy neurons as mentioned before. RSL refinement involves transforming binary connections into the weights in the unit interval. RSL refinement considers only the tree connections, but the eliminated connections, which occur by the leaves with the value zero or one, are not considered. This enhancement aims at further reduction in the value of the performance index. 3 Experimental Results In this paper, we consider Pima Indian Diabetes dataset (PID) available on the Machine Learning Repository site at the University of California at Irvine. PID has 768 instances taken from female patients of Pima Indian heritage. This database has 8 attributes (integer) and 2 classes as described below. Class Class 1 : normal (500) Class 2 : Pima Indian Diabetes (268) Attribute Attribute 1 : Number of times pregnant Attribute 2 : Plasma glucose concentration a 2-h in an oral glucose tolerance test Attribute 3 : Diastolic blood pressure (mm Hg) Attribute 4 : Triceps skin fold thickness (mm) Attribute 5 : 2-h serum insulin (lu/ml) Attribute 6 : Body mass index (weight in kg/(height in m)^2)) Attribute 7 : Diabetes pedigree function feature 8: 2-h serum insulin (lu/ml) Attribute 8 : Age (years) For the experiment of tree structures of fuzzy classifier, we used 3-uniformly distributed Gaussian membership function (overlap : 0.5). The number of input to each node and the number of Level (NL) were set as 2 and 4, respectively. 70% of the data was used as a training and rest 30% was used for testing. Genetic algorithm (GA) develops the binary structure by optimally selecting the nodes and leaves, and then ran- Copyright 2017 SERSC 145
dom signal-based learning (RSL) further refines the binary connections. The parameters used in this experiment are described in Table 1. Table 1. Parameter setup of the optimization Algorithm Parameter Value GA Population size 200 Generation no. 500 Crossover rate 0.9 Mutation rate 0.01 RSL Learning rate 0.01 Iteration no. 1000 To get the reasonable results, 30 independent simulations with different training and testing dataset have been performed. The average classification rate over 30 times independent simulations are described in Table 2. As shown in Table 2, tree structures of fuzzy classifier can be successfully applied to diabetes disease diagnosis. Table 2. Average classification rate over 30 times independent simulations Algorithm Average classification rate (%) Training data set Testing data set After GA 78.5 77.9 After RSL 80.1 79.8 4 Conclusions This paper applied tree structures of fuzzy classifier to diabetes disease diagnosis. Tree structures of fuzzy classifier can reduce the size of rules by optimally placing fuzzy neurons as nodes and selecting relevant input sub-spaces as leaves. In the optimization stage, we used two-step optimization where genetic algorithm develops the binary structure by optimally selecting the nodes and leaves, and then random signalbased learning further refines the binary connections. To show the effectiveness of the proposed method, we considered the diabetes disease dataset obtained from the UCI Machine Learning Repository site. As can be seen in the classification results, tree structures of fuzzy classifier can be successfully applied to diabetes disease diagnosis. 146 Copyright 2017 SERSC
References 1. Ganji, M.F., Abadeh, M.S.: A Fuzzy Classification System based on Ant Colony Optimization for Diabetes Disease Diagnosis. Expert Systems with Application, Vol. 38, No. 12 (2011) 14650-14659 2. Polat, K., Gunes, S.: An Expert System Approach based on Principal Component Analysis and Adaptive Neuro-Fuzzy Inference System to Diagnosis of Diabetes Disease. Digital Signal Processing, Vol. 17, No. 4 (2007) 702-710 3. Lee, C.S., Wang, M.H.: A Fuzzy Expert System for Diabetes Decision Support Application. IEEE Trans. Systems, Man, and Cybernetics, Part B, Vol. 41, No. 1 (2011) 139-153 4. Han, C.W.: Developing a Tree Structures of Fuzzy Controller. Advanced Science and Technology Letters, Vol. 139, (2016) 439-442 5. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA (1989) 6. Han, C.W., Park, J.I.: A Study on Hybrid Random Signal-based Learning and Its Applications. International Journal of Systems Science, Vol. 35, No. 4 (2004) 243-253 Copyright 2017 SERSC 147