A Rough Set Theory Approach to Diabetes

Similar documents
A Deep Learning Approach to Identify Diabetes

Application of Tree Structures of Fuzzy Classifier to Diabetes Disease Diagnosis

Keywords Missing values, Medoids, Partitioning Around Medoids, Auto Associative Neural Network classifier, Pima Indian Diabetes dataset.

Predicting Heart Attack using Fuzzy C Means Clustering Algorithm

Improved Optimization centroid in modified Kmeans cluster

ARTIFICIAL NEURAL NETWORKS TO DETECT RISK OF TYPE 2 DIABETES

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

Classification of Insulin Dependent Diabetes Mellitus Blood Glucose Level Using Support Vector Machine

Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering

Comparative Analysis of Machine Learning Algorithms for Chronic Kidney Disease Detection using Weka

Correlate gestational diabetes with juvenile diabetes using Memetic based Anytime TBCA

Classification of Lower Back Pain Disorder Using Multiple Machine Learning Techniques and Identifying Degree of Importance of Each Parameter

A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction

Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool

AN EXPERT SYSTEM FOR THE DIAGNOSIS OF DIABETIC PATIENTS USING DEEP NEURAL NETWORKS AND RECURSIVE FEATURE ELIMINATION

Phone Number:

Effective Values of Physical Features for Type-2 Diabetic and Non-diabetic Patients Classifying Case Study: Shiraz University of Medical Sciences

Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon Network

DOWNLOAD OR READ : TYPE 1 DIABETES IN CHILDREN HOW TO BECOME AN EXPERT ON YOUR OWN DIABETES PDF EBOOK EPUB MOBI

Informative Gene Selection for Leukemia Cancer Using Weighted K-Means Clustering

Predicting the Effect of Diabetes on Kidney using Classification in Tanagra

Data Mining Diabetic Databases

DIABETIC RISK PREDICTION FOR WOMEN USING BOOTSTRAP AGGREGATION ON BACK-PROPAGATION NEURAL NETWORKS

Prediction of Diabetes Using Bayesian Network

Predicting Breast Cancer Survivability Rates

International Journal of Advance Research in Computer Science and Management Studies

Australian Journal of Basic and Applied Sciences

Prediction of Diabetes Using Probability Approach

BLOOD GLUCOSE PREDICTION MODELS FOR PERSONALIZED DIABETES MANAGEMENT

A Critical Study of Classification Algorithms for LungCancer Disease Detection and Diagnosis

A Fuzzy Expert System for Heart Disease Diagnosis

An SVM-Fuzzy Expert System Design For Diabetes Risk Classification

Achieving Open-loop Insulin Delivery using ITM Designed for T1DM Patients

COMPARING THE IMPACT OF ACCURATE INPUTS ON NEURAL NETWORKS

A FUZZY LOGIC BASED CLASSIFICATION TECHNIQUE FOR CLINICAL DATASETS

Mayuri Takore 1, Prof.R.R. Shelke 2 1 ME First Yr. (CSE), 2 Assistant Professor Computer Science & Engg, Department

Analysis of Diabetic Dataset and Developing Prediction Model by using Hive and R

An Improved Algorithm To Predict Recurrence Of Breast Cancer

Research Article. Automated grading of diabetic retinopathy stages in fundus images using SVM classifer

Assistant Professor, School of Computing Science and Engineering, VIT University, Vellore, Tamil Nadu

DEVELOPMENT OF AN EXPERT SYSTEM ALGORITHM FOR DIAGNOSING CARDIOVASCULAR DISEASE USING ROUGH SET THEORY IMPLEMENTED IN MATLAB

A DATA MINING APPROACH FOR PRECISE DIAGNOSIS OF DENGUE FEVER

How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection

Predicting Juvenile Diabetes from Clinical Test Results

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures

arxiv: v1 [cs.lg] 4 Feb 2019

Empirical function attribute construction in classification learning

Enhanced Detection of Lung Cancer using Hybrid Method of Image Segmentation

Design and Study of Online Fuzzy Risk Score Analyzer for Diabetes Mellitus

The Relationship between Crime and CCTV Installation Status by Using Artificial Neural Networks

Cardiac Arrest Prediction to Prevent Code Blue Situation

Research Article. August 2017

A Feed-Forward Neural Network Model For The Accurate Prediction Of Diabetes Mellitus

A prediction model for type 2 diabetes using adaptive neuro-fuzzy interface system.

An Approach for Diabetes Detection using Data Mining Classification Techniques

Comparative Study of K-means, Gaussian Mixture Model, Fuzzy C-means algorithms for Brain Tumor Segmentation

Comparative Performance Analysis of Different Data Mining Techniques and Tools Using in Diabetic Disease

Automatic Definition of Planning Target Volume in Computer-Assisted Radiotherapy

An Artificial Intelligence based Non Invasive Blood Glucose Measurement using Urine Analysis

Effect of Feedforward Back Propagation Neural Network for Breast Tumor Classification

Segmentation of Tumor Region from Brain Mri Images Using Fuzzy C-Means Clustering And Seeded Region Growing

Dynamic Rule-based Agent

A Naïve Bayesian Classifier for Educational Qualification

PREDICTION OF DIABETES USING BACK PROPAGATION ALGORITHM

AN EXPERIMENTAL STUDY ON HYPOTHYROID USING ROTATION FOREST

Multi Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 *

Modelling and Application of Logistic Regression and Artificial Neural Networks Models

Learning and Adaptive Behavior, Part II

A Study of Methodology on Effective Feasibility Analysis Using Fuzzy & the Bayesian Network

BACKPROPOGATION NEURAL NETWORK FOR PREDICTION OF HEART DISEASE

International Journal of Engineering Research-Online A Peer Reviewed International Journal

R Jagdeesh Kanan* et al. International Journal of Pharmacy & Technology

Predictors of Youth Drug Use; using the 2014 Youth Risk Behavior Web-based Survey

MRI Image Processing Operations for Brain Tumor Detection

Classification of mammogram masses using selected texture, shape and margin features with multilayer perceptron classifier.

IDENTIFYING MOST INFLUENTIAL RISK FACTORS OF GESTATIONAL DIABETES MELLITUS USING DISCRIMINANT ANALYSIS

Voice Detection using Speech Energy Maximization and Silence Feature Normalization

Automated Prediction of Thyroid Disease using ANN

A New Monotony Advanced Decision Tree Using Graft Algorithm to Predict the Diagnosis of Diabetes Mellitus

Detection of Lung Cancer Using Backpropagation Neural Networks and Genetic Algorithm

Association Rule Mining on Type 2 Diabetes using FP-growth association rule Nandita Rane 1, Madhuri Rao 2

Evaluating Classifiers for Disease Gene Discovery

HEART DISEASE PREDICTION BY ANALYSING VARIOUS PARAMETERS USING FUZZY LOGIC

Fuzzy Cognitive Maps Approach to Identify Risk Factors of Diabetes

CLASSIFICATION OF BRAIN TUMOUR IN MRI USING PROBABILISTIC NEURAL NETWORK

LABVIEW based expert system for Detection of heart abnormalities

An Analysis of Diabetes Using Fuzzy Soft Matrix

Automated Medical Diagnosis using K-Nearest Neighbor Classification

Heart Disease Prediction System Using Data Mining and Hybrid Intelligent Techniques: A Review

Comparison of volume estimation methods for pancreatic islet cells

Predicting Breast Cancer Recurrence Using Machine Learning Techniques

Mental Health among Internet Addicts and Non Addict Peoples Behaviour

ScienceDirect. Predictive Methodology for Diabetic Data Analysis in Big Data

Effect of Adjustment on the Academic Performance of Urdu Medium Male and Female Secondary Level Students

A comparative study of emotional intelligence among management and engineering students

Modeling Type One Diabetes with a Linear Response Function Based Model

Earlier Detection of Cervical Cancer from PAP Smear Images

Analysis of Classification Algorithms towards Breast Tissue Data Set

Procedia Computer Science

Introduction. Chapter Before you start Formulation

Transcription:

, pp.50-54 http://dx.doi.org/10.14257/astl.2017.145.10 A Rough Set Theory Approach to Diabetes Shantan Sawa, Ronnie D. Caytiles* and N. Ch. S. N Iyengar** School of Computer Science and Engineering VIT University, Vellore-632014 * Multimedia Engineering department, Hannam University, Daejeon, Korea **Dept. of Information Technology, Sreenidhi Institute of Science and Technology, Hyderabad, India shantansawa@gmail.com, rdcaytiles@gmail.com, srimannarayanach@sreenidhi.edu.in Abstract. The primary objective of this paper is to develop and propose a model using the concepts from Rough Set Theory to cluster the patients in the diabetic dataset. The model to be developed incorporates Rough Clustering of the dataset, and from the clusters formed, compute the accuracy on the testing data. Rough Clustering will help splitting the data into clusters of patients that suffer from Diabetes Mellitus and the ones which do not. As a result, the patients suffering from Diabetes Mellitus will be clustered together and will provide us with the average values of the features used in the model for data clustering. The results obtained will provide more depth in the field of rough clustering for diabetes as the number of studies done on diabetes using rough set theory are few to none. Keywords: Rough Set Theory, Diabetes Mellitus, Rough Clustering 1 Introduction Type 1 diabetes is a disorder in which the pancreas can no longer produce insulin. It is also called juvenile diabetes primarily because of it s presence in both children and adults. However, statistics show that only 5% of all the type 1 cases are diagnosed in adulthood. It is sometimes referred to as insulin-dependent diabetes mellitus and has no cure. If one has it, they must take insulin to survive. According to the WHO (World Health Organization) the number of children having type 1 diabetes is very high as mentioned in the motivation. Hence it can be said that Diabetes is a serious chronic disease. Doctors usually take patients blood samples and check the sugar concentration in their blood in order to diagnose them with diabetes which takes long time Presently, there is no tool to detect if a person has type 1 diabetes and how severe the diabetes will affect him/her. Hence the need arises for some sort of efficient application to predict the onset of type 1 diabetes in patients for a quicker diagnosis for better safety. In the real world, data isn t crisp. But there exists some gray areas, some roughness among the objects in the data-set. As a result, in the real world applications, there are hardly any crisp data available. Thus, it becomes impossible to create crisp clusters from from real world data. Hence, in order to tackle this issue of lack of crispness, or involvement of roughness, in the data-set, implementing the concepts of Rough Set theory plays a significant part in developing ISSN: 2287-1233 ASTL Copyright 2017 SERSC

models that take the rough nature of the data-set into account and accordingly form clusters of the data. 2 Literature Survey/Related Work Asma Shaheen Khan, Waqas Ahmed [1] proposed an Intelligent decision support system in diabetic e health care from the perspective of elders. Tawfik Saeed Zeki et.al., [2] proposed an expert system for diabetes diagnosis. After data acquisition and designing a rule-based expert system. Narasinga Rao, M. et.al., [3] proposed classification which used Decision tree algorithm to predict class whether patient is diabetic or not. Clara Madonna L.et,al., [4]proposed a design for a Diabetic Diagnosis System using Rough Sets where the authors created a knowledge base from the existing data- set using upper and lower approximations and when a new incoming data is collected and evaluated to compute the equivalence classes. These equivalence are then compared against the knowledge, the system helped the user discern the type of diabetes. Many more works have been carried out in this direction by several authors. The motivation behind pursuing a working model in the field of T1DM is to guide the doctors and the guardians of the affected children to catch the disease in its nascent stage and based on the severity of diabetes; accordingly provide apt treatment to the affected children. RStudio is used here. The data set considered for the development analysis of the model is the Pima Indians Diabetes data set which is available online on University of California, Irvine, Machine Learning Repository. The data set has been provided by the National Institute of Diabetes and Digestive and Kidney Diseases. The data set consists of relevant information of patients of Pima Indian Heritage. There are 768 instances, with 500 testing negative for diabetes and the remaining 268 instances testing positive for diabetes. 3 Methodology The proposed system follows the following steps as shown in the Figure 1, which is the system design for the model. Once the data set is loaded on the platform, we must discern the features that needed to be selected for the entire clustering process. First, we run the correlation matrix to eliminate the possibility of attributes having similar trends Pearson s correlation coefficient methodology has been adopted for the computation of the correlation matrix. In order to interpret the values better, cut off values for the absolute coefficient have been established for better understanding. Perfect 0 - No correlation ( 0, 0.35] - Weakly correlated ( 0.35, 0.7] - Moderately correlated ( 0.7, 1) - Strongly correlated Perfect 1 - Perfectly correlated Copyright 2017 SERSC 51

Fig. 1. System design depicting the processes involved in the rough set theory model Table 1. Correlation Matrix on the PIMA Indians Data-set From the above table1, it is evident that no two attributes are strongly correlated. Hence, no attribute is dropped after the table. We continue to compute the reducts from the data-set for feature extraction to reduce the complexity during computing the clusters, at the same time, maintaining the integrity of the data-set. Unfortunately, computing a minimal is a NP-Hard problem. But there are good heuristic algorithms based on modified genetic algorithm to provide us with multiple reducts for the information system. To find the reducts, we first compute the discernibility matrix. Discernibility matrix is a n x n matrix, where n is the number of attributes in the data set. With the clusters formed, we split the data-set into 70% for training the model and 30% to validated and verify the results from the trained model. The testing set is validated on the basis of distance from the centroids of the clusters. 52 Copyright 2017 SERSC

4 Simulation and Results On the basis of correlation matrix and the feature selection process, the following attributes take precedence over the other features- Plasma Glucose Concentration, Body Mass Index and Age Fig. 2. Decision reduct 17 On selecting the reducts from the data-set and dropping the non-significant features, we proceed with rough clustering to cluster the data-set. The data set is split into 70% for training and 30% for testing and validating the data-set. For comparison purposes, we have even computed the centres implementing the classic Hard K-means clustering algorithm. The centroids for the Hard K-means algorithm were computed after 11 iterations over the data-set. The instances are split into Diabetic and Non- Diabetic clusters. The values of the centroids for the individual attributes in the reducts is listed in the figure below. Fig. 3. Cluster Means for the attributes using Hard K-Means algorithm Meanwhile, the rough clustering algorithm as proposed by P. Lingras is implemented on the 70% of the data to compute the centres for the attributes. Fig. 4. Cluster Means for the attributes using Rough K-Means algorithm on the Training set Copyright 2017 SERSC 53

Table 2. Tabulating the centre values for each individual attributes for the various methods V2 V6 V8 Hard K-means (156, 102) (34.5, 30.7) (38.1, 30.7) Rough K-means (Training) Rough K-means (Testing) (147, 105) (33.5, 30.7) (36.3, 31.9) (149, 108) (34.2, 31.1) (38.6, 31.7) Table 3. Tabulating the error percentage for centres values for each individual attributes Error % V2 V6 V8 Non-diabetic 2.86 1.3 0.63 Diabetic 1.36 2.09 6.34 5 Conclusion The Rough Clustering of the data-set was successfully trained, tested and implemented on the data set. The results obtained were above satisfactory and can be further improved by increasing the size of the data set. References 1. Shaheen, Asma. Intelligent decision support system in diabetic ehealth care. Diss. Blekinge Institute of Technology, 2009. 2. Zeki, Tawfik Saeed, et al. "An expert system for diabetes diagnosis, American Academic & Scholarly Research Journal, 4.5 (2012): 1. 3. Narasingarao, M.; Manda, R.; Sridhar, G.; Madhu, K.; Rao, A. A clinical decision support system using multilayer perceptron neural network to assess well-being in diabetes. J. Assoc. Phys. India2009, 57, 127 133. 4. LJ, Clara Madonna. "Design of a diabetic diagnosis system using rough sets, Cybernetics and Information Technologies, 13.3 (2013): 124-139. 54 Copyright 2017 SERSC