SIGNIFICANCE OF INTEGRATED TAXONOMY APPROACH IN DIVERSE LIVER CHAOSES. Presidency College, Chennai, India,

Similar documents
INVESTIGATING ILPD FOR MOST SIGNIFICANT FEATURES

Evaluating the Behavioral and Developmental Interventions for Autism Spectrum Disorder

Predicting Heart Attack using Fuzzy C Means Clustering Algorithm

Association Rule Mining on Type 2 Diabetes using FP-growth association rule Nandita Rane 1, Madhuri Rao 2

Multi Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 *

Effective Values of Physical Features for Type-2 Diabetic and Non-diabetic Patients Classifying Case Study: Shiraz University of Medical Sciences

Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool

Weighted Naive Bayes Classifier: A Predictive Model for Breast Cancer Detection

Effective Diagnosis of Alzheimer s Disease by means of Association Rules

Survey on Data Mining Techniques for Diagnosis and Prognosis of Breast Cancer

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India

A Comparative Study on Brain Tumor Analysis Using Image Mining Techniques

ABNORMAL LIVER FUNCTION TESTS. Dr Uthayanan Chelvaratnam Hepatology Consultant North Bristol NHS Trust

Data Mining in Bioinformatics Day 4: Text Mining

An SVM-Fuzzy Expert System Design For Diabetes Risk Classification

Primary Level Classification of Brain Tumor using PCA and PNN

An Improved Algorithm To Predict Recurrence Of Breast Cancer

Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon Network

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods

A Fuzzy Improved Neural based Soft Computing Approach for Pest Disease Prediction

Exploratory Quantitative Contrast Set Mining: A Discretization Approach

Investigating and Referring Incidental Findings of Abnormal Liver Tests

FUZZY DATA MINING FOR HEART DISEASE DIAGNOSIS

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

Contrasting the Contrast Sets: An Alternative Approach

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

Clustering of MRI Images of Brain for the Detection of Brain Tumor Using Pixel Density Self Organizing Map (SOM)

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: 1.852

Diagnosis of Coronary Arteries Stenosis Using Data Mining

A NOVEL VARIABLE SELECTION METHOD BASED ON FREQUENT PATTERN TREE FOR REAL-TIME TRAFFIC ACCIDENT RISK PREDICTION

Predicting the Effect of Diabetes on Kidney using Classification in Tanagra

Evaluation on liver fibrosis stages using the k-means clustering algorithm

A DATA MINING APPROACH FOR PRECISE DIAGNOSIS OF DENGUE FEVER

Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data. Technical Report

DIABETIC RISK PREDICTION FOR WOMEN USING BOOTSTRAP AGGREGATION ON BACK-PROPAGATION NEURAL NETWORKS

Review of Chronic Kidney Disease based on Data Mining Techniques

CANCER DIAGNOSIS USING DATA MINING TECHNOLOGY

Why to biopsy? Indications for liver biopsy in common medical liver diseases- how are they changing?

Jyotismita Talukdar, Dr. Sanjib Kr. Kalita

Data Mining and Knowledge Discovery: Practice Notes

Automated Prediction of Thyroid Disease using ANN

Improved Intelligent Classification Technique Based On Support Vector Machines

Evaluating Classifiers for Disease Gene Discovery

EXTRACT THE BREAST CANCER IN MAMMOGRAM IMAGES

BACKPROPOGATION NEURAL NETWORK FOR PREDICTION OF HEART DISEASE

ADVANCES in NATURAL and APPLIED SCIENCES

Learning with Rare Cases and Small Disjuncts

Lung Region Segmentation using Artificial Neural Network Hopfield Model for Cancer Diagnosis in Thorax CT Images

Segmentation of Tumor Region from Brain Mri Images Using Fuzzy C-Means Clustering And Seeded Region Growing

MRI Image Processing Operations for Brain Tumor Detection

Unsupervised MRI Brain Tumor Detection Techniques with Morphological Operations

Detection of Lung Cancer Using Backpropagation Neural Networks and Genetic Algorithm

Predicting Breast Cancer Survivability Rates

CLASSIFICATION OF BRAIN TUMOUR IN MRI USING PROBABILISTIC NEURAL NETWORK

KNOWLEDGE DISCOVERY FROM MINING ASSOCIATION RULES FOR HEART DISEASE PREDICTION

CLASSIFICATION OF BREAST CANCER INTO BENIGN AND MALIGNANT USING SUPPORT VECTOR MACHINES

A Survey on Prediction of Diabetes Using Data Mining Technique

CHAPTER - 7 FUZZY LOGIC IN DATA MINING

Transient elastography in chronic liver diseases of other etiologies

Classification of Smoking Status: The Case of Turkey

Discovery of Strong Association Rules for Attributes from Data for Program of All-Inclusive Care for the Elderly (PACE)

Implementation of Inference Engine in Adaptive Neuro Fuzzy Inference System to Predict and Control the Sugar Level in Diabetic Patient

PO Box 19015, Arlington, TX {ramirez, 5323 Harry Hines Boulevard, Dallas, TX

Detection and Classification of Lung Cancer Using Artificial Neural Network

A COMPARATIVE STUDY OF DIAGNOSING LIVER DISORDER DISEASE USING CLASSIFICATION ALGORITHM

A FUZZY LOGIC BASED CLASSIFICATION TECHNIQUE FOR CLINICAL DATASETS

Heterogeneous Data Mining for Brain Disorder Identification. Bokai Cao 04/07/2015

CANCER DIAGNOSIS USING NAIVE BAYES ALGORITHM

A hybrid Model to Estimate Cirrhosis Using Laboratory Testsand Multilayer Perceptron (MLP) Neural Networks

Extraction and Identification of Tumor Regions from MRI using Zernike Moments and SVM

Keywords Missing values, Medoids, Partitioning Around Medoids, Auto Associative Neural Network classifier, Pima Indian Diabetes dataset.

Challenges of Automated Machine Learning on Causal Impact Analytics for Policy Evaluation

A study on Feature Selection Methods in Medical Decision Support Systems

Brain Tumor Segmentation Based On a Various Classification Algorithm

AN EXPERIMENTAL STUDY ON HYPOTHYROID USING ROTATION FOREST

Predictive performance and discrimination in unbalanced classification

Lung Tumour Detection by Applying Watershed Method

Prediction of Diabetes Using Bayesian Network

Automatic Classification of Breast Masses for Diagnosis of Breast Cancer in Digital Mammograms using Neural Network

Disclosure. Objectives. Smash the Nash: A practical approach to fatty liver disease

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 3 Issue 2, February

Research Article. Automated grading of diabetic retinopathy stages in fundus images using SVM classifer

Computer-Aided Quantitative Analysis of Liver using Ultrasound Images

A Survey on Brain Tumor Detection Technique

Brain Tumor segmentation and classification using Fcm and support vector machine

American Journal of Oral Medicine and Radiology

Fuzzy Decision Tree FID

LUNG NODULE SEGMENTATION FOR COMPUTER AIDED DIAGNOSIS

Disease predictive, best drug: big data implementation of drug query with disease prediction, side effects & feedback analysis

Statistical Analysis Using Machine Learning Approach for Multiple Imputation of Missing Data

A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction

DPPred: An Effective Prediction Framework with Concise Discriminative Patterns

I. INTRODUCTION III. OVERALL DESIGN

EVALUATION OF ABNORMAL LIVER TESTS

PCA Enhanced Kalman Filter for ECG Denoising

R Jagdeesh Kanan* et al. International Journal of Pharmacy & Technology

AUTOMATIC BRAIN TUMOR DETECTION AND CLASSIFICATION USING SVM CLASSIFIER

PREPROCESSING AND GENERATION OF ASSOCIATION RULES FOR PREDICTION OF ACUTE MYELOID LEUKEMIA FROM BONE MARROW DATA

International Journal of Pharma and Bio Sciences A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS ABSTRACT

Transcription:

INTERNATIONAL International Journal of Computer JOURNAL Engineering OF COMPUTER and Technology (IJCET), ENGINEERING ISSN 0976 6367(Print), ISSN 0976 6375(Online) & TECHNOLOGY Volume 3, Issue 3, (IJCET) October-December (2012), IAEME ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume 3, Issue 3, October - December (2012), pp. 265-272 IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2012): 3.9580 (Calculated by GISI) www.jifactor.com IJCET I A E M E SIGNIFICANCE OF INTEGRATED TAXONOMY APPROACH IN DIVERSE LIVER CHAOSES ABSTRACT A.S.Aneeshkumar 1, Dr. C. Jothi Venkateswaran 2 1 Research Scholar, PG & Research Dept. of Computer Science, Presidency College, Chennai, India, aneesh_kumar777@yahoo.com 2 Research Supervisor &Dean, Dept. of Computer Science & Applications, Presidency College, Chennai, India, jothivenkateswaran@yahoo.co.in Liver chaos is an associated factor with other serious diseases like cancer, kidney failure and cardio vascular problems. The preliminary stage of liver dysfunction can be classified as three, which are Hepatitis, Alcoholic Fatty Liver Disease (AFLD) and Nonalcoholic Fatty Liver Disease (NAFLD). In this paper we are applying various integrated classification models of Data Mining for the evaluation of performance. Keywords: Associative Classification, Classification Based on Association, Classification based on Multiple Association Rules, Classification based on Predictive Association Rules, First Order Inductive Learner, Preprocessing. I. INTRODUCTION Many health care organizations having large number of health related data without proper analysis or storage. In a successful organization, it is import to empower the staff and management with data warehousing based on critical thinking and knowledge management tools for strategic decision making [1]. Medical diagnosis is regarded as an important yet complicated task that needs to be executed accurately and efficiently. Regrettably all doctors do not possess expertise in every subspecialty fields and moreover there is a shortage of resource persons in certain places. Therefore, an automatic medical diagnosis system would probably be exceedingly beneficial by bringing all of them together [2]. Determining the early stages of liver disease is better to avoid other consequences. The common liver related problems seen in Indian society are Hepatitis (Viral attack), Alcoholic Fatty Liver Disease (AFLD) and Non-alcoholic Fatty Liver Disease (NAFLD). Hepatitis can be broadly classified as HAV, HBV, HCV, HDV, and HEV. The first three are 265

very common in anywhere, but remaining two is seen very rarely and therefore such data are not available for this study. Normally, it is difficult to identify the exact reason of Liver dysfunction in some extend without further examinations and so in this work we are developing a model to predict the accurate type of disease without false assumptions. Human civilization since the time of immemorial to today, the alcohol is ubiquitous, with constantly changing patterns of alcohol intake around the world [3]. High level of continuous alcoholic consumption will lead to AFLD. Modern life style and lack of exercise are associated with an increased prevalence of diabetes mellitus (DM), Obesity, Hypertension, hyper triglycerides and which are calculated as major reasons of NAFLD. Hepatitis A may be spread primarily through food or water from an infected person and others are through sexual contact with an infected person, injection of drugs, infected women to infants and blood transfer. In some cases toxic drug effects will also be the factor of deviated results of the liver function tests and theses are seen in rare cases. Some hepatitis is considered to have autoimmune hepatitis, on the basis of combination of psychological and histological features, clinical and biochemical features as described by the International Autoimmune Hepatitis Group Report [4]. Radiologic imaging with Ultra-Sonography (US), Computed Tomography (CT) or Magnetic Resonance Imaging, used either singly or in combination, for detection of adequate threshold of fatty infiltrations in the liver. Then in the absence of obvious contraindications, a liver biopsy should be conducted in patients with unexplained sustained abnormalities of the liver biochemistry [5]. The greatest benefit of liver biopsy in unexplained abnormal LFT is to cover progressive liver disease either directly or through prompting a further diagnostic analysis. II. DATA DESCRIPTION The collected data for this study from a reputed hospital consists of all symptoms, doctor s observation and liver biochemistry of the patients, where in addition to Liver Function Test (LFT), Triglycerides report also included for this study. Symptoms of Liver problems are uncertain in most cases. So we neglect such items and the following factors are the attributes for this work. TABLE 1. Biochemical Markers of LFT Test Items Lower Upper Limit Limit Bilirubin(D) 0.1 0.4 Bilirubin(T) 0.3 1.3 S.G.O.T 12 38 S.G.P.T. 7 41 Alkaline Phosphatase 80 290 Gamma GT 11/ 7 50/32 Total Proteins 6.7 8.6 Albumin 4.1 5.3 266

Globulins 2.0 3.5 TABLE 2. Other dependent factors HBAIC Yes No Obesity Yes No BP Yes No Triglycerides Yes No Alcoholic consumption Yes No III. METHODOLOGY The combination of knowledge discovery, information retrieval, deductive learning and exploratory data analysis can be called as Data mining [6]. Different algorithms are involved in it to extract models or patterns for particular prediction. The word Association in Mining is finding frequent item sets and discovery of association or correlation among such frequent items in a huge transaction. Another data analysis method called Classification is used to identify some important entities for prediction, where it extracts different models for better grouping of similar data types. In recent years the data mining researchers proposed a new integrated predictive approach for better accuracy is known as associative classification (AC), which combines associative mining and classification. According to most of the studies, we can see that, it has better accuracy and rule evaluation when compare to other classifications and rule based classification techniques. Because some rules used in AC will not appear in basic classification techniques. AC follows the method of classification in associative rule evaluation and the major application of this is in medical and engineering field. Medical practitioners are always following an inherent approach of combinational symptom and report based diagnosis for disease identification. Normally AC performs its analysis in relational database and each entity consists of attributes like,,, and one of them will consider as Class label or Predictive attribute [7]. Rule generation, rule pruning and classification are the three steps of AC. The first AC algorithm introduced as Classification Based on Association (CBA) and it is based on the Apriori algorithm, which gives Class Associated Rule (CAR). These rules will pruned and then select most appropriate rules from it, for further classification. The CBA will mines all CARs and produce a classification from generated CARs, then it calculate the accuracy of the Association Classifier [8]. Multiple scan features of Apriori produce large number of rules and which maintain large computational time. Classification based on Multiple Association Rules (CMAR) and Classifications based on Predictive Association Rules (CPAR) are introduced later as the extension of CBA. The CMAR implements FP-growth algorithm instead of Apriori for the generation of frequent item sets [9]. It first generate all the association rules with certain support and confidence thresholds and then selects a small set of rules from them to form a classifier. Then after, it identifies the class label for the best rule with best confidence. When compare 267

with previous classification it uses multiple rules in prediction, with the help of weighted and so CMAR provides more accuracy in classification. Classification based on Predictive Association Rules combines the advantages of both associative classification and traditional rule-based classification to attain maximum accuracy. It extends FP growth with interestingness and overlapping relationships in formation of rules. CPAR generates a smaller set of rules, with higher quality and lower redundancy in comparison with other associative classification techniques. So CPAR is much efficient in case of rule generation, prediction and accuracy. More over than that CPAR generates a small set of predictive rules directly from the dataset based on the rule prediction and coverage analysis with lesser time, which is opposite to generating candidate rules like in other Associative Classifications, where datasets contain a large number of rows and columns. So they needs more time to build the model. It follows dynamic programming pattern to avoid repeated calculation in rule generation and select the entire close to the best literals instead of selecting only the best literal only, so the important rules will not lost. IV. DATA PREPROCESSING To avoid certain inappropriate, duplicate, unrelated and missing fields it is necessary to implement preprocessing techniques. In our approach, we used dimensionality reduction to avoid patient s psychodynamic factors and symptoms, because it may show a discrepancy according to the type and variation of Liver disorder. In our previous work, we used Principal Component Analysis (PCA) for this reduction. It searches for -dimensional orthogonal vectors, which can bitterly represent the data, where. the original data are thus projected onto a much smaller space, while it results dimensionality reduction. PCA always combines the essence of attributes by creating an alternative, smaller set of variables [10]. V. MODEL BUILDING In this paper, we build a model to evaluate the performances of four algorithms, where three are incorporated. Three separate and consolidate datasets are used here for this model construction. 5.1. FOIL First Order Inductive Learner is a sequential covering algorithm which learns firstorder logic rules. The tuples of the class for learning rules are called positive tuples, where the remaining is called as negative tuples. The number of ( ) tuples covered by the rule 1 is represented as ( ) and ( ) be the number of ( ) tuples covered by. Basic idea of CPAR observed the concept of FOIL [11]. The information gain of rules which provide high accuracy and maximum positive tuples are _ = + 2 + 268

which build rules to distinguish positive tuples from negative tuples. In cases of multiple classes, FOIL is applied to each class, where if a rule generates, the positive sample it covers, will be removed until all positive tuples in the data set are covered. 5.2. CBA CBA uses a heuristic method to construct the classifier, unlike Apriori with the satisfaction of minimum support and minimum confidence level, where the rules are organized according to decreasing precedence based on their confidence and support [12]. If a set of rule with same antecedent, then the rule with the highest confidence is selected to represent the set. When classifying a new tuple, the first rule satisfying the tuple is used to classify it. The classifier also contains a default rule, having lowest precedence, which specifies a default class for any new tuple that is not satisfied by any other rule in the classifier [13]. The set of these rules make a decision set for this classifier. 5.3. CMAR CMAR uses an enhanced FP-tree that maintains the distribution of class labels among tuples, which satisfying each frequent itemset, and it combine rule generation together with frequent itemset mining in a single step. For an example 1 and 2 are two rules, where antecedent of 1 is more general than that of 2 and ( 1) ( 2), then 2 is pruned. That defines the rules with low confidence can be pruned is a more generalized version with higher confidence exists [14]. 5.4. CPAR CPAR relaxes the FOIL application of removing positive samples by allowing the covered tuples to remain there, but it reduces the weight. The resulting rule from each class in this manner will merge to form the classifier rule set. But it produces fewer rules than other CBA and CMAR. For an example, (1) (2) (3) (2) (4) (1) (2) (1) (3) is generated for the following four rules. 1=( =1, =2, =3) 2=( =1, =2, =4) 3=( =1, =1, =2, =2) 4=( =1, =1, =2, =1, =3 ) 269

VI. ANALYSIS AND RESULTS As whispered earlier, because of ambiguity, we applied data reduction technique in preprocessing stage to reduce the dimensions. Then the attributes collected from the patients are shown in Table 1 and 2 for this work in addition to a class diagnosis. In Table 1, we can see the accuracy / Time of the classifier, where the full data set is used as a training set to build the model and again test it with the developed replica. First we applied these algorithms to identify viral infections of liver from the general category. In the second step we applied another set of data for classification, which related to the details of AFLD and the third dataset of NAFLD performs with lesser time. Finally we applied the combination of these three data sets to evaluate the performance. According to these results the average time taken by each method is same in Hepatitis data and the highest accuracy given by CPAR only. In case of AFLD data, four algorithms share a common time of 0.02 seconds with almost same and fine accuracy. The performance of CMAR is somewhat better here. In the next step also CPAR having better performance with 99.0 / 0.0 rate. I Finally in case of combined data, when compare to other three FOIL shows poor performance in both cases of accuracy and time, where CPAR completed within 0.02 seconds of time with good result. Figure 1 represents the performance of all these algorithms. TABLE 3. Evaluation of algorithms DATA TYPES FOIL CBA CMAR CPAR Hepatitis 88.26 / 0.04 91.70 / 0.05 91.98 / 0.04 92.0 / 0.04 AFLD 98.08 / 0.02 98.09 / 0.02 98.40 / 0.02 98.10 / 0.02 NAFLD 98.38 / 0.01 98.58 / 0.00 98.82 / 0.01 99.0 / 0.0 Consolidate 94.0 / 0.04 97.27 / 0.04 97.28 / 0.02 97.62 / 0.02 CPAR FOIL 100 98 96 94 92 90 88 86 84 82 CBA CMAR Hepatitis infection ALD NALD Consolidate Fig 1. Rule quality performance 270

VII. CONCLUSION The observed factors used for this model are very important for liver disease prevalence. Most of the liver problems are presented because of the life style and behaviour of the peoples. In this analysis accuracy shows the quality of the rule. The integration of multiple rules gives more accurate results. As the result the overall performance of CPAR is better than other classifiers. So this approach can be adapted to other application field like medical, insurance, banking, recruitment and psychology for better prediction in the field of their expected results. REFERENCES Journal Papers: [1] Shantakumar B. Patil and Y.S.Kumaraswamy, Intelligent and Effective heart Attack Prediction System Using Data Mining and Artificial Neural Network, European Journal of Scientific Research, ISSN 1450-216X Vol.31 No.4(2009), pp.642-656. [2] P.Rajeswari and G.Sophia Reena, Analysis of Liver Disorder using Data mining Algorithm, Global Journal of Computer Science and Technology, Vol.10 Issue14 (Ver.10) November 2010, Page no.48. [3] Das SK, Balakrishnan V and Vasudevan DM, Alcohol: Its health and social impact in India, Natl Med J India 2006; 19:94-9. Report: [4] International Autoimmune Hepatitis Group Report, J Hepatol 1999: 31:929-938. Journal Papers: [5] Maeve M.Skelly, Peter D.James and Stephen D. Ryder, Findings on Liver biopsy to investgate abnormal liver function tests in the absence of diagnostic serology, Elsevier Journal of Hepatology 35(2001) 195-199. [6] Huda Yasin, Tahseen A. Jilani and Madiha Danish, Hepatitis-C Classification using Data mining Techniques, International Joural of Computer Applications (0957-8887), Volume 24-No.3, June 2011. [7] Sunita Soni and O.P.Vyas, Using Associative Classifiers for Predictive Analysis n Health Care Data Mining, International Journal of Computer Applications(0975-8887), Volume 4- No.5, July 2010. [8] Asha.T, S.Natarajan and K.N.B. Murthy, A Study of Associative Classifiers with Different Rule Evaluation Measures for Tuberculosis Prediction, IJCA Special Issue on Artificial Intelligence Techniques Novel Approaches & Practical Applications, AIT-2011. Proceedings Papers: [9] Xiaoxin Yin and Jiawei Han, CPAR: Classification based on Predictive Association Rules, Proceedings of the SIAM International Conference on Data Mining. San Francisco, CA: SIAM Press, 2003, pp:369-376. 271

Books: [10] Jiawei Han and Micheline Kamber, Data Mining Concepts and Techniques, Published by Elsevier, second edition 2006. Proceedings Papers: [11] J. R. Quinlan and R. M. Cameron-Jones, FOIL: A Midterm Report, Proceedings of European Conference in Machine Learning, Vienna, Austria, 1993. [12] Liu B., Hsu W., Ma Y., Integrating Classification and Association Rule mining, Proceedings of KDD, New York, pp.80-86, 1998. Journal Papers: [13] Z. Tang and Q. Liao, A new Class-based Associative Classification Algorithm, International Journal of Applied Mathematics, 2007 Proceedings Papers: [14] Li W., Han J., Pie J., Accurate and Efficient Classification Based on Multiple Class- Association Rules, proceedings of IEEE International Conference on Data Mining, Washington DC, USA, pp. 369-380, 2001. 272