INVESTIGATING ILPD FOR MOST SIGNIFICANT FEATURES

Similar documents
Pathophysiology I Liver and Biliary Disease

A Study of Support Vector Machine Algorithm for Liver Disease Diagnosis

SIGNIFICANCE OF INTEGRATED TAXONOMY APPROACH IN DIVERSE LIVER CHAOSES. Presidency College, Chennai, India,

A hybrid Model to Estimate Cirrhosis Using Laboratory Testsand Multilayer Perceptron (MLP) Neural Networks

Abnormal Liver Chemistries. Lauren Myers, MMsc. PA-C Oregon Health and Science University

Performance Analysis of Liver Disease Prediction Using Machine Learning Algorithms

Jyotish Chandra Pandey et al, Asian Journal of Pharmaceutical Technology & Innovation, 03 (16); 2016; Research Article

Clinician Blood Panel Results

Monitoring Hepatitis C

Complete Medical History

Understanding Blood Tests

Comparative Study of the Serum Bilirubin and Various Other Liver Related Enzymes in Different Types of Jaundice

Clinician Blood Panel Results

Adams Memorial Hospital Decatur, Indiana EXPLANATION OF LABORATORY TESTS

Biochemical Investigations in Liver Disease. Dr Roshitha de Silva Department of Pathology Faculty of Medicine University of Kelaniya

Multi Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 *

Clinical enzymology. University of Babylon College of pharmacy Second semester - biochemistry 3 rd class By Dr. Abdulhussien M. K.

Predicting the Effect of Diabetes on Kidney using Classification in Tanagra

Multiphasic Blood Analysis

Analysis of Classification Algorithms towards Breast Tissue Data Set

Hepatitis C January 26, 2018

A study of effect of alcohol on liver function tests (LFT) in Garhwal hills, India

AN EXPERT SYSTEM FOR THE DIAGNOSIS OF DIABETIC PATIENTS USING DEEP NEURAL NETWORKS AND RECURSIVE FEATURE ELIMINATION

Protein & Enzyme Lab (BBT 314)

ROUTINE LAB STUDIES. Routine Clinic Lab Studies

Community health day. General Robert H. Reed Recreation Center 800 Gabreski Lane, Myrtle Beach Friday, May 11 7:30-10:30 a.m.

A Deep Learning Approach to Identify Diabetes

DIABETIC RISK PREDICTION FOR WOMEN USING BOOTSTRAP AGGREGATION ON BACK-PROPAGATION NEURAL NETWORKS

BIOCHEMICAL REPORT. Parameters Unit Finding Normal Value. Lipase U/L Amylase U/L

Clinician Blood Panel Results

A DATA MINING APPROACH FOR PRECISE DIAGNOSIS OF DENGUE FEVER

An Improved Algorithm To Predict Recurrence Of Breast Cancer

Interpreting Liver Function Tests

Experiment 6. Determination of the enzyme ALT or SGPT activity in serum by enzymatic method using Biophotometer

Biochemistry Liver Function Tests (LFTs)

Routine Clinic Lab Studies

Effective Values of Physical Features for Type-2 Diabetic and Non-diabetic Patients Classifying Case Study: Shiraz University of Medical Sciences

Hepatitis A. Fighting liver disease

Hepatitis C: Surveillance, Case Definition, and Investigation

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT

IN THE NAME OF GOD. D r. MANIJE DEZFULI AZAD UNIVERCITY OF TEHRAN BOOALI HOSPITAL INFECTIOUS DISEASES SPECIALIST

Cardiac Arrest Prediction to Prevent Code Blue Situation

Total Cholesterol A Type of Fat. LDL "Bad" Cholesterol. HDL "Good" Cholesterol. Triglycerides Type of Fat. vldl-c Precursor to LDL Cholest

Chemistry Reference Ranges and Critical Values

Chemistry Reference Ranges and Critical Values

ABNORMAL LIVER FUNCTION TESTS. Dr Uthayanan Chelvaratnam Hepatology Consultant North Bristol NHS Trust

What are enzyme markers?

Study of Liver Function Tests in Breast Carcinoma Patients before and After Chemotherapy

How does HBV affect the liver?

Liver Function Tests

Med Chem 535P ~ Diagnostic Medicinal Chemistry. General Comments

Hepatitis implies injury to liver characterized

Biliary Atresia. Who is at risk for biliary atresia?

HEART DISEASE PREDICTION BY ANALYSING VARIOUS PARAMETERS USING FUZZY LOGIC

Hepatitis Trivia Game

A Survey on Prediction of Diabetes Using Data Mining Technique

-Liver function tests -

DIABETES AND LABORATORY TESTS. Author: Josephine Davis

Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool

JMSCR Vol 04 Issue 12 Page December 2016

La b o r at o ry Test s a n d Pr o c e d u r e s

Weight. Your weight. Body Mass Index Measure of weight to hei. Total to HDL Ratio Total Cholesterol to HDL

Weight Your weight. Body Mass Index Measure of weight to hei. Total to HDL Ratio Total Cholesterol to HDL

Liver Disease Diagnosis Based on Neural Networks

WELLNESS LABS EXPLANATION OF RESULTS BASIC METABOLIC PANEL

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

CITY AND HACKNEY CCG ABNORMAL LIVER FUNCTION TESTS (LFTs) in ADULTS

What Does My Blood Test Mean

Interpreting Your Tests

Study of Lipid Profile Changes in Cirrhosis of Liver

FORECASTING MYOCARDIAL INFARCTION USING MACHINE LEARNING ALGORITHMS

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India

KEYWORDS: Hepatitis, liver, bilirubin, SGOT, SGPT, Andrographis paniculata, sorbitol INTRODUCTION. Int J Pharm Bio Sci

Your digestive system is responsible for both mechanical and chemical

Welcome You have the power to improve your healthcare costs! The City of Cocoa Beach has partnered with Bravo, a company that works with employers lik

Classification of Thyroid Disease Using Data Mining Techniques

Diseases of liver. Dr. Mohamed. A. Mahdi 4/2/2019. Mob:

Predicting Heart Attack using Fuzzy C Means Clustering Algorithm

Clinical Chemistry (CHE221) Professor Hicks Lecture 15. Bilirubin and Cholesterol

Hepatitis Case Investigation

Evaluating Classifiers for Disease Gene Discovery

Methods of Enzyme Assay

Cirrhosis of the Liver

Observation of Deviations and Comparisons of Liver Function Test in Different Stages of Kala-Azar

Management of Acute HCV Infection

Chapter 4. M.G.Rajanandh, Department of Pharmacy Practice, SRM College of Pharmacy, SRM University.

Liver Disease. By: Michael Martins

FUZZY DATA MINING FOR HEART DISEASE DIAGNOSIS

Suspected Isoflurane Induced Hepatitis from Cross Sensitivity in a Post Transplant for Fulminant Hepatitis from Halothane.

Jaundice , The Patient Education Institute, Inc. syf80102 Last reviewed: 05/05/2017 1

Module 1 Introduction of hepatitis

Prevalence of non-alcoholic fatty liver disease in type 2 diabetes mellitus patients in a tertiary care hospital of Bihar

EXTRACT THE BREAST CANCER IN MAMMOGRAM IMAGES

Prediction of Diabetes Using Probability Approach

Alcohol-Related Liver Disease

Primary Biliary Cholangitis

Hepatitis A Case Investigation and Outbreak Response. Terrie Whitfield LPN Public Health Representative

July Hepatitis Monthly Awareness Toolkit

Anatomy Jessica Ferguson Ashley Dobos May 31, 2006 LIVER

Transcription:

International Journal of Mechanical Engineering and Technology (IJMET) Volume 8, Issue 10, October 2017, pp. 741 749, Article ID: IJMET_08_10_080 Available online at http://www.iaeme.com/ijmet/issues.asp?jtype=ijmet&vtype=8&itype=10 ISSN Print: 0976-6340 and ISSN Online: 0976-6359 IAEME Publication Scopus Indexed INVESTIGATING ILPD FOR MOST SIGNIFICANT FEATURES Jothi Lakshmi U, K.Jayanthi and M.Sathya Assistant Professor, Department of Information Technology, Veltech University, Chennai, India ABSTRACT Now a day s every human spend ample amount of his earnings for his/her health issues. As per the World Health Organization, now in India Liver disease is the tenth most common cause of death. The factors that contribute to a global spread of hepatitis B virus (HBV) and hepatitis C virus (HCV) infection are immigration, cheap air travel, and globalization [1]. While considering data mining or machine learning curse of dimensionality is a common issue that degrades the query accuracy and efficiency. This paper aims to identify vital attributes for diagnosis of liver diseases in a patient which eventually could improve future disease prediction through machine learning. This is done by exploring the dataset which includes the Indian Liver Patients data using machine learning algorithm C4.5. For this study we have used a data mining tool called Tanagra. Keywords: Machine Learning, Feature Selection, C4.5, Liver Disease, ILPD. Cite this Article: Jothi Lakshmi U, K.Jayanthi and M.Sathya, Investigating ILPD for Most Significant Features, International Journal of Mechanical Engineering and Technology 8(10), 2017, pp. 741 749. http://www.iaeme.com/ijmet/issues.asp?jtype=ijmet&vtype=8&itype=10 1. INTRODUCTION Being the largest internal organ and gland, liver plays a vital role in keeping a man brisk. Liver disease is one among several fatal disease and its symptoms are not known in earlier stage. Comprehensive causes for liver disease are immigration, cheap air travel, and globalization [1]. But to be specific alcoholic addiction, smoking, contaminated food consumption, obesity, diabetes and heredity are also major causes for liver disease [5]. Diagnosis of such disease is crucial. There are several clinical methods and procedures for diagnosis of liver disorder. Indian Liver Patients data is a publicly available dataset which is commonly used for finding new way to predict liver diseases through machine learning. Data mining in health care sector helps in the prediction of disease by looking for specific patterns in previously existing patient record. In machine learning curse of dimensionality is an issue that has negative impact on accuracy and efficiency on query. Hence reducing the dimension is relevant so as to improve performance of the classifier. http://www.iaeme.com/ijmet/index.asp 741 editor@iaeme.com

Jothi Lakshmi U, K.Jayanthi and M.Sathya C4.5 is a classifier which generates a tree based on entropy and calculated information gain and it is an extension of ID3. In this work C4.5 has been chosen for exploring the ILPD dataset. The data set was collected from north east of Andhra Pradesh, India. 2. LITERATURE SURVEY Alkaline Phosphatase Level Test An alkaline phosphatase level test (ALP test) measures the amount of alkaline phosphatase enzyme in your bloodstream. The test requires a simple blood draw and is often a routine part of other blood tests. Abnormal levels of ALP in your blood most often indicate a problem with your liver, gallbladder, or bones. Figure 1 Liver Function [2] Bilirubin Test 1. A bilirubin test is used to help determine the cause of jaundice, a yellowing of your skin and the whites of your eyes. 2. It helps diagnose conditions like liver disease, hemolytic anemia, and blocked bile ducts. 3. Normal adult values of Total bilirubin range from 0.3 1.0 mg/dl or 5.1 to 17.0 mmol/l Direct bilirubin range from 0.1 to 0.3 mg/dl or 1.0 to 5.1 mmol/l Indirect bilirubin ranges from 0.2 to 0.7 mg/dl or 3.4 to 11.9 mmol/l Albumin and liver disease Albumin is a protein made by the liver. A serum albumin test measures the amount of this protein in the clear liquid portion of the blood. Albumin can also be measured in the urine.a normal albumin range is 3.4 to 5.4 g/dl. If you have a lower albumin level, you may have malnutrition. It can also mean that you have liver disease or an inflammatory disease. Higher albumin levels may be caused by acute infections, burns, and stress from surgery or a heart attack. [9] http://www.iaeme.com/ijmet/index.asp 742 editor@iaeme.com

Investigating ILPD for Most Significant Features SGPT - Serum Glutamic Pyruvic Transaminase SGPT test measures the level of Alanine Aminotransferase (aka ALT) in your blood. It is an enzyme made by cells in your liver.as mentioned earlier, the important functions of liver includes making proteins, storing vitamins and iron removing toxins from your blood producing bile, which aids in digestion. Proteins called enzymes help the liver break down other proteins so the human body can absorb them more easily. SGPT is one of these enzymes. It plays a crucial role in metabolism, the process that turns food into energy. This enzyme is normally found inside liver cells. When the liver is damaged or inflamed, the enzyme can be released into the bloodstream. This causes the ALT levels to rise. Measuring the level of ALT in a person s blood can help doctors evaluate liver function or determine the underlying cause of a liver problem. The ALT test is often part of an initial screening for liver disease. SGOT or serum glutamic-oxaloacetic transaminase This test for the enzyme aspartate aminotransferase (AST) level in blood. AST is found in red blood cells, liver cells, and muscle cells, including the heart. It is released into the blood when these cells are damaged.the AST level is measured to check the liver, kidneys, heart, pancreas, muscles, and red blood cells. This test is also done to check medical treatments that may affect the liver. TP-Total Protein If there is some symptoms of kidney or Liver disease, then the Total Protein Test is done.this test checks for the levels of protein, specifically albumin and globulin in the blood. If total protein is abnormal, further testing must be performed to identify which specific protein is abnormally low or high so that a specific diagnosis can be made. A/G ratio This is a blood test to measure the levels of protein in your body. Your liver makes most of the proteins that are found in your blood. Albumin is one major type of protein. Albumin carries many other substances around your system, including medicines and products your body makes. Another kind of protein called globulin has other functions in your body. This test provides information about the amount of albumin you have compared with globulin. This comparison is called the A/G ratio. This test is useful when your healthcare provider suspects you have liver disease. Certain diseases tend to lower your level of albumin and raise your level of one or more types of globulins. A normal range of albumin is 39 to 51 grams per liter (g/l) of blood. The normal range for globulins varies by specific type. A normal range for total globulins is 23 to 35 g/l. C4.5 Ross Quinlan developed C4.5 algorithm in the year 1993. It is an algorithm which avoid overfitting, it can handle continuous attribute, missing data is no issue, and it converts tree to rules. Works done with ILPD So far lots of work has been done with ILPD. In [3], [4],[5],[6] different machine learning algorithm and their performance is evaluated using this dataset. http://www.iaeme.com/ijmet/index.asp 743 editor@iaeme.com

Jothi Lakshmi U, K.Jayanthi and M.Sathya Importance of Feature Selection From a dataset finding out relevance feature set is important as it may improve the performance of classifier. 3. EXPERIMENT The intension of this experiment is to find a small set of attributes by which accuracy and efficiency of classifier would be improved. For this purpose supervised learning algorithm c4.5 is executed in data mining tool Tanagra. ILPD India Liver Patients data is a dataset publicly available through UCI archive. It consists of 11 attributes viz. Age, Gender, TB, DB, AlkPhos, Sgpt, Sgot, TP, ALB, A/G ratio, and class. The number of instances in the dataset is 579 instances. All listed attributes are continues except Gender which has 2 values either M or F. The attribute class classifies the entire set either as 1 or 2. This data set contains 416 liver patient records and 167 non liver patient records. The data set was collected from north east of Andhra Pradesh, India. Selector is a class label used to divide into groups (liver patient or not). This data set contains 441 male patient records and 142 female patient records. Any patient whose age exceeded 89 is listed as being of age "90". Attribute Table 1 Attribute description Description Age Gender TB DB AlkPhos Sgpt Sgot TP ALB A/G Ratio Class Age of the patient Gender of the patient Total Billrubin Direct Billrubin Alkaline Phosphatase Alamine Aminotransferase Aspartate Aminotransferase Total Protein Albumin Albumin Globulin ratio Selector Table 2 Attribute category Attribute Target Input Age - yes Gender - yes TB - yes DB - yes AlkPhos - yes http://www.iaeme.com/ijmet/index.asp 744 editor@iaeme.com

Investigating ILPD for Most Significant Features Sgpt - yes Sgot - yes TP - yes ALB - yes A/G - yes class yes - Sampling The dataset is sampled in such a way that the 75% is taken as training set and 25 % as test set. Experiment 1: (With all attribute) Here the complete set of attribute is taken into account. Attributes Age, Gender, TB, DB, AlkPhos, Sgpt, Sgot, TP, ALB, A/B are set as input attributes and class as target attribute. Error rate 0.1475 _1_1.00 0.9154 0.1125 _2_2.00 0.6783 0.2571 _1_1.00 292 27 319 _2_2.00 37 78 115 Sum 329 105 434 Error rate 0.3517 _1_1.00 0.8316 0.3070 _2_2.00 0.3000 0.5161 _1_1.00 79 16 95 _2_2.00 35 15 50 Sum 114 31 145 Experiment 2 (with only Age and Gender) Here the two set of attribute is taken into account. Attributes Age, Gender are set as input attributes and class as target attribute. With this set up the classifiers error rate is 0.1111, recall of selector 1 is 0.9189 and of selector 2 is 0.7879. This result is a real surprise though it doesn t make any sense factually. http://www.iaeme.com/ijmet/index.asp 745 editor@iaeme.com

Jothi Lakshmi U, K.Jayanthi and M.Sathya TAINING RESULT Error rate 0.2327 _1_1.00 0.9812 0.2328 _2_2.00 0.1739 0.2308 _1_1.00 313 6 319 _2_2.00 95 20 115 Sum 408 26 434 pred_bagging_2 Error rate 0.3379 Value Recall 1-Precision _1_1.00 _2_2.00 Sum _1_1.00 0.9579 0.3309 _2_2.00 0.1000 0.4444 _1_1.00 91 4 95 _2_2.00 45 5 50 Sum 136 9 145 Experiment 3 (only with alkphos) Error rate 0.2442 _1_1.00 0.9687 0.2370 _2_2.00 0.1652 0.3448 _1_1.00 309 10 319 _2_2.00 96 19 115 Sum 405 29 434 Error rate 0.3586 http://www.iaeme.com/ijmet/index.asp 746 editor@iaeme.com

Investigating ILPD for Most Significant Features _1_1.00 0.9368 0.3407 _2_2.00 0.0800 0.6000 _1_1.00 89 6 95 _2_2.00 46 4 50 Sum 135 10 145 Experiment 4(with TB,DB,AlkPhos,Sgpt,Sgot) Error rate 0.1521 _1_1.00 0.8840 0.0932 _2_2.00 0.7478 0.3008 _1_1.00 282 37 319 _2_2.00 29 86 115 Sum 311 123 434 Error rate 0.3241 _1_1.00 0.7789 0.2600 _2_2.00 0.4800 0.4667 _1_1.00 74 21 95 _2_2.00 26 24 50 Sum 100 45 145 Experiment 5(with TP,ALB,A/G)) Error rate 0.2465 _1_1.00 0.9969 0.2500 _2_2.00 0.0783 0.1000 _1_1.00 318 1 319 _2_2.00 106 9 115 Sum 424 10 434 http://www.iaeme.com/ijmet/index.asp 747 editor@iaeme.com

Jothi Lakshmi U, K.Jayanthi and M.Sathya Error rate 0.4138 _1_1.00 0.8632 0.3643 _2_2.00 0.0600 0.8125 _1_1.00 82 13 95 _2_2.00 47 3 50 Sum 129 16 145 Experiment 5(without Age and Gender) Error rate 0.1406 _1_1.00 0.9154 0.1043 _2_2.00 0.7043 0.2500 _1_1.00 292 27 319 _2_2.00 34 81 115 Sum 326 108 434 pred_bagging_2 Error rate 0.3241 _1_1.00 0.8211 0.2778 _2_2.00 0.4000 0.4595 _1_1.00 78 17 95 _2_2.00 30 20 50 Sum 108 37 145 4. CONCLUSION In this paper the Indian Liver Patient Data is investigated for finding the significance of each feature/attribute in classifier performance. The overall result shows that each attribute has its own contribution to the classifier performance. But to be very specific contribution of TB,DB,AlkPhos,Sgpt,Sgot is higher when compare to other attributes as the error rate is lesser with the former than the later. Hence it can be concluded from the result that TB,DB,AlkPhos,Sgpt,Sgot are most significant attributes in ILPD dataset. The work done with this experiment is just a gist for the future work to be done with the ILPD dataset. http://www.iaeme.com/ijmet/index.asp 748 editor@iaeme.com

Investigating ILPD for Most Significant Features REFERENCE [1] Global challenges in liver disease. Williams R 1.PubMed.gov,NCBI [2] http://www.medicinenet.com/liver_disease/article.htm [3] Analysis of Liver Disorder Using Data mining Algorithm, P.Rajeswari1, G.Sophia Reena2, Global Journal of Computer Science and Technology Vol. 10 Issue 14 (Ver. 1.0) November 2010 [4] An Approach of Data Mining for Predicting the Chances of Liver Disease in Ectopic Pregnant Groups, A.S.Aneeshkumar, C.Jothi Venkateswaran, Special Issue of International Journal of Computer Applications (0975 8887), The International Conference on Communication, Computing and Information Technology (ICCCMIT) 2012 [5] Liver Disease Prediction using SVM and Naïve Bayes Algorithms, Dr. S. Vijayarani1, Mr.S.Dhayanand2, International Journal of Science, Engineering and Technology Research (IJSETR) Volume 4, Issue 4, April 2015, ISSN: 2278 7798 [6] Liver Disease Analysis And Accuracy Prediction Using Machine Learning Techniques, D. Sindhuja1 and R. Jemina Priyadarsini2, I J C T A, 9(26) 2016, pp. 379-384, International Science Press [7] Feature Selection in Data Mining, YongSeog Kim, W. Nick Street, and Filippo Menczer, University of Iowa, USA [8] Implementation of decision tree algorithm c4.5, 1Harvinder Chauhan, 2Anu Chauhan, International Journal of Scientific and Research Publications, Volume 3, Issue 10, October 2013, ISSN 2250-3153 [9] https://www.urmc.rochester.edu/encyclopedia/content.aspx?...167...albumin_blood [10] Er. Harpal, Dr. Gaurav Tejpal and Dr. Sonal Sharma, Machine Learning Techniques for Wormhole Attack Detection Techniques in Wireless Sensor Networks, International Journal of Mechanical Engineering and Technology 8(9), 2017, pp. 337 348 [11] Padmakumari P and Umamakeswari.A, Hybrid Statistical and Machine Learning Methods for Failure Prediction in Cloud, International Journal of Mechanical Engineering and Technology 8(8), 2017, pp. 714 719. [12] Taran Singh Bharati and R. Kumar. Intrusion Intrusion Detection System for Manet Using Machine Learning and State Transition Analysis. International Journal of Computer Engineering and Technology, 6(12), 2015, pp. 01-08. [13] C.R. Cyril Anthoni Dr. A. Christy, Integration of Feature Sets with Machine Learning Techniques for Spam Filtering, International Journal of Computer Engineering and Technology (IJCET), Volume 2 Number 1, Jan - April (2011), pp. 47-52 [14] Goverdhan Reddy Jidiga and Dr. P Sammulal, Machine Learning Approach to Anomaly Detection In Cyber Security with A Case Study of Spamming Attack, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3, May-June (2013), pp. 113-122 http://www.iaeme.com/ijmet/index.asp 749 editor@iaeme.com