vii TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATIONS iii xi xii xiii 1 INTRODUCTION 1 1.1 CLINICAL DATA MINING 1 1.2 OBJECTIVES OF THE RESEARCH 5 1.3 LITERATURE REVIEW 6 1.3.1 Research Works Based on Data Mining Techniques 7 1.3.2 Research Works on Data Mining Using Clinical Datasets 11 1.3.3 Research Works on Heart Disease Prognosis, Diagnosis and Risk Prediction 17 1.3.4 Research Works on Diagnosis and Prediction of Hepatitis 22 1.3.5 Research Works on Correlation among Hepatitis, Heart Disease, Diabetes and Anaemia 26 1.4 ORGANISATION OF THE THESIS 31
viii CHAPTER NO. TITLE PAGE NO. 2 INTELLIGENT PREDICTIVE MODEL FOR KNOWLEDGE DISCOVERY FROM CLINICAL DATASETS 33 2.1 PRE-PROCESSING 33 2.2 PRE-MINING SUBSYSTEM 37 2.3 MINING SUBSYSTEM 38 2.4 EVALUATING SUBSYSTEM 40 2.5 KNOWLEDGE BASE 41 2.6 INFERENCE AND FORECASTING SUBSYSTEM 41 2.7 EVALUATION OF THE MODEL 42 3 NEURO FUZZY APPROACH FOR PREDICTING THE SURVIVAL OF HEPATITIS 43 3.1 HEPATITIS DATA 44 3.2 PRE-MINING SUBSYSTEM 45 3.2.1 Principal Component Analysis Technique 46 3.2.2 Fuzzy C-Means Clustering Technique 47 3.3 NEURO FUZZY CLASSIFIER 49 3.4 INFERENCE AND FORECASTING SUBSYSTEM 51 3.5 EXPERIMENTAL RESULTS 52 4 COMPARATIVE WORK FOR DISCOVERING RULES FROM HEPATITIS DATASET 55 4.1 HEPATITIS DATASET 55 4.2 PRE-MINING SUBYSTEM 57
ix CHAPTER NO. TITLE PAGE NO. 4.3 MINING SUBSYSTEM 58 4.3.1 Association Rule Mining 58 4.3.2 Neural Network 59 4.3.3 Decision Tree 62 4.4 RULE VALIDATIONSUBSYSTEM 63 4.5 INFERENCE AND FORECASTING 63 4.6 EXPERIMENTAL RESULTS 64 5 STATISTICAL APPROACH FOR PREDICTING THE PRESENCE OF HEART DISEASE 68 5.1 HEART DISEASE DATASET 69 5.2 PRE-MINING SUBSYSTEM 70 5.3 MINING SUBSYSTEM 71 5.3.1 Contingency Table Generation 72 5.3.2 Rule Generation 72 5.4 VALIDATION SUBSYSTEM 74 5.5 INFERENCE AND FORECASTING SUBSYSTEM 74 5.5.1 Classification 74 5.5.2 Weight of Evidence 76 5.5.3 Confidence Estimation 78 5.6 EXPERIMENTAL RESULTS 79 6 FUZZY NEURO-GENETIC APPROACH FOR PREDICTING THE SEVERITY OF HEART DISEASE 81 6.1 HEART DISEASE DATA 82 6.2 PRE-MINING SUBSYSTEM 84
x CHAPTER NO. TITLE PAGE NO. 6.3 MINING SUBSYSTEM 86 6.3.1 Training 86 6.3.2 Rule Selection 89 6.4 VALIDATION SUBSYSTEM 90 6.5 KNOWLEDGE BASE 90 6.6 INFERENCE AND FORECASTING SUBSYSTEM 91 6.7 EXPERIMENTAL RESULTS 91 7 CONCLUSIONS AND FUTURE WORKS 96 7.1 CONCLUSION 96 7.2 FUTURE WORK 99 REFERENCES 101 LIST OF PUBLICATIONS 112 VITAE 113
xi LIST OF TABLES TABLE NO. TITLE PAGE NO. 3.1 Dataset Description (Hepatitis) 45 3.2 Contingency Table for best run 52 3.3 Contingency Table for average run 53 3.4 Contingency Table for worst run 53 3.5 Performance Measures 53 4.1 Illustrative Time-Series Hepatitis Data 57 4.2 Attributes and their variations over time 58 4.3 Number of rules generated 64 4.4 Confusion Matrix (Neural Network) 66 4.5 Confusion Matrix (Decision Tree) 66 4.6 Performance Measure of Intelligent Rule Miner 67 5.1 Hungarian Dataset Description 70 5.2 Contingency table for sex and chest pain type 72 5.3 Heart Disease Data 79 5.4 Contingency Table (Bayesian Classifier) 80 5.5 Performance Measures 80 6.1 Description of Heart Disease Database 83 6.2 Explicatory Rules 93 6.3 Contingency Table (Heart Disease) 94 6.4 Comparison of Classification Accuracy for Cleveland heart data 94 7.1 Comparison of Classification Accuracy for Hepatitis Data 97 7.2 Comparison of Classification Accuracy for Heart Disease Data 98
xii LIST OF FIGURES FIGURE NO. TITLE PAGE NO. 2.1 Model for Knowledge Discovery 34 3.1 Model Tailored Using Neuro-Fuzzy Inferencing Technique for Predicting Survival of Hepatitis 44 3.2 Neuro-Fuzzy Classifier 50 4.1 Model Tailored Using Association Rule Mining, Neural Network and Decision Tree to Predict Hepatitis 56 4.2 Network Architecture 59 4.3 Decision Tree 62 4.4 Histogram 65 4.5 Error Rate for Training 65 5.1 Model Tailored Using Statistical Classifier to Predict Heart Disease 69 5.2 Explicatory Rules 79 6.1 Model Tailored Using Fuzzy Neuro-Genetic Technique for Predicting the Severity of Heart Disease 82 6.2 Neural Network 89 6.3 Run Time Analysis 92
xiii LIST OF SYMBOLS AND ABBREVIATIONS ANFIS ATP ALB ALK AMP ANN BMI CVD CTM CHE CANFIS CHF CSFNN CABG CAD CHD DAC ECG ESRD EPO FSS FACO FCM FL FNN - Adaptive Neuro Fuzzy Inference System - Adult Treatment Panel - Albumin - Alkaline - Anemia Management Protocol - Artificial Neural Networks - Body Mass Index - Cardiovascular Diseases - Central Tendency Measure - Cholinesterase - Co-Active Neuro-Fuzzy Inference System - Congestive Heart Failure - Conic Section Function Neural Network - Coronary Artery Bypass Graft Surgery - Coronary Artery Disease - Coronary Heart Disease - Direct Adaptive Controller - Electro Cardio Graph - End Stage Renal Disease - Erythropoeitin - Feature Subset Selection - Fuzzy based Ant Colony Algorithm - Fuzzy C-Means Clustering - Fuzzy Logic - Fuzzy Neural Network
xiv FRBCS - Fuzzy Rule Based Classifier System GRNN - Generalized Regression Neural Network GA - Genetic Algorithms GOT - Glutamic-Oxaloacetic Transminase GPT - Glutamic-Pyruvic Transminase HGB - Hemoglobin HBV - Hepatitis B Virus HCV - Hepatitis C Virus HDV - Hepatitis D Virus HEMR - Hepatitis Electronic Medical Record System HDL-C - High Density Lipoprotein Cholesterol HOMA-IR - Homeostasis Model Assessment of Insulin Resistance HIV - Human Immuodeficiency Virus IHDPS - Intelligent Heart Disease Prediction System LEM - Learning From Examples LVQ - Learning Vector Quantization LDL - Low Density Lipoprotein UCS - Michigan-style Learning Classifier System MICD - Minimum Inter Class Distance Classifier MPC - Model Predictive Controller MLP - Multi Layer Perceptron MI - Myocardial Infarction NLCS - Neural Based Learning Classifier System NeC4.5 - Neural Ensemble based C4.5 NN - Neural Networks PD - Pattern Discovery PCI - Percutaneous Coronary Intervention PTDM - Post Transplant Diabetes Mellitus PCA - Principle Component Analysis
xv PNN TP RBF RFNN RNA SGOT SGPT SNP SARSA SRNN SQL SVM TTT T-BIL T-CHO TRF TSAT UCI WHO ZTT - Probabilistic Neural Network - Protein Total - Radial Basis Function - Recurrent Fuzzy Neural Network - Ribo-Nucleic Acid - Serum Glutamic-Ocaloacetic Transminase - Serum Glutamic-Pyruvic Transminase - Single Nucleotide Polymorphism - State-Action-Reward-State-Action - State-space Recurrent Neural Networks - Structured Query Language - Support Vector Machine - Thymol Turbidity Test - Total Bilirubin - Total Cholesterol - Total Risk Factor - Transferin Saturation - University of California, Irvine - World Health Organization - Zinc Sulphate Turbidity Test