Analysis on Retrospective Cardiac Disorder Using Statistical Analysis and Data Mining Techniques

Similar documents
Influencing Factors on Fertility Intention of Women University Students: Based on the Theory of Planned Behavior

Stochastic Extension of the Attention-Selection System for the icub

Journal of Applied Science and Agriculture

The Probability of Disease. William J. Long. Cambridge, MA hospital admitting door (or doctors oce, or appropriate

APPLICATION OF THE WALSH TRANSFORM IN AN INTEGRATED ALGORITHM FO R THE DETECTION OF INTERICTAL SPIKES

INDIVIDUALIZATION FEATURE OF HEAD-RELATED TRANSFER FUNCTIONS BASED ON SUBJECTIVE EVALUATION. Satoshi Yairi, Yukio Iwaya and Yôiti Suzuki

Multiscale Model of Oxygen transport in Diabetes

Interpreting Effect Sizes in Contrast Analysis

Technical and Economic Analyses of Poultry Production in the UAE: Utilizing an Evaluation of Poultry Industry Feeds and a Cross-Section Survey

Systolic and Pipelined. Processors

Prevalence and Correlates of Diabetes Mellitus Among Adult Obese Saudis in Al-Jouf Region

Alternative Methods of Insulin Sensitivity Assessment in Obese Children and Adolescents

The Egyptian Journal of Hospital Medicine (January 2018) Vol. 70, Page

SMARTPHONE-BASED USER ACTIVITY RECOGNITION METHOD FOR HEALTH REMOTE MONITORING APPLICATIONS

Analytical and Numerical Investigation of FGM Pressure Vessel Reinforced by Laminated Composite Materials

Dietary Assessment in Epidemiology: Comparison of a Food Frequency and a Diet History Questionnaire with a 7-Day Food Record

Stacy R. Tomas, David Scott and John L. Crompton. Department of Recreation, Park and Tourism Sciences, Texas A&M University, USA

Reliability and Validity of the Korean Version of the Cancer Stigma Scale

Advanced Placement Psychology Grades 11 or 12

Objective Find the Coefficient of Determination and be able to interpret it. Be able to read and use computer printouts to do regression.

Two universal runoff yield models: SCS vs. LCM

Why do we remember some things and not others? Consider

Shunting Inhibition Controls the Gain Modulation Mediated by Asynchronous Neurotransmitter Release in Early Development

COHESION OF COMPACTED UNSATURATED SANDY SOILS AND AN EQUATION FOR PREDICTING COHESION WITH RESPECT TO INITIAL DEGREE OF SATURATION

Predictors of Maternal Identity of Korean Primiparas

Measurement uncertainty of ester number, acid number and patchouli alcohol of patchouli oil produced in Yogyakarta

BSc in Public Health, Health Education and Health Promotion health center of Chenaran, Mashhad University of Medical Sciences, Mashhad, IR Iran

Test Retest and Between-Site Reliability in a Multicenter fmri Study

OPTIMUM AUTOFRETTAGE PRESSURE IN THICK CYLINDERS

Coach on Call. Thank you for your interest in When the Scale Does Not Budge. I hope you find this tip sheet helpful.

METHODS FOR DETERMINING THE TAKE-OFF SPEED OF LAUNCHERS FOR UNMANNED AERIAL VEHICLES

Adaptive and Context-Aware Privacy Preservation Schemes Exploiting User Interactions in Pervasive Environments

FUNCTIONAL MAGNETIC RESONANCE IMAGING OF LANGUAGE PROCESSING AND ITS PHARMACOLOGICAL MODULATION DISSERTATION

PEKKA KANNUS, MD* Downloaded from at on April 8, For personal use only. No other uses without permission.

CH 11: Mendel / The Gene. Concept 11.1: Mendel used the scientific approach to identify two laws of inheritance

A Mathematical Model of The Effect of Immuno-Stimulants On The Immune Response To HIV Infection

Structural Safety. Copula-based approaches for evaluating slope reliability under incomplete probability information

Joint Iterative Equalization and Decoding for 2-D ISI Channels

Published: 10/06/2014. Arrhythmia Pathway

Causal Beliefs Influence the Perception of Temporal Order

lsokinetic Measurements of Trunk Extension and Flexion Performance Collected with the Biodex Clinical Data Station

Nadine Gaab, 1,2 * John D.E. Gabrieli, 1 and Gary H. Glover 2 INTRODUCTION. Human Brain Mapping 28: (2007) r

The association between obesity and cardiac output (CO)

RIGHT VENTRICULAR INFARCTION - CLINICAL, HAEMODYNAMIC, ECHOCARDIOGRAPHIC AND THERAPEUTIC CONSIDERATIONS

The number and width of CT detector rows determine the

CHAPTER 1 BACKGROUND ON THE EPIDEMIOLOGY AND MODELING OF BIV/AIDS

The Impact of College Experience on Future Job Seekers Diversity Readiness

A Brain-Machine Interface Enables Bimanual Arm Movements in Monkeys

Evaluating the psychometric properties of the MindMi TM Psychological Assessment System

F1 generation: The first set of offspring from the original parents being crossed. F2 generation: The second generation of offspring.

Effect of Hydroxymethylated Experiment on Complexing Capacity of Sodium Lignosulfonate Ting Zhou

Kuwait Chapter of Arabian Journal of Business and Management Review Vol. 3, No.1; Sep. 2013

Chasing the AIDS Virus

Simulation of the Human Glucose Metabolism Using Fuzzy Arithmetic

HTGR simulations in PSI using MELCOR 2.2

A copy can be downloaded for personal non-commercial research or study, without prior permission or charge

Abstract. Background and objectives

QUEEN CONCH STOCK RESTORATION

By: Charlene K. Baker, Fran H. Norris, Eric C. Jones and Arthur D. Murphy

pneumonia from the Pediatric Clinic of the University of Padua. Serological Methods

Could changes in national tuberculosis vaccination policies be ill-informed?

Radiation Protection Dosimetry, Volume II: Technical and Management System Requirements for Dosimetry Services. REGDOC-2.7.

Protecting Location Privacy: Optimal Strategy against Localization Attacks

Academic Flow and Cyberloafing. Listyo Yuwanto. Universitas Surabaya (UBAYA), East Java, Indonesia

Eating behavior traits and sleep as determinants of weight loss in overweight and obese adults

Osteoarthritis and Cartilage 19 (2011) 58e64

Lifespan Psychology. Chapter 3 Biology of Development. Outline. Infertility. Infertility/Reproductive technologies. Heredity/DNA/Genes

Frequency Domain Connectivity Identification: An Application of Partial Directed Coherence in fmri

VIII. FOOD AND llutrielfl' COMPOSITION OF SBP MEALS

Stable angina: drugs, angioplasty or surgery?

Instructions For Living a Healthy Life

Assessment of postural balance in multiple sclerosis patients

Altered Sleep Brain Functional Connectivity in Acutely Depressed Patients

Relationship of Mammographic Parenchymal Patterns with Breast Cancer Risk Factors and Risk of Breast Cancer in a Prospective Study

IRPPS Working Papers. The regional distribution of in-hospital fatality among Acute Myocardial Infarction events in Italy

National Institutes of Health, Bethesda, Maryland 4 Genomic Imaging Unit, Department of Psychiatry, University of Würzburg, Germany

Evaluation of the accuracy of Lachman and Anterior Drawer Tests with KT1000 ın the follow-up of anterior cruciate ligament surgery

Burden of Chinese Stroke Family Caregivers: The Hong Kong Experience

The Relationship between Resiliency and Anxiety with Body Dysmorphic Concern among Adolescent Girls in Tehran

Annual General Meeting of Shareholders of Roche Holding Ltd 27 February 2006

Commissioning for Value Where to Look pack

Distress is an unpleasant experience of an emotional,

Chapter 16. Simple patterns of inheritance

Collaborative Evaluation of a Fluorometric Method for Measuring Alkaline Phosphatase Activity in Cow s, Sheep s, and Goat s Milk

P states that is often characterized by an acute blood and

Activation of the Caudal Anterior Cingulate Cortex Due to Task-Related Interference in an Auditory Stroop Paradigm

Coupling of Substructures for Dynamic Analyses

ABSTRACT I. INTRODUCTION. Perspective of the Study

Application Of Fuzzy Logic Controller For Intensive Insulin Therapy In Tpye 1 Diabetic Mellitus Patient

The Regional Economics Applications Laboratory (REAL) of the University of Illinois focuses on the development and use of analytical models for urban

Immune mechanisms have been suggested to play a key role in

NUMERICAL INVESTIGATION OF BVI MODELING EFFECTS ON HELICOPTER ROTOR FREE WAKE SIMULATIONS

Multi View Cluster Approach to Explore Multi Objective Attributes based on Similarity Measure for High Dimensional Data

ADAPTATION OF THE MODIFIED BARTHEL INDEX FOR USE IN PHYSICAL MEDICINE AND REHABILITATION IN TURKEY

The Relationship between Personality Factors and Organizational Commitment of Iranian Primary School Principals

Sex Specificity of Ventral Anterior Cingulate Cortex Suppression During a Cognitive Task

GLOBAL PARTNERSHIP FOR YOUTH EMPLOYMENT. Testing What Works. Evaluating Kenya s Ninaweza Program

Word and tree-based similarities for textual entailment

I Can PresCribE A Drug: Mnemonic-based Teaching of Rational Prescribing

Transcription:

Intenational Jounal of Applied Engineeing Reseach ISSN 0973-456 Volume 1, Numbe 17 (017) pp. 6778-6787 Analysis on Retospective Cadiac Disode Using Statistical Analysis and Data Mining Techniques Jyotismita Talukda Assistant Pofesso, Depatment of Compute Science, Univesity of Technology and Management, Shillong, Meghalaya, Assam, India. Ocid Id: 0000-000-5167-7500 D. Sanjib K. Kalita Assistant Pofesso, Depatment of Compute Science, Gauhati Univesity, Guwahati, Assam, India Ocid Id: 0000-000-0576-3010 Abstact Abstact: Heat diseases ae one of the most pominent easons fo deaths all aound the globe till date. Accoding to seveal suveys, it is found that India has the lagest numbe of heat patients all aound the wold. CADI in Califonia has confimed that by 015, aound 6 billion population in India ae sole patients of cadiovascula diseases. The most common cause of cadiovascula disease is the inefficiency of the heat to pump blood fom heat to the est of the body and vice vesa. In this pape, we pesent and compae the vaious statistical and data mining techniques and algoithms in ode to pedict the specific isk factos of cadiovascula diseases. The diffeent coelations, patial coelations of the isk attibutes have been studied and pesented in this pape. An attempt has been made to develop a linea model fo ealy pediction of the cadiovascula disease. Futhe, using data mining techniques, these isk factos ae compaed to pedict the list of isk attibutes that ae most susceptible to heat disease. Keywods: cadiovascula disease, k-means, apioi, coelation, CHD, Regession, Rattle. INTRODUCTION Cadiovascula disease (CVD) includes heat disease (i.e., myocadial infaction and angina), stoke, hypetension, congestive heat failue (CHF), hadening of the ateies, and othe ciculatoy system diseases. CVD is the numbe one cause of death in Ameica, esponsible fo moe than 40% of annual deaths. An aveage of 1 death due to CVD occus evey 33 seconds in the United States [1]. Accoding to the eseach done by the Regista Geneal of India (RGI) and the Indian Council of Medical Reseach (ICMR), people belonging to the age goup between 5 yeas to 69 yeas contibute to about 5% of motality ate due to heat diseases[]. In 008, five out of the top ten causes fo motality woldwide, othe than injuies, wee noncommunicable diseases; this will go up to seven out of ten by the yea 030. By then, about 76% of the deaths in the wold will be due to non-communicable diseases (NCDs) [3] which also includes cadiovascula diseases. In 010, it was found that of all the widespead diseases aound the globe, aound 3 million deaths happened only because of cadiovascula diseases (CVDs). In fact, CVDs would be the single lagest cause of death in the wold accounting fo moe than a thid of all deaths [4]. The most common cause of heat disease is the inefficiency of the heat to pump blood fom heat to the est of the body and vice vesa. Thee ae seveal types of heat disease. Some of them ae [5]: Coonay heat disease: It also known as coonay atey disease (CAD), it is the most common type of heat disease acoss the wold. It is a condition in which plaque deposits block the coonay blood vessels leading to a educed supply of blood and oxygen to the heat. Angina pectois: It is a medical tem fo chest pain that occus due to insufficient supply of blood to the heat. Angina pectois, also temed as Angina is basically a waning signal fo an ealy heat attack. Congestive heat failue: It is a condition whee the heat cannot pump enough blood to the est of the body. It is commonly known as heat failue Cadiomyopathy: Cadiomyopathy is geneally caused due to a weak heat which is mainly due to the esult of a defective stuctue of the heat muscle o any changes in the stuctue of the muscle of the heat. The main cause of Cadiomyopathy is the inadequate pumping of the heat. Congenital heat disease: It mainly efes to the fomation of an abnomal heat duing bith due to a defect in the stuctue of the heat o its functioning. It is mainly found in childen. Ahythmias: Ahythmias is mainly caused due to impope hythmic movement of the heatbeat. The heatbeat in this type of disease is eithe vey slow o fast o is geneally 6778

Intenational Jounal of Applied Engineeing Reseach ISSN 0973-456 Volume 1, Numbe 17 (017) pp. 6778-6787 iegula. These abnomal heatbeats ae caused by a shot cicuit in the heat's electical system. Myocaditis: Hee, the heat muscle suffes inflammation which is usually due to any infection caused by vius, bacteia o any fungus. The symptoms of Myocaditis include joins pain, leg swelling o feve which is not diectly elated to the heat. Howeve, among all the possible heat diseases, Coonay heat disease is one of the most common heat disease in the wold that contibutes to almost half of wold s motality ate. Also known as coonay atey diseases (CAD), hee the coonay blood vessels ae blocked by the plaque deposits in the ateies, thus educing the supply of blood and oxygen to the heat. The most impotant behavioal isk factos of heat disease and stoke ae impope diet, lack of physical activity, tobacco consumption and hamful use of alcohol [6]. The esults of these isk factos may be pominent in individuals as inceased blood pessue, blood glucose, uncontolled blood lipids, oveweight and obesity. These intemediate isks factos indicate an inceased isk of developing a heat attack, stoke, heat failue and othe complications. Data mining can be temed as the pocess of discoveing coelations, pattens and tends by taking into consideation lage amount of data stoed inn epositoies. Seveal methods can be used to extact useful knowledge fom the lage data epositoies patten ecognition, association analysis as well as statistical and mathematical techniques [7]. It can also be defined as the pocess of analyzing data fom diffeent database and extacting some useful knowledge, patten, association o elationship out of it. As shown in Figue 1, data mining is a tem that descibes diffeent techniques used in a domain of machine leaning, statistical analysis, patten ecognition, pediction, classification, clusteing, visualization, modelling techniques. It has wide applications in all banches of industy such as telecommunications, etail, poduction, banking, education, and health cae management Futhemoe, statistical analysis is also one of the most tivial data analysis fo pedicting and compaing data esults fom a set of data population. Its main goal is to identify the possible tends fom a lage set of data. Statistical analysis is geneally explained using the following five discete steps: 1. Analysis of the natue of data taken.. Exploing the elationship of the data with the given population. 3. Defining a specified model to explain how the data elates to the population. 4. Validation of the model. 5. Pedictive analysis using the model to find the possible tends. PROBLEM STATEMENT Cadiovascula diseases ae consideed to be one of the easons fo highest motality ate acoss the globe. Ealy pediction of the isk factos fo a peson suffeing fom cadiovascula disease is of utmost impotance. Howeve, statistical analysis and data mining techniques can educe the numbe of tests that bae geneally equied to be caied out to pedict the occuence of any cadiovascula diseases. This educed test set plays an impotant ole in time and pefomance. Cadiovascula data analysis is impotant because it allows doctos to see which featues o attibutes ae moe impotant fo diagnosis such as age, weight, etc. This will help the doctos diagnose heat diseases moe efficiently. Seveal techniques ae available in the healthcae industy fo the ealy pediction of heat diseases, but eseach that has to be done to tack the pefomance of vaious classification techniques, to enable the choice of the best among them can be chosen. This pape pesents a eseach model pedict the heat disease fo patients by poviding timely esponse in pedicting the disease. This pape mainly focusses on the following topics: How vaious data mining techniques can be used in health cae industy and to identify thei pefomance in pediction? Use of the egession analysis in developing the pediction model to accuately pedict the isk factos fo patients suffeing fom heat disease. Figue 1: Domains in data mining Objective The pimay objective of this pape is to develop a pedictive model fo pope analysis and pediction of the isk factos of cadiovascula diseases using vaious statistical and datamining techniques. It also shows that data mining can be 6779

Intenational Jounal of Applied Engineeing Reseach ISSN 0973-456 Volume 1, Numbe 17 (017) pp. 6778-6787 applied to the medical databases to pedict o classify the data with easonable accuacy. The following ae the objectives leading to achievement of the pimay objective mentioned supa: To identify the best classification model which can help the physicians in pedicting the isk of heat disease using seveal attibutes. To ecognize and classify pattens in multivaiate patient attibutes. To pedict the futue outcomes based on pevious expeiences and pesent conditions. To identify the patients at isk, with the aim of inceasing quality of cae and to educe the cost of cae. To constuct a pediction model using seveal classification techniques such as naïve Bayes, decision tees and suppot vecto machines. BACKGROUND Pak (009) in his text book titled Peventive and Social Medicine states that, chonic non-communicable diseases ae inceasing among the adult population in both developed and developing counties. Cadiovascula diseases and cance ae at pesent the leading causes of death in developed counties such as Euope and Noth Ameica. The isk factos that ae esponsible fo mobidity and pematue motality ae smoking, alcohol abuse, failue o inability to obtain peventive health sevices, life-style changes and stess. Shantakuma et al. (009) point out that the tem heat disease encompasses the divese diseases that affect the heat and the tem cadiovascula disease includes a wide ange of conditions that affect the heat and the blood vessels that pump blood thoughout the body. It esults in disability of seveal body pats, illness and even death. Myocadial infactions which is also known as a heat attacks and angina pectois, o chest pain ae encompassed in the CHD. High blood pessue, coonay atey disease, stoke, o heumatic feve/heumatic heat disease ae the vaious foms of cadiovascula disease. Jayshi Sonawane et al. (013) have illustated the heat is the ogan that pumps blood, with its life giving oxygen and nutients, to all tissues of the body. If the pumping action of the heat becomes inefficient, vital ogans like the bain and kidneys suffe and if the heat stops woking altogethe, death occus within minutes. Latha Pathiban et al. [7] developed an appoach taking into consideation the coactive neuo-fuzzy infeence system (CANFIS) fo ealy detection of heat disease. The CANFIS model diagnosed the pesence of disease by meging the neual netwok adaptive capabilities and the fuzzy logic qualitative appoach and futhe integating with genetic algoithm. Kiyong Noh et al. [8] put foth a classification method fo the extaction of multi-paametic featues by assessing HRV fom ECG, data pepocessing and heat disease patten. The efficient FP-gowth method was the basis of this method which is an associative. They developed a ule to geneate the pattens. The multiple ules and puning, biased confidence (o cohesion measue) and dataset consisting of 670 paticipants, distibuted into two goups, namely nomal people and patients with coonay atey disease, wee employed to cay out the expeiment fo the associative classifie. Akhil Jabba et al. poposes efficient associative classification algoithm using genetic appoach fo heat disease pediction. Hian Chye Koh and Geald Tan mainly discusses data mining and its applications with majo aeas like Teatment effectiveness, Management of healthcae, Detection of faud and abuse, Custome elationship management[9]. Jayanthi Ranjan, in he pape explains how data mining discoves and extacts useful pattens of lage data to find useful pattens. This pape descibes the uses of data mining to impove the quality of the decision making in Phamaceutical industy. Issues in the phama industy ae advese eactions to the dugs [10]. M. Duaiaj, K. Meena explains a hybid pediction system consisting of Rough Set Theoy (RST) and Atificial Neual Netwok (ANN) on medical data. They developed a new data mining technique to assist competent solutions fo medical data analysis. [11]. K. Sinivas, B. Kavitha Rani and D. A. Govedhan explains the use of classification based data mining such as Rule based, decision tee, Naïve bayes fo medical data. Some vital attibutes like age, sex, blood pessue and blood suga wee used in ode to pedict the likelihood of patients getting a heat attack. [1]. Shweta Khaya discussed vaious data mining appoaches that have been utilized fo beast cance diagnosis and pognosis Decision tee is found to be the best pedicto with 93.6% Accuacy on benchmak dataset and also on SEER data set [13]. Elias Lemuye discussed the AIDS is the disease caused by HIV, which weakens the body s immune system until it can no longe fight off the simple infections that most healthy people s immune system can esist. Apioi algoithm is used to discove association ules. WEKA 3.6 is used as the data mining tool to implement the Algoithms. The J48 classifie pefoms classification with 81.8% accuacy in pedicting the HIV status [14]. 6780

Intenational Jounal of Applied Engineeing Reseach ISSN 0973-456 Volume 1, Numbe 17 (017) pp. 6778-6787 Avind Shama and P.C. Gupta explained the benefits of data mining in medicinal aea. Thei eseach aea was mainly focused in the blood bank secto. They used the J48 algoithm and WEKA to extact useful knowledge out of the attibutes available. The accuacy ates eached to about 89.9% using these algoithms on blood donos. [15]. METHODOLOGY This pape mainly focusses on the analysis of the cadiovascula diseases using statistical and data mining analysis. This pape explains the statistical analysis of diffeent CHD/CVD isk attibutes. The objectives of this chapte ae: Selecting the isk attibutes of CHD/CVD, such as: Fasting Blood Suga (FBS), Cholesteol (CHOL), Systolic Blood Pessue (BP), Thalach (TH) and age etc. To find the coelation (Peason) o dependency of these isk attibutes among themselves. To compute the patial coelation coefficients of these isk attibutes and identifying the most isky attibutes esponsible fo any possible CHD/CVD disease. To develop pedictive linea and non-linea statistical pedictive models based on the esults of coelation (simple and patial) among the selected isk attibutes. To epesent the CHD/CVD victims (male & female), following a pe-defined hieachy of isk attibutes, in the fom of tee. In the pesent study and analysis of CHD/CVD isk factos the coelations and patial coelation coefficients have been computed on fou basic isk attibutes, such as : - Blood Pessue (BP) Fasting Blood Suga (FBS) Cholesteol (CHOL), and Thalach (THAL). The attibute age has been consideed to study the fequency of CHD/CVD victims with espect to age and sex. The Peason s coelation coefficients have been computed using the elation (3.1) Whee xy xi x yi y i N N 1 (3.1) xi x yi y N i1 i1 x y x i, i y (3.) N N N is the numbe of sampled data (numbe of CHD/CVD victims), xy is the coelation coefficient between x and y. x and y ae any two CHD/CVD isk attibutes selected fom the fou isk attibuted as mentioned above. Similaly, the patial coelation coefficients among the fou isk attibutes (BP, FBS, CHOL and THAL) ae computed using the elation (3.3). i xi x j.( y) j x x x. y. x. y (3.3) i j i j y x y x 1 i 1 j Whee, x, x ae the patient s isk attibutes fo CHD/CVD Fo example, BP and FBS stand fo xi and xj espectively and y is epesenting CHOL ( say ) o THAL ( say ).The expession (3.1) epesents the patial coelation coefficients between x, x if they obtain the same scoe on the vaiable y. i j Fo example, in the expession (3.4):- 1. 1.3.3 1..(3) (3.4) 1 1 1.3 The patial coelation coefficient 1..(3) indicates the elationship between vaiables 1 and when each of them obtained the same scoe on the vaiable 3.In case of fou vaiables, the patial coelation coefficients ae computed as given in equation (3.5)... 1.3 14.3 4.3 1.34 (3.5).3 1 14.3 1 4.3 In equation (3.5), the vaiables 3 and 4 ae paallel out. The geneal fom of equations (3.4) and (3.5) is epesented as given by equation (3.6). 1.34...( N 1) 1 N.34..( N 1). N.34..( N 1) (3.6) 1.34... N 1 1N.34..( N 1) 1 N.34..( N 1) The pesent statistical analysis of CHD/CVD victims of age between 5 yeas and 100 yeas is made. The total numbe of victims analyzed in the pesent study is 336. Data mining compises of extacting useful knowledge fom a set of lage database. The pocess of mining and analyzing data can be caied out in the following steps depending on the type of modelling used fo the pocess, namely pedictive o desciptive. A pedictive data modelling includes pocesses like classification, Regession, Time seies analysis and then finally pediction. Howeve, a desciptive data modelling includes pocesses like Clusteing, Association, Summaization and Sequence discovey. 6781

Intenational Jounal of Applied Engineeing Reseach ISSN 0973-456 Volume 1, Numbe 17 (017) pp. 6778-6787 Figue : Data mining model and task This pape specially focusses on the Rattle data mining tool to pedict the isk factos fo a patient suffeing fom heat disease. Rattle is a gaphical tool that povides a use inteface to R pogamming fo analysis and pediction. Rattle data mining tool follows the given steps to extact knowledge fom a lage data set: 1. Loading of the data set: The data is geneally in.csv fomat. Othe fomat allowed ae ARFF,.txt,.xls etc.. Exploing the data: It includes exploing the data in tems of mean, median, vaiance, ange, skewness, kutosis etc. It also vaious numeic and gaphical tools fo the data. It has the option of find the histogams, box plots, dot plots fo the data available. 3. Coelation: It epesents the inte dependence of the input vaiables pesent in the dataset. Mathematically, it is defined as the covaiance of the dataset divided by the poduct of thei individual standad deviations. The sample coelation coefficient of a dataset whee Sx and Sy ae the sample standad deviation and Sxy is the sample covaiance can be defined as follows: Sxy = The values of the coelation facto indicates the stength of the intedependence of the seveal attibutes take. If the value of coelation coefficient is close to 1, it indicates that the vaiables ae closely positively elated to each othe. Thus the scatte plot falls almost along a staight line with a positive slope. Fo a negative value of coelation coefficient, vaiables ae linealy negatively elated to each othe and the scatte plot is linea with a negative slope. Howeve, fo value of coelation to be zeo, it indicates a vey weak elation between the vaiables. Coelation can be categoized as Peason, Kendall. Figue 3 shows the intedependence of the isk factos using coelation: Figue 3: Coelation using Peason coefficient. The diffeent shapes and sizes indicate seveal anges of coelation values. The diagonal lines indicate pefect coelation between the vaiables; howeve it is obvious that thee is pefect coelation between the same vaiables. Zeo coelation indicates a pefect cicle with white colo. Fo example, thee is no coelation between the vaiables Sex and Cholesteol. Thus it can be well said that cholesteol of a peson is completely independent of the gende of a peson. The colo of the cicles has a geat elation with the coelation of the vaiables. The coelation goes on deceasing as the colo fades fom dak to light. Moeove, the ed shades indicate positive coelation wheeas the blue shades indicate negative coelation. Howeve, it is seen that fo linea data analysis, Peason povides coelation is the much bette than speaman and Kendell coelation. Coelation can also be explained with the help of dendogams. Clusteing is the classification technique fo unsupevised data. Clusteing allows us to classify data and helps us to conside the cluste set (goups of cluste) by measuing the distance between the clustes. This pape uses the k- means clusteing, also called patitioning method whee a specific numbe (say k) of clustes ae taken into consideation. Each of the cluste in k- means algoithm analysis uses at least one data object, which is consideed to be the basic equiement fo clusteing pocess in the patitioning based clusteing. It takes into consideation the numeic data. When the numbe of clustes (say k) is assumed by the use, then a pio clusteing method may be fomulated to measue the coectness of the algoithm fo n objects on N dimensional space among the k cluste goups. K-means algoithm is a patitioning based method that ceates k- patitions/clustes fom a given set of data. The fundamental equiement fo the k- means clusteing algoithm is that each cluste must contain at least one data object and each of the data objects must belong to exactly one cluste. The diffeent clustes in k-means patitioning based algoithm is defined by 678

Intenational Jounal of Applied Engineeing Reseach ISSN 0973-456 Volume 1, Numbe 17 (017) pp. 6778-6787 the distance measuement fo clusteing the data. The most common method to measue the distance fo the clusteing detemination is the Euclidian distance, which is descibed by the following equation: D (x i,x j) = = xi,xj The algoithm fo k-means position method is as follows: The new cluste is selected using the following fomula: Vi = ( ) Whee, Ci epesents the numbe of data points/values in the i th cluste. The accuacy of the K-Means clusteing algoithm can be illustated as follows: Accuacy (A) = (t p + t n) / total sample. Whee, t p = tue positive t n = tue negative The cluste sizes that ae consideed in the dataset fo heat patients ae as follows: Runs = 10 Cluste size: 189,114,9,37,59,60,178,36,88,19 181, 71, 9, 96, 136, 168, 77, 70, 71, 117. Fo each patten X i, we detemine the membeship m(c j / X j) in each cluste C j. The membeship function detemines the value of X i that belongs to the j th cluste C j. If the patten X i is closest to the j th cluste with simila popeties, then it is assumed that: m(c j / X i) = 1 Figue 3: K means algoithm It is an unsupevised patitioning algoithm based on a simple iteative scheme fo finding a local minimal solution. It consides a specified numbe of clustes k. Let us conside the k pototype/clustes be (w1, w, w3. Wk) fo the n input pattens (i1, i, i3,in) [16]. Thus, W j = i l, jε{1,,..k} lε{1,, n} The quality of a clusteing algoithm is detemined by the following eo function: E = Whee, Cj is the j th cluste whose value is a disjoint subset of the input paamete. [16] The eo function can also be explained as following: Whee, E = is the Euclidian distance between Xi & Vj. Ci is the numbe of data points in the i th cluste. C is the numbe of cluste centes. else m(c j / X i) = 0 Thus, we can say that the values of m(c j / X i) is bound to (0,1). Theefoe, m(c j / X i) (0,1) Once the values of the clustes ae detemined, the centoids of the cluste centes ae to be e-computed in ode to find the new cluste centes, V j and then calculate the sum of squae eo E. The centoids of the e-computed cluste centes ae detemined as follows: V j = fo j = 1,,.k The sum of squae eo is calculated as follows: E = fo i=1..n j=1..k We epeat the computation of the new cluste centes until convegence. 6783

Intenational Jounal of Applied Engineeing Reseach ISSN 0973-456 Volume 1, Numbe 17 (017) pp. 6778-6787 The data means fo K- means clusteing analysis is shown as follows: Table I: Data means of K-Means clusteing Sex Chest pain Systolic BP FBS Step1: Apply minimum suppot to find all the fequent sets with k item sets in a database [0]. Step: Mine the fequent k-item sets to find the fequent sets with (k+1) items 0.73 3.64 138.60 0.005 Age Thalach Family histoy Symptoms 7.891 0.0 0.38 1.94 Comobidity Exang Rest ECG Chol Weight.59 0.66 0.39.003-0.01 Association analysis identifies the elationships o affinities between obsevations o between vaiables. These vaiables/ elationships ae expesses as a collection of ules called association ules. Association ule analysis is especially applicable when the size of the tansaction database is vey lage. The association analysis geneally compises of the following two steps: 1. Geneation of fequent item sets.. Mining the association ules fom the fequent item sets [17]. The oiginal concept of association ule mining was bought up by Agawal who defined the concept of association ule mining [18] [19] as: Let I = {i 1, i, i 3..i n} be the set of n binay attibutes called Items D = {t 1, t, t 3.. t n} be the set of n tansactions called Database. Each tansaction in the database D has a unique tansaction id and contains a subset of items in I. Thus, the association ule is defined as: whee, X, Y ε I X Y = ø X Y The set of items X is called antecedent The set of items Y is called consequent. Apioi algoithm is consideed as one of the Boolean association ules fo mining fequent item sets. The apioi algoithm was initially poposed by R.Agawal and R. Shikanth in the yea 1994. The entie Apioi algoithm can be divided into two steps: Figue 4: Apioi algoithm Pincipal component analysis is the pocess to find out the diffeent pattens in data, and also finding out the similaities and diffeences in the data povided. It can found using Singula value decomposition method (SVD) and Eigen value method. The esult of pincipal Component analysis using SVD and Eigen value method is shown below: Figue 5: Pincipal component analysis using Eigen value Figue 6: Pincipal component analysis using SVD 6784

Intenational Jounal of Applied Engineeing Reseach ISSN 0973-456 Volume 1, Numbe 17 (017) pp. 6778-6787 Table II: Standad Deviation Values Using Eigen Method Comp.1 Comp. Comp.3 Comp.4 Comp.5 Comp.6 6650.446 3.0363 1.7589 17.867 8.458 1.65 Comp.7 Comp.8 Comp.9 Comp.10 Comp.11 Comp.1 1.147458 1.0317 0.4871 0.4333 0.3987077 0.665 Table V shows the coelation and patial coelations of BP and THAL, while CHOL and FBS consideed as contol vaiables Table V: Coelation between BP and THAL RESULTS AND DISCUSSION The following table III shows bivaiate coelations among BP, FBS, CHOL and THAL. Table III: Coelation among BP, CHOL, FBS and THALACH Table VI shows the coelation and patial coelation coefficients of BP and CHOL. While THAL and FBS consideed as contol vaiables. This has been shown at diffeent BP anges. Table VI: Coelation between BP and CHOL Table VII shows the coelation and patial coelation coefficients between FBS and CHOL, when BP and THAL consideed as contol vaiables. In the table IV the coelation and patial coelation coefficients between BP and FBS, consideing CHOL and THAL as contol vaiables, have been shown Table VII: Coelation between FBS and CHOL Table IV: Coelation between BP and FBS Table VIII shows the esults of coelation and patial coelation coefficients between FBS and THAL while BP and CHOL consideed as contol vaiables. Table VIII: Coelation between FBS and THAL 6785

Intenational Jounal of Applied Engineeing Reseach ISSN 0973-456 Volume 1, Numbe 17 (017) pp. 6778-6787 Similaly, the table IX depicts the coelation and patial coelation coefficients between THAL and CHOL taking FBS and BP as contol vaiables. Table IX: Coelation between THAL and CHOL CONCLUSION In this study, the aim was to design a pedictive model fo heat disease pediction using statistical analysis and data mining techniques fom the 1 attibutes dataset that is capable of finding the isk factos in ealy detection of cadiac disodes. Data collected fom Gauhati Medical College and hospital, Hayat Hospital, GNRC Hospital, Downtown Hospital and Naayanan Supe-specialty Hospital in India 0f yea 015 was consideed and pepocessed fo this study. The models wee built using the statistical and data mining techniques. Coelation and patial coelation techniques wee used to analyze the data set. Data mining used K-Means, Apioi and Decision tee algoithms to detect the most impotant isk factos fo the pepocessing of the data. The pefomances of the models wee evaluated and was the found that Blood Pessue (BP), Fasting Blood Suga (FBS), Cholesteol (CHOL) and Thalach (THAL) ae the most impotant isk factos fo detecting a heat disease. Thus we see that fom a total of 1 attibutes, 4 attibutes poved to be isk factos that wee highly elevant. As a futue wok, I have planned to pefom additional expeiments with moe dataset and algoithms to impove the classification accuacy and to build a model that can pedict specific heat disease types. REFERENCES [1] A Liteatue Review of Cadiovascula Disease Management Pogams in Managed Cae Populations,SHETA ARA, PhamD, http://www.jmcp.og/doi/pdf/10.18553/jmcp.004.10.4.36. [] Vikas Chauasia, et al. Ealy Pediction of Heat Diseases Using Data Mining Techniques, Caib.j.SciTech, 013, Vol.1, 08-17. [3] Peventing Chonic Disease: A Vital Investment. Wold Health Oganization Global Repot, 005. [4] Global Buden of Disease. 004 update (008). Wold Health Oganization. [5] K.Sudhaka, D. M. Manimekalai, Study of Heat Disease Pediction using Data Mining, Intenational Jounal of Advanced Reseach in Compute Science and Softwae Engineeing, Volume 4, Issue 1, Januay 014 ISSN: 77 18X. [6] http://www.who.int/mediacente/factsheets/fs317/en/ [7] Gounescu, F, Data Mining: Concepts, Models, and Techniques, Spinge, 011. [8] Latha Pathiban and R.Subamanian, "Intelligent Heat Disease Pediction System using CANFIS and Genetic Algoithm", Intenational Jounal of Biological, Biomedical and Medical Sciences 3; 3, 008 [9] Kiyong Noh, Heon Gyu Lee, Ho-Sun Shon, Bum Ju Lee, and Keun Ho Ryu, "Associative Classification Appoach fo Diagnosing Cadiovascula Disease", Spinge, Vol:345, pp: 71-77, 006. [10] HianChyeKoh and Geald Tan, Data Mining Applications in Healthcae, jounal of Healthcae Infomation Management Vol 19, No [11] M.Duaiaj, K.Meena, A Hybid Pediction System Using Rough Sets and Atificial Neual Netwoks, Intenational Jounal Of Innovative Technology & Ceative Engineeing (ISSN: 045-8711) VOL.1 NO.7 JULY 011. [1] K. Sinivas, B. Kavitha Rani and D. A. Govdhan, Applications of Data Mining Techniques in Healthcae and Pediction of Heat Attacks Intenational Jounal on Compute Science and Engineeing (010). [13] Ming Chuan Hung, Jungpin Wu, Jin-HuaChang, Don lin Yang, An efficient K-Means clusteing algoithm using simple patitioning,jounal of infomation science and engineeing, yea 005. [14] EliasLemuye, Hiv Status Pedictive Modeling Using Data Mining Technology. [15] Rakesh Agaewal, Ramakishnan Sikanth, Fast Algoithm fo mining association ules inn lage databses,poc 0 th Intenational confeence vey lage databases(vldb), pp-487-499, yea 1994 [16] Avind Shama and P.C. Gupta Pedicting the Numbe of Blood Donos though thei Age and Blood Goup by using Data Mining Tool Intenational Jounal of Communication and Compute Technologies Volume 01 No.6, Issue: 0 Septembe 01. 6786

Intenational Jounal of Applied Engineeing Reseach ISSN 0973-456 Volume 1, Numbe 17 (017) pp. 6778-6787 [17] Jiao Yabing, Reseach of an impoved Apioi algoithm in data mining association ules:,ijcce, Vol., No.1, Januay 013 [18] Rachna Somkunwa, A study on vaious data mining appoaches of association ules,ijarcsse, Vol., Issue 9, Septembe 013. [19] Jong Soo Pak, Mingsyan Cheng, Philip S. Yu, An effective hash based algoithm fo mining association ules,poc ACM, SIGMOD Confeence, PP.175-186, yea 1995 [0] Blog.hackeeath.com/beginnes-tutoial-apioialgoithm-data-mining--implementation. 6787