A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA

Similar documents
Joint Modelling Approaches in diabetes research. Francisco Gude Clinical Epidemiology Unit, Hospital Clínico Universitario de Santiago

AUTOMATED DETECTION OF HARD EXUDATES IN FUNDUS IMAGES USING IMPROVED OTSU THRESHOLDING AND SVM

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores

Copy Number Variation Methods and Data

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of

econstor Make Your Publications Visible.

Study and Comparison of Various Techniques of Image Edge Detection

AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Association between cholesterol and cardiac parameters.

Comparison of support vector machine based on genetic algorithm with logistic regression to diagnose obstructive sleep apnea

Physical Model for the Evolution of the Genetic Code

Survival Rate of Patients of Ovarian Cancer: Rough Set Approach

Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana

Using Past Queries for Resource Selection in Distributed Information Retrieval

INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER

THIS IS AN OFFICIAL NH DHHS HEALTH ALERT

Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters

Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article

Saeed Ghanbari, Seyyed Mohammad Taghi Ayatollahi*, Najaf Zare

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16

Impact of Imputation of Missing Data on Estimation of Survival Rates: An Example in Breast Cancer

Statistically Weighted Voting Analysis of Microarrays for Molecular Pattern Selection and Discovery Cancer Genotypes

FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION

Classification of Breast Tumor in Mammogram Images Using Unsupervised Feature Learning

A New Machine Learning Algorithm for Breast and Pectoral Muscle Segmentation

Optimal Planning of Charging Station for Phased Electric Vehicle *

A Classification Model for Imbalanced Medical Data based on PCA and Farther Distance based Synthetic Minority Oversampling Technique

*VALLIAPPAN Raman 1, PUTRA Sumari 2 and MANDAVA Rajeswari 3. George town, Penang 11800, Malaysia. George town, Penang 11800, Malaysia

Introduction ORIGINAL RESEARCH

Economic crisis and follow-up of the conditions that define metabolic syndrome in a cohort of Catalonia,

Lymphoma Cancer Classification Using Genetic Programming with SNR Features

Strategies for the Early Diagnosis of Acute Myocardial Infarction Using Biochemical Markers

A New Diagnosis Loseless Compression Method for Digital Mammography Based on Multiple Arbitrary Shape ROIs Coding Framework

Fast Algorithm for Vectorcardiogram and Interbeat Intervals Analysis: Application for Premature Ventricular Contractions Classification

CLUSTERING is always popular in modern technology

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect

BIOSTATISTICS. Lecture 1 Data Presentation and Descriptive Statistics. dr. Petr Nazarov

Biomarker Selection from Gene Expression Data for Tumour Categorization Using Bat Algorithm

Estimation for Pavement Performance Curve based on Kyoto Model : A Case Study for Highway in the State of Sao Paulo

Arrhythmia Detection based on Morphological and Time-frequency Features of T-wave in Electrocardiogram ABSTRACT

Richard Williams Notre Dame Sociology Meetings of the European Survey Research Association Ljubljana,

Evaluation of Literature-based Discovery Systems

Available online at ScienceDirect. Procedia Computer Science 46 (2015 )

(From the Gastroenterology Division, Cornell University Medical College, New York 10021)

Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article

Reconstruction of gene regulatory network of colon cancer using information theoretic approach

Evaluation of the generalized gamma as a tool for treatment planning optimization

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

Using a Wavelet Representation for Classification of Movement in Bed

The Effect of Fish Farmers Association on Technical Efficiency: An Application of Propensity Score Matching Analysis

Research Article Statistical Analysis of Haralick Texture Features to Discriminate Lung Abnormalities

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy

Detection of Lung Cancer at Early Stage using Neural Network Techniques for Preventing Health Care

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

Algorithms 2009, 2, ; doi: /a OPEN ACCESS

Improvement of Automatic Hemorrhages Detection Methods using Brightness Correction on Fundus Images

Statistical Analysis on Infectious Diseases in Dubai, UAE

The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis

An Approach to Discover Dependencies between Service Operations*

The effect of salvage therapy on survival in a longitudinal study with treatment by indication

Semantics and image content integration for pulmonary nodule interpretation in thoracic computed tomography

PERFORMANCE EVALUATION OF DIVERSIFIED SVM KERNEL FUNCTIONS FOR BREAST TUMOR EARLY PROGNOSIS

Research Article Statistical Segmentation of Regions of Interest on a Mammographic Image

INTRAUTERINE GROWTH RESTRICTION (IUGR) RISK DECISION BASED ON SUPPORT VECTOR MACHINES

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015

AlereTM. i Influenza A & B. Enter. Molecular results in less than 15 minutes

Statistical models for predicting number of involved nodes in breast cancer patients

Investigation of zinc oxide thin film by spectroscopic ellipsometry

A Computer-aided System for Discriminating Normal from Cancerous Regions in IHC Liver Cancer Tissue Images Using K-means Clustering*

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE

Resampling Methods for the Area Under the ROC Curve

administration neural network vs. induction methods for knowledge classification

RENAL FUNCTION AND ACE INHIBITORS IN RENAL ARTERY STENOSISA/adbon et al. 651

BIOSTATISTICS. Lecture 1 Data Presentation and Descriptive Statistics. dr. Petr Nazarov

Sparse Representation of HCP Grayordinate Data Reveals. Novel Functional Architecture of Cerebral Cortex

THE NORMAL DISTRIBUTION AND Z-SCORES COMMON CORE ALGEBRA II

Jurnal Teknologi USING ASSOCIATION RULES TO STUDY PATTERNS OF MEDICINE USE IN THAI ADULT DEPRESSED PATIENTS. Full Paper

ENRICHING PROCESS OF ICE-CREAM RECOMMENDATION USING COMBINATORIAL RANKING OF AHP AND MONTE CARLO AHP

What Determines Attitude Improvements? Does Religiosity Help?

Drug Prescription Behavior and Decision Support Systems

UNIVERISTY OF KWAZULU-NATAL, PIETERMARITZBURG SCHOOL OF MATHEMATICS, STATISTICS AND COMPUTER SCIENCE

Analysis of Correlated Recurrent and Terminal Events Data in SAS Li Lu 1, Chenwei Liu 2

The Preliminary Study of Applying TOPSIS Method to Assess an Elderly Caring Center Performance Ranking

Adaptive Neuro Fuzzy Inference System (ANFIS): MATLAB Simulation of Breast Cancer Experimental Data

INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE

Effects of Estrogen Contamination on Human Cells: Modeling and Prediction Based on Michaelis-Menten Kinetics 1

Comparison among Feature Encoding Techniques for HIV-1 Protease Cleavage Specificity

Feature Selection for Predicting Tumor Metastases in Microarray Experiments using Paired Design

KOUJI KAJINAMI, MD,*t HIROYASU SEKI, MD,t NOBORU TAKEKOSHI, MD,t HIROSHI MABUCHI, MD* Kanazawa, Japan

Insights in Genetics and Genomics

Integration of sensory information within touch and across modalities

ARTICLE IN PRESS. computer methods and programs in biomedicine xxx (2007) xxx xxx. journal homepage:

Normal variation in the length of the luteal phase of the menstrual cycle: identification of the short luteal phase

Concentration of teicoplanin in the serum of adults with end stage chronic renal failure undergoing treatment for infection

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx

Dr.S.Sumathi 1, Mrs.V.Agalya 2 Mahendra Engineering College, Mahendhirapuri, Mallasamudram

Transcription:

Journal of Theoretcal and Appled Informaton Technology 2005 ongong JATIT & LLS ISSN: 1992-8645 www.jatt.org E-ISSN: 1817-3195 A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA 1 SUNGMIN MYOUNG, 2 HONG-KI LEE 1 Department of Health Admnstraton, Jungwon Unversty, Goesan-gun, Chungbuk, Korea. 2 Correspondng Author, Department of Management, Jungwon Unversty, Goesan-gun, Chungbuk, Korea. E-mal: 1 smmyoung@jwu.ac.kr, 2 t2020@jwu.ac.kr. ABSTRACT One of the objects of many bg-data analytc methods s to search through large amounts of nformaton, analyzng to predct for ndvdual patents. Cataract s a cloudng of the lens nsde the eye whch leads to a decrease n vson, and the most common cause of blndness and s conventonally treated wth surgery. It s mportant to detect the dagnoss early to reduce abstract cataracts develop. Many machne learnng and data mnng technques have been suggested to do automatc dagnoss cataract. In ths study, mxture of expert method was appled on 126,532 people collected from vstng a health screenng center. Mxture of expert was mplemented n R 3.4.1 to tran the data for the development of the model. The performance of the ME model was evaluated by plottng a ROC curve for the valdaton of the results. The ME model acheved accuracy rates whch were hgher than that of the logstc regresson model. Ths research wll be appled to an mportant dagnostc decson mechansm for cataract n health examnaton subject. Keywords: Mxture of Expert, Dagnoss, Cataract, Screenng Data, Data Mnng 1. INTRODUCTION In recent years, many bg-data analytc methods have been suggested n the feld of bomedcne [1]. One of the objects of these analytc methods s to search through huge amounts of nformaton, analyzng t to predct sgnfcant outcomes for ndvdual patents [2]. For example, complex data sets such as structured and unstructured data (EMR, fnancal, clncal and genomc data) appled to be dealed wth outcomes and predct patent s rsk for dsease or prognoss [3]. For solvng these problems, For solvng these problems, dvde-and-conquer prncple have been used to break a gven problem (complex problem) nto sub-problems whose solutons can be recursvely combned to produce a fnal soluton [4, 5]. Applyng ths prncple, Jacobs and Jordan proposed mxture of experts (ME) model whch s a modular neural network archtecture for unsupervs ed/supervsed learnng [6]. It s composed of several expert network, a gatng network, and a probalstc model to combne the gatng and experts network. Yuksel et al. ponted out three mportant propertes of ME model. Frst, t allows each experts to focus on smaller parts of a larger problem. Second, t uses soft parttons of the gven dataset. Thrd, t allows the splts to be formed along hyperplanes at arbtrary orentatons n the nput space [6, 7]. A number of studes for ME model have been suggested n the felds of regresson and classfcaton, and demonstrated usefulness. Also, many researchers thnk of ME have been publshed n the areas of medcal decson support, genomc data analyss and sgnal pattern data analyss [7]. A cataract s a cloudng of the lens nsde the eye whch leads to a decrease n vson. It s the most common cause of blndness and s conventonally treated wth surgery. Vsual loss occurs because opacfcaton of the lens obstructs lght from passng and beng focused on to the retna at the back of the eye. It s most commonly due to agng, but there are many other causes [8, 9]. Park et al. reported the prevalence of cataract and cataract surgery usng the 2008-2012 Korean Natonal Health and Nutrton Examnaton Survey (KNHANES) data. 7368

Journal of Theoretcal and Appled Informaton Technology 2005 ongong JATIT & LLS ISSN: 1992-8645 www.jatt.org E-ISSN: 1817-3195 Fgure 1: The analyss process The reported prevalence of cataract was 42.28% and 95% confdence nterval (CI) was 40.67-43.89. For men, the prevalence 40.82% and 95% CI 38.97-42.66. For women, the prevalence was 43.62% (95% CI, 41.91-45.33) and p-value for comparng gender was 0.606 [10]. The purpose of screenng s the testng of evdently well people to fnd those at ncreased rsk of havng a dsease or dsorder. Although an earler dagnoss usually has natural appeal, t had been known that earler mght not always be better, or worth the cost. Also, approprate screenng test s known that t can mprove health [11]. In the aspect of developng cataracts, t wll be mportant to detect the dagnoss early to reduce long-term problems through screenng test. At the pont of vew, ths research demonstrates the applcaton of model to apply an easy, quck, and precse method for dagnosng early cataract based on machne learnng algorthms, especally the ME model, by usng a hosptal screenng center data. Ths study shows that ME model acheves superor performances compared to classcal logstc regress on model for the hosptal screenng datasets and the obtaned results show that further sgnfcant feasblty of ME model n terms of hosptal screenng data can be appled. The outlne of ths paper s organzed n the followng way. In secton 2, the data used n ths study s presented. In secton 3, the ME model s descrbed. The applcaton results of the ME model proposed n ths paper are also presented n secton 4. Fnally, our work of ths paper s summarzed n the last secton. 2. DATA COLLECTION AND METHOD We used data from 126,532 dagnostc records who vsted from one hosptal screenng center from 1994 to 2005 were collected. Examnaton tems were measured such as medcal examnaton by ntervew, blood pressure, body measurements, blood, urnalyss, stool, dental exam, eye/hearng test, cardac functon, gynecology, and nutrton n hosptal screenng center. Table 1 present measured components of hosptal screenng center. Among the records, 5,804 subjects were vsted hosptal for examnng dagnostc ophthalmopathy, and 1,210 subjects were excluded due to mssng data. Thus, fnal analyss set wth 4,591 cases were constructed and used for analyss, 707 of them are cataracts and the rest of them are non-cataract. Fgure 1 provdes the analyss process n ths study. Each record has 28 attrbutes (medcal examnaton tems). In generally, t s known that rsk factors of cataract are age, sex, chronc dsease, genetc factor, nutrton and envronmental factors such as alcohol ntake, UV exposure and smokng [9]. To nvestgate rsk factors (ndependent varables) for cataract n the dataset of our study, we conducted unvarate analyss and sgnfcant rsk factors from unvarate analyss are put n 28 ndependent varables. 7369

Journal of Theoretcal and Appled Informaton Technology 2005 ongong JATIT & LLS ISSN: 1992-8645 www.jatt.org E-ISSN: 1817-3195 Table 1: Measured components of hosptal screen center Item Measured Components Basc Informaton Body measurement Blood test Lver functon Eye Metabolsm and electrolyte Serum lpd Urnalyss gender, age heght, weght, BMI RBC, Hb, Hct, MCV, MCH, MCHC, WBC, lymphocyte, eosnophl, platelet, ABO, RH total proten, albumn, total blrubn, AST, ALT, Alkalne phosphatase, r-gtp eyesght test, ntraocular pressure, fundus examnaton sodum, kalum, chlornaton, calcum, phosphorus, blood glucose, HBA1C, blood urea ntrogen, creatnne total cholesterol, trglycerde, HDL, LDL urne proten, urne glucose, uroblnogen, urc acd, ketone, occult blood test, ntrte As a result, we found 13 attrbute rsk factors such as age, BMI, WBC, glucose (Glu), blood urea ntrogen (BUN), albumn (Alb), alkalne phosphatea se (Alk. Phos), Kalum (K), Calcum (Ca), Creatn ne, Cholesterol (Chol), Trglycerde (TG), HDL nfluencng cataracts. These attrbutes whch s statstcally sgnfcant by usng unvarate analyss and 707 cataract cases and 3,884 non-cataract cases were exsted n ths study. Varable selecton was conducted for multvarate analyss (logstc regresson) and applyng ME model by usng lkelhood score suggested by Furnval and Wlson [12]. As a result, 6 rsk factors such as glucose, blood urea ntrogen, albumn, alkalne phosphatase, kalum, and calcum were selected. 3. DESCRIPTION OF MIXUTRE OF EXPERT AND EM ALGORITHM In ths secton, we brefly descrbe the ME model and the EM algorthm for estmatng the parameters of ME model [13-14]. The ME model s composed of a gatng network and several expert network, and the archtecture of ths model s llustrated n Fgure 2. The gatng network g k s partton the nput space x nto regon s correspondng to the varous expert networks, and uses a softmax functon as shown n (1). sk ( x, 0 ) e gk (1) s ( x, 0 ) e where s T vx and v s weght vector. The gatng network provdes the coeffcents of lnear combnaton as truthful probabltes for expert networks. Also, the gatng network s generalzed lnear functon such as softmax functon or the multnomal logt of ntermedate varables v. All the expert networks are lnear wth 'generalzed lnear' whch has a sngle output nonlnearty (McCullagh and Nelder, 1983). Each expert network produces an output vector y from nput vector x, and produces Ek ( x ) as a generalze d lnear functon of the nput x for k-th expert network E ( x) f( Wx) (2) k where W s a weght matrx and f () s Gaussan or Bernoull dstrbuton n the case of regresson /classfcaton problems. The overall output y of the ME model s the followng n (3). k 1 y ge ( x) (3) In equaton (3), the values of g are explaned multnomal probabltes whch have the decson that termnates for ( x, y ) n a regresson problem. 7370

Journal of Theoretcal and Appled Informaton Technology 2005 ongong JATIT & LLS ISSN: 1992-8645 www.jatt.org E-ISSN: 1817-3195 Fgure 2: The archtecture of mxture of experts The result of y s selected from a Py ( xw, ), whch s probablty densty. As I mentoned above, W means the weght matrx or set of parameters of the -th expert network. The total probablty of ME model n generatng y from x s shown n (4). The M-step solves the followng maxmzaton problems: and T ( s1) ( t) t y W t1 W arg max h log P( y x, W) (6) k Py ( x, ) gpy ( xw, ) (4) 1 where s the set of both gatng and expert network parameters. Moreover, the probablstc component of denstes s generally known that s assumed to a Bernoull dstrbuton n the case of 2- class classfcaton, a multnomal dstrbuton n the case more than 3-class classfcaton, and a Gaussan dstrbuton n the case of regresson. Based on the total probablty equaton n (4), estmatng parameters n the ME model s consdered as a maxmum lkelhood problem. Jordan and Jacobs have proposed an EM algorthm for adjustng the parameters of the archtecture. The EM algorthm conssts of two steps, the E-step and M-step. For the s-th epoch, the probabltes P ( xt, y t) were nterpreted the posteror probabltes h by computed n the E-step as shown n (5) h k 1 gx (, v ) P y xw, N ( s) ( s) t t t gx (, v ) P y xw, s ( s) ( ) t t t (5) T N ( s1) ( t) k t k V t1 k1 V arg max h log g( x, v ) (7) where V s the set of all the parameters n the gatng network. Therefore, the EM algorthm s summarzed as followng [13-15]: EM algorthm s summarzed as followng [13-15]: 1. For each data par ( xt, y t), compute the posteror probabltes h usng the current values of the parameters. 2. For each expert network, solve a maxmzaton problem n Eq. (6) wth observatons ( x, ) T t yt t 1 T and observaton weghts h. t 1 3. For the gatng network, solve the maxmzaton problem n Eq. (7) wth observatons ( x, ) T t h. k t 1 4. Iterate by usng the updated parameter values. 4. RESULTS 7371

Journal of Theoretcal and Appled Informaton Technology 2005 ongong JATIT & LLS ISSN: 1992-8645 www.jatt.org E-ISSN: 1817-3195 Table 2: Unvarate Analyss of statstcally sgnfcant screenng varables for cataract/non-cataract. Non- Cataract Varable Cataract p-value (n=3,384) (n=707) Age 50.3±13.3 60.4±10.7 <0.001 BMI 23.9±3.8 24.1±3.0 0.047 WBC 6.4±2.5 6.6±1.9 <0.001 Glucose 98.0±29.4 114.3±46.3 <0.001 BUN 14.4±4.1 15.7±4.8 <0.001 Alb 4.8±0. 3 4.5±0.4 <0.001 Alk.Phos 72.7±27.8 79.6±32.2 <0.001 Kalum 4.2±0.4 4.3±0.4 <0.001 Calcum 9.5±0.6 9.4±0.5 <0.001 Creatnne 0.9±0.2 1.0±0.4 0.010 Chol 197.5±37.3 201.0±38.7 0.022 TG 145.1±99.5 157.7±111.7 0.005 HDL 51.6±12.8 50.0±12.0 0.003 As mentoned before, a total of 4,591 cases vsted a hosptal screenng center data for the dagnoss of cataract were collected between 1994 and 2005 and analyzed. All statstcal analyss and computaton were performed R 3.4.1 (avalable at http://www.rproject.org) and R-studo 1.1.447 (avalable at http://www.rstudo.com). In order to fnd rsk factors of cataract, we conducted unvarate analyss comparng the characterstcs of cataract and noncataract by usng two-sample t-test. Table 2 shows statstcally sgnfcant 13 rsk factors among 23 canddate rsk factors. The mean age of cataract group (60.4±10.7 years) was more hgher than noncataract group (50.3±13.3 years). In cases of BMI, WBC, Glucose, Alk. Phos, K, Ca, creatnne, chol, and TG, cataract group were also more hgher than non-cataract group. However, albumn and HDL were lower n cataract group than n non-cataract group. A number of rsk factors were selected by varable selecton 6 rsk factors such as Glucose, blood urea ntrogen, albumn, alkalne phosphatase, kalum, and calcum were fnally selected by usng multvarate analyss and varable selecton method. Table 3 shows the result of logstc regresson for 13 rsk factors whch s sgnfcant n unvarate analyss. The regresson coeffcent estmates for the sgnfcant 6 rsk factors were gven n Table 3. Among 6 rsk factors, the hghest assocaton factor of cataract was glucose (OR=1.34, p-value=<0.000 1), followed by BUN (OR=1.23, p-value=<0.0001), Kalum (OR=1.13, p-value=0.0041), Alk. Phos (OR=1.12, p-value=0.0077), Calcum (OR=0.89, p- value=0.0184) and Albumn (OR=0.85, p-value=0.0 003). Table 3: The result of logstc regresson for 13 rsk factors Varable Estmate Std. Error p-value Age 0.0075 0.0344 0.7677 BMI 0.0087 0.0408 0.8313 WBC 0.0279 0.0367 0.4481 Kalum 0.1265 0.0441 0.0041 Calcum -0.1218 0.0517 0.0184 Creatnne -0.0290 0.0440 0.5101 Glucose 0.2930 0.0356 <0.0001 BUN 0.2047 0.0445 <0.0001 Albumn -0.1627 0.0454 0.0003 Alk. Phos 0.1168 0.0438 0.0077 Chol 0.0112 0.0466 0.8106 HDL -0.0150 0.0503 0.7653 TG 0.0442 0.0461 0.3371 Classcal logstc regresson and ME model was descrbed, and present classfyng of the hdden subgroups. In ths analyss the ME model used for the dagnoss of cataract was k=2 expert networks. That s, the ME model formed two local experts and a gatng network. In order to focus on cataract dagnoss, total cases were used n ths analyss. The ME archtecture supposed for the dagnoss of cataract s shown n Fgure 3. Each expert network produces ts output O as a generalzed lnear functon of the nput x. The gatng network g( x, ) s also generalzed lnear. Fgure 3: Archtecture of the ME model for the dagnoss of cataract Table 4: Confuson Matrces of the Classfers. 7372

Journal of Theoretcal and Appled Informaton Technology 2005 ongong JATIT & LLS ISSN: 1992-8645 www.jatt.org E-ISSN: 1817-3195 Classfer Type Logstc ME Desred Result Output Result Non- Cataract Cataract 3844 682 cataract 40 25 Noncataract Noncataract 3687 118 cataract 197 589 The comparson results showed that the ME model (accuracy s 93.33%) s better n predcton than logstc regresson (accuracy s 84.27%). To estmatng the ME parameters, the ntal values of gatng network and expert networks were chosen randomly. The ME model and classcal logstc regresson model were descrbed and compared recever operatng characterstc (ROC) curves for valdaton of dagnoss predcton model. To comparng the performance of a classfcaton model, the confuson matrx was descrbed n Table 4. Accordng to the confuson matrx, 40 cases were ncorrectly by logstc regresson as cataract patent n non-cataract cases. However, 197 cases were ncorrectly by ME model as cataract patent n non-cataract cases and 589 cases were classfed correctly cataract patents. In order to determne the performance of the classfers, the classfcaton accuraces such as senstvty, specfcty, accuracy were gven n Table 5. Accuracy are defned as the proporton of the number of correct decsons n total number of cases. It calculates (TP+TN)/(TP+TN+FP+FN) where TP, FN, FP, and TN represent the number of true postves, false negatve, false postve and true negatve, respectvely. Senstvty means the proporton of postves that are correctly dentfed as such the percentage of cataract people who are correctly dentfed as havng the cataract, t calculates TP/(TP+FN). Specfcty measures the proporton of negatves that are correctly dentfed as such the percentage of non-cataract people who are correctly dentfed as not havng the cataract, t calculates TN/(TN+FP). Table 5: The values of statstcal parameters of classf caton accuracy Classfer Type Classfcaton accuraces(%) Senstvty Specfcty Accuracy Logstc 3.54 98.97 84.27 ME 84.58 94.92 93.33 Fgure 4: Posteror probablty of ME model Although the ME model was lower than logstc regresson n the value of specfcty, the other parameter values (senstvty, accuracy) was hgher than logstc regresson. Thus, the testng performance of the ME model was found to be more approprate than the logstc model. The purpose of the classfcaton s to allocate the nput data to one of several classes whch have the probablty of class membershp [4]. Estmated posteror probablty of ME model was llustrated n Fgure 4. The values of statstcal parameters of ME models for two components are gven n Table 6. Table 6: Parameter estmates of ME models for two components Varables Parameter Estmates Expert 1 Expert 2 Intercept -25.4534-1.2377 Glucose -0.0260 0.5448 BUN 0.0351 0.2590 Albumn -0.0019-0.2101 Alk. Phos 0.0020 0.2105 Kalum 0.0051 0.1437 Calcum 0.0106-0.1633 7373 Estmated gatng networks are expert 1 (82.90%) and expert 2 (17.10%). The ME classfed expert 1

Journal of Theoretcal and Appled Informaton Technology 2005 ongong JATIT & LLS ISSN: 1992-8645 www.jatt.org E-ISSN: 1817-3195 (lower rsk group) s lower parameter values then expert 2 (hgh rsk group). Compared wth standalone logstc regresson, the parameter estmates were hgher for expert 2 than for logstc regresson. In ths result, t s suggested that expert 2 s a hgh rsk group for detectng cataract. In detal, the parameter estmate of glucose was more hgher n expert 2(OR=1.72) than n logstc regresson (OR=1.34). The reman values of OR were 1.29 (BUN), 0.81 (Albumn), 1.23 (Alk. Phos), 1.15 (Kalum), and 0.84 (Calcum), whch s more hgher than standalone logstc regresson. Conversely, n the case of expert 1, the parameter estmates were much smaller than the standalone logstc regresson. In detal, For glucose, the OR was 1.34 (logstc) and 0.97 (expert 1), whch was dfferent trend wth expert 2. In the case of BUN, the OR was 1.23(logstc) and 1.03 (expert 1). The estmated OR of albumn was 0.85 (logstc) and 0.99 (expert 1). The OR of alk.phos was 1.12 (logstc) and 1.00 (expert 1), kalum was 1.13 (logstc) and 1.01 (expert 1), and calcum was 0.89 (logstc) and 1.01 (expert 1). These results show that odds rato of all rsk factors are close to 1. Ths suggest that the model of expert 1 have lttle effect on cataract. The performance of the ME model can be evaluated by plottng ROC curve for the test (Fgure 5). The defnton of ROC curve s a plot of the senstvty versus (1-specfcty) of a screenng test, where the dfferent ponts on the curve correspond to dfferent cut-off ponts used to desgnate testpostve [17]. Fgure 5: ROC curves of the classfers ROC plot n Fgure 5 shows that the performance of the ME model s hgher than that of the logstc regresson. One general method to quantfy the dagnostc accuracy of a laboratory test s to present ts performance by a sngle number. The most convenent measure s the area under the ROC plot (AUC). A test wth no better accuracy than chance has an AUC of 0.5, a test wth perfact accuracy has an AUC of 1. The estmated AUCs of each expert are hgher (AUC of expert 1=0.83, AUC of expert 2=0.81) than logstc regresson (AUC=0.76). Thus, the performance of ME showed n ths research was present to be hgher than that of the classcal logstc regresson. 5. CONCLUSIONS The purpose of medcal nformatcs felds has been known that an nductve knowledge fnds the decson characterstcs of the dseases and can then be used to dagnose future patents wth uncertan dsease states [18]. Also, the preventon and predcton of dsease n recent are regarded as mportant rather than treatment. Thus, the applcaton of the classfcaton model and the llustraton of the predcton model for clncal data are mportant ssues. Health checkups are amed at early detecton and changes of rsk factors for chronc dsease outbreak, and to check current health status. In generally, t has been known that the development of cataracts can be reduced through early dagnoss from screenng test. In ths research, cataract was consdered n terms of early dagnoss of dsease through the perspectve of statstcal data analyss. Cataract are complex dsease whch s known to be caused by rsk factors assocated wth physcal nformaton, lver functon, metabolsm, electrolyte, and serum lpd. Therefore, t s consdered to consst of several subgroups accordng to ther characterstcs. In the pont of vew, ths paper llustrated the feasblty of ME model to mprove classfcaton accuracy and present predcton model usng a hosptal screenng data. The performances of the two classfcaton methods (ME and classcal logstc regresson) were compared, and the result present that the ME model was hgher than the other method n aspect of dagnostc accuraces such as senstvty, specfcty, accuracy, and ROC curve. The AUC results also showed that the ME model was hgher than the standalone logstc model. However, accuracy and specfcty n ME model were approprate vald (accuracy=93.33%, specfc ty=94.92%), but senstvty was relatvely low (84.58%). Ths results were presumably due to the fact that only the clncal data obtaned from the health checkup data were used, not the data obtaned through the ophthalmologc examnaton. 7374

Journal of Theoretcal and Appled Informaton Technology 2005 ongong JATIT & LLS ISSN: 1992-8645 www.jatt.org E-ISSN: 1817-3195 Ths method wll be expected to assst physcans n the dagnoss of cataract predcton n terms of dagnostc decson support mechansm. Furthermore, each estmated expert can be regarded as latent subgroups whch s specfc characterzng subgroups such as low or hgh rsk group. Ths research does not consder data splttng for analyss because of focusng on cataract dagnoss. Future work wll examne the effcency of the ME model n splttng datasets and mult-class problems. Also, t wll plan to be appled the latent rsk group predcton by comparng the ophthalmolgc examnaton wth the health checkup data. DISCLOSURE STATEMENT The authors declare that there are no potental conflct of nterest. ACKNOWLEDGEMENT Ths research was supported by Basc Scence Research Program through the Natonal Research Foundaton of Korea (NRF) funded by the Mnstry of Educaton (2017R1D1A1B03029018). REFRENCES: [1] A. Belle, R. Thagarajan, S. Soroushmehr, F. Navd, D. Beard, and K. Najaran, Bg data analyss n healthcare, Bomed Research Internatonal, Vol. 2015, 2015, 370194. [2] A. Gandom, M. Hader, Beyond the hype: Bg data concepts, methods, and analytcs, Internatonal Journal of Informaton Management, Vol. 35, No. 2, 2015, pp. 137-144. [3] W. Raghupath, V. Raghupath, Bg data analytcs n healthcare: promse and potental, Health Informaton Scence and Systems, Vol. 2, No. 3, 2014. [4] E. Ubeyl, A mxture of experts network structure for breast cancer, Journal of Medcal Systems, Vol. 29, No. 5, 2005, pp. 569-579. [5] S. Myoung, Modfed mxtue of experts for the dagnoss of perfuson magnetc resonance magng measures n locally rectal cancer patents, Healthcare Informatcs Research., Vol. 19, No. 2, 2013. [6] R. Jabos, M. Jordan, S. Nowlan, and G. Hnton, Adaptve mxture of local experts, Neural Computaton, Vol. 3, No. 1, 1991, pp. 79-87. [7] S. Yuksel, J. Wlson, and P. Gader, Twenty years of mxture of experts, IEEE transactons of Neural Networks and Learnng Systems, Vol. 23, No. 8, 2012, pp. 1177-1193. [8] D. Pascoln, S. Marott, Global estmates of vsual mparment: 2010, Brtsh Journal of Ophthalmology, Vol. 96, No. 1, 2012. [9] A. Foster, Vson 2020: the cataract challenge, Communty Eye Health, Vol. 13, No. 34, 2000. [10] S. Park, J. Lee, S. Kang, J. Hyon, and K. Park A. Gandom, Cataract and cataract surgery: natonwde prevalence and clncal determnants, Journal of Korean Medcal Scence, Vol. 31, No. 6, 2016. [11] D. Grmes, K. Schulz, Uses and abuses of screenng tests, The Lancet, Vol. 359, No. 9309, 2002, pp. 881-884. [12] G. Furnval, R. Wlson, Regresson by leaps and bounds, Technometrcs, Vol. 16, 1974, pp. 499-511. [13] G. McLachlan, D. Peel, Edtor, Fnte Mxture Models, Wley, New York, [14] T. Haste, R. Tbshran, and J. Fredman, Edtor, The Elements of Statstcal Learnng, Sprnger, New York, [15] J. Kay, D. Ttterngton, Edtor, Statstcs and Neural Networks, Oxford, New York, [16] M. Jordan, R. Jacobs, Herarchcal mxture of experts and the EM algorthm, Neural Networks, Vol. 8, No. 9, 1994, pp. 1409-1431. [17] J. Hanley, B. McMel, The meanng and use of the area under a recever operatng characterstc (ROC) curve, Dagnostc Radology, Vol. 143, 1982, pp. 29-36. [18] E. Ubeyl, Comparsons of dfferent classfcaton algorthms n clncal decsonmakng, Expert System, Vol. 24, 2007, pp. 117-131. 7375