A combined neural network and decision trees model for prognosis of breast cancer relapse

Size: px
Start display at page:

Download "A combined neural network and decision trees model for prognosis of breast cancer relapse"

Transcription

1 Artificial Intelligence in Medicine 27 (2003) A combined neural network and decision trees model for prognosis of breast cancer relapse José M. Jerez-Aragonés a,*, José A. Gómez-Ruiz a, Gonzalo Ramos-Jiménez a, José Muñoz-Pérez a, Emilio Alba-Conejo b a Departamento de Lenguajes y Ciencias de la Computación, Complejo Tecnológico de la Información, Campus de Teatinos, University of Malaga, Malaga, Spain b Servicio de Oncología, Hospital Clínico Universitario, Malaga, Spain Received 10 January 2002; received in revised form 16 July 2002; accepted 27 September 2002 Abstract The prediction of clinical outcome of patients after breast cancer surgery plays an important role in medical tasks such as diagnosis and treatment planning. Different prognostic factors for breast cancer outcome appear to be significant predictors for overall survival, but probably form part of a bigger picture comprising many factors. Survival estimations are currently performed by clinicians using the statistical techniques of survival analysis. In this sense, artificial neural networks are shown to be a powerful tool for analysing datasets where there are complicated non-linear interactions between the input data and the information to be predicted. This paper presents a decision support tool for the prognosis of breast cancer relapse that combines a novel algorithm TDIDT (control of induction by sample division method, CIDIM), to select the most relevant prognostic factors for the accurate prognosis of breast cancer, with a system composed of different neural networks topologies that takes as input the selected variables in order for it to reach good correct classification probability. In addition, a new method for the estimate of Bayes optimal error using the neural network paradigm is proposed. Clinical pathological data were obtained from the Medical Oncology Service of the Hospital Clínico Universitario of Málaga, Spain. The results show that the proposed system is an useful tool to be used by clinicians to search through large datasets seeking subtle patterns in prognostic factors, and that may further assist the selection of appropriate adjuvant treatments for the individual patient. # 2002 Elsevier Science B.V. All rights reserved. Keywords: Back-propagation algorithm; Bayes error; Survival analysis; Breast cancer; Decision trees; Inductive learning * Corresponding author. Tel.: þ ; fax: þ address: jja@lcc.uma.es (J.M. Jerez-Aragonés) /02/$ see front matter # 2002 Elsevier Science B.V. All rights reserved. PII: S (02)

2 46 J.M. Jerez-Aragonés et al. / Artificial Intelligence in Medicine 27 (2003) Introduction Prediction tasks are among the most interesting activities in which to implement intelligent systems. Specifically, prediction is an attempt to accurately forecast the outcome of a specific situation, using as input information obtained from a concrete set of variables that potentially describe the situation. A problem often faced in clinical medicine is how to reach a conclusion about the prognosis of cancer patients when presented with complex clinical and prognostic information, since specialists usually make decisions based on a simple dichotomization of variables into a favourable and unfavourable classification [18]. As we enter the new millennium, treatment modalities exist for many solid tumour types and their use is well established. Nevertheless, offset against this is the toxicity of some treatments. As there is a real risk of mortality associated with treatment, it is vital to have the possibility of offering different therapies depending on the patients. In this sense, the likelihood that the patient will suffer a recurrence of her disease is very important, so that the risks and expected benefits of specific therapies can be compared. This work analyses, on the one hand, the decision-making process existing when patients with primary breast cancer should receive a certain therapy to remove the primary tumour. On the other hand, different prognostic factors appear to be significant predictors for overall survival, but probably form part of a bigger picture comprising many, inter-related factors [11]. In order to investigate this hypothesis, studies looking at a large number of potential prognostic factors are needed. To further complicate matters, these relationships may well be non-linear in nature. These form the major difficulties in such studies. Furthermore, the statistical analysis of large datasets using standard methodologies is cumbersome and limited, especially in the case of non-linear relationships. Among prognostic modelling techniques that induce models from medical data, survival analysis methods are specific both in terms of modelling and the type of data required. Survival models attempt to determine the probability of the event occurring within a specific time, which requires classification models that classify either the occurrence or non-occurrence of the event and optionally model the outcome probabilities. Several tools successfully used in the construction of medical prognosis models have been proposed by the machine learning community [17,34]. Neural networks are a form of artificial intelligence that have found application in a wide range of problems [10,20,24] and have given, in many cases, superior results to standard statistical models [33]. Baxt [4] demonstrated the predictive reliability of an artificial neural networks model in medical diagnosis. In this case, we utilise the ability of neural networks to recognise complex and highly non-linear relationships, such as are likely to characterise medical circumstances. Some authors [14,30] have modelled systems for outcome prediction in post-surgery breast and lung carcinoma patients using neural networks to perform survival analysis. This type of modelling manages the problem of censored data handling that arises when the event related to the censor variable normally included in the survival data (like death or recurrence of a disease) has not occurred during the follow-up period for a patient, although the event may eventually occur. These authors have solved the problem by using different survival estimators to handle censored data for patients. This would imply that

3 J.M. Jerez-Aragonés et al. / Artificial Intelligence in Medicine 27 (2003) prognostic factors for example, in breast cancer with adjuvant therapy after surgery are not time-dependent, but this is not really true. That is, the strength of the prognostic factor is not the same for different time intervals. Different techniques for survival estimation, such as Kaplan Meier analysis [15] and Cox Regression modelling [6] assume that the strength of a prognostic factor does not change over time. In addition, the existence of a peak of recurrence in the distribution of relapse probability [2] demonstrates that the recurrence probability is not the same over time. In this sense, if these statistical techniques are not appropriate to solve this problem, a possible solution would be to incorporate the whole set of prognostic factors pre-selected by medical experts (Section 3.1) as input to the neural networks system. This would involve removing all the patients with censor data; however, the cardinality of the resulting patient data vectors set would then become too small to constitute a significant representation of this problem. This work proposes a new system approach based on: (1) specific topologies of neural networks for different time intervals during the follow-up time of the patients, considering the events occurring in different intervals as different problems; and (2) decision trees, useful in understanding the underlying relationships in breast cancer data, for selecting the most important prognostic factors corresponding to every time interval. This is not the first attempt to combine decision trees and neural networks [1,7], but it does present different ways of integrating them. In addition, we introduce a new decision trees algorithm, control of induction by sample division method (CIDIM), for reducing the number of rules and improving the selection of attributes from the database to become significant prognostic factors. Furthermore, a new upper-bound estimate of the problem-difficulty level, based on the correct classification Bayes probability, is also proposed. 2. Breast cancer overview Breast cancer is a malignant tumour that has developed from cells of the breast. Although scientists know some of the risk factors (i.e. ageing, genetic risk factors, family history, menstrual periods, not having children, obesity) that increase a woman s chance of developing breast cancer, they do not yet know what causes most breast cancers or exactly how some of these risk factors cause cells to become cancerous. Research is under way to learn more and scientists are making great progress in understanding how certain changes in DNA can cause normal breast cells to become cancerous. Breast cancer is the most common cancer among women, excluding nonmelanoma skin cancers. The American Cancer Society estimated that in 2001 about 192,200 new cases of invasive breast cancer (Stages I IV) were diagnosed among women in the US. Ductal carcinoma in situ (DCIS) accounts for about 39,900 new cases each year. Breast cancer also occurs in men. In 2001, there were about 40,600 deaths from breast cancer in the US (40,200 among women, and 400 among men). Breast cancer is the second leading cause of cancer death in women, exceeded only by lung cancer, although death rates declined significantly during These decreases are probably the result of earlier detection and improved treatment. Breast cancer has a very high cure rate, with 97% of women surviving for 5 years if the cancer is diagnosed early.

4 48 J.M. Jerez-Aragonés et al. / Artificial Intelligence in Medicine 27 (2003) Staging is the process of gathering information about the tumour from certain examinations and diagnostic tests to determine how widespread the cancer is. The stage of a cancer is one of the most important factors in selecting treatment options. The TNM system is a standardised way in which the cancer care team describes the extent to which the cancer spread, where the letter T followed by a number from 0 to 4 describes the tumour s size and spread to the skin or chest wall under the breast, the letter N followed by a number from 0 to 3 indicates whether the cancer has spread to lymph nodes near the breast, and the letter M followed by a 0 or 1 indicates whether or not the cancer has spread to distant organs. Once a patient s T, N, and M categories have been determined, this information is combined in a process called stage grouping to determine a woman s disease stage. This is expressed in Roman numerals from Stage 0 (the least advanced stage) to Stage IV (the most advanced stage). 3. Methods 3.1. Patient data Data from 1035 patients with breast cancer disease from the Medical Oncology Service of the Hospital Clínico Universitario of Málaga, Spain were collected and recorded during the period Data corresponding to every patient were structured in 85 fields containing information about post-surgical measurements, personal data, and type of treatment. Part of this information regarding patients is not relevant for predicting outcome, so that only 14 independent input variables pre-selected from all these data fields, and targeted by medical experts as probably being risk factors for breast cancer prognosis were incorporated in the model, becoming inputs to the CIDIM algorithm. Fig. 1 shows the data pre-processing stages from the original database in the system construction phase. All variables and their units or modes of representation, mean, standard deviation, and median are shown in Table 1 where survival status appears as a supervisory variable to be predicted by the prognosis system. Table 2 shows the underlying medical and statistical meaning of risk factors proposed as important prognostic variables Censoring data handling One of the most common problems in survival analysis is the lack of information in the form of missing data values. To properly address censoring data in the modelling process, patients for whom the event did not occur require special treatment. Different methods have been proposed to solve this problem (see review in [23]). The simplest solution is to remove those patient cases with missing values, which would involve the rejection of a large number of them. Another approach is to reject those prognostic factors for which there is no data; however, this approach is very difficult to control, since it could lead the system to make weak predictions if a great number of significant prognostic factors are eliminated. Other authors propose a technique that assigns a distribution of outcomes instead of a single outcome. The distribution would be assessed through the outcome probability

5 J.M. Jerez-Aragonés et al. / Artificial Intelligence in Medicine 27 (2003) Fig. 1. The prognosis system based on neural networks and decision trees.

6 50 J.M. Jerez-Aragonés et al. / Artificial Intelligence in Medicine 27 (2003) Table 1 Summary of patient data: range, mean, S.D. and median Prognostic variables (mnemonic) Range Mean S.D. Median Age (Ag) Menarchy age (Ma) Menopause age (Mg) First pregnancy age (Fp) No. of miscarriages (Mn) No. of axillary lymph nodes (An) Grade (Gr) 1, 2, NA Tumour size (Ts) No. of pregnancies (Pn) Estrogen receptors (Er) 1, NA Progesteron receptors (Pr) 1, NA P53 1, NA Ploidy (Pl) 1, 2, NA S-Phase (Ps) Supervisory variable Survival status 0 (non-relapse) 1 (relapse) NA estimate based on the Kaplan Meier method using weighted examples to implement the schema [31,34], but, as mentioned before, prognostic factors in breast cancer with adjuvant therapy after surgery are time dependent. That is, the strength of the prognostic factor is not the same for the first 10 months than, for example, the months interval. Techniques for survival estimation, such as Kaplan Meier analysis [15] and Cox Regression modelling [6] assume that the strength of a prognostic factor does not change over time, although this is not the case in the real world. On the other hand, the recurrence probability is not the same over time, since the existence of a peak of recurrence in the distribution of relapse probability has been demonstrated empirically [2]. Some authors [16,31] mention that trivial solutions to the problem, such as removing the censor data from the dataset or considering them as examples where the event will not occur, would bias the modelling. However, it has been demonstrated that good results were achieved in [22] by using only complete data cases with no missing data values. In this work, we reject patient cases containing missing data values for each interval of follow-up time through a classification rule analysed below. Data subsets corresponding to each time interval were selected from the original 1035 patients from the Oncology Service database and classified into relapse and nonrelapse classes for each time interval I i. This classification process was performed according to the status survival and time interval variables from each patient data. Let C ij be the class j of the interval i, where j ¼ 1 identifies the class relapse and j ¼ 2 the class non-relapse. Then, for the interval I i, the patients selected for classes C i1 and C i2 are chosen according to the following rules: (a) C i1 : patients with time interval ¼ i and survival status ¼ relapse. (b) C i2 : patients with time interval ¼ j (j < i) and survival status ¼ relapse, and all the patients with time interval ¼ k (k > i).

7 J.M. Jerez-Aragonés et al. / Artificial Intelligence in Medicine 27 (2003) Table 2 Medical meaning of risk prognostic factors proposed as important prognostic variables Prognostic variables Age Menarchy age and menopause age First pregnancy age No. of miscarriages No. of axillary nodes Grade Tumour size No. of pregnancies Estrogen and progesteron receptors p53 Ploidy S-phase Description A woman s risk of developing breast cancer increases with age. About 77% of women with breast cancer are over age 50 at the time of diagnosis. Women younger than 30 years account for only 0.3% of breast cancer cases. Women in their thirties account for about 3.5% of cases. Women who started menstruating at an early age (before age 12) or who went through menopause at a late age (after age 50) have a slightly higher risk of breast cancer. Women who delay their first pregnancy into their thirties have almost a doubled risk of breast cancer compared to those who have babies in their late teens or early 1920s. Miscarriages (spontaneous abortions) do not seem to increase the risk of breast cancer, and many of the studies concerning induced or spontaneous pregnancy losses and breast cancer are controversial. When breast cancer cells reach the axillary lymph nodes, they can continue to grow, often causing swelling of the lymph nodes in the underarm area. If breast cancer cells have grown in the axillary lymph nodes, they are more likely to have spread to other organs of the body as well. This is why finding out whether breast cancer has spread to axillary lymph nodes is important in selecting the best mode of treatment and predicting the patient outcome. Histologic tumour grade is based on the arrangement of the cells in relation to each other, as well as features of individual cells. The grade helps predict the patient s prognosis because cancers that closely resemble normal breast tissue tend to grow and spread more slowly. In general, a lower grade number indicates a slower-growing cancer while a higher number indicates a faster-growing cancer. Tumour size is one of the most important prognostic variables and is related to the breast cancer stage. Stage I: the tumour is 2.0 cm or less; Stage II: the tumour size is between 2.0 and 5 cm. Stage III: the tumour is larger than 5 cm. Women who have had no children or who had their first child after age 30 have a slightly higher breast cancer risk. Receptors are molecules that are a part of cells. They recognise certain substances such as hormones that circulate in the blood. Normal breast cells and some breast cancer cells have receptors that recognise estrogen and progesterone. Breast cancers that contain estrogen and progesterone receptors tend to have a better prognosis than cancers without these receptors. Tests to identify other acquired changes in oncogenes or tumour suppressor genes (such as p53) may help doctors more accurately predict the prognosis of some women with breast cancer. The ploidy of cancer cells refers to the amount of DNA they contain. If there s a normal amount of DNA, the cells are said to be diploid. If the amount is abnormal, then the cells are described as aneuploid. Some studies have found that aneuploid breast cancers tend to be more aggressive. The S-phase test counts the percentage of tumour cells that are making copies of their DNA, and thus provides an estimate of the speed of tumour growth. A high S- phase level would indicate that the tumour is aggressive. Tumours that have normal DNA ploidy levels and are slow growing (low S-phase) indicate a better patient prognosis than a tumour with abnormal DNA Ploidy results and a high S-phase.

8 52 J.M. Jerez-Aragonés et al. / Artificial Intelligence in Medicine 27 (2003) Selection of prognostic factors In order to select the most important prognostic factors (from those pre-selected by medical experts as being significant risk factors) for predicting overall survival, several methods have been studied. First, consulting clinicians about their importance is the simplest way, but this would introduce a significant bias in the selected attributes set. Another approach trains neural networks with different sets of input data in order to select the most significant attributes, but the implementation of this method has a high computational cost. In this sense, symbolic induction techniques can help us to understand the underlying relationships in breast cancer data with low computational cost. Decision trees appear to be appropriate methods for these types of problems, because if some parameters can be shown not to be significant in the decision process, then their rejection can be recommended, which would simplify the whole system. In our work, which involves the use of neural networks, the rejection of input parameters diminishes the size of the final networks architecture. Different algorithms, such as ID3 [25 27] and C5 (updated version of C4.5 [28]), were tested in our research, but too many attributes were obtained as significant prognostic factors, which would excessively complicate the architecture of the final neural network system. An appropriate prognostic factors selection method is thus necessary. Therefore, a new method called control of induction by sample division method [29,32] has been developed to perform adaptive pruning with predictive control, significantly reducing the number of rules and improving the selection of the attributes that would better explain the patient dataset. By using CIDIM, trees smaller than those obtained with other algorithms are generated. This allows the selection of the most important attributes as the neural networks system input. The main features of the CIDIM algorithm are as follows: (1) The top down induction decision tree (TDIDT) algorithms [5,19], generally, split the experiences set into two sets: the training set and test set (two-third and one-third of the dataset, respectively). The CIDIM algorithm divides the training set into two subsets of identical size: the construction subset (called CNS) and the control subset (called CLS). For every new node of the tree, its expansion is decided by using the CNS and CLS subsets, based on the predictive capacity of the expansion in regard to the CLS set. The final tree is not the best classifier, but it has far fewer rules and generally is as good a predictor as the tree obtained with classical TDIDT algorithms (sometimes better). (2) An internal bound condition is defined. Usually, the expansion of the tree finishes when all experiences associated with a node belong to the same class, yielding too large trees. In order to avoid this overfitting, external conditions are considered by different algorithms (C5 demands that at least two branches have at least two experiences). The CIDIM algorithm uses the following as an internal condition: if the prediction is not improved then the node is not expanded, making the expansion process dependent on CNS and CLS subsets. Tree expansion supervision is driven by two indexes: the absolute index I A and the relative index I R (see expressions (1) and (2)). For every algorithmic step, a node is

9 J.M. Jerez-Aragonés et al. / Artificial Intelligence in Medicine 27 (2003) expanded only if these indexes are increased. The absolute and relative indexed are defined as P N i¼1 I A ¼ CORRECTðe iþ (1) N P N i¼1 I R ¼ P Cðe i Þðe i Þ (2) N where N is the number of experiences, e a single experience, C(e) the class of the e experience, P m (e) the probability of m class for the e experience, and CORRECTðeÞ ¼1if P CðeÞ ¼ maxfp 1 ðeþ; P 2 ðeþ;...; P k ðeþg or 0 if another case. The CIDIM algorithm is presented next. 1. The CNS and CLS subsets are obtained by a random division of the experiences set used to construct the tree. 2. For each non-leaf node do: 2.1. Splitting (as standard TDIDT) by a disorder measure (for example the entropy measure) If splitting does not improve the prediction (according to I A and I R ), then the node is a leaf node (even when all the experiences do not belong to the same class) If splitting improves the prediction then the node is expanded. Next, to show the goodness of the CIDIM algorithm, we present some experimental results from a previous work [29]. In order to compare the CIDIM method with ID3 and C5 algorithms, three standard experiences sets have been used. These sets are ionosphere, pima-diabetes and wdbc, which can be obtained from MLRepository [12]. Table 3 presents a brief resume of their characteristics. Each numerical attribute has been divided into several intervals of similar size according to the range of values. Experiences with unknown values have been omitted. We used tenfold cross-validation in order to avoid bias in the results. The pruning CF parameter was set to four different values for the C5 algorithm. The success index (SI) and the number of rules averages for every dataset are shown in Table 4. This table shows how CIDIM always generates fewer rules than the other learning algorithms under comparison, with a similar success index [29]. Table 3 Characteristics of standard sets Name Ionosphere Pima-diabetes WDBC Cardinal Attributes Types Symbolic Numerical Numerical Classes Subject Ionosphere Diabetes Cancer

10 54 J.M. Jerez-Aragonés et al. / Artificial Intelligence in Medicine 27 (2003) Table 4 Comparative results of CIDIM and other common TDIDT algorithms Ionosphere Pima-diabetes WDBC SI No. of rules averages SI No. of rules averages SI No. of rules averages ID C5 0% C5 10% C5 20% C5 30% CIDIM On the other hand, the standard multivariate analysis methods, in spite of known disadvantages, are still in use for variable selection problem. Some of them are standard stepwise regression procedures (forward selection, backward elimination, MINR, MAXR forward selection). In Section 4, the MAXR variable selection procedure is used for selecting the most important prognostic factors for predicting the overall survival and the results are compared with those found by the CIDIM algorithm proposed in this section Approximating Bayes decision rule The problem presented is: given a patient, will she suffer a post-surgical relapse at any period during her follow-up time? We need a decision rule to solve it. That is, classifying an observation x as belonging to one of two populations is desired, such that if x belongs to the ith population, x occurs according to the density function p(x/c i ). When maximising the correct classification probability is desired, the minimum probability of error decision rule (Bayes rule [8]) isdefined by the function 8 f i ðxþ ¼ 1 if p C i p C < k 8k x x : 0 if otherwise where p(c i /x) is the a posteriori density function and f i is the probability of classifying the pattern x in class C i. Next, we determine the correct classification probability when the Bayes rule is used for this problem. Let p(c i ) be the a priori probability of class C i where i ¼ 1 identifies the class relapse and i ¼ 2 the class non-relapse and let p ii be the conditional probability of correctly classifying a pattern of C i in C i. Therefore, the correct classification probability is given by Z Z x p ¼ pðc 1 Þp 11 þ pðc 2 Þp 22 ¼ pðc 1 Þ f 1 ðxþp dx þ pðc 2 Þ R N C 1 Z Z x x ¼ pðc 1 Þp dx þ pðc 2 Þp dx A¼fx:pðC 1 =xþpðc 2 =xþg C 1 A C 2 Z Z x x x ¼ pðc 1 Þp dx þ pðc 2 Þp pðc 1 Þp R N C 1 C 2 C 1 A x f 2 ðxþp R N C 2 dx dx (3)

11 J.M. Jerez-Aragonés et al. / Artificial Intelligence in Medicine 27 (2003) Z ¼ pðc 1 Þþ p C 2 p C Z 1 pðxþ dx ¼ pðc 1 Þþ 1 2p C 1 pðxþ dx A x x A x Z ¼ pðc 1 Þþ 1 2p C 1 A x pðxþ dx since p C 1 1 x 2 ; 8x 2 A In an analogous way we have Z p ¼ pðc 2 Þþ 2p C Z 1 1 pðxþ dx ¼ pðc 2 Þþ 2p C 1 1 A x A x pðxþ dx (4) From expressions (3) and (4) we obtain p maxfpðc 1 Þ; pðc 2 Þg and p ¼ 1 Z 2 þ p C 1 1 R N x 2pðxÞ dx (5) Note that expression (3) provides an explicit expression for the correct classification probability in terms of the probability of assigning a pattern x to category C 1. The a posteriori density function p(c i /x) is unknown, so we have to estimate it to obtain an approximate correct classification probability. Funahashi [9] proves theoretically that three-layer neural networks with at least 2n hidden units have the capability of approximating the a posteriori probability in the two-category classification problem with arbitrary accuracy, and that it tends to the a posteriori probability as back-propagation learning proceeds ideally. Thus, we have Fðx; t; wþ ffip C 1 (6) x where Fðx; t; wþ is the network output for an input pattern x and t and w are the synaptic weight matrices. Hence, the approximate Bayes decision rule is given by the expression ( ~fðxþ ¼ 1 iffðx; t; wþ iffðx; t; wþ < 1 (7) 2 which gives the probability of classifying the pattern x in class C 1.If fðxþ ~ ¼1, pattern x is classified in class C 1 and if fðxþ ~ ¼0, it is classified in C 2. From expressions (5) and (6), we obtain an estimate ^p of the correct classification probability given by the expression ^p ¼ 1 2 þ 1 X n Fðx i ; t; wþ 1 n 2 (8) i¼1

12 56 J.M. Jerez-Aragonés et al. / Artificial Intelligence in Medicine 27 (2003) where n is the number of patients. Note that p is an upper bound for the probability of making a correct classification with any given decision rule, since p has been determined by Bayes rule. Thus, we estimate p using the neural network paradigm, that is, by outputs of a multi-layer neural network Fðx; t; wþ. Note that the variance of the estimate ^p is varð^pþ ð1=nþ, and so we have an accurate estimate. Note that ^p can be used to check the degree of difficulty in a classification problem. The probabilities p 11 and p 22 can also be estimated with the multi-layer neural network as ^p 11 ¼ 1 X Fðx; t; wþ m ^p 22 ¼ 1 n m fx2c 1 :Fðx;t;wÞ1=2g X fx2c 2 :Fðx;t;wÞ1=2g ð1 Fðx; t; wþþ where m is the number of patients that suffer a relapse The prognosis system Taking into account: (1) the importance of the prognostic factors strength evolution over time, and the existence of a peak recurrence in the relapse distribution; (2) the CIDIM algorithm analysed in Section 2.3 to select the most significant prognostic factors; and (3) the justification of the proposed decision rule in expression (7), then a solution scheme is proposed based on specific topologies of neural networks combined with decision trees for different time intervals during the follow-up time of the patients, and a threshold unit to implement the decision-making process (Fig. 1) Decision trees The decision trees unit leads to the selection of the most significant prognostic factors from the patients database for every time interval. These subsets of prognostic factors constitute the kernel of the prognostic factors selector (PFS in Fig. 1). Given a new patient for whom predictions have to be made, and the corresponding time interval under study, the PFS extracts the appropriate input subset of prognostic factors to the neural networks system to obtain good prediction accuracy of the correct classification probability of patient relapse after breast cancer surgery The neural networks system The neural networks system computes an attributes set from the prognostic factors selector giving a value corresponding to the a posteriori probability of relapse for the patient under study. The main common characteristics of the networks employed are shown in Table 5. Input layers, corresponding to every neural network selected for the different time intervals under study, have as many elements as the number of selected attributes as appear in Table 6, column #2. The middle or hidden layers have 14, 19, 14, 15, 17, 13, and 10 elements, respectively, with logistic transfer functions. These numbers of elements were determined using a cascade learning constructive process, adding neurons to the hidden layer one at time until there is no further improvement in network performance. The output

13 J.M. Jerez-Aragonés et al. / Artificial Intelligence in Medicine 27 (2003) Table 5 Common characteristics of the ANN models used Network topology Learning algorithm Learning rule Input data Output data Multilayer perceptron full connectivity Levenberg Marquardt Generalised delta rule Attributes thought to be risk factors Relapse probability layers have one logistic element corresponding to the single dependent variable. The output elements predict the relapse probability by means of its numerical output (ranging from 0 to 1). Connection weights are changed using a Levenberg Marquardt errors back-propagation algorithm [21] and the learning constant was set to Weights initialisation is crucial in the learning process with artificial neural networks. In order to obtain a realistic estimate of the correct classification probability, 30 weights initialisations were carried out and the average and standard deviation of the runs are presented. Giving the information to the neural network input layer requires an information preprocessing process. First, it is important to normalise all the prognostic factors ranges to lie within the central range of the hidden layer transfer function in the neural network ( 1.0 and 1.0 for the hyperbolic tangent transfer function), and second, to study the range and distribution of each prognostic variable to remove all the missed values, and to lessen the impact of outliers at the extremes of the distribution. A crucial aspect of carrying out learning and prediction analysis with a neural network system is to split the database into two independent sets: the training set (80% of the dataset), which is used to train the neural network, and the test set (20% of the dataset) to validate its predictive performance. During training the data vectors of the training set are repetitively presented to the network which attempts to generate a 1 at the output unit when the survival status of the patient is relapse, and a 0 when the status is non-relapse. The networks were trained and the mean square errors between the survival status variables (supervisory variable) and the dependent output variables decreased with an Table 6 Results of the prognosis system based on neural networks and decision trees Time interval Selected attributes No. of patients PCP BCP NNCP I 1 (0 10) Ag, Ma, Fp, An, Ts, Pn, P (0.01) (0.01) I 2 (10 20) Ag, Ma, Mg, An, Ts, Gr, Er, P (0.02) (0.02) I 3 (20 30) Ag, An, Ts, Gr, Er, P (0.09) (0.10) I 4 (30 40) Ag, An, Ts, Er, Pr, Ps, P (0.05) (0.04) I 5 (40 50) Ag, An, Ts, Er, Pr, Ps, P (0.01) (0.01) I 6 (50 60) Ag, An, Ts, Er, Gr (0.06) (0.07) I 7 (>60) Ag, An, Ts, Gr (0.02) (0.02) Number of patients, selected attributes (prognostic factors) and PCP, averages (and standard deviations) of BCP and NNCP probabilities obtained in all patients follow-up time intervals (in months).

14 58 J.M. Jerez-Aragonés et al. / Artificial Intelligence in Medicine 27 (2003) increasing number of epochs during training: first, it decreases rapidly and then continues to decrease slowly as the network makes its way to local minimum. With good generalisation as the goal, the network ended up overfitting the training data since the training session was not stopped at the correct point. The procedure used to avoid overfitting was the early stopping method of training [3], which leads to identifying the onset of overfitting through the use of the hold-out method, for which the training set is split into an estimation subset (80% of the training set), and a validation subset (20% of the training set). The estimation subset of examples was used to train each network of the system until 2000 epochs (the total number of training iterations depended on the number of patients selected for each time interval), but the training sessions were stopped periodically, weights matrices were saved to files, and the networks were tested on the validation subsets after each training period. The early stopping points were found by plotting together the estimation learning curve, which decreased monolithically, and the validation learning curve, which decreased monolithically to a minimum, then started to increase as the training continued. This minimum was achieved after different epochs for each time interval under study (285, 224, 192, 179, 168, 130, 163). The optimally trained neural networks were tested for their ability to predict breast cancer relapse in the test set. To evaluate the proposed model, a standard technique of stratified tenfold crossvalidation was used [13]. This technique divides the patient dataset into 10 sets of approximately equal size and equal distributions of recurrent and non-recurrent patients. Each of the 10 random subsets of the data serves as a test set for the prognostic model trained with the remaining 9 partitions. The overall prediction accuracy for the system is then assessed as an average of 10 experiments Threshold unit The threshold unit outputs a class for survival status according to the proposed decision rule in expression (7). To obtain an appropriate classification accuracy, which is expressed as the percentage of patients in the test set that were classified correctly, a cut-off prediction between 0 and 1 had to be chosen before any output of the network (ranging from 0 to 1) could be interpreted as a prediction of breast cancer relapse ROC analysis and Cox regression For medical applications, classification accuracy is not necessarily the best quality measure of a classifier. Thus, two other measures are more frequently used: sensitivity and specificity. Sensitivity measures the fraction of positives cases that are classified as positive. Specificity measures the fraction of negative cases classified as negative. For many medical problems, high classification accuracy is less important than the high sensitivity and/or specificity of a classifier system. A receiver operating characteristic curve (ROC) indicates a trade-off that one can achieve between the false alarm rate (1: specificity, plotted on the X-axis) that needs to be minimised, and the detection rate (sensitivity, plotted on the Y-axis) that needs to be maximised. Although we mentioned in Section 1 that the application of traditional statistical techniques for survival analysis is not suitable for this problem, we think that a comparison

15 J.M. Jerez-Aragonés et al. / Artificial Intelligence in Medicine 27 (2003) of the proposed model against the Cox statistical technique, actually used by medical experts, seems to be appropriate in order to justify and demonstrate the usefulness and power of the proposed combined model. Cox regression modelling was performed using SPSS statistical software. 4. Results and discussion Table 6 shows the number of patients and the selection of prognostic factors corresponding to every time interval (in months) of patients follow-up that were selected for training the neural networks system. After processing the patient database through the decision trees system (CIDIM algorithm), certain attributes appear to be the most significant prognostic factors (second column in Table 6) becoming the input to the artificial neural networks system. The decision trees system makes the attributes selection process objective in comparison with the subjective process carried out by experts on clinical data suspected of being risk factors for breast cancer prognosis. The results of the application of the MAX Forward Selection Procedure to the selection of the relevant prognostic factors in comparison with those found by the proposed CIDIM algorithm are presented in Table 7. This table shows how the CIDIM algorithm chooses, for each time interval, a greater number and with more variability of attributes thought to be significant prognostic factors. This means that CIDIM performs a fine fit of the most important prognostic factors in the selection process. Table 6 also shows a comparison among the a priori classification probability (PCP) of the dataset, the estimate of the correct classification probability of Bayes (BCP), and the classification probability obtained by the application of the decision rule proposed in (4) (NNCP). Because no single network output between 0 and 1 served as a perfect cut-off prediction for breast cancer relapse, the accuracy result for NNCP have been complemented with a ROC analysis. No theoretical work defines how the appropriate cut-off prediction for network processing of a test file should be determined. Thus, 10 equally spaced cut-off predictions were examined in the range The true and false positives and negatives, the sensitivity, specificity, and positive and negative predictive values were Table 7 Comparison of MAXR procedure against CIDIM algorithm for selecting the most significant prognostic factors for each time interval Time interval MAXR forward selection procedure CIDIM algorithm I 1 (0 10) Ag, An, Ts Ag, Ma, Fp, An, Ts, Pn, P53 I 2 (10 20) Ag, An, Ts, Gr Ag, Ma, Mg, An, Ts, Gr, Er, P53 I 3 (20 30) Ag, An, Ts, Gr Ag, An, Ts, Gr, Er, P53 I 4 (30 40) Ag, An, Ts Ag, An, Ts, Er, Pr, Ps, P53 I 5 (40 50) Ag, An, Ts Ag, An, Ts, Er, Pr, Ps, P53 I 6 (50 60) Ag, An, Ts, Gr Ag, An, Ts, Er, Gr I 7 (>60) Ag, An, Ts Ag, An, Ts, Gr

16 60 J.M. Jerez-Aragonés et al. / Artificial Intelligence in Medicine 27 (2003) Table 8 Test results by ROC analysis for the prognosis model based on neural networks and decision trees Time interval False negative False positive Positive predictive value Negative predictive value Sensitivity Specificity I 1 (0 10) I 2 (10 20) I 3 (20 30) I 4 (30 40) I 5 (40 50) I 6 (50 60) I 7 (>60) calculated for each cut-off prediction and the point on the ROC curve that minimises the overall error was identified for each time interval of patients follow-up (Table 8). The fractional results of the Table 8 are consequence of the 10 repetitions used for each crossvalidation partition. To have a better reference of the proposed system fitness, PCP, BCP, and NNCP indexes have been simultaneously plotted in Fig. 2, the analysis of which yields some important results: first, the proposed system (NNCP) always improves the a priori probability (PCP). Here, it is important to point out the difficulty of this, given such high values of PCP for each time interval. Besides this, this improvement is greater in the most critical interval during the follow-up time of the patients (I 2 in Table 6) [2]. Second, NNCP is always smaller than BCP and it follows the BCP shape, as was expected. In addition, we can observe that the difference between the two is not significant, which means that the proposed rule in expression (5) is a good estimator of Bayes decision rule. Finally, the predictive ability of the neural network system (Fig. 1, prognosis phase) was compared to the predictive ability of Cox model. Using Cox s model for prediction, the Fig. 2. The estimate of the correct classification probability of Bayes (BCP), the correct classification probability obtained with the proposed neural networks system (NNCP) and the a priori correct classification (PCP) for each time interval under study.

17 J.M. Jerez-Aragonés et al. / Artificial Intelligence in Medicine 27 (2003) Table 9 Comparison of classification accuracy for each time interval between Cox regression model and the prognosis model based on neural networks and decision trees Prognosis model I 1 (0 10) I 2 (10 20) I 3 (20 30) I 4 (30 40) I 5 (40 50) I 6 (50 60) I 7 (>60) NNCP Cox model probability of recurrence in patients was estimated within seven different time intervals from the surgical intervention (Table 9). 5. Conclusions This paper presents a decision-support tool for the prognosis of breast cancer relapse using clinical pathological data. We propose a model that combines a novel algorithm TDIDT (CIDIM), with a system composed of different neural network topologies to approximate Bayes optimal error for the prediction of patient relapse after breast cancer surgery. The CIDIM algorithm selects the most relevant prognostic factors for the accurate prognosis of breast cancer, while the neural networks system takes as input these selected variables in order for it to reach good correct classification probability. We also present a new method for the estimate of Bayes optimal error using the neural network paradigm, and a new methodology to process censored data when the time of patient follow-up is discretized into different time intervals. The proposed method is useful for the medical expert mainly under the following circumstances: (1) when data present an important number of attributes with missing values; (2) when not only prediction accuracy, but also additional knowledge is required about the more significant prognostic factors for each time interval; and (3) when the prognostic factors significance is not the same over the time of patient follow-up, and the utilisation of survival estimate techniques is not very advisable. Actually, our research group works on improving the correct classification probability accuracy by different means: (1) introducing a methodology based on genetic algorithms for the automatic induction of appropriate neural networks topologies; (2) constructing modular neural networks architectures and analysing their generalisation properties; and (3) certain attributes (for example, grade, ploidy, estrogen receptors) have been converted into discrete values, although their conceptual vagueness could be quantified by the degree of membership of a numerical value in a fuzzy set. Thus, their values would be a userdefined finite set of linguistic values. Therefore, a fuzzy neural network system would be necessary to work with these special attributes. Based on the results achieved in this work, we hope that clinicians will be able to use artificial neural networks combined with decision trees to search through large datasets seeking subtle patterns in prognostic factors, and that may further assist the selection of appropriate adjuvant treatments for the individual patient.

18 62 J.M. Jerez-Aragonés et al. / Artificial Intelligence in Medicine 27 (2003) Acknowledgements We would like to thank the referees for their valuable comments and suggestions, and also the Oncology Service staff of the Hospital Clínico Universitario of Málaga for their comments and collaboration in this work. This work has been partially supported by the FRESCO project, number PB C04-01, of CICYT Spain. References [1] Abbass HA, Towsey M, Finn G. C-Net: a method for generating non-deterministic and dynamic multivariate decision trees. Know Inform Syst 2001;5(2): [2] Alba E et. al. Estructura del patron de recurrencia en el cancer de mama operable (CMO) tras el tratamiento primario. Implicaciones acerca del conocimiento de la historia natural de la enfermedad. In: Proceedings of the 7th Congreso de la Sociedad Española de Oncología Médica, Barcelona, Spain, [3] Amari S, Murata N, Muller KR, Finke M, Yang H. Statistical theory of overtraining is cross-validation asymptotically effective? Adv Neural Inform Process Syst 1996;8: [4] Baxt WG. Application of neural networks to clinical medicine. Lancet 1995;346: [5] Buntine W, Nibblett T. A further comparison of splitting rules for decision-tree induction. Mach Learn 1992;8: [6] Cox DR. Regression models and life tables. J R Stat Soc 1972;34: [7] D alche-buc F, Zwierski D, Nadal J. Trio learning: a new strategy for building hybrid neural trees. Neural Syst 1994;5(4): [8] Duda RO, Hart PE. Pattern classification and scene analysis. New York: Wiley; [9] Funahashi K. Multilayer neural networks and Bayes decision theory. Neural Networks 1998;11: [10] Gorman RP, Sejnowski TJ. Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks 1988;1: [11] Grumett S, Snow P. Artificial neural networks: a new model for assessing prognostic factors. Ann Oncol 2000;11: [12] [13] Janssen P, et al. Model structure selection for multivariable systems by cross-validation. Int J Control 1988;47: [14] Jefferson M, Pendleton N, Lucas B, Horan M. Comparison of a genetic algorithm neural network with logistic regression for predicting outcome after surgery for patients with nonsmall cell lung carcinoma. Am. Cancer Soc. (Atlanta) [15] Kaplan SA, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc 1958;53: [16] Kattan MW, Hess KR, Beck JR. Experiments to determine whether recursive partitioning (cart) or an artificial neural network overcomes theoretical limitations of Cox proportional hazards regression. Comput Biomed Res 1998;31(5): [17] Lucas PJF, Abu-Hanna A. Prognostic methods in medicine. Artif Intell Med 1999;15(2): (editorial). [18] McGuire WL, Tandom AT, Allred DC, Chamnes GC, Clark GM. How to use prognostic factors in axillary node-negative breast cancer patients. J Natl Cancer Inst 1990;82: [19] Michalski R, Carbonell JG, Mitchell TM. Machine learning, an artificial intelligence approach. Palo Alto: Tioga Press; [20] O Neill M. Training back-propagation neural networks to define and detect DNA-binding sites. Nucl Acids Res 1991;19: [21] Patterson DW. Artificial neural networks, theory and applications. Singapore: Prentice Hall; [22] Pesonen E, Eskelinen M, Juhola M. Comparison of different neural networks algorithms in the diagnosis of acute apendicitis. Int J Biomed Comput 1996;40:

A Model For Prognosis of Early Breast Cancer

A Model For Prognosis of Early Breast Cancer Model For Prognosis of Early Breast Cancer JEEZ, J.M. (), GOMEZ, J.. (), MUÑOZ, J. (), LB, E. () () Group of esearch in Images nalysis and rtificial Intelligence Departamento de Lenguajes y Ciencias de

More information

Predicting Breast Cancer Recurrence Using Machine Learning Techniques

Predicting Breast Cancer Recurrence Using Machine Learning Techniques Predicting Breast Cancer Recurrence Using Machine Learning Techniques Umesh D R Department of Computer Science & Engineering PESCE, Mandya, Karnataka, India Dr. B Ramachandra Department of Electrical and

More information

Application of Artificial Neural Network-Based Survival Analysis on Two Breast Cancer Datasets

Application of Artificial Neural Network-Based Survival Analysis on Two Breast Cancer Datasets Application of Artificial Neural Network-Based Survival Analysis on Two Breast Cancer Datasets Chih-Lin Chi a, W. Nick Street b, William H. Wolberg c a Health Informatics Program, University of Iowa b

More information

Good Old clinical markers have similar power in breast cancer prognosis as microarray gene expression profilers q

Good Old clinical markers have similar power in breast cancer prognosis as microarray gene expression profilers q European Journal of Cancer 40 (2004) 1837 1841 European Journal of Cancer www.ejconline.com Good Old clinical markers have similar power in breast cancer prognosis as microarray gene expression profilers

More information

Predicting Breast Cancer Survivability Rates

Predicting Breast Cancer Survivability Rates Predicting Breast Cancer Survivability Rates For data collected from Saudi Arabia Registries Ghofran Othoum 1 and Wadee Al-Halabi 2 1 Computer Science, Effat University, Jeddah, Saudi Arabia 2 Computer

More information

Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients

Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients Abstract Prognosis for stage IV (metastatic) breast cancer is difficult for clinicians to predict. This study examines the

More information

An Improved Algorithm To Predict Recurrence Of Breast Cancer

An Improved Algorithm To Predict Recurrence Of Breast Cancer An Improved Algorithm To Predict Recurrence Of Breast Cancer Umang Agrawal 1, Ass. Prof. Ishan K Rajani 2 1 M.E Computer Engineer, Silver Oak College of Engineering & Technology, Gujarat, India. 2 Assistant

More information

Time-to-Recur Measurements in Breast Cancer Microscopic Disease Instances

Time-to-Recur Measurements in Breast Cancer Microscopic Disease Instances Time-to-Recur Measurements in Breast Cancer Microscopic Disease Instances Ioannis Anagnostopoulos 1, Ilias Maglogiannis 1, Christos Anagnostopoulos 2, Konstantinos Makris 3, Eleftherios Kayafas 3 and Vassili

More information

Empirical function attribute construction in classification learning

Empirical function attribute construction in classification learning Pre-publication draft of a paper which appeared in the Proceedings of the Seventh Australian Joint Conference on Artificial Intelligence (AI'94), pages 29-36. Singapore: World Scientific Empirical function

More information

Cognitive Neuroscience History of Neural Networks in Artificial Intelligence The concept of neural network in artificial intelligence

Cognitive Neuroscience History of Neural Networks in Artificial Intelligence The concept of neural network in artificial intelligence Cognitive Neuroscience History of Neural Networks in Artificial Intelligence The concept of neural network in artificial intelligence To understand the network paradigm also requires examining the history

More information

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures 1 2 3 4 5 Kathleen T Quach Department of Neuroscience University of California, San Diego

More information

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods International Journal of Bioinformatics and Biomedical Engineering Vol. 1, No. 3, 2015, pp. 318-322 http://www.aiscience.org/journal/ijbbe ISSN: 2381-7399 (Print); ISSN: 2381-7402 (Online) Diagnosis of

More information

PROGNOSTIC COMPARISON OF STATISTICAL, NEURAL AND FUZZY METHODS OF ANALYSIS OF BREAST CANCER IMAGE CYTOMETRIC DATA

PROGNOSTIC COMPARISON OF STATISTICAL, NEURAL AND FUZZY METHODS OF ANALYSIS OF BREAST CANCER IMAGE CYTOMETRIC DATA 1 of 4 PROGNOSTIC COMPARISON OF STATISTICAL, NEURAL AND FUZZY METHODS OF ANALYSIS OF BREAST CANCER IMAGE CYTOMETRIC DATA H. Seker 1, M. Odetayo 1, D. Petrovic 1, R.N.G. Naguib 1, C. Bartoli 2, L. Alasio

More information

Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon Network

Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon Network UTM Computing Proceedings Innovations in Computing Technology and Applications Volume 2 Year: 2017 ISBN: 978-967-0194-95-0 1 Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon

More information

Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:

Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23: Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:7332-7341 Presented by Deming Mi 7/25/2006 Major reasons for few prognostic factors to

More information

Predicting Kidney Cancer Survival from Genomic Data

Predicting Kidney Cancer Survival from Genomic Data Predicting Kidney Cancer Survival from Genomic Data Christopher Sauer, Rishi Bedi, Duc Nguyen, Benedikt Bünz Abstract Cancers are on par with heart disease as the leading cause for mortality in the United

More information

Creating prognostic systems for cancer patients: A demonstration using breast cancer

Creating prognostic systems for cancer patients: A demonstration using breast cancer Received: 16 April 2018 Revised: 31 May 2018 DOI: 10.1002/cam4.1629 Accepted: 1 June 2018 ORIGINAL RESEARCH Creating prognostic systems for cancer patients: A demonstration using breast cancer Mathew T.

More information

Predicting Breast Cancer Survival Using Treatment and Patient Factors

Predicting Breast Cancer Survival Using Treatment and Patient Factors Predicting Breast Cancer Survival Using Treatment and Patient Factors William Chen wchen808@stanford.edu Henry Wang hwang9@stanford.edu 1. Introduction Breast cancer is the leading type of cancer in women

More information

BACKPROPOGATION NEURAL NETWORK FOR PREDICTION OF HEART DISEASE

BACKPROPOGATION NEURAL NETWORK FOR PREDICTION OF HEART DISEASE BACKPROPOGATION NEURAL NETWORK FOR PREDICTION OF HEART DISEASE NABEEL AL-MILLI Financial and Business Administration and Computer Science Department Zarqa University College Al-Balqa' Applied University

More information

Implications of Progesterone Receptor Status for the Biology and Prognosis of Breast Cancers

Implications of Progesterone Receptor Status for the Biology and Prognosis of Breast Cancers 日大医誌 75 (1): 10 15 (2016) 10 Original Article Implications of Progesterone Receptor Status for the Biology and Prognosis of Breast Cancers Naotaka Uchida 1), Yasuki Matsui 1), Takeshi Notsu 1) and Manabu

More information

Sparse Coding in Sparse Winner Networks

Sparse Coding in Sparse Winner Networks Sparse Coding in Sparse Winner Networks Janusz A. Starzyk 1, Yinyin Liu 1, David Vogel 2 1 School of Electrical Engineering & Computer Science Ohio University, Athens, OH 45701 {starzyk, yliu}@bobcat.ent.ohiou.edu

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017 RESEARCH ARTICLE Classification of Cancer Dataset in Data Mining Algorithms Using R Tool P.Dhivyapriya [1], Dr.S.Sivakumar [2] Research Scholar [1], Assistant professor [2] Department of Computer Science

More information

ANN predicts locoregional control using molecular marker profiles of. Head and Neck squamous cell carcinoma

ANN predicts locoregional control using molecular marker profiles of. Head and Neck squamous cell carcinoma ANN predicts locoregional control using molecular marker profiles of Head and Neck squamous cell carcinoma Final Project: 539 Dinesh Kumar Tewatia Introduction Radiotherapy alone or combined with chemotherapy,

More information

Modeling Sentiment with Ridge Regression

Modeling Sentiment with Ridge Regression Modeling Sentiment with Ridge Regression Luke Segars 2/20/2012 The goal of this project was to generate a linear sentiment model for classifying Amazon book reviews according to their star rank. More generally,

More information

Downloaded from ijbd.ir at 19: on Friday March 22nd (Naive Bayes) (Logistic Regression) (Bayes Nets)

Downloaded from ijbd.ir at 19: on Friday March 22nd (Naive Bayes) (Logistic Regression) (Bayes Nets) 1392 7 * :. :... :. :. (Decision Trees) (Artificial Neural Networks/ANNs) (Logistic Regression) (Naive Bayes) (Bayes Nets) (Decision Tree with Naive Bayes) (Support Vector Machine).. 7 :.. :. :.. : lga_77@yahoo.com

More information

Auto-Encoder Pre-Training of Segmented-Memory Recurrent Neural Networks

Auto-Encoder Pre-Training of Segmented-Memory Recurrent Neural Networks Auto-Encoder Pre-Training of Segmented-Memory Recurrent Neural Networks Stefan Glüge, Ronald Böck and Andreas Wendemuth Faculty of Electrical Engineering and Information Technology Cognitive Systems Group,

More information

Radiotherapy Outcomes

Radiotherapy Outcomes in partnership with Outcomes Models with Machine Learning Sarah Gulliford PhD Division of Radiotherapy & Imaging sarahg@icr.ac.uk AAPM 31 st July 2017 Making the discoveries that defeat cancer Radiotherapy

More information

Applications of Machine learning in Prediction of Breast Cancer Incidence and Mortality

Applications of Machine learning in Prediction of Breast Cancer Incidence and Mortality Applications of Machine learning in Prediction of Breast Cancer Incidence and Mortality Nadia Helal and Eman Sarwat Radiation Safety Dep. NCNSRC., Atomic Energy Authority, 3, Ahmed El Zomor St., P.Code

More information

A Deep Learning Approach to Identify Diabetes

A Deep Learning Approach to Identify Diabetes , pp.44-49 http://dx.doi.org/10.14257/astl.2017.145.09 A Deep Learning Approach to Identify Diabetes Sushant Ramesh, Ronnie D. Caytiles* and N.Ch.S.N Iyengar** School of Computer Science and Engineering

More information

Discovering Dependencies in Medical Data by Visualisation

Discovering Dependencies in Medical Data by Visualisation Discovering Dependencies in Medical Data by Visualisation Jacek Dryl +, Halina Kwasnicka *, Urszula Markowska-Kaczmar *, Rafal Matkowski +, Paweł Mikołajczyk *, Jacek Tomasiak * + Medical University of

More information

PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH

PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH 1 VALLURI RISHIKA, M.TECH COMPUTER SCENCE AND SYSTEMS ENGINEERING, ANDHRA UNIVERSITY 2 A. MARY SOWJANYA, Assistant Professor COMPUTER SCENCE

More information

INDUCTIVE LEARNING OF TREE-BASED REGRESSION MODELS. Luís Fernando Raínho Alves Torgo

INDUCTIVE LEARNING OF TREE-BASED REGRESSION MODELS. Luís Fernando Raínho Alves Torgo Luís Fernando Raínho Alves Torgo INDUCTIVE LEARNING OF TREE-BASED REGRESSION MODELS Tese submetida para obtenção do grau de Doutor em Ciência de Computadores Departamento de Ciência de Computadores Faculdade

More information

Early Detection of Lung Cancer

Early Detection of Lung Cancer Early Detection of Lung Cancer Aswathy N Iyer Dept Of Electronics And Communication Engineering Lymie Jose Dept Of Electronics And Communication Engineering Anumol Thomas Dept Of Electronics And Communication

More information

Pathology Report Patient Companion Guide

Pathology Report Patient Companion Guide Pathology Report Patient Companion Guide Breast Cancer - Understanding Your Pathology Report Pathology Reports can be overwhelming. They contain scientific terms that are unfamiliar and might be a bit

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Final Project Report CS 229 Autumn 2017 Category: Life Sciences Maxwell Allman (mallman) Lin Fan (linfan) Jamie Kang (kangjh) 1 Introduction

More information

Data mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis

Data mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis Data mining for Obstructive Sleep Apnea Detection 18 October 2017 Konstantinos Nikolaidis Introduction: What is Obstructive Sleep Apnea? Obstructive Sleep Apnea (OSA) is a relatively common sleep disorder

More information

Maram Abdaljaleel, MD Dermatopathologist and Neuropathologist University of Jordan, School of Medicine

Maram Abdaljaleel, MD Dermatopathologist and Neuropathologist University of Jordan, School of Medicine Maram Abdaljaleel, MD Dermatopathologist and Neuropathologist University of Jordan, School of Medicine The most common non-skin malignancy of women 2 nd most common cause of cancer deaths in women, following

More information

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018 Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this

More information

Multilayer Perceptron Neural Network Classification of Malignant Breast. Mass

Multilayer Perceptron Neural Network Classification of Malignant Breast. Mass Multilayer Perceptron Neural Network Classification of Malignant Breast Mass Joshua Henry 12/15/2017 henry7@wisc.edu Introduction Breast cancer is a very widespread problem; as such, it is likely that

More information

Rajiv Gandhi College of Engineering, Chandrapur

Rajiv Gandhi College of Engineering, Chandrapur Utilization of Data Mining Techniques for Analysis of Breast Cancer Dataset Using R Keerti Yeulkar 1, Dr. Rahila Sheikh 2 1 PG Student, 2 Head of Computer Science and Studies Rajiv Gandhi College of Engineering,

More information

SUPPLEMENTARY MATERIAL

SUPPLEMENTARY MATERIAL SUPPLEMENTARY MATERIAL Supplementary Figure 1. Recursive partitioning using PFS data in patients with advanced NSCLC with non-squamous histology treated in the placebo pemetrexed arm of LUME-Lung 2. (A)

More information

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Data Mining Techniques to Predict Cancer Diseases

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Learning and Adaptive Behavior, Part II

Learning and Adaptive Behavior, Part II Learning and Adaptive Behavior, Part II April 12, 2007 The man who sets out to carry a cat by its tail learns something that will always be useful and which will never grow dim or doubtful. -- Mark Twain

More information

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA The European Agency for the Evaluation of Medicinal Products Evaluation of Medicines for Human Use London, 15 November 2001 CPMP/EWP/1776/99 COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO

More information

Updates on the Conflict of Postoperative Radiotherapy Impact on Survival of Young Women with Cancer Breast: A Retrospective Cohort Study

Updates on the Conflict of Postoperative Radiotherapy Impact on Survival of Young Women with Cancer Breast: A Retrospective Cohort Study International Journal of Medical Research & Health Sciences Available online at www.ijmrhs.com ISSN No: 2319-5886 International Journal of Medical Research & Health Sciences, 2017, 6(7): 14-18 I J M R

More information

Conditional spectrum-based ground motion selection. Part II: Intensity-based assessments and evaluation of alternative target spectra

Conditional spectrum-based ground motion selection. Part II: Intensity-based assessments and evaluation of alternative target spectra EARTHQUAKE ENGINEERING & STRUCTURAL DYNAMICS Published online 9 May 203 in Wiley Online Library (wileyonlinelibrary.com)..2303 Conditional spectrum-based ground motion selection. Part II: Intensity-based

More information

A prediction model for type 2 diabetes using adaptive neuro-fuzzy interface system.

A prediction model for type 2 diabetes using adaptive neuro-fuzzy interface system. Biomedical Research 208; Special Issue: S69-S74 ISSN 0970-938X www.biomedres.info A prediction model for type 2 diabetes using adaptive neuro-fuzzy interface system. S Alby *, BL Shivakumar 2 Research

More information

MISSING DATA ESTIMATION FOR CANCER DIAGNOSIS SUPPORT

MISSING DATA ESTIMATION FOR CANCER DIAGNOSIS SUPPORT MISSING DATA ESTIMATION FOR CANCER DIAGNOSIS SUPPORT Witold Jacak (a), Karin Proell (b) (a) Department of Software Engineering Upper Austria University of Applied Sciences Hagenberg, Softwarepark 11, Austria

More information

Classification of benign and malignant masses in breast mammograms

Classification of benign and malignant masses in breast mammograms Classification of benign and malignant masses in breast mammograms A. Šerifović-Trbalić*, A. Trbalić**, D. Demirović*, N. Prljača* and P.C. Cattin*** * Faculty of Electrical Engineering, University of

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference Lecture Outline Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Statistical Inference Role of Statistical Inference Hierarchy of Experimental

More information

Ethnic Disparities in the Treatment of Stage I Non-small Cell Lung Cancer. Juan P. Wisnivesky, MD, MPH, Thomas McGinn, MD, MPH, Claudia Henschke, PhD,

Ethnic Disparities in the Treatment of Stage I Non-small Cell Lung Cancer. Juan P. Wisnivesky, MD, MPH, Thomas McGinn, MD, MPH, Claudia Henschke, PhD, Ethnic Disparities in the Treatment of Stage I Non-small Cell Lung Cancer Juan P. Wisnivesky, MD, MPH, Thomas McGinn, MD, MPH, Claudia Henschke, PhD, MD, Paul Hebert, PhD, Michael C. Iannuzzi, MD, and

More information

Fuzzy Decision Tree FID

Fuzzy Decision Tree FID Fuzzy Decision Tree FID Cezary Z. Janikow Krzysztof Kawa Math & Computer Science Department Math & Computer Science Department University of Missouri St. Louis University of Missouri St. Louis St. Louis,

More information

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Gene Selection for Tumor Classification Using Microarray Gene Expression Data Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology

More information

PII: S (96) THE USE OF THE AREA UNDER THE ROC CURVE IN THE EVALUATION OF MACHINE LEARNING ALGORITHMS

PII: S (96) THE USE OF THE AREA UNDER THE ROC CURVE IN THE EVALUATION OF MACHINE LEARNING ALGORITHMS Pergamon Pattern Recognition, Vol. 30, No. 7, pp. 1145-1159, 1997 1997 Pattern Recognition Society. Published by Elsevier Science Ltd Printed in Great Britain. All rights reserved 0031-3203/97 $17.00+.00

More information

Bayesian Networks in Medicine: a Model-based Approach to Medical Decision Making

Bayesian Networks in Medicine: a Model-based Approach to Medical Decision Making Bayesian Networks in Medicine: a Model-based Approach to Medical Decision Making Peter Lucas Department of Computing Science University of Aberdeen Scotland, UK plucas@csd.abdn.ac.uk Abstract Bayesian

More information

J2.6 Imputation of missing data with nonlinear relationships

J2.6 Imputation of missing data with nonlinear relationships Sixth Conference on Artificial Intelligence Applications to Environmental Science 88th AMS Annual Meeting, New Orleans, LA 20-24 January 2008 J2.6 Imputation of missing with nonlinear relationships Michael

More information

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you? WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters

More information

Ductal Carcinoma-in-Situ: New Concepts and Controversies

Ductal Carcinoma-in-Situ: New Concepts and Controversies Ductal Carcinoma-in-Situ: New Concepts and Controversies James J. Stark, MD, FACP Medical Director, Cancer Program and Palliative Care Maryview Medical Center Professor of Medicine, EVMS Case Presentation

More information

INTRODUCTION TO MACHINE LEARNING. Decision tree learning

INTRODUCTION TO MACHINE LEARNING. Decision tree learning INTRODUCTION TO MACHINE LEARNING Decision tree learning Task of classification Automatically assign class to observations with features Observation: vector of features, with a class Automatically assign

More information

Breast Cancer After Treatment of Hodgkin's Disease.

Breast Cancer After Treatment of Hodgkin's Disease. Breast Cancer After Treatment of Hodgkin's Disease. Hancock SL, Tucker MA, Hoppe R Journal of the National Cancer Institute 85(1):25-31, 1993 Introduction The risks of second malignancy are increased in

More information

Biceps Activity EMG Pattern Recognition Using Neural Networks

Biceps Activity EMG Pattern Recognition Using Neural Networks Biceps Activity EMG Pattern Recognition Using eural etworks K. Sundaraj University Malaysia Perlis (UniMAP) School of Mechatronic Engineering 0600 Jejawi - Perlis MALAYSIA kenneth@unimap.edu.my Abstract:

More information

MEDICAL SCORING FOR BREAST CANCER RECURRENCE. Nurul Husna bt Jamian UiTM (Perak), Tapah Campus

MEDICAL SCORING FOR BREAST CANCER RECURRENCE. Nurul Husna bt Jamian UiTM (Perak), Tapah Campus MEDICAL SCORING FOR BREAST CANCER RECURRENCE Nurul Husna bt Jamian UiTM (Perak), Tapah Campus SAS GLOBAL FORUM 2014 Overview Introduction Methodology Analysis and Findings Conclusion Q&A INTRODUCTION Breast

More information

Improved Intelligent Classification Technique Based On Support Vector Machines

Improved Intelligent Classification Technique Based On Support Vector Machines Improved Intelligent Classification Technique Based On Support Vector Machines V.Vani Asst.Professor,Department of Computer Science,JJ College of Arts and Science,Pudukkottai. Abstract:An abnormal growth

More information

A Hierarchical Artificial Neural Network Model for Giemsa-Stained Human Chromosome Classification

A Hierarchical Artificial Neural Network Model for Giemsa-Stained Human Chromosome Classification A Hierarchical Artificial Neural Network Model for Giemsa-Stained Human Chromosome Classification JONGMAN CHO 1 1 Department of Biomedical Engineering, Inje University, Gimhae, 621-749, KOREA minerva@ieeeorg

More information

Chapter 17 Sensitivity Analysis and Model Validation

Chapter 17 Sensitivity Analysis and Model Validation Chapter 17 Sensitivity Analysis and Model Validation Justin D. Salciccioli, Yves Crutain, Matthieu Komorowski and Dominic C. Marshall Learning Objectives Appreciate that all models possess inherent limitations

More information

Understanding Your Pathology Report

Understanding Your Pathology Report Understanding Your Pathology Report Because every person s breast cancer is unique, it s important to understand the underlying biology of your tumor to personalize your treatment plan. Your physicians

More information

MODEL SELECTION STRATEGIES. Tony Panzarella

MODEL SELECTION STRATEGIES. Tony Panzarella MODEL SELECTION STRATEGIES Tony Panzarella Lab Course March 20, 2014 2 Preamble Although focus will be on time-to-event data the same principles apply to other outcome data Lab Course March 20, 2014 3

More information

Prediction of Malignant and Benign Tumor using Machine Learning

Prediction of Malignant and Benign Tumor using Machine Learning Prediction of Malignant and Benign Tumor using Machine Learning Ashish Shah Department of Computer Science and Engineering Manipal Institute of Technology, Manipal University, Manipal, Karnataka, India

More information

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) AUTHORS: Tejas Prahlad INTRODUCTION Acute Respiratory Distress Syndrome (ARDS) is a condition

More information

Artificial Neural Networks (Ref: Negnevitsky, M. Artificial Intelligence, Chapter 6)

Artificial Neural Networks (Ref: Negnevitsky, M. Artificial Intelligence, Chapter 6) Artificial Neural Networks (Ref: Negnevitsky, M. Artificial Intelligence, Chapter 6) BPNN in Practice Week 3 Lecture Notes page 1 of 1 The Hopfield Network In this network, it was designed on analogy of

More information

Enhanced Detection of Lung Cancer using Hybrid Method of Image Segmentation

Enhanced Detection of Lung Cancer using Hybrid Method of Image Segmentation Enhanced Detection of Lung Cancer using Hybrid Method of Image Segmentation L Uma Maheshwari Department of ECE, Stanley College of Engineering and Technology for Women, Hyderabad - 500001, India. Udayini

More information

Evaluating Classifiers for Disease Gene Discovery

Evaluating Classifiers for Disease Gene Discovery Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics

More information

Automatic Definition of Planning Target Volume in Computer-Assisted Radiotherapy

Automatic Definition of Planning Target Volume in Computer-Assisted Radiotherapy Automatic Definition of Planning Target Volume in Computer-Assisted Radiotherapy Angelo Zizzari Department of Cybernetics, School of Systems Engineering The University of Reading, Whiteknights, PO Box

More information

Survey on Breast Cancer Analysis using Machine Learning Techniques

Survey on Breast Cancer Analysis using Machine Learning Techniques Survey on Breast Cancer Analysis using Machine Learning Techniques Prof Tejal Upadhyay 1, Arpita Shah 2 1 Assistant Professor, Information Technology Department, 2 M.Tech, Computer Science and Engineering,

More information

Machine Learning for Survival Analysis: A Case Study on Recurrence of Prostate Cancer

Machine Learning for Survival Analysis: A Case Study on Recurrence of Prostate Cancer Machine Learning for Survival Analysis: A Case Study on Recurrence of Prostate Cancer Blaž Zupan 1,2,4,JanezDemšar 1, Michael W. Kattan 3, J. Robert Beck 4,and I. Bratko 1,2 1 Faculty of Computer Science,

More information

COMPARATIVE STUDY ON FEATURE EXTRACTION METHOD FOR BREAST CANCER CLASSIFICATION

COMPARATIVE STUDY ON FEATURE EXTRACTION METHOD FOR BREAST CANCER CLASSIFICATION COMPARATIVE STUDY ON FEATURE EXTRACTION METHOD FOR BREAST CANCER CLASSIFICATION 1 R.NITHYA, 2 B.SANTHI 1 Asstt Prof., School of Computing, SASTRA University, Thanjavur, Tamilnadu, India-613402 2 Prof.,

More information

A Learning Method of Directly Optimizing Classifier Performance at Local Operating Range

A Learning Method of Directly Optimizing Classifier Performance at Local Operating Range A Learning Method of Directly Optimizing Classifier Performance at Local Operating Range Lae-Jeong Park and Jung-Ho Moon Department of Electrical Engineering, Kangnung National University Kangnung, Gangwon-Do,

More information

Learning with Rare Cases and Small Disjuncts

Learning with Rare Cases and Small Disjuncts Appears in Proceedings of the 12 th International Conference on Machine Learning, Morgan Kaufmann, 1995, 558-565. Learning with Rare Cases and Small Disjuncts Gary M. Weiss Rutgers University/AT&T Bell

More information

BREAST CANCER EPIDEMIOLOGY MODEL:

BREAST CANCER EPIDEMIOLOGY MODEL: BREAST CANCER EPIDEMIOLOGY MODEL: Calibrating Simulations via Optimization Michael C. Ferris, Geng Deng, Dennis G. Fryback, Vipat Kuruchittham University of Wisconsin 1 University of Wisconsin Breast Cancer

More information

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method Biost 590: Statistical Consulting Statistical Classification of Scientific Studies; Approach to Consulting Lecture Outline Statistical Classification of Scientific Studies Statistical Tasks Approach to

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Classification of Smoking Status: The Case of Turkey

Classification of Smoking Status: The Case of Turkey Classification of Smoking Status: The Case of Turkey Zeynep D. U. Durmuşoğlu Department of Industrial Engineering Gaziantep University Gaziantep, Turkey unutmaz@gantep.edu.tr Pınar Kocabey Çiftçi Department

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

An Efficient Hybrid Rule Based Inference Engine with Explanation Capability

An Efficient Hybrid Rule Based Inference Engine with Explanation Capability To be published in the Proceedings of the 14th International FLAIRS Conference, Key West, Florida, May 2001. An Efficient Hybrid Rule Based Inference Engine with Explanation Capability Ioannis Hatzilygeroudis,

More information

Development of Soft-Computing techniques capable of diagnosing Alzheimer s Disease in its pre-clinical stage combining MRI and FDG-PET images.

Development of Soft-Computing techniques capable of diagnosing Alzheimer s Disease in its pre-clinical stage combining MRI and FDG-PET images. Development of Soft-Computing techniques capable of diagnosing Alzheimer s Disease in its pre-clinical stage combining MRI and FDG-PET images. Olga Valenzuela, Francisco Ortuño, Belen San-Roman, Victor

More information

4. Model evaluation & selection

4. Model evaluation & selection Foundations of Machine Learning CentraleSupélec Fall 2017 4. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr

More information

Application of distributed lighting control architecture in dementia-friendly smart homes

Application of distributed lighting control architecture in dementia-friendly smart homes Application of distributed lighting control architecture in dementia-friendly smart homes Atousa Zaeim School of CSE University of Salford Manchester United Kingdom Samia Nefti-Meziani School of CSE University

More information

IDENTIFYING MOST INFLUENTIAL RISK FACTORS OF GESTATIONAL DIABETES MELLITUS USING DISCRIMINANT ANALYSIS

IDENTIFYING MOST INFLUENTIAL RISK FACTORS OF GESTATIONAL DIABETES MELLITUS USING DISCRIMINANT ANALYSIS Inter national Journal of Pure and Applied Mathematics Volume 113 No. 10 2017, 100 109 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu IDENTIFYING

More information

A hybrid Model to Estimate Cirrhosis Using Laboratory Testsand Multilayer Perceptron (MLP) Neural Networks

A hybrid Model to Estimate Cirrhosis Using Laboratory Testsand Multilayer Perceptron (MLP) Neural Networks IOSR Journal of Nursing and Health Science (IOSR-JNHS) e-issn: 232 1959.p- ISSN: 232 194 Volume 7, Issue 1 Ver. V. (Jan.- Feb.218), PP 32-38 www.iosrjournals.org A hybrid Model to Estimate Cirrhosis Using

More information

An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns

An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns 1. Introduction Vasily Morzhakov, Alexey Redozubov morzhakovva@gmail.com, galdrd@gmail.com Abstract Cortical

More information

Automated Prediction of Thyroid Disease using ANN

Automated Prediction of Thyroid Disease using ANN Automated Prediction of Thyroid Disease using ANN Vikram V Hegde 1, Deepamala N 2 P.G. Student, Department of Computer Science and Engineering, RV College of, Bangalore, Karnataka, India 1 Assistant Professor,

More information

arxiv: v2 [cs.lg] 30 Oct 2013

arxiv: v2 [cs.lg] 30 Oct 2013 Prediction of breast cancer recurrence using Classification Restricted Boltzmann Machine with Dropping arxiv:1308.6324v2 [cs.lg] 30 Oct 2013 Jakub M. Tomczak Wrocław University of Technology Wrocław, Poland

More information

Breast Cancer Diagnosis and Prognosis

Breast Cancer Diagnosis and Prognosis Breast Cancer Diagnosis and Prognosis Patrick Pantel Department of Computer Science University of Manitoba Winnipeg, Manitoba, Canada R3T 2N2 ppantel@cs.umanitoba.ca Abstract Breast cancer accounts for

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction Artificial neural networks are mathematical inventions inspired by observations made in the study of biological systems, though loosely based on the actual biology. An artificial

More information

Intelligent Control Systems

Intelligent Control Systems Lecture Notes in 4 th Class in the Control and Systems Engineering Department University of Technology CCE-CN432 Edited By: Dr. Mohammed Y. Hassan, Ph. D. Fourth Year. CCE-CN432 Syllabus Theoretical: 2

More information

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1 From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Contents Dedication... iii Acknowledgments... xi About This Book... xiii About the Author... xvii Chapter 1: Introduction...

More information

CARDIAC ARRYTHMIA CLASSIFICATION BY NEURONAL NETWORKS (MLP)

CARDIAC ARRYTHMIA CLASSIFICATION BY NEURONAL NETWORKS (MLP) CARDIAC ARRYTHMIA CLASSIFICATION BY NEURONAL NETWORKS (MLP) Bochra TRIQUI, Abdelkader BENYETTOU Center for Artificial Intelligent USTO-MB University Algeria triqui_bouchra@yahoo.fr a_benyettou@yahoo.fr

More information