How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection

Size: px
Start display at page:

Download "How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection"

Transcription

1 How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection Esma Nur Cinicioglu * and Gülseren Büyükuğur Istanbul University, School of Business, Quantitative Methods Division, Avcilar, 34322, Istanbul, Turkey esmanurc@istanbul.edu.tr, gulsayici@gmail.com Abstract. Variable selection in Bayesian networks is necessary to assure the quality of the learned network structure. Cinicioglu & Shenoy (2012) suggested an approach for variable selection in Bayesian networks where a score, S j, is developed to assess each variable whether it should be included in the final Bayesian network. However, with this method the without parents or children are punished which affects the performance of the learned network. To eliminate that drawback, in this paper we develop a new score, NS j. We measure the performance of this new heuristic in terms of the prediction capacity of the learned network, its lift over marginal and evaluate its success by comparing it with the results obtained by the previously developed S j score. For the illustration of the developed heuristic and comparison of the results credit score data is used. Keywords: Bayesian networks, Variable selection, Heuristic. 1 Introduction The upsurge of popularity of Bayesian networks brings a parallel increase in research for structure learning algorithms of Bayesian networks from data sets. The ability of Bayesian networks to represent the probabilistic relationships between the is one of the main reasons of the rise in reputation of Bayesian networks as an inference tool. This also generates the major appeal of Bayesian networks for data mining. With the advancement and diversification of the structure learning algorithms, more may be incorporated to the learning process, bigger data sets may be used for learning, and inferences become faster even in the presence of continuous. The progress achieved on structure learning algorithms for Bayesian networks is encouraging for the increasing use of Bayesian networks as a general decision support system, a data mining tool and for probabilistic inference. On the other hand, though the quality of a learned network may be evaluated by many different aspects, the performance of the learned network very much depends on the selection of the to be included to the network. Depending on the purpose of the application, the characteristics of an application may differ and hence the expectations from a Bayesian network performance may vary. Therefore to assure to end up with a Bayesian * Corresponding author. A. Laurent et al. (Eds.): IPMU 2014, Part I, CCIS 442, pp , Springer International Publishing Switzerland 2014

2 528 E.N. Cinicioglu and G. Büyükuğur network of high quality, variable selection in Bayesian networks should constitute an important dimension of the learning process. There is a considerable literature in statistics on measures like AIC, BIC, Caplow s C-p statistic etc. that are used for variable selection in statistical models. These measures have been adopted by the machine learning community for evaluating the score based methods for learning Bayesian network models (Scutari, 2010). However, these scores are used as a measure of the relative quality of the learned network and do not assist in the variable selection process. Additionally, as discussed in Cui et al. (2010), traditional methods of stepwise variable selection do not consider the interrelations among and may not identify the best subset for model building. Despite the interest for structure learning algorithms and adaptation of different measures for the evaluation of the resulting Bayesian networks, variable selection in Bayesian networks is a topic which needs further attention of the researchers. Previously Koller and Sahmi (1996) elaborate the importance of feature selection and state that the goal should be to eliminate a feature if it gives us little or no additional information. Hruschka et al. (2004) described Bayesian feature selection approach for classification problems. In their work, first a BN is created from a dataset and then the Markov blanket of the class variable is used to the feature subset selection task. Sun & Shenoy (2007) provided a heuristic method to guide the selection of in naïve Bayes models. To achieve the goal, the proposed heuristic relies on correlations and partial correlations among. Another heuristic developed for variable selection in Bayesian networks was proposed by Cinicioglu & Shenoy (2012). With this heuristic a score called S j was developed which helps to determine the to be used in the final Bayesian network. By this heuristic first an initial Bayesian network is developed with the purpose of learning the conditional probability tables (cpts) of all the in the network. The cpts indicate the association of a variable with the other in the network. Using the cpt of each variable, its corresponding S j score is calculated. In their paper Cinicioglu & Shenoy (2012) illustrate that by applying proposed heuristic the performance of the learned network in terms of the prediction capacity may be improved substantially. In this paper first we discuss the S j score, and then identify the problem that though the S j score demonstrates a sound performance on prediction capacity, its formula leads to the problem that the without parents or children in the network are punished and that in turn affects the overall performance of the heuristic. Trying to eliminate that drawback, in this paper we suggest a modified version of the S j score, which is called as NS j. We measure the performance of this new score in terms of the prediction capacity of the learned network, its lift compared to the marginal model and evaluate its success by comparing it with the results obtained by the previously developed S j score. For the illustration of the developed heuristic and comparison of the results credit score data is used. The outline of the remainder of the paper is as follow: The next section gives details about the credit data set used for the application of the proposed heuristic. In section 3 the development of the new heuristic is explained, where both S j and NS j scores are discussed in detail in subsections 3.1 and 3.2 respectively. In section 4, using both of the variable selection scores S j and NS j, different Bayesian networks are created. The performance results of these two heuristics are compared in terms of the prediction capacity and improvement rates obtained compared to the marginal model.

3 How to Create Better Performing Bayesian Networks Data Set The data set used in this study is a free data set, called the German credit data, provided by the UCI Center for Machine Learning and Repository Systems. The original form of the data set contains the information of 1000 customers about 20 different attributes, 13 categorical and 7 numerical, giving the information necessary to evaluate a customer s eligibility to get a credit. Before the use of the data set for the application of the proposed heuristics several changes are made in the original data set. In this research, the German credit data set is transformed into a form where the numerical attributes Duration in month, Credit amount, Installment rate in percentage of disposable income, Present residence since, Age in years, Number of existing credits at this bank and Number of people being liable to provide maintenance for are discretized. The variable Personal status and sex is divided into two categorical as Personal status and Sex. In the original data set the categorical variable Purpose contains eleven different states. In this paper some of these states are joined together, like car and used car as car, furniture, radio and domestic appliances as appliances and retraining and business as business, resulting in seven different states at the end. The final data set used in this study constitutes of 21 columns and 1000 lines, referring the number of and cases consequently. 3 Development of the Proposed Heuristic 3.1 S j Score The heuristic developed by Cinicioglu & Shenoy (2012) is based on the principle that a good prediction capacity of a Bayesian network depends on the choice of the that have high associations with each other. A marginal variable present in a network will not have any dependencies with the remaining in the network and thus won t have any impact for the overall performance of the network. In that instance, the arcs learned using an existing structure learning algorithm shows the dependency of a child node with its parent node, hence a proof of association. However, not all which do not place themselves as marginals, can be incorporated to the final Bayesian network. The idea is to develop an efficient heuristic for variable selection where the Bayesian network created using the selected will show a superior prediction performance compared to the random inclusion of to the network. Besides, though the presence of an arc shows the dependency relationship between two in the network, the degree of association is not measured there and may vary quite differently among. A natural way to examine the association of a variable with other considered for inclusion in the final Bayesian network is to learn an initial Bayesian network structure at first and then use the conditional probability tables of each variable as a source of measurement for the degree of association.

4 530 E.N. Cinicioglu and G. Büyükuğur Applying the distance measure to the conditional probability table of a variable, the degree of change on the conditional probabilities of a child node depending on the states of its parents may be measured. In that instance a high average distance obtained indicate that the conditional probability of the variable considered changes a great deal depending on the states of its parents. Thus, a high average distance is an indication of the high association of a child node with its parents. The average distance of each variable may be calculated using the formula given below. Here d represents the average distance of the variable of interest with its parent. p and q stand for the conditional probabilities of this variable for the different states of its parents, i stands for the different states of the child node and n stands for the number of states of the set of parent nodes. /, (1) 2 However, there may be in the network which do not have a high level of association with its parent node but do possess a high association with its children. Basing the selection process on the average distance of each variable solely will deteriorate the performance of the network created. Besides, while the average distance obtained from the cpt of a variable shows the degree of association of a child node with its parents, the same average distance also shows the degree of association of a parent node with its child, jointly with the child s other parents. Following this logic Cinicioglu & Shenoy (2012) developed the S j score given in Equation (2) below. In this formula the S j score of a variable j is the sum of the average distance of this variable d j and the average of the average distances of its children. Here ij denotes the child variable i of the variable j and c j denotes the number of j s children. (2) Consider Table 1 given below. This table is the cpt of the variable Credit amount. Using the formula given in Equation (1) the average distance of this variable is calculated as Considering Figure 1 given below, we see that Credit Amount possesses three children. Hence in order to calculate the S j score of Credit Amount we need to find the average distances of the child, average them and then add it to the average distance of the Credit Amount. A high S j score is desired as an indication of the high association with other. Ideally, according to the heuristic, the variable with the lowest S j score will be excluded from the analysis and a new BN will be created with the remaining. This network will include the new cpts which will be the basis for the selection of the variable to be excluded from the network. This process is repeated until the desired number of is obtained. This repeated process is the ideal way of applying the heuristic, however if not automated will require a great deal of time. In the following, subsection 3.2, the shortcomings of the S j score are discussed. As a modification of the S j score to handle the problems involved with the old variable selection method, a new score called NS j is suggested.

5 How to Create Better Performing Bayesian Networks 531 Table 1. Cpt of the variable Credit Amount Credit Amount Telephone None Yes Fig. 1. Variable Credit Amount with its three children and calculation of S Credit Amount 3.2 A New Variable Selection Score: NS j The heuristic developed by Cinicioglu & Shenoy (2012) tries to identify the which possess a high level of association with their parent and child. With that purpose the variable selection score developed, S j, is comprised of two parts: S j is the sum of the average distance of the variable of interest and the average of the average distances of its children. This way, with the S j score the variable is evaluated by considering both the association with its parents and also with its children. However, this approach also has the drawback that the without parents or children are penalized for inclusion to the final Bayesian network. Consider the formula of the S j score given in Equation (2). A variable without parents will only have a marginal probability distribution, not a cpt, and thus its average distance will be considered as zero. Similarly, for a variable which does not have any children the S j score will be equal to its average distance. The resulting S j scores for a variable without parents and for a variable without children are given in Equations (3) and (4) respectively. For a variable j without parents (3) For a variable j without children (4) As illustrated above because of the formulation of the S j score, which do not possess parents or children will be punished in the variable selection process. If such a variable which lacks parents or children has a strong association with the present part

6 532 E.N. Cinicioglu and G. Büyükuğur (its parents or children depending on the case) though, then this selection process may cause to create networks with lower performance. To overcome this problem, in this research, a modified version of the S j score, NS j, is presented. For which lack either parents or children the score will remain to be the same as the old one. For which possess both parents and children on the other hand, NS j will be equal to the half of the old S j score. These two cases are formulated in Equation (5) and (6) given below. For a variable j without parents or children For a variable j both with parents and children (5) The which don t have any parents or children will be eliminated from the network. In the following section both of these heuristics will be used to learn BNs from the credit data set introduced in Section 2, their performance will be evaluated in terms of the prediction capacity and improvement obtained compared to the marginal model. (6) 4 Evaluation of the Proposed Heuristic In this section the performance of the variable selection scores S j and NS j are compared. The evaluation is made in terms of the prediction capacity and improvement of the BNs created using the suggested scores. For the application of the heuristic, first, it is necessary to learn an initial BN from the data set. For illustration and evaluation of the suggested scores the credit data set given in Section 2 will be used. For learning BNs from the data set WinMine, software (Heckerman et al., 2000) developed by Microsoft Research, is used. The main advantage of WinMine is its ability to automatically calculate log-scores and lift over marginals of the learned BNs. Log-score is a quantitative criterion to compare the quality and performance of the learned BNs. The formula for the log score is given below.,, / (7) where n is the number of, and N is the number of cases in the test set. For the calculation of the log-score, the dataset is divided into a 70/30 train and test split 1 and the accuracy of the learned model on the test set is then evaluated using the log score. Using WinMine the difference between the log scores of the provided 1 In WinMine only the percentage of the test/training test data may be determined. Using a different software in further research 10-fold cross validation will increase the validity of the results.

7 How to Create Better Performing Bayesian Networks 533 model and the marginal model can also be compared which is called as the lift over marginal. A positive difference signifies that the model out-performs the marginal model on the test set. The initial BN learned from the credit data set is given in Figure 2 below. Fig. 2. The initial BN learned from the credit data set containing all of the Using the cpts obtained through the initial BN we can calculate both the S j and NS j scores. Figure 3 given below depicts the graph of both S j and NS j scores for the 21 used in the initial BN. The observations made are as follows: For seven in the network the corresponding S j and NS j scores agree. These seven are the ones which either lack parents or children. Fig. 3. Graph of the S j and NS j scores calculated using the cpts obtained from the initial BN In our analysis we want to compare the performance of these two variable selection scores. With that purpose two sets of are created, one by selecting the with the highest S j scores and the second with the highest NS j scores. Using the selected the corresponding BNs are learned. The performance of the BNs

8 534 E.N. Cinicioglu and G. Büyükuğur are compared in terms of prediction capacity of the provided model and in terms of the improvement obtained. As the next step the same process is repeated by using the cpts of the new BNs to calculate the new S j and NS j scores. Accordingly, the to be excluded from the network is decided according their ranking on the variable selection score considered, S j or NS j. In our analysis, we repeated the steps five times and created BNs using 17, 15, 13, 11 and 8, all selected according their ranking in the corresponding variable selection scores. The results of their performance are listed in Table 2 given below. Both of the variable selection scores obtain better results compared to the marginal model and also the average distance measure. Notice that also the results of the BNs created using the average distance d j are listed in the same table. This is done for comparison purposes to illustrate that both of the variable selection scores do result in superior performance compared to the average distance measure. Additionally, in almost all the networks considered except the BN with 17, we obtained better performing networks using the NS j score both in terms of the prediction capacity and improvement obtained. Table 2. Performance results of the variable selection scores S j and NS j 2 LogScore Prediction rate Lift Over Marginal Improvement obtained initial BN % % Top 17 Top 15 Top 13 Top 11 Top 8 d j % % S j % % NS j % % d j % % S j % % NS j % % d j % % S j % % NS j % % d j % % S j % % NS j % % d j % % S j % % NS j % % 5 Results, Conclusions and Further Research In order to ensure the prediction capacity of a BN learned from the data set and to be able to discover hidden information inside a big data set it is necessary to select the right set of to be used in the BN to be learned. This problem is especially 2 The results are rounded to two decimal places.

9 How to Create Better Performing Bayesian Networks 535 apparent when there is a huge set of and the provided data is limited. In the last decade the research on structure learning algorithms for BNs have grown substantially. Though, there exists a wide research for variable selection in statistical models, the research conducted for variable selection in BNs remains to be limited. The variable selection measures developed for statistical models have been adapted by the machine learning community for evaluating the overall performance of the BN and do not provide guidance in variable selection for creating a good performing BN. The variable selection score S j (Cinicioglu &Shenoy, 2012), provides a sound performance for prediction capacity of the resulting network, however has the drawback that the without parents or children punished for inclusion to the network. Motivated by that problem in this research we suggest a modification to the S j score, called as NS j which fixes the problems inherent in its predecessor S j. A credit score data set is used for applying the proposed heuristic. The performance of the resulting BNs using the proposed heuristic is evaluated using logscore and lift over marginal which provides the prediction capacity of the network and the improvement obtained using the provided model compared to the marginal model. These results are compared with the results obtained using the distance measure and the S j score. Accordingly, the new developed NS j score show better performance both in terms of prediction capacity and the improvement obtained. For further research, different variable selection scores from statistical models and different data sets may be used to evaluate the results of the proposed heuristic. Acknowledgements. We are grateful for two anonymous reviewers of IPMU-2014 for comments and suggestions for improvements. This research was funded by Istanbul University Research fund, project number References 1. Cinicioglu, E.N., Shenoy, P.P.: A new heuristic for learning Bayesian networks from limited datasets: a real-time recommendation system application with RFID systems in grocery stores. Annals of Operations Research, 1 21 (2012) 2. Cui, G., Wong, M.L., Zhang, G.: In Bayesian variable selection for binary response models and direct marketing forecasting. Expert Systems with Applications 37, (2010) 3. Heckerman, D., Chickering, D.M., Meek, C., Rounthwaite, R., Kadie, C.: Dependency Networks for Inference, Collaborative Filtering, and Data Visualization. Journal of Machine Learning Research 1, (2000) 4. Hruschka Jr., E.R., Hruschka, E.R., Ebecken, N.F.F.: Feature selection by bayesian networks. In: Tawfik, A.Y., Goodwin, S.D. (eds.) Canadian AI LNCS (LNAI), vol. 3060, pp Springer, Heidelberg (2004) 5. Koller, D., Sahami, M.: Toward optimal feature selection (1996) 6. Murphy, P.M., Aha, D.W.: UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California, Irvine, CA (1994) 7. Sun, L., Shenoy, P.P.: Using Bayesian networks for bankruptcy prediction: some methodological issues. European Journal of Operational Research 180(2), (2007)

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

A Comparison of Collaborative Filtering Methods for Medication Reconciliation A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,

More information

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1 6 December 2013 www.mssanz.org.au/modsim2013 Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision

More information

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s Using Bayesian Networks to Analyze Expression Data Xu Siwei, s0789023 Muhammad Ali Faisal, s0677834 Tejal Joshi, s0677858 Outline Introduction Bayesian Networks Equivalence Classes Applying to Expression

More information

Bayesian Networks Representation. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Bayesian Networks Representation. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University Bayesian Networks Representation Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University March 16 th, 2005 Handwriting recognition Character recognition, e.g., kernel SVMs r r r a c r rr

More information

Predicting Breast Cancer Survivability Rates

Predicting Breast Cancer Survivability Rates Predicting Breast Cancer Survivability Rates For data collected from Saudi Arabia Registries Ghofran Othoum 1 and Wadee Al-Halabi 2 1 Computer Science, Effat University, Jeddah, Saudi Arabia 2 Computer

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017 RESEARCH ARTICLE Classification of Cancer Dataset in Data Mining Algorithms Using R Tool P.Dhivyapriya [1], Dr.S.Sivakumar [2] Research Scholar [1], Assistant professor [2] Department of Computer Science

More information

Prediction of Malignant and Benign Tumor using Machine Learning

Prediction of Malignant and Benign Tumor using Machine Learning Prediction of Malignant and Benign Tumor using Machine Learning Ashish Shah Department of Computer Science and Engineering Manipal Institute of Technology, Manipal University, Manipal, Karnataka, India

More information

CS 4365: Artificial Intelligence Recap. Vibhav Gogate

CS 4365: Artificial Intelligence Recap. Vibhav Gogate CS 4365: Artificial Intelligence Recap Vibhav Gogate Exam Topics Search BFS, DFS, UCS, A* (tree and graph) Completeness and Optimality Heuristics: admissibility and consistency CSPs Constraint graphs,

More information

Evaluating Classifiers for Disease Gene Discovery

Evaluating Classifiers for Disease Gene Discovery Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics

More information

Using Bayesian Networks to Direct Stochastic Search in Inductive Logic Programming

Using Bayesian Networks to Direct Stochastic Search in Inductive Logic Programming Appears in Proceedings of the 17th International Conference on Inductive Logic Programming (ILP). Corvallis, Oregon, USA. June, 2007. Using Bayesian Networks to Direct Stochastic Search in Inductive Logic

More information

A Bayesian Approach to Tackling Hard Computational Challenges

A Bayesian Approach to Tackling Hard Computational Challenges A Bayesian Approach to Tackling Hard Computational Challenges Eric Horvitz Microsoft Research Joint work with: Y. Ruan, C. Gomes, H. Kautz, B. Selman, D. Chickering MS Research University of Washington

More information

Identifying the Zygosity Status of Twins Using Bayes Network and Estimation- Maximization Methodology

Identifying the Zygosity Status of Twins Using Bayes Network and Estimation- Maximization Methodology Identifying the Zygosity Status of Twins Using Bayes Network and Estimation- Maximization Methodology Yicun Ni (ID#: 9064804041), Jin Ruan (ID#: 9070059457), Ying Zhang (ID#: 9070063723) Abstract As the

More information

TEACHING YOUNG GROWNUPS HOW TO USE BAYESIAN NETWORKS.

TEACHING YOUNG GROWNUPS HOW TO USE BAYESIAN NETWORKS. TEACHING YOUNG GROWNUPS HOW TO USE BAYESIAN NETWORKS Stefan Krauss 1, Georg Bruckmaier 1 and Laura Martignon 2 1 Institute of Mathematics and Mathematics Education, University of Regensburg, Germany 2

More information

Improving the Accuracy of Neuro-Symbolic Rules with Case-Based Reasoning

Improving the Accuracy of Neuro-Symbolic Rules with Case-Based Reasoning Improving the Accuracy of Neuro-Symbolic Rules with Case-Based Reasoning Jim Prentzas 1, Ioannis Hatzilygeroudis 2 and Othon Michail 2 Abstract. In this paper, we present an improved approach integrating

More information

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

DIAGNOSIS AND PREDICTION OF TRAFFIC CONGESTION ON URBAN ROAD NETWORKS USING BAYESIAN NETWORKS

DIAGNOSIS AND PREDICTION OF TRAFFIC CONGESTION ON URBAN ROAD NETWORKS USING BAYESIAN NETWORKS Kim and Wang 0 0 DIAGNOSIS AND PREDICTION OF TRAFFIC CONGESTION ON URBAN ROAD NETWORKS USING BAYESIAN NETWORKS Jiwon Kim, Corresponding Author The University of Queensland Brisbane St Lucia, QLD 0, Australia

More information

Selection and Combination of Markers for Prediction

Selection and Combination of Markers for Prediction Selection and Combination of Markers for Prediction NACC Data and Methods Meeting September, 2010 Baojiang Chen, PhD Sarah Monsell, MS Xiao-Hua Andrew Zhou, PhD Overview 1. Research motivation 2. Describe

More information

Lecture 3: Bayesian Networks 1

Lecture 3: Bayesian Networks 1 Lecture 3: Bayesian Networks 1 Jingpeng Li 1 Content Reminder from previous lecture: Bayes theorem Bayesian networks Why are they currently interesting? Detailed example from medical diagnostics Bayesian

More information

Credal decision trees in noisy domains

Credal decision trees in noisy domains Credal decision trees in noisy domains Carlos J. Mantas and Joaquín Abellán Department of Computer Science and Artificial Intelligence University of Granada, Granada, Spain {cmantas,jabellan}@decsai.ugr.es

More information

Outline. What s inside this paper? My expectation. Software Defect Prediction. Traditional Method. What s inside this paper?

Outline. What s inside this paper? My expectation. Software Defect Prediction. Traditional Method. What s inside this paper? Outline A Critique of Software Defect Prediction Models Norman E. Fenton Dongfeng Zhu What s inside this paper? What kind of new technique was developed in this paper? Research area of this technique?

More information

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such

More information

Statistical Analysis Using Machine Learning Approach for Multiple Imputation of Missing Data

Statistical Analysis Using Machine Learning Approach for Multiple Imputation of Missing Data Statistical Analysis Using Machine Learning Approach for Multiple Imputation of Missing Data S. Kanchana 1 1 Assistant Professor, Faculty of Science and Humanities SRM Institute of Science & Technology,

More information

A Bayesian Network Model of Knowledge-Based Authentication

A Bayesian Network Model of Knowledge-Based Authentication Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2007 Proceedings Americas Conference on Information Systems (AMCIS) December 2007 A Bayesian Network Model of Knowledge-Based Authentication

More information

Variable Features Selection for Classification of Medical Data using SVM

Variable Features Selection for Classification of Medical Data using SVM Variable Features Selection for Classification of Medical Data using SVM Monika Lamba USICT, GGSIPU, Delhi, India ABSTRACT: The parameters selection in support vector machines (SVM), with regards to accuracy

More information

Model reconnaissance: discretization, naive Bayes and maximum-entropy. Sanne de Roever/ spdrnl

Model reconnaissance: discretization, naive Bayes and maximum-entropy. Sanne de Roever/ spdrnl Model reconnaissance: discretization, naive Bayes and maximum-entropy Sanne de Roever/ spdrnl December, 2013 Description of the dataset There are two datasets: a training and a test dataset of respectively

More information

AALBORG UNIVERSITY. Prediction of the Insulin Sensitivity Index using Bayesian Network. Susanne G. Bøttcher and Claus Dethlefsen

AALBORG UNIVERSITY. Prediction of the Insulin Sensitivity Index using Bayesian Network. Susanne G. Bøttcher and Claus Dethlefsen AALBORG UNIVERSITY Prediction of the Insulin Sensitivity Index using Bayesian Network by Susanne G. Bøttcher and Claus Dethlefsen May 2004 R-2004-14 Department of Mathematical Sciences Aalborg University

More information

Towards Effective Structure Learning for Large Bayesian Networks

Towards Effective Structure Learning for Large Bayesian Networks From: AAAI Technical Report WS-02-. Compilation copyright 2002, AAAI (www.aaai.org). All rights reserved. Towards Effective Structure Learning for Large Bayesian Networks Prashant Doshi pdoshi@cs.uic.edu

More information

On the Combination of Collaborative and Item-based Filtering

On the Combination of Collaborative and Item-based Filtering On the Combination of Collaborative and Item-based Filtering Manolis Vozalis 1 and Konstantinos G. Margaritis 1 University of Macedonia, Dept. of Applied Informatics Parallel Distributed Processing Laboratory

More information

Decisions and Dependence in Influence Diagrams

Decisions and Dependence in Influence Diagrams JMLR: Workshop and Conference Proceedings vol 52, 462-473, 2016 PGM 2016 Decisions and Dependence in Influence Diagrams Ross D. hachter Department of Management cience and Engineering tanford University

More information

Application of Bayesian Network Model for Enterprise Risk Management of Expressway Management Corporation

Application of Bayesian Network Model for Enterprise Risk Management of Expressway Management Corporation 2011 International Conference on Innovation, Management and Service IPEDR vol.14(2011) (2011) IACSIT Press, Singapore Application of Bayesian Network Model for Enterprise Risk Management of Expressway

More information

Automated Medical Diagnosis using K-Nearest Neighbor Classification

Automated Medical Diagnosis using K-Nearest Neighbor Classification (IMPACT FACTOR 5.96) Automated Medical Diagnosis using K-Nearest Neighbor Classification Zaheerabbas Punjani 1, B.E Student, TCET Mumbai, Maharashtra, India Ankush Deora 2, B.E Student, TCET Mumbai, Maharashtra,

More information

You must answer question 1.

You must answer question 1. Research Methods and Statistics Specialty Area Exam October 28, 2015 Part I: Statistics Committee: Richard Williams (Chair), Elizabeth McClintock, Sarah Mustillo You must answer question 1. 1. Suppose

More information

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Timothy N. Rubin (trubin@uci.edu) Michael D. Lee (mdlee@uci.edu) Charles F. Chubb (cchubb@uci.edu) Department of Cognitive

More information

Artificial Intelligence For Homeopathic Remedy Selection

Artificial Intelligence For Homeopathic Remedy Selection Artificial Intelligence For Homeopathic Remedy Selection A. R. Pawar, amrut.pawar@yahoo.co.in, S. N. Kini, snkini@gmail.com, M. R. More mangeshmore88@gmail.com Department of Computer Science and Engineering,

More information

Expert System Profile

Expert System Profile Expert System Profile GENERAL Domain: Medical Main General Function: Diagnosis System Name: INTERNIST-I/ CADUCEUS (or INTERNIST-II) Dates: 1970 s 1980 s Researchers: Ph.D. Harry Pople, M.D. Jack D. Myers

More information

Case Studies of Signed Networks

Case Studies of Signed Networks Case Studies of Signed Networks Christopher Wang December 10, 2014 Abstract Many studies on signed social networks focus on predicting the different relationships between users. However this prediction

More information

Keywords Artificial Neural Networks (ANN), Echocardiogram, BPNN, RBFNN, Classification, survival Analysis.

Keywords Artificial Neural Networks (ANN), Echocardiogram, BPNN, RBFNN, Classification, survival Analysis. Design of Classifier Using Artificial Neural Network for Patients Survival Analysis J. D. Dhande 1, Dr. S.M. Gulhane 2 Assistant Professor, BDCE, Sevagram 1, Professor, J.D.I.E.T, Yavatmal 2 Abstract The

More information

Application of Tree Structures of Fuzzy Classifier to Diabetes Disease Diagnosis

Application of Tree Structures of Fuzzy Classifier to Diabetes Disease Diagnosis , pp.143-147 http://dx.doi.org/10.14257/astl.2017.143.30 Application of Tree Structures of Fuzzy Classifier to Diabetes Disease Diagnosis Chang-Wook Han Department of Electrical Engineering, Dong-Eui University,

More information

Representing Association Classification Rules Mined from Health Data

Representing Association Classification Rules Mined from Health Data Representing Association Classification Rules Mined from Health Data Jie Chen 1, Hongxing He 1,JiuyongLi 4, Huidong Jin 1, Damien McAullay 1, Graham Williams 1,2, Ross Sparks 1,andChrisKelman 3 1 CSIRO

More information

Event Classification and Relationship Labeling in Affiliation Networks

Event Classification and Relationship Labeling in Affiliation Networks Event Classification and Relationship Labeling in Affiliation Networks Abstract Many domains are best described as an affiliation network in which there are entities such as actors, events and organizations

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018 Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this

More information

CHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY

CHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY 64 CHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY 3.1 PROBLEM DEFINITION Clinical data mining (CDM) is a rising field of research that aims at the utilization of data mining techniques to extract

More information

Mayuri Takore 1, Prof.R.R. Shelke 2 1 ME First Yr. (CSE), 2 Assistant Professor Computer Science & Engg, Department

Mayuri Takore 1, Prof.R.R. Shelke 2 1 ME First Yr. (CSE), 2 Assistant Professor Computer Science & Engg, Department Data Mining Techniques to Find Out Heart Diseases: An Overview Mayuri Takore 1, Prof.R.R. Shelke 2 1 ME First Yr. (CSE), 2 Assistant Professor Computer Science & Engg, Department H.V.P.M s COET, Amravati

More information

Bayesian (Belief) Network Models,

Bayesian (Belief) Network Models, Bayesian (Belief) Network Models, 2/10/03 & 2/12/03 Outline of This Lecture 1. Overview of the model 2. Bayes Probability and Rules of Inference Conditional Probabilities Priors and posteriors Joint distributions

More information

A Cooperative Multiagent Architecture for Turkish Sign Tutors

A Cooperative Multiagent Architecture for Turkish Sign Tutors A Cooperative Multiagent Architecture for Turkish Sign Tutors İlker Yıldırım Department of Computer Engineering Boğaziçi University Bebek, 34342, Istanbul, Turkey ilker.yildirim@boun.edu.tr 1 Introduction

More information

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15) ON THE COMPARISON OF BAYESIAN INFORMATION CRITERION AND DRAPER S INFORMATION CRITERION IN SELECTION OF AN ASYMMETRIC PRICE RELATIONSHIP: BOOTSTRAP SIMULATION RESULTS Henry de-graft Acquah, Senior Lecturer

More information

A Cue Imputation Bayesian Model of Information Aggregation

A Cue Imputation Bayesian Model of Information Aggregation A Cue Imputation Bayesian Model of Information Aggregation Jennifer S. Trueblood, George Kachergis, and John K. Kruschke {jstruebl, gkacherg, kruschke}@indiana.edu Cognitive Science Program, 819 Eigenmann,

More information

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful.

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful. icausalbayes USER MANUAL INTRODUCTION You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful. We expect most of our users

More information

Color Difference Equations and Their Assessment

Color Difference Equations and Their Assessment Color Difference Equations and Their Assessment In 1976, the International Commission on Illumination, CIE, defined a new color space called CIELAB. It was created to be a visually uniform color space.

More information

Towards More Confident Recommendations: Improving Recommender Systems Using Filtering Approach Based on Rating Variance

Towards More Confident Recommendations: Improving Recommender Systems Using Filtering Approach Based on Rating Variance Towards More Confident Recommendations: Improving Recommender Systems Using Filtering Approach Based on Rating Variance Gediminas Adomavicius gedas@umn.edu Sreeharsha Kamireddy 2 skamir@cs.umn.edu YoungOk

More information

2016 Children and young people s inpatient and day case survey

2016 Children and young people s inpatient and day case survey NHS Patient Survey Programme 2016 Children and young people s inpatient and day case survey Technical details for analysing trust-level results Published November 2017 CQC publication Contents 1. Introduction...

More information

A FRAMEWORK FOR CLINICAL DECISION SUPPORT IN INTERNAL MEDICINE A PRELIMINARY VIEW Kopecky D 1, Adlassnig K-P 1

A FRAMEWORK FOR CLINICAL DECISION SUPPORT IN INTERNAL MEDICINE A PRELIMINARY VIEW Kopecky D 1, Adlassnig K-P 1 A FRAMEWORK FOR CLINICAL DECISION SUPPORT IN INTERNAL MEDICINE A PRELIMINARY VIEW Kopecky D 1, Adlassnig K-P 1 Abstract MedFrame provides a medical institution with a set of software tools for developing

More information

An assistive application identifying emotional state and executing a methodical healing process for depressive individuals.

An assistive application identifying emotional state and executing a methodical healing process for depressive individuals. An assistive application identifying emotional state and executing a methodical healing process for depressive individuals. Bandara G.M.M.B.O bhanukab@gmail.com Godawita B.M.D.T tharu9363@gmail.com Gunathilaka

More information

Using Bayesian Networks for Daily Activity Prediction

Using Bayesian Networks for Daily Activity Prediction Plan, Activity, and Intent Recognition: Papers from the AAAI 2013 Workshop Using Bayesian Networks for Daily Activity Prediction Ehsan Nazerfard School of Electrical Eng. and Computer Science Washington

More information

Rethinking Cognitive Architecture!

Rethinking Cognitive Architecture! Rethinking Cognitive Architecture! Reconciling Uniformity and Diversity via Graphical Models! Paul Rosenbloom!!! 1/25/2010! Department of Computer Science &! Institute for Creative Technologies! The projects

More information

and errs as expected. The disadvantage of this approach is that it is time consuming, due to the fact that it is necessary to evaluate all algorithms,

and errs as expected. The disadvantage of this approach is that it is time consuming, due to the fact that it is necessary to evaluate all algorithms, Data transformation and model selection by experimentation and meta-learning Pavel B. Brazdil LIACC, FEP - University of Porto Rua Campo Alegre, 823 4150 Porto, Portugal Email: pbrazdil@ncc.up.pt Research

More information

Statistics are commonly used in most fields of study and are regularly seen in newspapers, on television, and in professional work.

Statistics are commonly used in most fields of study and are regularly seen in newspapers, on television, and in professional work. I. Introduction and Data Collection A. Introduction to Statistics In this section Basic Statistical Terminology Branches of Statistics Types of Studies Types of Data Levels of Measurement 1. Basic Statistical

More information

Bayesian approaches to handling missing data: Practical Exercises

Bayesian approaches to handling missing data: Practical Exercises Bayesian approaches to handling missing data: Practical Exercises 1 Practical A Thanks to James Carpenter and Jonathan Bartlett who developed the exercise on which this practical is based (funded by ESRC).

More information

Lecturer: Rob van der Willigen 11/9/08

Lecturer: Rob van der Willigen 11/9/08 Auditory Perception - Detection versus Discrimination - Localization versus Discrimination - - Electrophysiological Measurements Psychophysical Measurements Three Approaches to Researching Audition physiology

More information

Recent advances in non-experimental comparison group designs

Recent advances in non-experimental comparison group designs Recent advances in non-experimental comparison group designs Elizabeth Stuart Johns Hopkins Bloomberg School of Public Health Department of Mental Health Department of Biostatistics Department of Health

More information

Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool

Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool Sujata Joshi Assistant Professor, Dept. of CSE Nitte Meenakshi Institute of Technology Bangalore,

More information

Lecturer: Rob van der Willigen 11/9/08

Lecturer: Rob van der Willigen 11/9/08 Auditory Perception - Detection versus Discrimination - Localization versus Discrimination - Electrophysiological Measurements - Psychophysical Measurements 1 Three Approaches to Researching Audition physiology

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 Exam policy: This exam allows one one-page, two-sided cheat sheet; No other materials. Time: 80 minutes. Be sure to write your name and

More information

A Comparison of Three Measures of the Association Between a Feature and a Concept

A Comparison of Three Measures of the Association Between a Feature and a Concept A Comparison of Three Measures of the Association Between a Feature and a Concept Matthew D. Zeigenfuse (mzeigenf@msu.edu) Department of Psychology, Michigan State University East Lansing, MI 48823 USA

More information

Prognostic Prediction in Patients with Amyotrophic Lateral Sclerosis using Probabilistic Graphical Models

Prognostic Prediction in Patients with Amyotrophic Lateral Sclerosis using Probabilistic Graphical Models Prognostic Prediction in Patients with Amyotrophic Lateral Sclerosis using Probabilistic Graphical Models José Jorge Dos Reis Instituto Superior Técnico, Lisboa November 2014 Abstract Amyotrophic Lateral

More information

Predicting Breast Cancer Survival Using Treatment and Patient Factors

Predicting Breast Cancer Survival Using Treatment and Patient Factors Predicting Breast Cancer Survival Using Treatment and Patient Factors William Chen wchen808@stanford.edu Henry Wang hwang9@stanford.edu 1. Introduction Breast cancer is the leading type of cancer in women

More information

Article from. Forecasting and Futurism. Month Year July 2015 Issue Number 11

Article from. Forecasting and Futurism. Month Year July 2015 Issue Number 11 Article from Forecasting and Futurism Month Year July 2015 Issue Number 11 Calibrating Risk Score Model with Partial Credibility By Shea Parkes and Brad Armstrong Risk adjustment models are commonly used

More information

Comparative Analysis of Machine Learning Algorithms for Chronic Kidney Disease Detection using Weka

Comparative Analysis of Machine Learning Algorithms for Chronic Kidney Disease Detection using Weka I J C T A, 10(8), 2017, pp. 59-67 International Science Press ISSN: 0974-5572 Comparative Analysis of Machine Learning Algorithms for Chronic Kidney Disease Detection using Weka Milandeep Arora* and Ajay

More information

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China A Vision-based Affective Computing System Jieyu Zhao Ningbo University, China Outline Affective Computing A Dynamic 3D Morphable Model Facial Expression Recognition Probabilistic Graphical Models Some

More information

Consumer Review Analysis with Linear Regression

Consumer Review Analysis with Linear Regression Consumer Review Analysis with Linear Regression Cliff Engle Antonio Lupher February 27, 2012 1 Introduction Sentiment analysis aims to classify people s sentiments towards a particular subject based on

More information

Graphical Modeling Approaches for Estimating Brain Networks

Graphical Modeling Approaches for Estimating Brain Networks Graphical Modeling Approaches for Estimating Brain Networks BIOS 516 Suprateek Kundu Department of Biostatistics Emory University. September 28, 2017 Introduction My research focuses on understanding how

More information

Modeling Sentiment with Ridge Regression

Modeling Sentiment with Ridge Regression Modeling Sentiment with Ridge Regression Luke Segars 2/20/2012 The goal of this project was to generate a linear sentiment model for classifying Amazon book reviews according to their star rank. More generally,

More information

Numerical Integration of Bivariate Gaussian Distribution

Numerical Integration of Bivariate Gaussian Distribution Numerical Integration of Bivariate Gaussian Distribution S. H. Derakhshan and C. V. Deutsch The bivariate normal distribution arises in many geostatistical applications as most geostatistical techniques

More information

Adaptive Thresholding in Structure Learning of a Bayesian Network

Adaptive Thresholding in Structure Learning of a Bayesian Network Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Adaptive Thresholding in Structure Learning of a Bayesian Network Boaz Lerner, Michal Afek, Rafi Bojmel Ben-Gurion

More information

Applying Machine Learning Techniques to Analysis of Gene Expression Data: Cancer Diagnosis

Applying Machine Learning Techniques to Analysis of Gene Expression Data: Cancer Diagnosis Applying Machine Learning Techniques to Analysis of Gene Expression Data: Cancer Diagnosis Kyu-Baek Hwang, Dong-Yeon Cho, Sang-Wook Park Sung-Dong Kim, and Byoung-Tak Zhang Artificial Intelligence Lab

More information

IE 5203 Decision Analysis Lab I Probabilistic Modeling, Inference and Decision Making with Netica

IE 5203 Decision Analysis Lab I Probabilistic Modeling, Inference and Decision Making with Netica IE 5203 Decision Analysis Lab I Probabilistic Modeling, Inference and Decision Making with Netica Overview of Netica Software Netica Application is a comprehensive tool for working with Bayesian networks

More information

A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction

A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction Samuel Giftson Durai Research Scholar, Dept. of CS Bishop Heber College Trichy-17, India S. Hari Ganesh, PhD Assistant

More information

We, at Innovatech Group, have designed xtrack, an easy-to-use workout application that tracks the fitness progress of the user, asking the user to

We, at Innovatech Group, have designed xtrack, an easy-to-use workout application that tracks the fitness progress of the user, asking the user to 2 We, at Innovatech Group, have designed xtrack, an easy-to-use workout application that tracks the fitness progress of the user, asking the user to input information before and after each workout session.

More information

MODEL SELECTION STRATEGIES. Tony Panzarella

MODEL SELECTION STRATEGIES. Tony Panzarella MODEL SELECTION STRATEGIES Tony Panzarella Lab Course March 20, 2014 2 Preamble Although focus will be on time-to-event data the same principles apply to other outcome data Lab Course March 20, 2014 3

More information

Representation and Analysis of Medical Decision Problems with Influence. Diagrams

Representation and Analysis of Medical Decision Problems with Influence. Diagrams Representation and Analysis of Medical Decision Problems with Influence Diagrams Douglas K. Owens, M.D., M.Sc., VA Palo Alto Health Care System, Palo Alto, California, Section on Medical Informatics, Department

More information

Empirical function attribute construction in classification learning

Empirical function attribute construction in classification learning Pre-publication draft of a paper which appeared in the Proceedings of the Seventh Australian Joint Conference on Artificial Intelligence (AI'94), pages 29-36. Singapore: World Scientific Empirical function

More information

Prediction Model For Risk Of Breast Cancer Considering Interaction Between The Risk Factors

Prediction Model For Risk Of Breast Cancer Considering Interaction Between The Risk Factors INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME, ISSUE 0, SEPTEMBER 01 ISSN 81 Prediction Model For Risk Of Breast Cancer Considering Interaction Between The Risk Factors Nabila Al Balushi

More information

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework Thomas E. Rothenfluh 1, Karl Bögl 2, and Klaus-Peter Adlassnig 2 1 Department of Psychology University of Zurich, Zürichbergstraße

More information

A Rough Set Theory Approach to Diabetes

A Rough Set Theory Approach to Diabetes , pp.50-54 http://dx.doi.org/10.14257/astl.2017.145.10 A Rough Set Theory Approach to Diabetes Shantan Sawa, Ronnie D. Caytiles* and N. Ch. S. N Iyengar** School of Computer Science and Engineering VIT

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2013/01/08 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

John Smith 20 October 2009

John Smith 20 October 2009 John Smith 20 October 2009 2009 MySkillsProfile.com. All rights reserved. Introduction The Emotional Competencies Questionnaire (ECQ) assesses your current emotional competencies and style by asking you

More information

The Mapping and Analysis of Transportation Needs in Haliburton County Analytical Report. Breanna Webber Viyanka Suthaskaran

The Mapping and Analysis of Transportation Needs in Haliburton County Analytical Report. Breanna Webber Viyanka Suthaskaran The Mapping and Analysis of Transportation Needs in Haliburton County Analytical Report Breanna Webber Viyanka Suthaskaran Host Organization: Haliburton Transportation Task Force Table Of Contents Introduction

More information

Prediction of Average and Perceived Polarity in Online Journalism

Prediction of Average and Perceived Polarity in Online Journalism Prediction of Average and Perceived Polarity in Online Journalism Albert Chu, Kensen Shi, Catherine Wong Abstract We predicted the average and perceived journalistic objectivity in online news articles

More information

Scientific Journal of Informatics Vol. 3, No. 2, November p-issn e-issn

Scientific Journal of Informatics Vol. 3, No. 2, November p-issn e-issn Scientific Journal of Informatics Vol. 3, No. 2, November 2016 p-issn 2407-7658 http://journal.unnes.ac.id/nju/index.php/sji e-issn 2460-0040 The Effect of Best First and Spreadsubsample on Selection of

More information

Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering

Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering Kunal Sharma CS 4641 Machine Learning Abstract Clustering is a technique that is commonly used in unsupervised

More information

Breast screening: understanding case difficulty and the nature of errors

Breast screening: understanding case difficulty and the nature of errors Loughborough University Institutional Repository Breast screening: understanding case difficulty and the nature of errors This item was submitted to Loughborough University's Institutional Repository by

More information

Using Association Rule Mining to Discover Temporal Relations of Daily Activities

Using Association Rule Mining to Discover Temporal Relations of Daily Activities Using Association Rule Mining to Discover Temporal Relations of Daily Activities Ehsan Nazerfard, Parisa Rashidi, and Diane J. Cook School of Electrical Engineering and Computer Science Washington State

More information

Two-sample Categorical data: Measuring association

Two-sample Categorical data: Measuring association Two-sample Categorical data: Measuring association Patrick Breheny October 27 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 40 Introduction Study designs leading to contingency

More information

A Hierarchical Artificial Neural Network Model for Giemsa-Stained Human Chromosome Classification

A Hierarchical Artificial Neural Network Model for Giemsa-Stained Human Chromosome Classification A Hierarchical Artificial Neural Network Model for Giemsa-Stained Human Chromosome Classification JONGMAN CHO 1 1 Department of Biomedical Engineering, Inje University, Gimhae, 621-749, KOREA minerva@ieeeorg

More information

Comparative analysis of data mining tools for lungs cancer patients

Comparative analysis of data mining tools for lungs cancer patients Journal of Information & Communication Technology Vol. 9, No. 1, (Spring2015) 33-40 Comparative analysis of data mining tools for lungs cancer patients Adnan Alam khan * Institute of Business & Technology

More information

Trading off coverage for accuracy in forecasts: Applications to clinical data analysis

Trading off coverage for accuracy in forecasts: Applications to clinical data analysis Trading off coverage for accuracy in forecasts: Applications to clinical data analysis Michael J Pazzani, Patrick Murphy, Kamal Ali, and David Schulenburg Department of Information and Computer Science

More information