Learning to Identify Irrelevant State Variables

Size: px
Start display at page:

Download "Learning to Identify Irrelevant State Variables"

Transcription

1 Learning to Identify Irrelevant State Variables Nicholas K. Jong Department of Computer Sciences University of Texas at Austin Austin, Texas Peter Stone Department of Computer Sciences University of Texas at Austin Austin, Texas Abstract When they are available, safe state abstractions improve the efficiency of reinforcement learning algorithms by allowing an agent to ignore irrelevant distinctions between states while still learning an optimal policy. Prior work investigated how to incorporate state abstractions into existing algorithms, but most approaches required the user to provide the abstraction. How to discover this kind of domain knowledge automatically remains a challenging open problem. In this paper, we introduce a general approach for testing the validity of a potential state abstraction. We reduce the problem to one of determining whether an action is optimal in every state in a given set. To decide optimality we give two statistical methods, which trade off between computational and sample complexity. One of these methods applies statistical hypothesis testing directly to learned state-action values, and the other applies Monte Carlo sampling to a learned Bayesian model. Finally, we demonstrate the ability of these methods to discriminate between safe and unsafe state abstractions in the familiar Taxi domain. 1 Introduction Reinforcement learning (RL) addresses the problem of how an agent ought to select actions in a Markov decision problem (MDP) so as to maximize its expected reward despite not knowing the transition and reward functions beforehand. Early work in this field led to simple algorithms that guarantee convergence to optimal behavior in the limit, but the rate of convergence has proven unacceptable for large, real-world applications. One key problem is the choice of state representation. The representation must include enough state variables for the problem to be Markov, but too many state variables incur the curse of dimensionality. Since the number of potential state variables is typically quite large for interesting problems, an important step in specifying an RL task is selecting those variables that are most relevant for learning. In this paper, we consider the task of automatically recognizing that a certain state variable is irrelevant. We define a state variable as irrelevant if an agent can completely ignore the variable and still behave optimally. For each state variable that it learns to ignore, an agent can significantly increase the efficiency of future training. The overall learning efficiency thus becomes more robust to the initial choice of state representation.

2 In general, an agent must learn a particular task rather well before it can reach safe conclusions about relevancy. One premise of our work is that an abstraction learned in one problem instance is likely to apply to other, similar problems. Learning in these subsequent problems can be accomplished with fewer state variables and therefore more efficiently. In this way an agent might learn from a comparatively easy problem a state representation that applies to a more difficult but related problem. Our work is motivated in part by recent work on temporal abstractions and hierarchy in RL [1, 4, 7, 8]. The introduction of reusable subtasks creates an opportunity for applying dynamic state abstractions, which apply at some parts of the hierarchy but not others. In this context a flexible mechanism for the automated discovery of abstractions is particularly important, since otherwise the user must consider individually each task in a potentially large hierarchy. Furthermore, a method for discovering the conditions under which a state abstraction applies may prove useful in the discovery of the task decomposition itself. For this reason we develop our approach in the context of a non-hierarchical learning algorithm yet in a domain familiar from the hierarchical learning literature. The main contributions of this paper are (i) our reformulation of the question of state irrelevance into a question of action optimality and (ii) the methods we give for answering this latter question. In Section 2 we describe the domain in which we develop our ideas. In Section 3 we give our definition of state irrelevance in terms of action optimality. In Section 4 we describe two distinct statistical methods for deciding whether an action is optimal. In Section 5 we show that both methods yield the desired results, but with differing levels of computational and sample complexity. In Section 6 we discuss related work, and in Section 7 we conclude. 2 Safe state abstractions in the Taxi domain We use Dietterich s Taxi domain [4], illustrated in Figure 1, as the setting for our work. This domain has four state variables. The first two correspond to the taxi s current position in the grid world. The third indicates the passenger s current location, at one of the four labeled positions (Red, Green, Blue, and Yellow) or inside the taxi. The fourth indicates the labeled position where the passenger would like to go. The domain therefore has = 500 possible states. At each time step, the taxi may move north, move south, move east, move west, attempt to pick up the passenger, or attempt to put down Figure 1: The Taxi domain. the passenger. Actions that would move the taxi through a wall or off the grid have no effect. Every action has a reward of -1, except illegal attempts to pick up or put down the passenger, which have reward -10. The agent receives a reward of +20 for achieving a goal state, in which the passenger is at the destination (and not inside the taxi). In this paper, we consider the stochastic version of the domain. Whenever the taxi attempts to move, the resulting motion occurs in a random perpendicular direction with probability 0.2. Furthermore, once the taxi picks up the passenger and begins to move, the destination changes with probability 0.3. Dietterich demonstrates that a handcrafted task hierarchy can facilitate learning in this domain. The crucial reusable tasks in his hierarchy are those that take the taxi to each of the four landmarks. For example, an agent can execute a task that navigates to the Red landmark whenever it must pick up a passenger there and also whenever it must deliver a passenger there. Dietterich also observes that the location of the passenger and the passenger s final destination are irrelevant to the task of travelling to the Red landmark. State abstractions such as this one are what allow his MAXQ framework to learn the Taxi domain efficiently.

3 How might a learning algorithm discover this abstraction autonomously, based only on experience with the domain? We consider this question in a non-hierarchical framework. Even without a task decomposition, an agent can safely ignore the passenger s final destination in any state where the passenger is not inside the taxi, since the optimal action then does not depend on the final destination. Our approach learns this static state abstraction, which we can then apply to both non-hierarchical and hierarchical algorithms. 3 Defining irrelevance Suppose without loss of generality that the n + 1 state variables of an MDP are X 1, X 2,... X n = X and Y. Let X denote a set of possible values for X, determining a region of the state space. We wish to determine whether or not knowing the value of Y affects the quality of an agent s decisions in this region of the state space. One simple sufficient condition is that the agent s learned policy ˆπ ignores Y : x X, y 1, y 2 : ˆπ( x, y 1 ) = ˆπ( x, y 2 ). However, this condition is too strong in practice. If some states have more than one optimal action, then the learned policy may specify one action when Y = y 1 and a different one when Y = y 2, due to variance in the learned Q-values. We instead examine the Q-values directly. We check that in every case there exists some action that achieves the maximum expected reward regardless of the value of Y : x X a y : ˆQ( x, y, a) ˆV ( x, y). Essentially, this condition examines the learned Q-values to determine whether a policy exists that ignores Y. However, our determination of whether an action maximizes the expected reward must be robust to uncertainty in the value estimates. Learning algorithms that mix exploration and exploitation are especially likely to attain accurate value estimates for only one optimal action from a given state. For example, consider a state in the stochastic taxi domain where the passenger is in the upper left corner and the taxi is in the upper right corner. To maximize expected reward, an agent must navigate to the passenger as quickly as possible. Due to the configuration of obstacles in the world, both moving south and moving west are optimal actions from this state, regardless of the passenger s eventual destination. The following table shows some of the learned Q-values for this situation, obtained using Q-learning with Boltzmann exploration. 1 Dest Action Q Blue West Blue South Green West Green South For each destination, the value of moving south and of moving west should be approximately the same, but the exploitation component of the learning policy caused Q-learning only to converge to a correct estimate for one of the two optimal actions. Q-learning with an exploitative policy is an extreme case, but even an algorithm that explores the domain in a more balanced fashion is likely to have different estimates for Q- values that are in truth the same, simply due to the stochastic nature of the domain. To determine whether one Q-value is greater than another, we must take into account the uncertainty in our estimate. 1 For all the Q-learning runs in this paper, we used a starting temperature of 50, a cooling rate of , a learning rate of 0.25, and no discount factor.

4 4 Testing hypotheses To evaluate whether a state-action value is optimal, we draw inspiration from statistical hypothesis testing. In this family of techniques, we consider a default hypothesis (called the null hypothesis) and wish to determine whether some data tends to refute this hypothesis. We calculate a certain scalar statistic of the data and determine the distribution of this statistic assuming the null hypothesis is true. Then we compute the likelihood p of observing a value as extreme as the observed statistic given that distribution. We reject the null hypothesis if and only if this likelihood falls below some predetermined threshold (called the significance level), indicating that the statistic lies in the unlikely tail of the distribution. In our framework we define for each state ( x, y) and action a a separate null hypothesis that a is an optimal action in ( x, y): Q( x, y, a) V ( x, y). If for a given x we accept the null hypothesis for all y, then action a is optimal regardless of the value of Y. In this case Y is irrelevant given x according to our definition of irrelevance. Conversely, Y is relevant given x if for every action we reject the null hypothesis for some value y. 4.1 Classical hypothesis testing If we regard each state-action value as a random variable, then we apply established statistical tests for determining whether the mean of two random variables differ. This straightforward approach requires us to draw independent samples of the estimated value for each state-action pair. We can obtain this sample by repeatedly running any RL algorithm that computes these estimates until it converges to an optimal policy. After n runs we have a sample of size n of each state-action value. Instead of directly testing the hypothesis that a is an optimal action in state s, we test the hypothesis that Q(s, a) Q(s, a ) for each other action a. Only if we accept all of these hypotheses do we accept the hypothesis that a is optimal. If we assume that our sample of Q-values has a Gaussian distribution (for each state-action pair), we could use a paired t test to test these hypotheses. In general, we have reason to believe that the actual distribution is somewhat skewed since these values are the max of other values. Fortunately, the statistical literature provides a test that does not require us to know the distribution of our sample: the Wilcoxon signed ranks test [3]. This test computes a statistic of the difference between Q(s, a) and Q(s, a ) for each run that is known to converge to a Gaussian distribution for sufficiently large n. It outputs the maximum significance level at which we should still accept the hypothesis that Q(s, a) Q(s, a ). We then accept the hypothesis that a is optimal in state s if and only if the maximum significance level for each a is greater than our threshold. 4.2 Monte Carlo simulation The straightforward implementation described above makes very poor use of experience data, since each time step contributes to only one of the sample solutions. Here we develop an alternate approach that draws upon recent work in Bayesian MDP models [2]. This technique regards the successor state that results from a given state-action pair as a random variable drawn from a multinomial distribution. Using Bayesian parameter estimation techniques, we start with a prior probability distribution over the parameters for each multinomial and then update these distributions given experience data. The joint distribution over the transition probabilities and one step rewards for each state-action pair comprise the Bayesian model. This Bayesian MDP model thus represents a single probability distribution over MDPs whose mean converges in the limit to the MDP that generated the data. In our approach we use all of the experience data to learn a single Bayesian model of the do-

5 main. We then draw sample MDPs that are independent given the model and apply Monte Carlo simulation to make probabilistic statements about the Q-values of the underlying MDP. We directly estimate the probability that an action is optimal in a given state (given our prior distribution) as the fraction of samples in which the action is in fact optimal. We then accept the hypothesis that an action is optimal unless the estimated probability of optimality is too low: ˆQ( x, y, a) ˆV ( x, y) Pr(Q( x, y, a) = V ( x, y) h) p, where p is a significance level as in classical hypothesis testing. 5 Results We verified the correctness of both our statistical hypothesis testing and Monte Carlo approaches on the stochastic version of the Taxi domain. Both methods generally proceed in two phases. In the first phase, we run an established RL algorithm until convergence, perhaps multiple times. In the second phase, we use the output of the RL algorithm to accept or to reject the hypotheses that certain state variables are irrelevant in certain conditions. Evaluating all possible hypotheses of this form would be prohibitively expensive, so here we examine just two cases in the Taxi domain to demonstrate that we can discriminate between hypotheses that we should reject and hypotheses that we should accept. In the first case, the passenger is at the upper left landmark. In this case we wish to show that the passenger s destination is irrelevant to the optimal action, which is always to navigate towards the upper left landmark. In the second case, the passenger is inside the taxi. Here we wish to show that the passenger s destination is not irrelevant, since the optimal action is to navigate towards the destination. 5.1 The Wilcoxon signed ranks test To obtain the sample Q-values necessary to apply the Wilcoxon signed ranks test, we ran 25 independent instances of Q-learning with Boltzmann exploration for time steps each, enough time to ensure convergence to an optimal policy. Then for the two cases above, we applied the Wilcoxon signed ranks test to determine for each possible location of the taxi the maximum significance level at which we would conclude that the passenger s destination is relevant to the optimal policy. The following table displays the values obtained in a typical run for the first case, in which the passenger s destination is not relevant Consider the upper right hand square, for which we obtain the maximum significance level This number means that there exists some action a for which we accept, across all possible passenger destinations, the null hypothesis that a is optimal, so long as we choose a significance level p < Only if we choose a significance level p does there exist a passenger destination and an alternative action a such that we should reject the hypothesis that a is as good as a. We see from the table that on this trial for p < this approach correctly identifies the passenger destination as irrelevant for all 25 taxi locations (when the passenger is currently at the upper left landmark). The next table represents the case when the passenger is inside the taxi and the destination is generally relevant.

6 All but four locations in this case have extremely low p-values, suggesting that we reject the hypothesis that passenger destination is irrelevant (thus indicating that it is relevant) in these states. In the four locations with higher p-values yhe passenger destination actually is irrelevant: although the passenger is already inside the taxi, moving north is an optimal first action towards all four of the possible destinations. (Recall the possible passenger destinations as indicated in Figure 1.) These values indicate that on this trial this approach avoids false positive identifications of irrelevant state for p In ten trials, this approach never generated a p-value above for a state where the null hypothesis was false, and it never generated a p-value below for a state where the null hypothesis was true. Over these ten trials, a typical significance level of 0.05 would have correctly classified the relevancy of the passenger destination in every state. 5.2 Monte Carlo simulation We also validated our Monte Carlo approach on the Taxi domain. We used prioritized sweeping [6] with t Bored = 10 to ensure that the Bayesian model had at least ten samples for each reachable input to the transition function. We allowed the agent to explore for 40,000 time steps, enough to ensure that it completed its exploration. The agent assumed that the reward function was deterministic, so it knew all the one step rewards after visiting each state-action pair at least once. In general, if we do not make this assumption, then we must choose some prior distribution over rewards for each state-action pair. Since the Taxi domain has a deterministic reward function, we chose to avoid this complication in the work reported here. Furthermore, we initialized each parameter of the Dirichlet distributions to 0. This prior distribution is not formally a Dirichlet distribution, which assumes that each parameter is positive. However, we can still sample from these distributions by assuming that unobserved state transitions have probability 0. This improper prior has the advantage of yielding a Bayesian model whose mean is identical to the maximum likelihood model, and it is slightly more computationally efficient than the approach of Dearden et al [2]. After the exploration phase, we sampled 100 MDPs from the learned Bayesian model. We solved each of these using value iteration and examined the same two cases as in Section 5.1. The following table shows for each of the 25 taxi locations the maximum probability at which some action is optimal across all passenger destinations, given that the passenger is still waiting at the upper left landmark. In other words, each cell contains the quantity max a min y ˆPr(Q( x, y, a) = V ( x, y)), where x corresponds to the taxi location and passenger location Although these estimated probabilities do not convey the same formal meaning as the significance values that statistical hypothesis tests output, we may interpret them in a somewhat similar fashion. Consider the taxi location with the smallest estimated probability, which is If we start with the null hypothesis that some action is optimal at that lo-

7 cation across all passenger destinations, our Monte Carlo simulation gives us no reason to reject that hypothesis, since at least one action was optimal in 20 of the 100 sampled MDPs. The next table shows the estimated probabilities for the second case, when the passenger is inside the taxi Note that for all the locations where the passenger destination is in fact relevant, no action was optimal across passenger destinations in any of the 100 sampled MDPs. We can easily imagine setting a probability threshold similar in meaning to the significance level of statistical hypothesis tests. We then rejecting the null hypothesis only when the estimated probability falls below that threshold. In the ten trials that we ran, a threshold of 0.05 never caused any false negatives but did lead the algorithm erroneously to classify the passenger s destination as relevant in three instances out of (In each trial, the destination is irrelevant for each combination of four passenger locations and 25 taxi locations.) The principal cost of the Monte Carlo approach is computational. The process of learning the Bayesian model, sampling 100 MDPs, and performing value iteration until convergence 100 times required 335 seconds on a 2.8 GHz Pentium 4 CPU, in contrast to the 9 seconds required to run 25 instances of Q-Learning and to apply the Wilcoxon signed ranks test. On the other hand, the Monte Carlo approach makes more efficient use of the data, requiring only steps of direct experience with the environment instead of = steps. Thus one method emphasizes computational efficiency and the other sample complexity. Learning a state abstraction even from a solved task could be well worth the cost. For example, our implementation of prioritized sweeping required over 24 minutes to solve a random instance of the Taxi domain. In contrast, solving the original 5 5 domain, applying the Monte Carlo approach to discover when the passenger destination was irrelevant, and then using this abstraction solved the same instance in only 12.5 minutes. As with all forms of statistical hypothesis testing, random chance will occasionally cause these procedures to accept an incorrect hypothesis or to reject a correct hypothesis. Collecting more data can reduce the likelihood of error, but our results show that a our approach already fairly reliably discriminates between situations when the passenger destination is relevant to behaving optimally and when it is irrelevant. 6 Related work Our work bears a strong resemblance to aspects of McCallum s U-tree algorithm [5], which uses statistical hypothesis testing to determine what features to include in its state representation. U-tree is an online instance-based algorithm that adds a state variable to its representation if different values of the variable predict different distributions of expected future reward. The algorithm computes these distributions of values in part from the current representation, resulting in a circularity that prevents it from guaranteeing to converge on an optimal state abstraction. In contrast, our approach explicitly employs only state abstractions that preserve an optimal policy. Both of the methods we described in Section 4 require information obtained from a com-

8 plete solution to the given task, so at present they are most likely to be useful for finding state abstractions in small problems that might apply in similar but much larger problems. We leave for future work the question of how we might apply these techniques online to tasks that are not yet fully learned. In this situation the uncertainty in the value function is much larger, and our approach will tend to assume that all state variables are irrelevant in the absence of sufficient evidence to the contrary. We also also leave for future work how to determine what candidate state abstractions to test, if we cannot afford to test them all. 7 Conclusion This paper has addressed the problem of determining what state variables are relevant to the solution of an RL task. We defined the relevancy of a state variable in terms of the existance of an action that is optimal across all values of that state variable. We described two statistical methods for determining whether an action is optimal in a given state. One method applies an established statistical hypothesis test to Q-values obtained from independent runs of an RL algorithm. This method is as computationally efficient as the RL algorithm used. The other method applies Monte Carlo simulation to a learned Bayesian model and requires far less experience data. Finally, we demonstrated that both methods accurately identify the conditions under which a certain state variable is irrelevant in the Taxi domain. Acknowledgments We would like to thank Greg Kuhlmann for helpful comments and suggestions. This research was supported in part by NSF CAREER award IIS References 1. Andrew G. Barto and Sridhar Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete-Event Systems, 13:41 77, Special Issue on Reinforcement Learning. 2. Richard Dearden, Nir Friedman, and David Andre. Model based Bayesian exploration. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pages , Morris H. Degroot. Probability and Statistics. Addison-Wesley Pub Co, 2nd edition, Thomas G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13: , Andrew Kachites McCallum. Reinforcement Learning with Selective Perception and Hidden State. PhD thesis, University of Rochester, Andrew W. Moore and Christopher G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13: , Ronald Parr and Stuart Russell. Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems 10, Richard S. Sutton, Doina Precup, and Satinder Singh. Between MDPs and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1 2): , 1999.

Towards Learning to Ignore Irrelevant State Variables

Towards Learning to Ignore Irrelevant State Variables Towards Learning to Ignore Irrelevant State Variables Nicholas K. Jong and Peter Stone Department of Computer Sciences University of Texas at Austin Austin, Texas 78712 {nkj,pstone}@cs.utexas.edu Abstract

More information

Exploration and Exploitation in Reinforcement Learning

Exploration and Exploitation in Reinforcement Learning Exploration and Exploitation in Reinforcement Learning Melanie Coggan Research supervised by Prof. Doina Precup CRA-W DMP Project at McGill University (2004) 1/18 Introduction A common problem in reinforcement

More information

A Cooking Assistance System for Patients with Alzheimers Disease Using Reinforcement Learning

A Cooking Assistance System for Patients with Alzheimers Disease Using Reinforcement Learning International Journal of Information Technology Vol. 23 No. 2 2017 A Cooking Assistance System for Patients with Alzheimers Disease Using Reinforcement Learning Haipeng Chen 1 and Yeng Chai Soh 2 1 Joint

More information

CS324-Artificial Intelligence

CS324-Artificial Intelligence CS324-Artificial Intelligence Lecture 3: Intelligent Agents Waheed Noor Computer Science and Information Technology, University of Balochistan, Quetta, Pakistan Waheed Noor (CS&IT, UoB, Quetta) CS324-Artificial

More information

Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012

Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012 Practical Bayesian Optimization of Machine Learning Algorithms Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012 ... (Gaussian Processes) are inadequate for doing speech and vision. I still think they're

More information

A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models

A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models Jonathan Y. Ito and David V. Pynadath and Stacy C. Marsella Information Sciences Institute, University of Southern California

More information

Learning to Use Episodic Memory

Learning to Use Episodic Memory Learning to Use Episodic Memory Nicholas A. Gorski (ngorski@umich.edu) John E. Laird (laird@umich.edu) Computer Science & Engineering, University of Michigan 2260 Hayward St., Ann Arbor, MI 48109 USA Abstract

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

Adaptive Treatment of Epilepsy via Batch Mode Reinforcement Learning

Adaptive Treatment of Epilepsy via Batch Mode Reinforcement Learning Adaptive Treatment of Epilepsy via Batch Mode Reinforcement Learning Arthur Guez, Robert D. Vincent and Joelle Pineau School of Computer Science, McGill University Massimo Avoli Montreal Neurological Institute

More information

A Scoring Policy for Simulated Soccer Agents Using Reinforcement Learning

A Scoring Policy for Simulated Soccer Agents Using Reinforcement Learning A Scoring Policy for Simulated Soccer Agents Using Reinforcement Learning Azam Rabiee Computer Science and Engineering Isfahan University, Isfahan, Iran azamrabiei@yahoo.com Nasser Ghasem-Aghaee Computer

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Michèle Sebag ; TP : Herilalaina Rakotoarison TAO, CNRS INRIA Université Paris-Sud Nov. 9h, 28 Credit for slides: Richard Sutton, Freek Stulp, Olivier Pietquin / 44 Introduction

More information

Bayesian and Frequentist Approaches

Bayesian and Frequentist Approaches Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law

More information

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence COMP-241, Level-6 Mohammad Fahim Akhtar, Dr. Mohammad Hasan Department of Computer Science Jazan University, KSA Chapter 2: Intelligent Agents In which we discuss the nature of

More information

The optimism bias may support rational action

The optimism bias may support rational action The optimism bias may support rational action Falk Lieder, Sidharth Goel, Ronald Kwan, Thomas L. Griffiths University of California, Berkeley 1 Introduction People systematically overestimate the probability

More information

Artificial Intelligence Lecture 7

Artificial Intelligence Lecture 7 Artificial Intelligence Lecture 7 Lecture plan AI in general (ch. 1) Search based AI (ch. 4) search, games, planning, optimization Agents (ch. 8) applied AI techniques in robots, software agents,... Knowledge

More information

Search e Fall /18/15

Search e Fall /18/15 Sample Efficient Policy Click to edit Master title style Search Click to edit Emma Master Brunskill subtitle style 15-889e Fall 2015 11 Sample Efficient RL Objectives Probably Approximately Correct Minimizing

More information

CS343: Artificial Intelligence

CS343: Artificial Intelligence CS343: Artificial Intelligence Introduction: Part 2 Prof. Scott Niekum University of Texas at Austin [Based on slides created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All materials

More information

KECERDASAN BUATAN 3. By Sirait. Hasanuddin Sirait, MT

KECERDASAN BUATAN 3. By Sirait. Hasanuddin Sirait, MT KECERDASAN BUATAN 3 By @Ir.Hasanuddin@ Sirait Why study AI Cognitive Science: As a way to understand how natural minds and mental phenomena work e.g., visual perception, memory, learning, language, etc.

More information

Seminar Thesis: Efficient Planning under Uncertainty with Macro-actions

Seminar Thesis: Efficient Planning under Uncertainty with Macro-actions Seminar Thesis: Efficient Planning under Uncertainty with Macro-actions Ragnar Mogk Department of Computer Science Technische Universität Darmstadt ragnar.mogk@stud.tu-darmstadt.de 1 Introduction This

More information

Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes

Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes Using Eligibility Traces to Find the est Memoryless Policy in Partially Observable Markov Decision Processes John Loch Department of Computer Science University of Colorado oulder, CO 80309-0430 loch@cs.colorado.edu

More information

Artificial Intelligence Intelligent agents

Artificial Intelligence Intelligent agents Artificial Intelligence Intelligent agents Peter Antal antal@mit.bme.hu A.I. September 11, 2015 1 Agents and environments. The concept of rational behavior. Environment properties. Agent structures. Decision

More information

Plan Recognition through Goal Graph Analysis

Plan Recognition through Goal Graph Analysis Plan Recognition through Goal Graph Analysis Jun Hong 1 Abstract. We present a novel approach to plan recognition based on a two-stage paradigm of graph construction and analysis. First, a graph structure

More information

POND-Hindsight: Applying Hindsight Optimization to POMDPs

POND-Hindsight: Applying Hindsight Optimization to POMDPs POND-Hindsight: Applying Hindsight Optimization to POMDPs Alan Olsen and Daniel Bryce alan@olsen.org, daniel.bryce@usu.edu Utah State University Logan, UT Abstract We present the POND-Hindsight entry in

More information

Bayesian Reinforcement Learning

Bayesian Reinforcement Learning Bayesian Reinforcement Learning Rowan McAllister and Karolina Dziugaite MLG RCC 21 March 2013 Rowan McAllister and Karolina Dziugaite (MLG RCC) Bayesian Reinforcement Learning 21 March 2013 1 / 34 Outline

More information

Solutions for Chapter 2 Intelligent Agents

Solutions for Chapter 2 Intelligent Agents Solutions for Chapter 2 Intelligent Agents 2.1 This question tests the student s understanding of environments, rational actions, and performance measures. Any sequential environment in which rewards may

More information

Sequential Decision Making

Sequential Decision Making Sequential Decision Making Sequential decisions Many (most) real world problems cannot be solved with a single action. Need a longer horizon Ex: Sequential decision problems We start at START and want

More information

Reinforcement Learning of Hierarchical Skills on the Sony Aibo robot

Reinforcement Learning of Hierarchical Skills on the Sony Aibo robot Reinforcement Learning of Hierarchical Skills on the Sony Aibo robot Vishal Soni and Satinder Singh Computer Science and Engineering University of Michigan, Ann Arbor {soniv, baveja}@umich.edu Abstract

More information

A HMM-based Pre-training Approach for Sequential Data

A HMM-based Pre-training Approach for Sequential Data A HMM-based Pre-training Approach for Sequential Data Luca Pasa 1, Alberto Testolin 2, Alessandro Sperduti 1 1- Department of Mathematics 2- Department of Developmental Psychology and Socialisation University

More information

Modeling Human Understanding of Complex Intentional Action with a Bayesian Nonparametric Subgoal Model

Modeling Human Understanding of Complex Intentional Action with a Bayesian Nonparametric Subgoal Model Modeling Human Understanding of Complex Intentional Action with a Bayesian Nonparametric Subgoal Model Ryo Nakahashi Chris L. Baker and Joshua B. Tenenbaum Computer Science and Artificial Intelligence

More information

USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1

USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1 Ecology, 75(3), 1994, pp. 717-722 c) 1994 by the Ecological Society of America USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1 OF CYNTHIA C. BENNINGTON Department of Biology, West

More information

SLAUGHTER PIG MARKETING MANAGEMENT: UTILIZATION OF HIGHLY BIASED HERD SPECIFIC DATA. Henrik Kure

SLAUGHTER PIG MARKETING MANAGEMENT: UTILIZATION OF HIGHLY BIASED HERD SPECIFIC DATA. Henrik Kure SLAUGHTER PIG MARKETING MANAGEMENT: UTILIZATION OF HIGHLY BIASED HERD SPECIFIC DATA Henrik Kure Dina, The Royal Veterinary and Agricuural University Bülowsvej 48 DK 1870 Frederiksberg C. kure@dina.kvl.dk

More information

1 What is an Agent? CHAPTER 2: INTELLIGENT AGENTS

1 What is an Agent? CHAPTER 2: INTELLIGENT AGENTS 1 What is an Agent? CHAPTER 2: INTELLIGENT AGENTS http://www.csc.liv.ac.uk/ mjw/pubs/imas/ The main point about agents is they are autonomous: capable of acting independently, exhibiting control over their

More information

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Gene Selection for Tumor Classification Using Microarray Gene Expression Data Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology

More information

Evolutionary Programming

Evolutionary Programming Evolutionary Programming Searching Problem Spaces William Power April 24, 2016 1 Evolutionary Programming Can we solve problems by mi:micing the evolutionary process? Evolutionary programming is a methodology

More information

Survival Skills for Researchers. Study Design

Survival Skills for Researchers. Study Design Survival Skills for Researchers Study Design Typical Process in Research Design study Collect information Generate hypotheses Analyze & interpret findings Develop tentative new theories Purpose What is

More information

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests Objectives Quantifying the quality of hypothesis tests Type I and II errors Power of a test Cautions about significance tests Designing Experiments based on power Evaluating a testing procedure The testing

More information

AI: Intelligent Agents. Chapter 2

AI: Intelligent Agents. Chapter 2 AI: Intelligent Agents Chapter 2 Outline Agents and environments Rationality PEAS (Performance measure, Environment, Actuators, Sensors) Environment types Agent types Agents An agent is anything

More information

Two-sided Bandits and the Dating Market

Two-sided Bandits and the Dating Market Two-sided Bandits and the Dating Market Sanmay Das Center for Biological and Computational Learning Massachusetts Institute of Technology Cambridge, MA 02139 sanmay@mit.edu Emir Kamenica Department of

More information

Agents and Environments

Agents and Environments Agents and Environments Berlin Chen 2004 Reference: 1. S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Chapter 2 AI 2004 Berlin Chen 1 What is an Agent An agent interacts with its

More information

MITOCW conditional_probability

MITOCW conditional_probability MITOCW conditional_probability You've tested positive for a rare and deadly cancer that afflicts 1 out of 1000 people, based on a test that is 99% accurate. What are the chances that you actually have

More information

Decision Analysis. John M. Inadomi. Decision trees. Background. Key points Decision analysis is used to compare competing

Decision Analysis. John M. Inadomi. Decision trees. Background. Key points Decision analysis is used to compare competing 5 Decision Analysis John M. Inadomi Key points Decision analysis is used to compare competing strategies of management under conditions of uncertainty. Various methods may be employed to construct a decision

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

Sparse Coding in Sparse Winner Networks

Sparse Coding in Sparse Winner Networks Sparse Coding in Sparse Winner Networks Janusz A. Starzyk 1, Yinyin Liu 1, David Vogel 2 1 School of Electrical Engineering & Computer Science Ohio University, Athens, OH 45701 {starzyk, yliu}@bobcat.ent.ohiou.edu

More information

Progress in Risk Science and Causality

Progress in Risk Science and Causality Progress in Risk Science and Causality Tony Cox, tcoxdenver@aol.com AAPCA March 27, 2017 1 Vision for causal analytics Represent understanding of how the world works by an explicit causal model. Learn,

More information

Chapter 11. Experimental Design: One-Way Independent Samples Design

Chapter 11. Experimental Design: One-Way Independent Samples Design 11-1 Chapter 11. Experimental Design: One-Way Independent Samples Design Advantages and Limitations Comparing Two Groups Comparing t Test to ANOVA Independent Samples t Test Independent Samples ANOVA Comparing

More information

Outlier Analysis. Lijun Zhang

Outlier Analysis. Lijun Zhang Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based

More information

Intelligent Agents. Russell and Norvig: Chapter 2

Intelligent Agents. Russell and Norvig: Chapter 2 Intelligent Agents Russell and Norvig: Chapter 2 Intelligent Agent? sensors agent actuators percepts actions environment Definition: An intelligent agent perceives its environment via sensors and acts

More information

Marcus Hutter Canberra, ACT, 0200, Australia

Marcus Hutter Canberra, ACT, 0200, Australia Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ Australian National University Abstract The approaches to Artificial Intelligence (AI) in the last century may be labelled as (a) trying

More information

ERA: Architectures for Inference

ERA: Architectures for Inference ERA: Architectures for Inference Dan Hammerstrom Electrical And Computer Engineering 7/28/09 1 Intelligent Computing In spite of the transistor bounty of Moore s law, there is a large class of problems

More information

Remarks on Bayesian Control Charts

Remarks on Bayesian Control Charts Remarks on Bayesian Control Charts Amir Ahmadi-Javid * and Mohsen Ebadi Department of Industrial Engineering, Amirkabir University of Technology, Tehran, Iran * Corresponding author; email address: ahmadi_javid@aut.ac.ir

More information

Time Experiencing by Robotic Agents

Time Experiencing by Robotic Agents Time Experiencing by Robotic Agents Michail Maniadakis 1 and Marc Wittmann 2 and Panos Trahanias 1 1- Foundation for Research and Technology - Hellas, ICS, Greece 2- Institute for Frontier Areas of Psychology

More information

Hebbian Plasticity for Improving Perceptual Decisions

Hebbian Plasticity for Improving Perceptual Decisions Hebbian Plasticity for Improving Perceptual Decisions Tsung-Ren Huang Department of Psychology, National Taiwan University trhuang@ntu.edu.tw Abstract Shibata et al. reported that humans could learn to

More information

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Thesis Proposal Indrayana Rustandi April 3, 2007 Outline Motivation and Thesis Preliminary results: Hierarchical

More information

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018 Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this

More information

Artificial Intelligence Programming Probability

Artificial Intelligence Programming Probability Artificial Intelligence Programming Probability Chris Brooks Department of Computer Science University of San Francisco Department of Computer Science University of San Francisco p.1/25 17-0: Uncertainty

More information

Bayesian Nonparametric Methods for Precision Medicine

Bayesian Nonparametric Methods for Precision Medicine Bayesian Nonparametric Methods for Precision Medicine Brian Reich, NC State Collaborators: Qian Guan (NCSU), Eric Laber (NCSU) and Dipankar Bandyopadhyay (VCU) University of Illinois at Urbana-Champaign

More information

SUPPLEMENTAL MATERIAL

SUPPLEMENTAL MATERIAL 1 SUPPLEMENTAL MATERIAL Response time and signal detection time distributions SM Fig. 1. Correct response time (thick solid green curve) and error response time densities (dashed red curve), averaged across

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Representing Problems (and Plans) Using Imagery

Representing Problems (and Plans) Using Imagery Representing Problems (and Plans) Using Imagery Samuel Wintermute University of Michigan 2260 Hayward St. Ann Arbor, MI 48109-2121 swinterm@umich.edu Abstract In many spatial problems, it can be difficult

More information

Belief Management for Autonomous Robots using History-Based Diagnosis

Belief Management for Autonomous Robots using History-Based Diagnosis Belief Management for Autonomous Robots using History-Based Diagnosis Stephan Gspandl, Ingo Pill, Michael Reip, Gerald Steinbauer Institute for Software Technology Graz University of Technology Inffeldgasse

More information

Intelligent Machines That Act Rationally. Hang Li Bytedance AI Lab

Intelligent Machines That Act Rationally. Hang Li Bytedance AI Lab Intelligent Machines That Act Rationally Hang Li Bytedance AI Lab Four Definitions of Artificial Intelligence Building intelligent machines (i.e., intelligent computers) Thinking humanly Acting humanly

More information

Intelligent Machines That Act Rationally. Hang Li Toutiao AI Lab

Intelligent Machines That Act Rationally. Hang Li Toutiao AI Lab Intelligent Machines That Act Rationally Hang Li Toutiao AI Lab Four Definitions of Artificial Intelligence Building intelligent machines (i.e., intelligent computers) Thinking humanly Acting humanly Thinking

More information

Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents

Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents Autonomous Agents and Multiagent Systems manuscript No. (will be inserted by the editor) Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents Pedro Sequeira Francisco S. Melo Ana Paiva

More information

Exploring the Influence of Particle Filter Parameters on Order Effects in Causal Learning

Exploring the Influence of Particle Filter Parameters on Order Effects in Causal Learning Exploring the Influence of Particle Filter Parameters on Order Effects in Causal Learning Joshua T. Abbott (joshua.abbott@berkeley.edu) Thomas L. Griffiths (tom griffiths@berkeley.edu) Department of Psychology,

More information

Analyses of Markov decision process structure regarding the possible strategic use of interacting memory systems

Analyses of Markov decision process structure regarding the possible strategic use of interacting memory systems COMPUTATIONAL NEUROSCIENCE ORIGINAL RESEARCH ARTICLE published: 24 December 2008 doi: 10.3389/neuro.10.006.2008 Analyses of Markov decision process structure regarding the possible strategic use of interacting

More information

Lecture 13: Finding optimal treatment policies

Lecture 13: Finding optimal treatment policies MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 13: Finding optimal treatment policies Prof. David Sontag MIT EECS, CSAIL, IMES (Thanks to Peter Bodik for slides on reinforcement learning) Outline

More information

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc. Sawtooth Software RESEARCH PAPER SERIES MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB Bryan Orme, Sawtooth Software, Inc. Copyright 009, Sawtooth Software, Inc. 530 W. Fir St. Sequim,

More information

I. INTRODUCTION /$ IEEE 70 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 2, NO. 2, JUNE 2010

I. INTRODUCTION /$ IEEE 70 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 2, NO. 2, JUNE 2010 70 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 2, NO. 2, JUNE 2010 Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective Satinder Singh, Richard L. Lewis, Andrew G. Barto,

More information

Expert System Profile

Expert System Profile Expert System Profile GENERAL Domain: Medical Main General Function: Diagnosis System Name: INTERNIST-I/ CADUCEUS (or INTERNIST-II) Dates: 1970 s 1980 s Researchers: Ph.D. Harry Pople, M.D. Jack D. Myers

More information

Introduction to Artificial Intelligence 2 nd semester 2016/2017. Chapter 2: Intelligent Agents

Introduction to Artificial Intelligence 2 nd semester 2016/2017. Chapter 2: Intelligent Agents Introduction to Artificial Intelligence 2 nd semester 2016/2017 Chapter 2: Intelligent Agents Mohamed B. Abubaker Palestine Technical College Deir El-Balah 1 Agents and Environments An agent is anything

More information

Plan Recognition through Goal Graph Analysis

Plan Recognition through Goal Graph Analysis Plan Recognition through Goal Graph Analysis Jun Hong 1 Abstract. We present a novel approach to plan recognition based on a two-stage paradigm of graph construction and analysis. First, a graph structure

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Intelligent Agents Chapter 2 & 27 What is an Agent? An intelligent agent perceives its environment with sensors and acts upon that environment through actuators 2 Examples of Agents

More information

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s Using Bayesian Networks to Analyze Expression Data Xu Siwei, s0789023 Muhammad Ali Faisal, s0677834 Tejal Joshi, s0677858 Outline Introduction Bayesian Networks Equivalence Classes Applying to Expression

More information

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm Journal of Social and Development Sciences Vol. 4, No. 4, pp. 93-97, Apr 203 (ISSN 222-52) Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm Henry De-Graft Acquah University

More information

Bayesian (Belief) Network Models,

Bayesian (Belief) Network Models, Bayesian (Belief) Network Models, 2/10/03 & 2/12/03 Outline of This Lecture 1. Overview of the model 2. Bayes Probability and Rules of Inference Conditional Probabilities Priors and posteriors Joint distributions

More information

Bayesian models of inductive generalization

Bayesian models of inductive generalization Bayesian models of inductive generalization Neville E. Sanjana & Joshua B. Tenenbaum Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Cambridge, MA 239 nsanjana, jbt @mit.edu

More information

Sheila Barron Statistics Outreach Center 2/8/2011

Sheila Barron Statistics Outreach Center 2/8/2011 Sheila Barron Statistics Outreach Center 2/8/2011 What is Power? When conducting a research study using a statistical hypothesis test, power is the probability of getting statistical significance when

More information

A Reinforcement Learning Approach Involving a Shortest Path Finding Algorithm

A Reinforcement Learning Approach Involving a Shortest Path Finding Algorithm Proceedings of the 003 IEEE/RSJ Intl. Conference on Intelligent Robots and Systems Proceedings Las Vegas, of Nevada the 003 October IEEE/RSJ 003 Intl. Conference on Intelligent Robots and Systems Las Vegas,

More information

Introduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods

Introduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods Introduction Patrick Breheny January 10 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/25 Introductory example: Jane s twins Suppose you have a friend named Jane who is pregnant with twins

More information

Probabilistic Graphical Models: Applications in Biomedicine

Probabilistic Graphical Models: Applications in Biomedicine Probabilistic Graphical Models: Applications in Biomedicine L. Enrique Sucar, INAOE Puebla, México May 2012 What do you see? What we see depends on our previous knowledge (model) of the world and the information

More information

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis?

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? Richards J. Heuer, Jr. Version 1.2, October 16, 2005 This document is from a collection of works by Richards J. Heuer, Jr.

More information

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER Introduction, 639. Factor analysis, 639. Discriminant analysis, 644. INTRODUCTION

More information

Further Properties of the Priority Rule

Further Properties of the Priority Rule Further Properties of the Priority Rule Michael Strevens Draft of July 2003 Abstract In Strevens (2003), I showed that science s priority system for distributing credit promotes an allocation of labor

More information

Learning Navigational Maps by Observing the Movement of Crowds

Learning Navigational Maps by Observing the Movement of Crowds Learning Navigational Maps by Observing the Movement of Crowds Simon T. O Callaghan Australia, NSW s.ocallaghan@acfr.usyd.edu.au Surya P. N. Singh Australia, NSW spns@acfr.usyd.edu.au Fabio T. Ramos Australia,

More information

EECS 433 Statistical Pattern Recognition

EECS 433 Statistical Pattern Recognition EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern

More information

Cognitive modeling versus game theory: Why cognition matters

Cognitive modeling versus game theory: Why cognition matters Cognitive modeling versus game theory: Why cognition matters Matthew F. Rutledge-Taylor (mrtaylo2@connect.carleton.ca) Institute of Cognitive Science, Carleton University, 1125 Colonel By Drive Ottawa,

More information

A Learning Method of Directly Optimizing Classifier Performance at Local Operating Range

A Learning Method of Directly Optimizing Classifier Performance at Local Operating Range A Learning Method of Directly Optimizing Classifier Performance at Local Operating Range Lae-Jeong Park and Jung-Ho Moon Department of Electrical Engineering, Kangnung National University Kangnung, Gangwon-Do,

More information

A SYSTEM FOR COMPUTER-AIDED DIAGNOSIS

A SYSTEM FOR COMPUTER-AIDED DIAGNOSIS p- %'IK- _'^ PROBLEM- SOLVING STRATEGIES IN A SYSTEM FOR COMPUTER-AIDED DIAGNOSIS 268-67 George Anthony Gorry June 1967 RECEIVED JUN 26 1967 Abstract A system consisting of a diagnostic program and a

More information

MODELING NONCOMPENSATORY CHOICES WITH A COMPENSATORY MODEL FOR A PRODUCT DESIGN SEARCH

MODELING NONCOMPENSATORY CHOICES WITH A COMPENSATORY MODEL FOR A PRODUCT DESIGN SEARCH Proceedings of the ASME 2015 International Design Engineering Technical Conferences & Computers and Information in Engineering Conference IDETC/CIE 2015 August 2 5, 2015, Boston, Massachusetts, USA DETC2015-47632

More information

Katsunari Shibata and Tomohiko Kawano

Katsunari Shibata and Tomohiko Kawano Learning of Action Generation from Raw Camera Images in a Real-World-Like Environment by Simple Coupling of Reinforcement Learning and a Neural Network Katsunari Shibata and Tomohiko Kawano Oita University,

More information

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

A Comparison of Collaborative Filtering Methods for Medication Reconciliation A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,

More information

Dopamine neurons activity in a multi-choice task: reward prediction error or value function?

Dopamine neurons activity in a multi-choice task: reward prediction error or value function? Dopamine neurons activity in a multi-choice task: reward prediction error or value function? Jean Bellot 1,, Olivier Sigaud 1,, Matthew R Roesch 3,, Geoffrey Schoenbaum 5,, Benoît Girard 1,, Mehdi Khamassi

More information

Assigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science

Assigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science Assigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science Abstract One method for analyzing pediatric B cell leukemia is to categorize

More information

26:010:557 / 26:620:557 Social Science Research Methods

26:010:557 / 26:620:557 Social Science Research Methods 26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate Professor Department of Accounting & Information Systems Rutgers Business School Newark & New Brunswick 1 Overview

More information

Bayesian Networks in Medicine: a Model-based Approach to Medical Decision Making

Bayesian Networks in Medicine: a Model-based Approach to Medical Decision Making Bayesian Networks in Medicine: a Model-based Approach to Medical Decision Making Peter Lucas Department of Computing Science University of Aberdeen Scotland, UK plucas@csd.abdn.ac.uk Abstract Bayesian

More information

Foundations of AI. 10. Knowledge Representation: Modeling with Logic. Concepts, Actions, Time, & All the Rest

Foundations of AI. 10. Knowledge Representation: Modeling with Logic. Concepts, Actions, Time, & All the Rest Foundations of AI 10. Knowledge Representation: Modeling with Logic Concepts, Actions, Time, & All the Rest Wolfram Burgard, Andreas Karwath, Bernhard Nebel, and Martin Riedmiller 10/1 Contents Knowledge

More information

Dynamic Rule-based Agent

Dynamic Rule-based Agent International Journal of Engineering Research and Technology. ISSN 0974-3154 Volume 11, Number 4 (2018), pp. 605-613 International Research Publication House http://www.irphouse.com Dynamic Rule-based

More information

Rational Agents (Chapter 2)

Rational Agents (Chapter 2) Rational Agents (Chapter 2) Agents An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators Example: Vacuum-Agent Percepts:

More information

Model-Based fmri Analysis. Will Alexander Dept. of Experimental Psychology Ghent University

Model-Based fmri Analysis. Will Alexander Dept. of Experimental Psychology Ghent University Model-Based fmri Analysis Will Alexander Dept. of Experimental Psychology Ghent University Motivation Models (general) Why you ought to care Model-based fmri Models (specific) From model to analysis Extended

More information