Learning to Identify Irrelevant State Variables
|
|
- Janis Hunt
- 6 years ago
- Views:
Transcription
1 Learning to Identify Irrelevant State Variables Nicholas K. Jong Department of Computer Sciences University of Texas at Austin Austin, Texas Peter Stone Department of Computer Sciences University of Texas at Austin Austin, Texas Abstract When they are available, safe state abstractions improve the efficiency of reinforcement learning algorithms by allowing an agent to ignore irrelevant distinctions between states while still learning an optimal policy. Prior work investigated how to incorporate state abstractions into existing algorithms, but most approaches required the user to provide the abstraction. How to discover this kind of domain knowledge automatically remains a challenging open problem. In this paper, we introduce a general approach for testing the validity of a potential state abstraction. We reduce the problem to one of determining whether an action is optimal in every state in a given set. To decide optimality we give two statistical methods, which trade off between computational and sample complexity. One of these methods applies statistical hypothesis testing directly to learned state-action values, and the other applies Monte Carlo sampling to a learned Bayesian model. Finally, we demonstrate the ability of these methods to discriminate between safe and unsafe state abstractions in the familiar Taxi domain. 1 Introduction Reinforcement learning (RL) addresses the problem of how an agent ought to select actions in a Markov decision problem (MDP) so as to maximize its expected reward despite not knowing the transition and reward functions beforehand. Early work in this field led to simple algorithms that guarantee convergence to optimal behavior in the limit, but the rate of convergence has proven unacceptable for large, real-world applications. One key problem is the choice of state representation. The representation must include enough state variables for the problem to be Markov, but too many state variables incur the curse of dimensionality. Since the number of potential state variables is typically quite large for interesting problems, an important step in specifying an RL task is selecting those variables that are most relevant for learning. In this paper, we consider the task of automatically recognizing that a certain state variable is irrelevant. We define a state variable as irrelevant if an agent can completely ignore the variable and still behave optimally. For each state variable that it learns to ignore, an agent can significantly increase the efficiency of future training. The overall learning efficiency thus becomes more robust to the initial choice of state representation.
2 In general, an agent must learn a particular task rather well before it can reach safe conclusions about relevancy. One premise of our work is that an abstraction learned in one problem instance is likely to apply to other, similar problems. Learning in these subsequent problems can be accomplished with fewer state variables and therefore more efficiently. In this way an agent might learn from a comparatively easy problem a state representation that applies to a more difficult but related problem. Our work is motivated in part by recent work on temporal abstractions and hierarchy in RL [1, 4, 7, 8]. The introduction of reusable subtasks creates an opportunity for applying dynamic state abstractions, which apply at some parts of the hierarchy but not others. In this context a flexible mechanism for the automated discovery of abstractions is particularly important, since otherwise the user must consider individually each task in a potentially large hierarchy. Furthermore, a method for discovering the conditions under which a state abstraction applies may prove useful in the discovery of the task decomposition itself. For this reason we develop our approach in the context of a non-hierarchical learning algorithm yet in a domain familiar from the hierarchical learning literature. The main contributions of this paper are (i) our reformulation of the question of state irrelevance into a question of action optimality and (ii) the methods we give for answering this latter question. In Section 2 we describe the domain in which we develop our ideas. In Section 3 we give our definition of state irrelevance in terms of action optimality. In Section 4 we describe two distinct statistical methods for deciding whether an action is optimal. In Section 5 we show that both methods yield the desired results, but with differing levels of computational and sample complexity. In Section 6 we discuss related work, and in Section 7 we conclude. 2 Safe state abstractions in the Taxi domain We use Dietterich s Taxi domain [4], illustrated in Figure 1, as the setting for our work. This domain has four state variables. The first two correspond to the taxi s current position in the grid world. The third indicates the passenger s current location, at one of the four labeled positions (Red, Green, Blue, and Yellow) or inside the taxi. The fourth indicates the labeled position where the passenger would like to go. The domain therefore has = 500 possible states. At each time step, the taxi may move north, move south, move east, move west, attempt to pick up the passenger, or attempt to put down Figure 1: The Taxi domain. the passenger. Actions that would move the taxi through a wall or off the grid have no effect. Every action has a reward of -1, except illegal attempts to pick up or put down the passenger, which have reward -10. The agent receives a reward of +20 for achieving a goal state, in which the passenger is at the destination (and not inside the taxi). In this paper, we consider the stochastic version of the domain. Whenever the taxi attempts to move, the resulting motion occurs in a random perpendicular direction with probability 0.2. Furthermore, once the taxi picks up the passenger and begins to move, the destination changes with probability 0.3. Dietterich demonstrates that a handcrafted task hierarchy can facilitate learning in this domain. The crucial reusable tasks in his hierarchy are those that take the taxi to each of the four landmarks. For example, an agent can execute a task that navigates to the Red landmark whenever it must pick up a passenger there and also whenever it must deliver a passenger there. Dietterich also observes that the location of the passenger and the passenger s final destination are irrelevant to the task of travelling to the Red landmark. State abstractions such as this one are what allow his MAXQ framework to learn the Taxi domain efficiently.
3 How might a learning algorithm discover this abstraction autonomously, based only on experience with the domain? We consider this question in a non-hierarchical framework. Even without a task decomposition, an agent can safely ignore the passenger s final destination in any state where the passenger is not inside the taxi, since the optimal action then does not depend on the final destination. Our approach learns this static state abstraction, which we can then apply to both non-hierarchical and hierarchical algorithms. 3 Defining irrelevance Suppose without loss of generality that the n + 1 state variables of an MDP are X 1, X 2,... X n = X and Y. Let X denote a set of possible values for X, determining a region of the state space. We wish to determine whether or not knowing the value of Y affects the quality of an agent s decisions in this region of the state space. One simple sufficient condition is that the agent s learned policy ˆπ ignores Y : x X, y 1, y 2 : ˆπ( x, y 1 ) = ˆπ( x, y 2 ). However, this condition is too strong in practice. If some states have more than one optimal action, then the learned policy may specify one action when Y = y 1 and a different one when Y = y 2, due to variance in the learned Q-values. We instead examine the Q-values directly. We check that in every case there exists some action that achieves the maximum expected reward regardless of the value of Y : x X a y : ˆQ( x, y, a) ˆV ( x, y). Essentially, this condition examines the learned Q-values to determine whether a policy exists that ignores Y. However, our determination of whether an action maximizes the expected reward must be robust to uncertainty in the value estimates. Learning algorithms that mix exploration and exploitation are especially likely to attain accurate value estimates for only one optimal action from a given state. For example, consider a state in the stochastic taxi domain where the passenger is in the upper left corner and the taxi is in the upper right corner. To maximize expected reward, an agent must navigate to the passenger as quickly as possible. Due to the configuration of obstacles in the world, both moving south and moving west are optimal actions from this state, regardless of the passenger s eventual destination. The following table shows some of the learned Q-values for this situation, obtained using Q-learning with Boltzmann exploration. 1 Dest Action Q Blue West Blue South Green West Green South For each destination, the value of moving south and of moving west should be approximately the same, but the exploitation component of the learning policy caused Q-learning only to converge to a correct estimate for one of the two optimal actions. Q-learning with an exploitative policy is an extreme case, but even an algorithm that explores the domain in a more balanced fashion is likely to have different estimates for Q- values that are in truth the same, simply due to the stochastic nature of the domain. To determine whether one Q-value is greater than another, we must take into account the uncertainty in our estimate. 1 For all the Q-learning runs in this paper, we used a starting temperature of 50, a cooling rate of , a learning rate of 0.25, and no discount factor.
4 4 Testing hypotheses To evaluate whether a state-action value is optimal, we draw inspiration from statistical hypothesis testing. In this family of techniques, we consider a default hypothesis (called the null hypothesis) and wish to determine whether some data tends to refute this hypothesis. We calculate a certain scalar statistic of the data and determine the distribution of this statistic assuming the null hypothesis is true. Then we compute the likelihood p of observing a value as extreme as the observed statistic given that distribution. We reject the null hypothesis if and only if this likelihood falls below some predetermined threshold (called the significance level), indicating that the statistic lies in the unlikely tail of the distribution. In our framework we define for each state ( x, y) and action a a separate null hypothesis that a is an optimal action in ( x, y): Q( x, y, a) V ( x, y). If for a given x we accept the null hypothesis for all y, then action a is optimal regardless of the value of Y. In this case Y is irrelevant given x according to our definition of irrelevance. Conversely, Y is relevant given x if for every action we reject the null hypothesis for some value y. 4.1 Classical hypothesis testing If we regard each state-action value as a random variable, then we apply established statistical tests for determining whether the mean of two random variables differ. This straightforward approach requires us to draw independent samples of the estimated value for each state-action pair. We can obtain this sample by repeatedly running any RL algorithm that computes these estimates until it converges to an optimal policy. After n runs we have a sample of size n of each state-action value. Instead of directly testing the hypothesis that a is an optimal action in state s, we test the hypothesis that Q(s, a) Q(s, a ) for each other action a. Only if we accept all of these hypotheses do we accept the hypothesis that a is optimal. If we assume that our sample of Q-values has a Gaussian distribution (for each state-action pair), we could use a paired t test to test these hypotheses. In general, we have reason to believe that the actual distribution is somewhat skewed since these values are the max of other values. Fortunately, the statistical literature provides a test that does not require us to know the distribution of our sample: the Wilcoxon signed ranks test [3]. This test computes a statistic of the difference between Q(s, a) and Q(s, a ) for each run that is known to converge to a Gaussian distribution for sufficiently large n. It outputs the maximum significance level at which we should still accept the hypothesis that Q(s, a) Q(s, a ). We then accept the hypothesis that a is optimal in state s if and only if the maximum significance level for each a is greater than our threshold. 4.2 Monte Carlo simulation The straightforward implementation described above makes very poor use of experience data, since each time step contributes to only one of the sample solutions. Here we develop an alternate approach that draws upon recent work in Bayesian MDP models [2]. This technique regards the successor state that results from a given state-action pair as a random variable drawn from a multinomial distribution. Using Bayesian parameter estimation techniques, we start with a prior probability distribution over the parameters for each multinomial and then update these distributions given experience data. The joint distribution over the transition probabilities and one step rewards for each state-action pair comprise the Bayesian model. This Bayesian MDP model thus represents a single probability distribution over MDPs whose mean converges in the limit to the MDP that generated the data. In our approach we use all of the experience data to learn a single Bayesian model of the do-
5 main. We then draw sample MDPs that are independent given the model and apply Monte Carlo simulation to make probabilistic statements about the Q-values of the underlying MDP. We directly estimate the probability that an action is optimal in a given state (given our prior distribution) as the fraction of samples in which the action is in fact optimal. We then accept the hypothesis that an action is optimal unless the estimated probability of optimality is too low: ˆQ( x, y, a) ˆV ( x, y) Pr(Q( x, y, a) = V ( x, y) h) p, where p is a significance level as in classical hypothesis testing. 5 Results We verified the correctness of both our statistical hypothesis testing and Monte Carlo approaches on the stochastic version of the Taxi domain. Both methods generally proceed in two phases. In the first phase, we run an established RL algorithm until convergence, perhaps multiple times. In the second phase, we use the output of the RL algorithm to accept or to reject the hypotheses that certain state variables are irrelevant in certain conditions. Evaluating all possible hypotheses of this form would be prohibitively expensive, so here we examine just two cases in the Taxi domain to demonstrate that we can discriminate between hypotheses that we should reject and hypotheses that we should accept. In the first case, the passenger is at the upper left landmark. In this case we wish to show that the passenger s destination is irrelevant to the optimal action, which is always to navigate towards the upper left landmark. In the second case, the passenger is inside the taxi. Here we wish to show that the passenger s destination is not irrelevant, since the optimal action is to navigate towards the destination. 5.1 The Wilcoxon signed ranks test To obtain the sample Q-values necessary to apply the Wilcoxon signed ranks test, we ran 25 independent instances of Q-learning with Boltzmann exploration for time steps each, enough time to ensure convergence to an optimal policy. Then for the two cases above, we applied the Wilcoxon signed ranks test to determine for each possible location of the taxi the maximum significance level at which we would conclude that the passenger s destination is relevant to the optimal policy. The following table displays the values obtained in a typical run for the first case, in which the passenger s destination is not relevant Consider the upper right hand square, for which we obtain the maximum significance level This number means that there exists some action a for which we accept, across all possible passenger destinations, the null hypothesis that a is optimal, so long as we choose a significance level p < Only if we choose a significance level p does there exist a passenger destination and an alternative action a such that we should reject the hypothesis that a is as good as a. We see from the table that on this trial for p < this approach correctly identifies the passenger destination as irrelevant for all 25 taxi locations (when the passenger is currently at the upper left landmark). The next table represents the case when the passenger is inside the taxi and the destination is generally relevant.
6 All but four locations in this case have extremely low p-values, suggesting that we reject the hypothesis that passenger destination is irrelevant (thus indicating that it is relevant) in these states. In the four locations with higher p-values yhe passenger destination actually is irrelevant: although the passenger is already inside the taxi, moving north is an optimal first action towards all four of the possible destinations. (Recall the possible passenger destinations as indicated in Figure 1.) These values indicate that on this trial this approach avoids false positive identifications of irrelevant state for p In ten trials, this approach never generated a p-value above for a state where the null hypothesis was false, and it never generated a p-value below for a state where the null hypothesis was true. Over these ten trials, a typical significance level of 0.05 would have correctly classified the relevancy of the passenger destination in every state. 5.2 Monte Carlo simulation We also validated our Monte Carlo approach on the Taxi domain. We used prioritized sweeping [6] with t Bored = 10 to ensure that the Bayesian model had at least ten samples for each reachable input to the transition function. We allowed the agent to explore for 40,000 time steps, enough to ensure that it completed its exploration. The agent assumed that the reward function was deterministic, so it knew all the one step rewards after visiting each state-action pair at least once. In general, if we do not make this assumption, then we must choose some prior distribution over rewards for each state-action pair. Since the Taxi domain has a deterministic reward function, we chose to avoid this complication in the work reported here. Furthermore, we initialized each parameter of the Dirichlet distributions to 0. This prior distribution is not formally a Dirichlet distribution, which assumes that each parameter is positive. However, we can still sample from these distributions by assuming that unobserved state transitions have probability 0. This improper prior has the advantage of yielding a Bayesian model whose mean is identical to the maximum likelihood model, and it is slightly more computationally efficient than the approach of Dearden et al [2]. After the exploration phase, we sampled 100 MDPs from the learned Bayesian model. We solved each of these using value iteration and examined the same two cases as in Section 5.1. The following table shows for each of the 25 taxi locations the maximum probability at which some action is optimal across all passenger destinations, given that the passenger is still waiting at the upper left landmark. In other words, each cell contains the quantity max a min y ˆPr(Q( x, y, a) = V ( x, y)), where x corresponds to the taxi location and passenger location Although these estimated probabilities do not convey the same formal meaning as the significance values that statistical hypothesis tests output, we may interpret them in a somewhat similar fashion. Consider the taxi location with the smallest estimated probability, which is If we start with the null hypothesis that some action is optimal at that lo-
7 cation across all passenger destinations, our Monte Carlo simulation gives us no reason to reject that hypothesis, since at least one action was optimal in 20 of the 100 sampled MDPs. The next table shows the estimated probabilities for the second case, when the passenger is inside the taxi Note that for all the locations where the passenger destination is in fact relevant, no action was optimal across passenger destinations in any of the 100 sampled MDPs. We can easily imagine setting a probability threshold similar in meaning to the significance level of statistical hypothesis tests. We then rejecting the null hypothesis only when the estimated probability falls below that threshold. In the ten trials that we ran, a threshold of 0.05 never caused any false negatives but did lead the algorithm erroneously to classify the passenger s destination as relevant in three instances out of (In each trial, the destination is irrelevant for each combination of four passenger locations and 25 taxi locations.) The principal cost of the Monte Carlo approach is computational. The process of learning the Bayesian model, sampling 100 MDPs, and performing value iteration until convergence 100 times required 335 seconds on a 2.8 GHz Pentium 4 CPU, in contrast to the 9 seconds required to run 25 instances of Q-Learning and to apply the Wilcoxon signed ranks test. On the other hand, the Monte Carlo approach makes more efficient use of the data, requiring only steps of direct experience with the environment instead of = steps. Thus one method emphasizes computational efficiency and the other sample complexity. Learning a state abstraction even from a solved task could be well worth the cost. For example, our implementation of prioritized sweeping required over 24 minutes to solve a random instance of the Taxi domain. In contrast, solving the original 5 5 domain, applying the Monte Carlo approach to discover when the passenger destination was irrelevant, and then using this abstraction solved the same instance in only 12.5 minutes. As with all forms of statistical hypothesis testing, random chance will occasionally cause these procedures to accept an incorrect hypothesis or to reject a correct hypothesis. Collecting more data can reduce the likelihood of error, but our results show that a our approach already fairly reliably discriminates between situations when the passenger destination is relevant to behaving optimally and when it is irrelevant. 6 Related work Our work bears a strong resemblance to aspects of McCallum s U-tree algorithm [5], which uses statistical hypothesis testing to determine what features to include in its state representation. U-tree is an online instance-based algorithm that adds a state variable to its representation if different values of the variable predict different distributions of expected future reward. The algorithm computes these distributions of values in part from the current representation, resulting in a circularity that prevents it from guaranteeing to converge on an optimal state abstraction. In contrast, our approach explicitly employs only state abstractions that preserve an optimal policy. Both of the methods we described in Section 4 require information obtained from a com-
8 plete solution to the given task, so at present they are most likely to be useful for finding state abstractions in small problems that might apply in similar but much larger problems. We leave for future work the question of how we might apply these techniques online to tasks that are not yet fully learned. In this situation the uncertainty in the value function is much larger, and our approach will tend to assume that all state variables are irrelevant in the absence of sufficient evidence to the contrary. We also also leave for future work how to determine what candidate state abstractions to test, if we cannot afford to test them all. 7 Conclusion This paper has addressed the problem of determining what state variables are relevant to the solution of an RL task. We defined the relevancy of a state variable in terms of the existance of an action that is optimal across all values of that state variable. We described two statistical methods for determining whether an action is optimal in a given state. One method applies an established statistical hypothesis test to Q-values obtained from independent runs of an RL algorithm. This method is as computationally efficient as the RL algorithm used. The other method applies Monte Carlo simulation to a learned Bayesian model and requires far less experience data. Finally, we demonstrated that both methods accurately identify the conditions under which a certain state variable is irrelevant in the Taxi domain. Acknowledgments We would like to thank Greg Kuhlmann for helpful comments and suggestions. This research was supported in part by NSF CAREER award IIS References 1. Andrew G. Barto and Sridhar Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete-Event Systems, 13:41 77, Special Issue on Reinforcement Learning. 2. Richard Dearden, Nir Friedman, and David Andre. Model based Bayesian exploration. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pages , Morris H. Degroot. Probability and Statistics. Addison-Wesley Pub Co, 2nd edition, Thomas G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13: , Andrew Kachites McCallum. Reinforcement Learning with Selective Perception and Hidden State. PhD thesis, University of Rochester, Andrew W. Moore and Christopher G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13: , Ronald Parr and Stuart Russell. Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems 10, Richard S. Sutton, Doina Precup, and Satinder Singh. Between MDPs and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1 2): , 1999.
Towards Learning to Ignore Irrelevant State Variables
Towards Learning to Ignore Irrelevant State Variables Nicholas K. Jong and Peter Stone Department of Computer Sciences University of Texas at Austin Austin, Texas 78712 {nkj,pstone}@cs.utexas.edu Abstract
More informationExploration and Exploitation in Reinforcement Learning
Exploration and Exploitation in Reinforcement Learning Melanie Coggan Research supervised by Prof. Doina Precup CRA-W DMP Project at McGill University (2004) 1/18 Introduction A common problem in reinforcement
More informationA Cooking Assistance System for Patients with Alzheimers Disease Using Reinforcement Learning
International Journal of Information Technology Vol. 23 No. 2 2017 A Cooking Assistance System for Patients with Alzheimers Disease Using Reinforcement Learning Haipeng Chen 1 and Yeng Chai Soh 2 1 Joint
More informationCS324-Artificial Intelligence
CS324-Artificial Intelligence Lecture 3: Intelligent Agents Waheed Noor Computer Science and Information Technology, University of Balochistan, Quetta, Pakistan Waheed Noor (CS&IT, UoB, Quetta) CS324-Artificial
More informationPractical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012
Practical Bayesian Optimization of Machine Learning Algorithms Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012 ... (Gaussian Processes) are inadequate for doing speech and vision. I still think they're
More informationA Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models
A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models Jonathan Y. Ito and David V. Pynadath and Stacy C. Marsella Information Sciences Institute, University of Southern California
More informationLearning to Use Episodic Memory
Learning to Use Episodic Memory Nicholas A. Gorski (ngorski@umich.edu) John E. Laird (laird@umich.edu) Computer Science & Engineering, University of Michigan 2260 Hayward St., Ann Arbor, MI 48109 USA Abstract
More informationA Brief Introduction to Bayesian Statistics
A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon
More informationAdaptive Treatment of Epilepsy via Batch Mode Reinforcement Learning
Adaptive Treatment of Epilepsy via Batch Mode Reinforcement Learning Arthur Guez, Robert D. Vincent and Joelle Pineau School of Computer Science, McGill University Massimo Avoli Montreal Neurological Institute
More informationA Scoring Policy for Simulated Soccer Agents Using Reinforcement Learning
A Scoring Policy for Simulated Soccer Agents Using Reinforcement Learning Azam Rabiee Computer Science and Engineering Isfahan University, Isfahan, Iran azamrabiei@yahoo.com Nasser Ghasem-Aghaee Computer
More informationReinforcement Learning
Reinforcement Learning Michèle Sebag ; TP : Herilalaina Rakotoarison TAO, CNRS INRIA Université Paris-Sud Nov. 9h, 28 Credit for slides: Richard Sutton, Freek Stulp, Olivier Pietquin / 44 Introduction
More informationBayesian and Frequentist Approaches
Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law
More informationTitle: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection
Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John
More informationArtificial Intelligence
Artificial Intelligence COMP-241, Level-6 Mohammad Fahim Akhtar, Dr. Mohammad Hasan Department of Computer Science Jazan University, KSA Chapter 2: Intelligent Agents In which we discuss the nature of
More informationThe optimism bias may support rational action
The optimism bias may support rational action Falk Lieder, Sidharth Goel, Ronald Kwan, Thomas L. Griffiths University of California, Berkeley 1 Introduction People systematically overestimate the probability
More informationArtificial Intelligence Lecture 7
Artificial Intelligence Lecture 7 Lecture plan AI in general (ch. 1) Search based AI (ch. 4) search, games, planning, optimization Agents (ch. 8) applied AI techniques in robots, software agents,... Knowledge
More informationSearch e Fall /18/15
Sample Efficient Policy Click to edit Master title style Search Click to edit Emma Master Brunskill subtitle style 15-889e Fall 2015 11 Sample Efficient RL Objectives Probably Approximately Correct Minimizing
More informationCS343: Artificial Intelligence
CS343: Artificial Intelligence Introduction: Part 2 Prof. Scott Niekum University of Texas at Austin [Based on slides created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All materials
More informationKECERDASAN BUATAN 3. By Sirait. Hasanuddin Sirait, MT
KECERDASAN BUATAN 3 By @Ir.Hasanuddin@ Sirait Why study AI Cognitive Science: As a way to understand how natural minds and mental phenomena work e.g., visual perception, memory, learning, language, etc.
More informationSeminar Thesis: Efficient Planning under Uncertainty with Macro-actions
Seminar Thesis: Efficient Planning under Uncertainty with Macro-actions Ragnar Mogk Department of Computer Science Technische Universität Darmstadt ragnar.mogk@stud.tu-darmstadt.de 1 Introduction This
More informationUsing Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes
Using Eligibility Traces to Find the est Memoryless Policy in Partially Observable Markov Decision Processes John Loch Department of Computer Science University of Colorado oulder, CO 80309-0430 loch@cs.colorado.edu
More informationArtificial Intelligence Intelligent agents
Artificial Intelligence Intelligent agents Peter Antal antal@mit.bme.hu A.I. September 11, 2015 1 Agents and environments. The concept of rational behavior. Environment properties. Agent structures. Decision
More informationPlan Recognition through Goal Graph Analysis
Plan Recognition through Goal Graph Analysis Jun Hong 1 Abstract. We present a novel approach to plan recognition based on a two-stage paradigm of graph construction and analysis. First, a graph structure
More informationPOND-Hindsight: Applying Hindsight Optimization to POMDPs
POND-Hindsight: Applying Hindsight Optimization to POMDPs Alan Olsen and Daniel Bryce alan@olsen.org, daniel.bryce@usu.edu Utah State University Logan, UT Abstract We present the POND-Hindsight entry in
More informationBayesian Reinforcement Learning
Bayesian Reinforcement Learning Rowan McAllister and Karolina Dziugaite MLG RCC 21 March 2013 Rowan McAllister and Karolina Dziugaite (MLG RCC) Bayesian Reinforcement Learning 21 March 2013 1 / 34 Outline
More informationSolutions for Chapter 2 Intelligent Agents
Solutions for Chapter 2 Intelligent Agents 2.1 This question tests the student s understanding of environments, rational actions, and performance measures. Any sequential environment in which rewards may
More informationSequential Decision Making
Sequential Decision Making Sequential decisions Many (most) real world problems cannot be solved with a single action. Need a longer horizon Ex: Sequential decision problems We start at START and want
More informationReinforcement Learning of Hierarchical Skills on the Sony Aibo robot
Reinforcement Learning of Hierarchical Skills on the Sony Aibo robot Vishal Soni and Satinder Singh Computer Science and Engineering University of Michigan, Ann Arbor {soniv, baveja}@umich.edu Abstract
More informationA HMM-based Pre-training Approach for Sequential Data
A HMM-based Pre-training Approach for Sequential Data Luca Pasa 1, Alberto Testolin 2, Alessandro Sperduti 1 1- Department of Mathematics 2- Department of Developmental Psychology and Socialisation University
More informationModeling Human Understanding of Complex Intentional Action with a Bayesian Nonparametric Subgoal Model
Modeling Human Understanding of Complex Intentional Action with a Bayesian Nonparametric Subgoal Model Ryo Nakahashi Chris L. Baker and Joshua B. Tenenbaum Computer Science and Artificial Intelligence
More informationUSE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1
Ecology, 75(3), 1994, pp. 717-722 c) 1994 by the Ecological Society of America USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1 OF CYNTHIA C. BENNINGTON Department of Biology, West
More informationSLAUGHTER PIG MARKETING MANAGEMENT: UTILIZATION OF HIGHLY BIASED HERD SPECIFIC DATA. Henrik Kure
SLAUGHTER PIG MARKETING MANAGEMENT: UTILIZATION OF HIGHLY BIASED HERD SPECIFIC DATA Henrik Kure Dina, The Royal Veterinary and Agricuural University Bülowsvej 48 DK 1870 Frederiksberg C. kure@dina.kvl.dk
More information1 What is an Agent? CHAPTER 2: INTELLIGENT AGENTS
1 What is an Agent? CHAPTER 2: INTELLIGENT AGENTS http://www.csc.liv.ac.uk/ mjw/pubs/imas/ The main point about agents is they are autonomous: capable of acting independently, exhibiting control over their
More informationGene Selection for Tumor Classification Using Microarray Gene Expression Data
Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology
More informationEvolutionary Programming
Evolutionary Programming Searching Problem Spaces William Power April 24, 2016 1 Evolutionary Programming Can we solve problems by mi:micing the evolutionary process? Evolutionary programming is a methodology
More informationSurvival Skills for Researchers. Study Design
Survival Skills for Researchers Study Design Typical Process in Research Design study Collect information Generate hypotheses Analyze & interpret findings Develop tentative new theories Purpose What is
More informationObjectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests
Objectives Quantifying the quality of hypothesis tests Type I and II errors Power of a test Cautions about significance tests Designing Experiments based on power Evaluating a testing procedure The testing
More informationAI: Intelligent Agents. Chapter 2
AI: Intelligent Agents Chapter 2 Outline Agents and environments Rationality PEAS (Performance measure, Environment, Actuators, Sensors) Environment types Agent types Agents An agent is anything
More informationTwo-sided Bandits and the Dating Market
Two-sided Bandits and the Dating Market Sanmay Das Center for Biological and Computational Learning Massachusetts Institute of Technology Cambridge, MA 02139 sanmay@mit.edu Emir Kamenica Department of
More informationAgents and Environments
Agents and Environments Berlin Chen 2004 Reference: 1. S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Chapter 2 AI 2004 Berlin Chen 1 What is an Agent An agent interacts with its
More informationMITOCW conditional_probability
MITOCW conditional_probability You've tested positive for a rare and deadly cancer that afflicts 1 out of 1000 people, based on a test that is 99% accurate. What are the chances that you actually have
More informationDecision Analysis. John M. Inadomi. Decision trees. Background. Key points Decision analysis is used to compare competing
5 Decision Analysis John M. Inadomi Key points Decision analysis is used to compare competing strategies of management under conditions of uncertainty. Various methods may be employed to construct a decision
More informationLec 02: Estimation & Hypothesis Testing in Animal Ecology
Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then
More informationSparse Coding in Sparse Winner Networks
Sparse Coding in Sparse Winner Networks Janusz A. Starzyk 1, Yinyin Liu 1, David Vogel 2 1 School of Electrical Engineering & Computer Science Ohio University, Athens, OH 45701 {starzyk, yliu}@bobcat.ent.ohiou.edu
More informationProgress in Risk Science and Causality
Progress in Risk Science and Causality Tony Cox, tcoxdenver@aol.com AAPCA March 27, 2017 1 Vision for causal analytics Represent understanding of how the world works by an explicit causal model. Learn,
More informationChapter 11. Experimental Design: One-Way Independent Samples Design
11-1 Chapter 11. Experimental Design: One-Way Independent Samples Design Advantages and Limitations Comparing Two Groups Comparing t Test to ANOVA Independent Samples t Test Independent Samples ANOVA Comparing
More informationOutlier Analysis. Lijun Zhang
Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based
More informationIntelligent Agents. Russell and Norvig: Chapter 2
Intelligent Agents Russell and Norvig: Chapter 2 Intelligent Agent? sensors agent actuators percepts actions environment Definition: An intelligent agent perceives its environment via sensors and acts
More informationMarcus Hutter Canberra, ACT, 0200, Australia
Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ Australian National University Abstract The approaches to Artificial Intelligence (AI) in the last century may be labelled as (a) trying
More informationERA: Architectures for Inference
ERA: Architectures for Inference Dan Hammerstrom Electrical And Computer Engineering 7/28/09 1 Intelligent Computing In spite of the transistor bounty of Moore s law, there is a large class of problems
More informationRemarks on Bayesian Control Charts
Remarks on Bayesian Control Charts Amir Ahmadi-Javid * and Mohsen Ebadi Department of Industrial Engineering, Amirkabir University of Technology, Tehran, Iran * Corresponding author; email address: ahmadi_javid@aut.ac.ir
More informationTime Experiencing by Robotic Agents
Time Experiencing by Robotic Agents Michail Maniadakis 1 and Marc Wittmann 2 and Panos Trahanias 1 1- Foundation for Research and Technology - Hellas, ICS, Greece 2- Institute for Frontier Areas of Psychology
More informationHebbian Plasticity for Improving Perceptual Decisions
Hebbian Plasticity for Improving Perceptual Decisions Tsung-Ren Huang Department of Psychology, National Taiwan University trhuang@ntu.edu.tw Abstract Shibata et al. reported that humans could learn to
More informationBayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis
Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Thesis Proposal Indrayana Rustandi April 3, 2007 Outline Motivation and Thesis Preliminary results: Hierarchical
More informationIntroduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018
Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this
More informationArtificial Intelligence Programming Probability
Artificial Intelligence Programming Probability Chris Brooks Department of Computer Science University of San Francisco Department of Computer Science University of San Francisco p.1/25 17-0: Uncertainty
More informationBayesian Nonparametric Methods for Precision Medicine
Bayesian Nonparametric Methods for Precision Medicine Brian Reich, NC State Collaborators: Qian Guan (NCSU), Eric Laber (NCSU) and Dipankar Bandyopadhyay (VCU) University of Illinois at Urbana-Champaign
More informationSUPPLEMENTAL MATERIAL
1 SUPPLEMENTAL MATERIAL Response time and signal detection time distributions SM Fig. 1. Correct response time (thick solid green curve) and error response time densities (dashed red curve), averaged across
More informationCitation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.
University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document
More informationRepresenting Problems (and Plans) Using Imagery
Representing Problems (and Plans) Using Imagery Samuel Wintermute University of Michigan 2260 Hayward St. Ann Arbor, MI 48109-2121 swinterm@umich.edu Abstract In many spatial problems, it can be difficult
More informationBelief Management for Autonomous Robots using History-Based Diagnosis
Belief Management for Autonomous Robots using History-Based Diagnosis Stephan Gspandl, Ingo Pill, Michael Reip, Gerald Steinbauer Institute for Software Technology Graz University of Technology Inffeldgasse
More informationIntelligent Machines That Act Rationally. Hang Li Bytedance AI Lab
Intelligent Machines That Act Rationally Hang Li Bytedance AI Lab Four Definitions of Artificial Intelligence Building intelligent machines (i.e., intelligent computers) Thinking humanly Acting humanly
More informationIntelligent Machines That Act Rationally. Hang Li Toutiao AI Lab
Intelligent Machines That Act Rationally Hang Li Toutiao AI Lab Four Definitions of Artificial Intelligence Building intelligent machines (i.e., intelligent computers) Thinking humanly Acting humanly Thinking
More informationEmergence of Emotional Appraisal Signals in Reinforcement Learning Agents
Autonomous Agents and Multiagent Systems manuscript No. (will be inserted by the editor) Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents Pedro Sequeira Francisco S. Melo Ana Paiva
More informationExploring the Influence of Particle Filter Parameters on Order Effects in Causal Learning
Exploring the Influence of Particle Filter Parameters on Order Effects in Causal Learning Joshua T. Abbott (joshua.abbott@berkeley.edu) Thomas L. Griffiths (tom griffiths@berkeley.edu) Department of Psychology,
More informationAnalyses of Markov decision process structure regarding the possible strategic use of interacting memory systems
COMPUTATIONAL NEUROSCIENCE ORIGINAL RESEARCH ARTICLE published: 24 December 2008 doi: 10.3389/neuro.10.006.2008 Analyses of Markov decision process structure regarding the possible strategic use of interacting
More informationLecture 13: Finding optimal treatment policies
MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 13: Finding optimal treatment policies Prof. David Sontag MIT EECS, CSAIL, IMES (Thanks to Peter Bodik for slides on reinforcement learning) Outline
More informationSawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.
Sawtooth Software RESEARCH PAPER SERIES MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB Bryan Orme, Sawtooth Software, Inc. Copyright 009, Sawtooth Software, Inc. 530 W. Fir St. Sequim,
More informationI. INTRODUCTION /$ IEEE 70 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 2, NO. 2, JUNE 2010
70 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 2, NO. 2, JUNE 2010 Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective Satinder Singh, Richard L. Lewis, Andrew G. Barto,
More informationExpert System Profile
Expert System Profile GENERAL Domain: Medical Main General Function: Diagnosis System Name: INTERNIST-I/ CADUCEUS (or INTERNIST-II) Dates: 1970 s 1980 s Researchers: Ph.D. Harry Pople, M.D. Jack D. Myers
More informationIntroduction to Artificial Intelligence 2 nd semester 2016/2017. Chapter 2: Intelligent Agents
Introduction to Artificial Intelligence 2 nd semester 2016/2017 Chapter 2: Intelligent Agents Mohamed B. Abubaker Palestine Technical College Deir El-Balah 1 Agents and Environments An agent is anything
More informationPlan Recognition through Goal Graph Analysis
Plan Recognition through Goal Graph Analysis Jun Hong 1 Abstract. We present a novel approach to plan recognition based on a two-stage paradigm of graph construction and analysis. First, a graph structure
More informationArtificial Intelligence
Artificial Intelligence Intelligent Agents Chapter 2 & 27 What is an Agent? An intelligent agent perceives its environment with sensors and acts upon that environment through actuators 2 Examples of Agents
More informationUsing Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s
Using Bayesian Networks to Analyze Expression Data Xu Siwei, s0789023 Muhammad Ali Faisal, s0677834 Tejal Joshi, s0677858 Outline Introduction Bayesian Networks Equivalence Classes Applying to Expression
More informationBayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm
Journal of Social and Development Sciences Vol. 4, No. 4, pp. 93-97, Apr 203 (ISSN 222-52) Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm Henry De-Graft Acquah University
More informationBayesian (Belief) Network Models,
Bayesian (Belief) Network Models, 2/10/03 & 2/12/03 Outline of This Lecture 1. Overview of the model 2. Bayes Probability and Rules of Inference Conditional Probabilities Priors and posteriors Joint distributions
More informationBayesian models of inductive generalization
Bayesian models of inductive generalization Neville E. Sanjana & Joshua B. Tenenbaum Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Cambridge, MA 239 nsanjana, jbt @mit.edu
More informationSheila Barron Statistics Outreach Center 2/8/2011
Sheila Barron Statistics Outreach Center 2/8/2011 What is Power? When conducting a research study using a statistical hypothesis test, power is the probability of getting statistical significance when
More informationA Reinforcement Learning Approach Involving a Shortest Path Finding Algorithm
Proceedings of the 003 IEEE/RSJ Intl. Conference on Intelligent Robots and Systems Proceedings Las Vegas, of Nevada the 003 October IEEE/RSJ 003 Intl. Conference on Intelligent Robots and Systems Las Vegas,
More informationIntroduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods
Introduction Patrick Breheny January 10 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/25 Introductory example: Jane s twins Suppose you have a friend named Jane who is pregnant with twins
More informationProbabilistic Graphical Models: Applications in Biomedicine
Probabilistic Graphical Models: Applications in Biomedicine L. Enrique Sucar, INAOE Puebla, México May 2012 What do you see? What we see depends on our previous knowledge (model) of the world and the information
More informationHow Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis?
How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? Richards J. Heuer, Jr. Version 1.2, October 16, 2005 This document is from a collection of works by Richards J. Heuer, Jr.
More informationTHE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER
THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER Introduction, 639. Factor analysis, 639. Discriminant analysis, 644. INTRODUCTION
More informationFurther Properties of the Priority Rule
Further Properties of the Priority Rule Michael Strevens Draft of July 2003 Abstract In Strevens (2003), I showed that science s priority system for distributing credit promotes an allocation of labor
More informationLearning Navigational Maps by Observing the Movement of Crowds
Learning Navigational Maps by Observing the Movement of Crowds Simon T. O Callaghan Australia, NSW s.ocallaghan@acfr.usyd.edu.au Surya P. N. Singh Australia, NSW spns@acfr.usyd.edu.au Fabio T. Ramos Australia,
More informationEECS 433 Statistical Pattern Recognition
EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern
More informationCognitive modeling versus game theory: Why cognition matters
Cognitive modeling versus game theory: Why cognition matters Matthew F. Rutledge-Taylor (mrtaylo2@connect.carleton.ca) Institute of Cognitive Science, Carleton University, 1125 Colonel By Drive Ottawa,
More informationA Learning Method of Directly Optimizing Classifier Performance at Local Operating Range
A Learning Method of Directly Optimizing Classifier Performance at Local Operating Range Lae-Jeong Park and Jung-Ho Moon Department of Electrical Engineering, Kangnung National University Kangnung, Gangwon-Do,
More informationA SYSTEM FOR COMPUTER-AIDED DIAGNOSIS
p- %'IK- _'^ PROBLEM- SOLVING STRATEGIES IN A SYSTEM FOR COMPUTER-AIDED DIAGNOSIS 268-67 George Anthony Gorry June 1967 RECEIVED JUN 26 1967 Abstract A system consisting of a diagnostic program and a
More informationMODELING NONCOMPENSATORY CHOICES WITH A COMPENSATORY MODEL FOR A PRODUCT DESIGN SEARCH
Proceedings of the ASME 2015 International Design Engineering Technical Conferences & Computers and Information in Engineering Conference IDETC/CIE 2015 August 2 5, 2015, Boston, Massachusetts, USA DETC2015-47632
More informationKatsunari Shibata and Tomohiko Kawano
Learning of Action Generation from Raw Camera Images in a Real-World-Like Environment by Simple Coupling of Reinforcement Learning and a Neural Network Katsunari Shibata and Tomohiko Kawano Oita University,
More informationA Comparison of Collaborative Filtering Methods for Medication Reconciliation
A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,
More informationDopamine neurons activity in a multi-choice task: reward prediction error or value function?
Dopamine neurons activity in a multi-choice task: reward prediction error or value function? Jean Bellot 1,, Olivier Sigaud 1,, Matthew R Roesch 3,, Geoffrey Schoenbaum 5,, Benoît Girard 1,, Mehdi Khamassi
More informationAssigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science
Assigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science Abstract One method for analyzing pediatric B cell leukemia is to categorize
More information26:010:557 / 26:620:557 Social Science Research Methods
26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate Professor Department of Accounting & Information Systems Rutgers Business School Newark & New Brunswick 1 Overview
More informationBayesian Networks in Medicine: a Model-based Approach to Medical Decision Making
Bayesian Networks in Medicine: a Model-based Approach to Medical Decision Making Peter Lucas Department of Computing Science University of Aberdeen Scotland, UK plucas@csd.abdn.ac.uk Abstract Bayesian
More informationFoundations of AI. 10. Knowledge Representation: Modeling with Logic. Concepts, Actions, Time, & All the Rest
Foundations of AI 10. Knowledge Representation: Modeling with Logic Concepts, Actions, Time, & All the Rest Wolfram Burgard, Andreas Karwath, Bernhard Nebel, and Martin Riedmiller 10/1 Contents Knowledge
More informationDynamic Rule-based Agent
International Journal of Engineering Research and Technology. ISSN 0974-3154 Volume 11, Number 4 (2018), pp. 605-613 International Research Publication House http://www.irphouse.com Dynamic Rule-based
More informationRational Agents (Chapter 2)
Rational Agents (Chapter 2) Agents An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators Example: Vacuum-Agent Percepts:
More informationModel-Based fmri Analysis. Will Alexander Dept. of Experimental Psychology Ghent University
Model-Based fmri Analysis Will Alexander Dept. of Experimental Psychology Ghent University Motivation Models (general) Why you ought to care Model-based fmri Models (specific) From model to analysis Extended
More information