The optimism bias may support rational action
|
|
- Aubrey Williams
- 6 years ago
- Views:
Transcription
1 The optimism bias may support rational action Falk Lieder, Sidharth Goel, Ronald Kwan, Thomas L. Griffiths University of California, Berkeley 1 Introduction People systematically overestimate the probability of good outcomes [1] and systematically underestimate how long it will take to achieve them [2]. From an epistemic perspective, optimism is irrational because it misrepresents the information that we have been given. Yet, surprisingly, optimistic people often perform better than their more realistic peers [1]. How can it be that being irrational leads to better performance than being more rational? In this abstract, we explore a potential solution to this puzzle. Concretely, we we investigate the hypothesis that overestimating the probability of achieving good outcomes can compensate for the cognitive limitations that prevent us from looking far ahead into the future and fully considering the long-term consequences of our actions. Previous work in reinforcement learning has used different notions of optimism to promote learning through exploration and thereby benefits the returns of future decisions [3 7]. Here, we investigate the immediate benefits of optimism for the returns of present actions rather than learning and future returns, and explore a different notion of optimism that formalizes the psychological theory that people overestimate the probability of good events relative to bad events. 2 Model We model the decision environment E as a Markov Decision Process (MDP) [8] E = (S, A, T, γ, r), (1) where S are the states, A are the actions, T are the transition probabilities, γ = 1 is the discount factor, and r is the reward function. We model the agent s internal model M of the environment E by the MDP ( M = S, A, ˆT ), γ, r, (2) whose transition probabilities ˆT may differ from the true probabilities T. Concretely, we use this distortion of the transition probabilities to model optimism and pessimism according to ˆT α (s s, a) T (s s, a) sig(v E (s ) V E (s)) α, (3) where sig is the sigmoidal function, V E is the optimal value function of the MDP E, α = 0 corresponds to realism, α > 0 corresponds to optimism, and α < 0 corresponds to pessimism. We model the the implications of bounded cognitive resources on people s performance in sequential decision-making by assuming that they can look only h steps ahead and therefore act according to the policy π h (s) = arg max Q h (s, a), (4) a [ ] Q h (s t, a) = E ˆT r(s t, a, S t+1 ) + max π t+h 1 i=t+1 r(s i, π(s i ), S i+1 ), (5) where the expectation E ˆT is taken with respect to the subjective transition probabilities ˆT. We compute this solution using backward induction with planning horizon h [9]. 1
2 Figure 1: Illustration of the MDP structure is in the simulations and Experiment. 3 Simulation of sequential decision-making with delayed rewards One of the key challenges in decision-making is that immediate rewards are sometimes misaligned with long-term rewards. This problem is exacerbated by the limited number of steps that agents can plan ahead. Hence, decision-makers tend to underweight distant uncertain rewards relative to immediate certain rewards [10]. Our theory predicts that optimism can compensate for this problem. To illustrate this prediction we simulated the effect of optimism on a bounded agent s performance in the sequential decision problem illustrated in Figure 1. The states s 0,, s 100 correspond to having completed 0% to 100% of the work needed to reach the goal. At each point in time the agent can choose between working towards a goal (a 1 ) which costs effort and resources (r 1 = 1) versus leisure (a 0 ) which generates a small reward (r 0 = +1) but does not advance the agent s progress. The states represent the agent s progress towards the goal. Once the goal has been achieved (S t = 100%) the agent can reap a large reward (r 2 = +100). This task is a finite-horizon MDP lasting 12 rounds. The agent can plan only 5 rounds ahead (h = 5), and its average rate of progress when working towards the goal for one round is 20%: T (s s, a 1 ) = Binomial (s 1 s 1, 100, 0.2). (6) To simulate the effects of optimism and pessimism on decision-making in this environment, we computing the policy π 5 (Equation 5) for the internal models M pessimism, M realism, and M pessimism with ˆT α for α pessimism = 10, α realism = 0, and α optimism = 10 respectively, and simulated the performance of the resulting myopic policies in the environment E. We found that the bounded agent whose model of the world was accurate (myopic realism) performed much worse than the optimal policy. While the optimal policy chooses action a 1 until it reaches the goal state in almost all cases, the myopic realistic agent does always chooses action a 0 in state s 0 (0%) and consequently never reached the goal state (s 100 ). By contrast, the optimistic bounded agent chose to always invest effort (a 1 ) in the initial state and consequently performed optimally (see Figure 2A). The pessimistic bounded agent performed at the same level as the realistic one. Note that the optimistic agent exhibited the planning fallacy [2]: while the expected completion time following the optimal policy was 4.5 months the optimistic agent s estimate of the expected completion time was only 3.9 months. This made it worthwhile for the optimistic bounded agent to pursue the goal even though it was thinking only 5 steps ahead. Thus, the irrational planning fallacy led to rational action. Hence, according to our theory, people with a very accurate model of the world might perform worse in sequential decision problems with the chain structure shown in Figure 1 than their more optimistic peers. To test this prediction, we have planned an experiment that will induces people to be either optimistic, realistic, or pessimistic and then measures their performance in the chain-structured MDP shown in Figure 1, and conducted a pilot experiment to tune the proposed experimental design. 4 Pilot Experiment To evaluate the effectiveness of our experimental manipulations and determine the decisions that our model would predict for the resulting internal models we conducted a pilot experiment. 2
3 A B Figure 2: A: Performance of the optimistic, realistic, pessimistic myopic bounded agents and the optimal policy in the decision environment shown in Figure 1. B: Simulation of the main experiment by condition depending on how many steps people can plan ahead. 4.1 Methods We recruited 200 adult participants on Amazon Mechanical Turk. Participants received $0.50 for about 6 minutes of work. Eight participants were excluded since their answers to the survey questions indicated that they had not performed the task. Participants solved a sequential decision problem with the structure shown in Figure 1. To convey this structure to our participants we created a game called Product Manager. In this game participants play the manager of a car company. In each round (month) the participant decides whether the company will focus on marketing the existing product SportsCar (a 0 ) or invest in the development of a new product HoverCar (a 1 ). Participants started with a capital of and their task was to maximize the company s capital after 4, 12, 24, or 72 months. The reward for marketing the old product was drawn from a normal distribution with mean $ and standard deviation $1 000 (r 0 N (µ = , σ = 1000)), the reward for investing into development was drawn from a normal distribution with mean $ and standard deviation $1 500 (r 1 N (µ = , σ = 1 500)), and the reward for marketing the new product was normally distributed with a mean of $ and standard deviation $ (r 2 N (µ = , σ = )). In each round the participant was shown the current state (e.g., HoverCar is currently 0% developed. ), the number of the current round and the total number of rounds, their current balance, and the rewards of their most recent decision. The experiment was structured into four blocks: instructions, training block, and a survey. The instructions introduced the game and informed the participants about the number of rounds, the return on marketing the existing product the cost of developing the new product, and the return on marketing the new product. In the training block the participants task was to explore the effects of investing in development versus marketing in three simulations lasting 10 rounds each. Finally, the survey asked participants to estimate the average rate of progress that occurred when they decided to invest into development and marketing respectively. Each participant was randomly assigned to one of three experimental conditions. The three experimental conditions differed in the rate of progressed used in the simulations of the training phase and were designed to induce pessimism, realism, and optimism respectively. In the pessimism condition, the average rate of progress in the training block was half the true rate of progress; in the realism condition, it was equal to the true rate of progress; and in the optimism condition, it was twice the true rate of progress. The true rate of progress was set such the expected number of investments needed to reach 100% development was 80% of the total duration of the game. 4.2 Results and Discussion To examine the effectiveness of our experimental manipulations we compared the three groups estimates of the average amount of progress achieved by a single investment in product development. We found that people estimated the rate of progress to be higher in the optimism condition than in the realism condition (t(123.7) = 2.30, p = 0.01) and the pessimism condition (t(126.2) = 2.59, p < 0.01). The difference between the realism condition and the pessimism condition was not 3
4 statistically significant (t(122.5) = 0.53, p = 0.30). In conclusion, our experimental manipulation was successful at inducing optimism and created significant group differences. Furthermore, we found that each group s estimate of the rate of progress was significantly higher than the rate of progress in the examples they had observed. In the pessimism condition, people overestimated the rate of progress by 12.4% (t(65) = 4.16, p < ). In the realism condition, participants overestimated the presented rate of progress by 7.4% (t(61) = 2.94, p = ), and in the optimism condition, people overestimated the presented rate of progress by 5.2% (t(63) = 1.66, p = 0.05). The overestimation of the frequency of positive events is consistent with the optimism bias [1]. Interestingly, the optimism bias decreased with the true frequency. Hence, at least in our study, the optimism bias could result from Bayesian inference with an optimistic prior. 5 Planned Main Experiment The main experiment will use the paradigm used in the pilot experiment with the addition of a test block following the training block. In the test block participants will play the Product Manager game for 4, 12, 24, or 72 rounds starting at 0% progress. Participants will receive a financial bonus of up to $2 proportional to their capital at the end of the test phase. 5.1 Model Predictions We used the subjective transition probabilities induced by our experimental manipulations in the pilot experiment to derive our model s prediction for the results of the main experiment. As shown in Figure 2B, our simulation shows that optimism shortens the number of steps that people have to plan ahead to realize that they should invest into product development. Our model predicts that when the game lasts only four rounds, then participants in the optimism condition will invest but participants in the realism and pessimism condition will not. When the game lasts 12 or more rounds, then the benefit of optimism over realism depends on how many steps people can plan ahead. 5.2 Next Steps Since it appears plausible that people would plan less than 6 steps ahead, we will refine the experimental manipulations to induce larger differences between the three groups subjective transition probabilities such that our model predicts a benefit of optimism in the 12-month condition even if people plan only 5 steps ahead. We will test this prediction by running the main experiment with the experimental manipulations determined through iterative piloting. We will use the data to test our theory s prediction that optimism improves people s performance in sequential decision problems in which determining the best action requires planning ahead many steps. 6 Discussion Our theory suggests that the optimism bias might serve to improve people s decisions in environments in which the rewards for prolonged cumulative effort justify forgoing immediate gratification. According to our model the optimism bias achieves this by compensating for the cognitive limitations that prevent us from looking far enough into the future to fully consider long-term consequences unless we underestimate how long it will take to achieve them. Hence, at least in some cases, the planning fallacy [2] helps us make better decisions. Hence, it may be a sign of bounded rationality rather than irrationality. This abstract continues the line of work begun by [11] by generalizing the definition of optimism proposed therein from a specific class of decision problems to general decision problems and testing its predictions empirically. We have demonstrated that this extension can capture the benefits of optimism when obtaining a high reward requires persistent effort. Furthermore, the proposed experiment will be the first to empirically test our boundedly rational theory of optimism. The beneficial effects of optimism illustrate that for bounded agents there is a tension between epistemic rationality (having beliefs that are as accurate as possible) versus instrumental rationality (choosing the actions that maximize one s expected utility). Concretely, our simulations suggest 4
5 that bounded agents have to be epistemically irrational to achieve instrumental rationality [12]. This might be the deeper reason for why we are optimistic for ourselves but not for others [2]. In conclusion, our theory suggests that optimism and the planning fallacy might not be irrational after all but reflect the rational use of limited cognitive resources [13, 14]. Acknowledgments. This work was supported by ONR MURI N References [1] T. Sharot, The optimism bias, Current Biology, vol. 21, no. 23, pp. R941 R945, [2] R. Buehler, D. Griffin, and M. Ross, Exploring the planning fallacy : Why people underestimate their task completion times., Journal of personality and social psychology, vol. 67, no. 3, p. 366, [3] R. S. Sutton, Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, in Proceedings of the seventh international conference on machine learning, pp , [4] P. Auer, Using confidence bounds for exploitation-exploration trade-offs, J. Mach. Learn. Res., vol. 3, pp , Mar [5] I. Szita and A. Lőrincz, The many faces of optimism: a unifying approach, in Proceedings of the 25th international conference on Machine learning, pp , ACM, [6] P. Sunehag and M. Hutter, Rationality, optimism and guarantees in general reinforcement learning, Journal of Machine Learning Research, vol. 16, pp , [7] P. Sunehag and M. Hutter, A dual process theory of optimistic cognition, in Proceedings of the 36th Annual Conference of the Cognitive Science Society (P. Bello, M. Guarini, M. McShane, and B. Scassellati, eds.), (Austin, TX), Cognitive Science Society, [8] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. Cambridge: MIT Press, [9] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, [10] J. Myerson and L. Green, Discounting of delayed rewards: Models of individual choice., Journal of the experimental analysis of behavior, vol. 64, pp , Nov [11] R. Neumann, A. N. Rafferty, and T. L. Griffiths, A bounded rationality account of wishful thinking, in Proceedings of the 36th Annual Conference of the Cognitive Science Society (P. Bello, M. Guarini, M. McShane, and B. Scassellati, eds.), (Austin, TX), Cognitive Science Society, [12] F. Lieder, M. Hsu, and T. L. Griffiths, The high availability of extreme events serves resource-rational decision-making, in Proceedings of the 36th Annual Conference of the cognitive science society (P. Bello, M. Guarini, M. McShane, and B. Scassellati, eds.), (Austin, TX), Cognitive Science Society, [13] T. L. Griffiths, F. Lieder, and N. D. Goodman, Rational use of cognitive resources: Levels of analysis between the computational and the algorithmic, Topics in Cognitive Science, vol. 7, no. 2, pp , [14] F. Lieder, T. L. Griffiths, and N. D. Goodman, Burn-in, bias, and the rationality of anchoring, in Adv. Neural Inf. Process. Syst. 25 (P. Bartlett, F. C. N. Pereira, L. Bottou, C. J. C. Burges, and K. Q. Weinberger, eds.),
Sequential Decision Making
Sequential Decision Making Sequential decisions Many (most) real world problems cannot be solved with a single action. Need a longer horizon Ex: Sequential decision problems We start at START and want
More informationTwo-sided Bandits and the Dating Market
Two-sided Bandits and the Dating Market Sanmay Das Center for Biological and Computational Learning Massachusetts Institute of Technology Cambridge, MA 02139 sanmay@mit.edu Emir Kamenica Department of
More informationBurn-in, bias, and the rationality of anchoring
Burn-in, bias, and the rationality of anchoring Falk Lieder Translational Neuromodeling Unit ETH and University of Zurich lieder@biomed.ee.ethz.ch Thomas L. Griffiths Psychology Department University of
More informationLecture 13: Finding optimal treatment policies
MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 13: Finding optimal treatment policies Prof. David Sontag MIT EECS, CSAIL, IMES (Thanks to Peter Bodik for slides on reinforcement learning) Outline
More informationUsing Heuristic Models to Understand Human and Optimal Decision-Making on Bandit Problems
Using Heuristic Models to Understand Human and Optimal Decision-Making on andit Problems Michael D. Lee (mdlee@uci.edu) Shunan Zhang (szhang@uci.edu) Miles Munro (mmunro@uci.edu) Mark Steyvers (msteyver@uci.edu)
More informationReinforcement Learning
Reinforcement Learning Michèle Sebag ; TP : Herilalaina Rakotoarison TAO, CNRS INRIA Université Paris-Sud Nov. 9h, 28 Credit for slides: Richard Sutton, Freek Stulp, Olivier Pietquin / 44 Introduction
More informationHuman and Optimal Exploration and Exploitation in Bandit Problems
Human and Optimal Exploration and ation in Bandit Problems Shunan Zhang (szhang@uci.edu) Michael D. Lee (mdlee@uci.edu) Miles Munro (mmunro@uci.edu) Department of Cognitive Sciences, 35 Social Sciences
More informationA resource-rational analysis of human planning
A resource-rational analysis of human planning Frederick Callaway 1, Falk Lieder 1, Priyam Das, Sayan Gul, Paul M. Krueger & Thomas L. Griffiths Department of Psychology, University of California, Berkeley
More informationBayesian Reinforcement Learning
Bayesian Reinforcement Learning Rowan McAllister and Karolina Dziugaite MLG RCC 21 March 2013 Rowan McAllister and Karolina Dziugaite (MLG RCC) Bayesian Reinforcement Learning 21 March 2013 1 / 34 Outline
More informationAdversarial Decision-Making
Adversarial Decision-Making Brian J. Stankiewicz University of Texas, Austin Department Of Psychology & Center for Perceptual Systems & Consortium for Cognition and Computation February 7, 2006 Collaborators
More informationGeneric Priors Yield Competition Between Independently-Occurring Preventive Causes
Powell, D., Merrick, M. A., Lu, H., & Holyoak, K.J. (2014). Generic priors yield competition between independentlyoccurring preventive causes. In B. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.),
More informationLearning to Identify Irrelevant State Variables
Learning to Identify Irrelevant State Variables Nicholas K. Jong Department of Computer Sciences University of Texas at Austin Austin, Texas 78712 nkj@cs.utexas.edu Peter Stone Department of Computer Sciences
More informationAn Understanding of Role of Heuristic on Investment Decisions
International Review of Business and Finance ISSN 0976-5891 Volume 9, Number 1 (2017), pp. 57-61 Research India Publications http://www.ripublication.com An Understanding of Role of Heuristic on Investment
More informationSequential search behavior changes according to distribution shape despite having a rank-based goal
Sequential search behavior changes according to distribution shape despite having a rank-based goal Tsunhin John Wong (jwong@mpib-berlin.mpg.de) 1 Jonathan D. Nelson (jonathan.d.nelson@gmail.com) 1,2 Lael
More informationEmpirical evidence for resource-rational anchoring and adjustment
Psychon Bull Rev (2018) 25:775 784 DOI 10.3758/s13423-017-1288-6 BRIEF REPORT Empirical evidence for resource-rational anchoring and adjustment Falk Lieder 1,2 Thomas L. Griffiths 1,5 QuentinJ.M.Huys 2,4
More informationIrrationality in Game Theory
Irrationality in Game Theory Yamin Htun Dec 9, 2005 Abstract The concepts in game theory have been evolving in such a way that existing theories are recasted to apply to problems that previously appeared
More informationWhy so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit task
Why so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit tas Anonymous Author(s) Affiliation Address email Abstract 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
More informationThe distorting effect of deciding to stop sampling
The distorting effect of deciding to stop sampling Anna Coenen and Todd M. Gureckis Department of Psychology, NYU, 6 Washington Place, New York, NY 13 USA {anna.coenen, todd.gureckis}@nyu.edu Abstract
More informationIntroduction to Behavioral Economics Like the subject matter of behavioral economics, this course is divided into two parts:
Economics 142: Behavioral Economics Spring 2008 Vincent Crawford (with very large debts to Colin Camerer of Caltech, David Laibson of Harvard, and especially Botond Koszegi and Matthew Rabin of UC Berkeley)
More informationA Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models
A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models Jonathan Y. Ito and David V. Pynadath and Stacy C. Marsella Information Sciences Institute, University of Southern California
More informationTowards Learning to Ignore Irrelevant State Variables
Towards Learning to Ignore Irrelevant State Variables Nicholas K. Jong and Peter Stone Department of Computer Sciences University of Texas at Austin Austin, Texas 78712 {nkj,pstone}@cs.utexas.edu Abstract
More informationA Ra%onal Perspec%ve on Heuris%cs and Biases. Falk Lieder, Tom Griffiths, & Noah Goodman Computa%onal Cogni%ve Science Lab UC Berkeley
A Ra%onal Perspec%ve on Heuris%cs and Biases Falk Lieder, Tom Griffiths, & Noah Goodman Computa%onal Cogni%ve Science Lab UC Berkeley Outline 1. What is a good heuris%c? How good are the heuris%cs that
More informationA Framework for Sequential Planning in Multi-Agent Settings
A Framework for Sequential Planning in Multi-Agent Settings Piotr J. Gmytrasiewicz and Prashant Doshi Department of Computer Science University of Illinois at Chicago piotr,pdoshi@cs.uic.edu Abstract This
More informationReinforcement Learning and Artificial Intelligence
Reinforcement Learning and Artificial Intelligence PIs: Rich Sutton Michael Bowling Dale Schuurmans Vadim Bulitko plus: Dale Schuurmans Vadim Bulitko Lori Troop Mark Lee Reinforcement learning is learning
More informationChristopher G. Lucas
Christopher G. Lucas PERSONAL INFORMATION Address: 3.31 Informatics Forum 10 Crichton Street Edinburgh, EH8 9AB United Kingdom Email: c.lucas@ed.ac.uk Tel: +44 131 651 3260 EMPLOYMENT 2013 present University
More informationA Cooking Assistance System for Patients with Alzheimers Disease Using Reinforcement Learning
International Journal of Information Technology Vol. 23 No. 2 2017 A Cooking Assistance System for Patients with Alzheimers Disease Using Reinforcement Learning Haipeng Chen 1 and Yeng Chai Soh 2 1 Joint
More informationCS 4365: Artificial Intelligence Recap. Vibhav Gogate
CS 4365: Artificial Intelligence Recap Vibhav Gogate Exam Topics Search BFS, DFS, UCS, A* (tree and graph) Completeness and Optimality Heuristics: admissibility and consistency CSPs Constraint graphs,
More informationReinforcement learning and the brain: the problems we face all day. Reinforcement Learning in the brain
Reinforcement learning and the brain: the problems we face all day Reinforcement Learning in the brain Reading: Y Niv, Reinforcement learning in the brain, 2009. Decision making at all levels Reinforcement
More informationA Model of Dopamine and Uncertainty Using Temporal Difference
A Model of Dopamine and Uncertainty Using Temporal Difference Angela J. Thurnham* (a.j.thurnham@herts.ac.uk), D. John Done** (d.j.done@herts.ac.uk), Neil Davey* (n.davey@herts.ac.uk), ay J. Frank* (r.j.frank@herts.ac.uk)
More informationCognitive modeling versus game theory: Why cognition matters
Cognitive modeling versus game theory: Why cognition matters Matthew F. Rutledge-Taylor (mrtaylo2@connect.carleton.ca) Institute of Cognitive Science, Carleton University, 1125 Colonel By Drive Ottawa,
More informationReferences. Christos A. Ioannou 2/37
Prospect Theory References Tversky, A., and D. Kahneman: Judgement under Uncertainty: Heuristics and Biases, Science, 185 (1974), 1124-1131. Tversky, A., and D. Kahneman: Prospect Theory: An Analysis of
More informationExploring the Influence of Particle Filter Parameters on Order Effects in Causal Learning
Exploring the Influence of Particle Filter Parameters on Order Effects in Causal Learning Joshua T. Abbott (joshua.abbott@berkeley.edu) Thomas L. Griffiths (tom griffiths@berkeley.edu) Department of Psychology,
More informationReinforcement Learning : Theory and Practice - Programming Assignment 1
Reinforcement Learning : Theory and Practice - Programming Assignment 1 August 2016 Background It is well known in Game Theory that the game of Rock, Paper, Scissors has one and only one Nash Equilibrium.
More informationCooperation in Risky Environments: Decisions from Experience in a Stochastic Social Dilemma
Cooperation in Risky Environments: Decisions from Experience in a Stochastic Social Dilemma Florian Artinger (artinger@mpib-berlin.mpg.de) Max Planck Institute for Human Development, Lentzeallee 94, 14195
More informationThe Game Prisoners Really Play: Preference Elicitation and the Impact of Communication
The Game Prisoners Really Play: Preference Elicitation and the Impact of Communication Michael Kosfeld University of Zurich Ernst Fehr University of Zurich October 10, 2003 Unfinished version: Please do
More informationPOND-Hindsight: Applying Hindsight Optimization to POMDPs
POND-Hindsight: Applying Hindsight Optimization to POMDPs Alan Olsen and Daniel Bryce alan@olsen.org, daniel.bryce@usu.edu Utah State University Logan, UT Abstract We present the POND-Hindsight entry in
More informationMarcus Hutter Canberra, ACT, 0200, Australia
Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ Australian National University Abstract The approaches to Artificial Intelligence (AI) in the last century may be labelled as (a) trying
More informationChoice adaptation to increasing and decreasing event probabilities
Choice adaptation to increasing and decreasing event probabilities Samuel Cheyette (sjcheyette@gmail.com) Dynamic Decision Making Laboratory, Carnegie Mellon University Pittsburgh, PA. 15213 Emmanouil
More informationA Scoring Policy for Simulated Soccer Agents Using Reinforcement Learning
A Scoring Policy for Simulated Soccer Agents Using Reinforcement Learning Azam Rabiee Computer Science and Engineering Isfahan University, Isfahan, Iran azamrabiei@yahoo.com Nasser Ghasem-Aghaee Computer
More informationUsing Inverse Planning and Theory of Mind for Social Goal Inference
Using Inverse Planning and Theory of Mind for Social Goal Inference Sean Tauber (sean.tauber@uci.edu) Mark Steyvers (mark.steyvers@uci.edu) Department of Cognitive Sciences, University of California, Irvine
More informationEvaluation of Linguistic Labels Used in Applications
Evaluation of Linguistic Labels Used in Applications Hanan Hibshi March 2016 CMU-ISR-16-104 Travis Breaux Institute for Software Research School of Computer Science Carnegie Mellon University Pittsburgh,
More informationSeminar Thesis: Efficient Planning under Uncertainty with Macro-actions
Seminar Thesis: Efficient Planning under Uncertainty with Macro-actions Ragnar Mogk Department of Computer Science Technische Universität Darmstadt ragnar.mogk@stud.tu-darmstadt.de 1 Introduction This
More informationDiscovering Inductive Biases in Categorization through Iterated Learning
Discovering Inductive Biases in Categorization through Iterated Learning Kevin R. Canini (kevin@cs.berkeley.edu) Thomas L. Griffiths (tom griffiths@berkeley.edu) University of California, Berkeley, CA
More informationMarkov Decision Processes for Screening and Treatment of Chronic Diseases
Markov Decision Processes for Screening and Treatment of Chronic Diseases Lauren N. Steimle and Brian T. Denton Abstract In recent years, Markov decision processes (MDPs) and partially obserable Markov
More informationEffectively Learning from Pedagogical Demonstrations
Effectively Learning from Pedagogical Demonstrations Mark K Ho (mark ho@brown.edu) Department of Cognitive, Linguistic, and Psychological Sciences, 190 Thayer Street Providence, RI 02906 USA Michael L.
More informationDynamic Control Models as State Abstractions
University of Massachusetts Amherst From the SelectedWorks of Roderic Grupen 998 Dynamic Control Models as State Abstractions Jefferson A. Coelho Roderic Grupen, University of Massachusetts - Amherst Available
More informationLearning What Others Like: Preference Learning as a Mixed Multinomial Logit Model
Learning What Others Like: Preference Learning as a Mixed Multinomial Logit Model Natalia Vélez (nvelez@stanford.edu) 450 Serra Mall, Building 01-420 Stanford, CA 94305 USA Abstract People flexibly draw
More informationLecture 2: Learning and Equilibrium Extensive-Form Games
Lecture 2: Learning and Equilibrium Extensive-Form Games III. Nash Equilibrium in Extensive Form Games IV. Self-Confirming Equilibrium and Passive Learning V. Learning Off-path Play D. Fudenberg Marshall
More informationHow do People Really Think about Climate Change?
How do People Really Think about Climate Change? #LSESunstein Professor Cass Sunstein Robert Walmsley University Professor at Harvard Law School. Former Administrator of the White House Office of Information
More informationWhy so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit task
Why so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit tas Anonymous Author(s) Affiliation Address email Abstract 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
More informationRemarks on Bayesian Control Charts
Remarks on Bayesian Control Charts Amir Ahmadi-Javid * and Mohsen Ebadi Department of Industrial Engineering, Amirkabir University of Technology, Tehran, Iran * Corresponding author; email address: ahmadi_javid@aut.ac.ir
More informationGeneralization and Theory-Building in Software Engineering Research
Generalization and Theory-Building in Software Engineering Research Magne Jørgensen, Dag Sjøberg Simula Research Laboratory {magne.jorgensen, dagsj}@simula.no Abstract The main purpose of this paper is
More informationA HMM-based Pre-training Approach for Sequential Data
A HMM-based Pre-training Approach for Sequential Data Luca Pasa 1, Alberto Testolin 2, Alessandro Sperduti 1 1- Department of Mathematics 2- Department of Developmental Psychology and Socialisation University
More informationCost-Sensitive Learning for Biological Motion
Olivier Sigaud Université Pierre et Marie Curie, PARIS 6 http://people.isir.upmc.fr/sigaud October 5, 2010 1 / 42 Table of contents The problem of movement time Statement of problem A discounted reward
More informationOn Modeling Human Learning in Sequential Games with Delayed Reinforcements
On Modeling Human Learning in Sequential Games with Delayed Reinforcements Roi Ceren, Prashant Doshi Department of Computer Science University of Georgia Athens, GA 30602 {ceren,pdoshi}@cs.uga.edu Matthew
More informationIndividual Differences in Attention During Category Learning
Individual Differences in Attention During Category Learning Michael D. Lee (mdlee@uci.edu) Department of Cognitive Sciences, 35 Social Sciences Plaza A University of California, Irvine, CA 92697-5 USA
More informationIntroduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018
Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this
More informationCausal Models Interact with Structure Mapping to Guide Analogical Inference
Causal Models Interact with Structure Mapping to Guide Analogical Inference Hee Seung Lee (heeseung@ucla.edu) Keith J. Holyoak (holyoak@lifesci.ucla.edu) Department of Psychology, University of California,
More informationChristian W. Bach and Conrad Heilmann Agent connectedness and backward induction
Christian W. Bach and Conrad Heilmann Agent connectedness and backward induction Working paper Original citation: Bach, Christian W. and Heilmann, Conrad (2009) Agent connectedness and backward induction.
More informationReinforcement Learning as a Framework for Ethical Decision Making
The Workshops of the Thirtieth AAAI Conference on Artificial Intelligence AI, Ethics, and Society: Technical Report WS-16-02 Reinforcement Learning as a Framework for Ethical Decision Making David Abel
More informationRAPID: A Belief Convergence Strategy for Collaborating with Inconsistent Agents
RAPID: A Belief Convergence Strategy for Collaborating with Inconsistent Agents Trevor Sarratt and Arnav Jhala University of California Santa Cruz {tsarratt, jhala}@soe.ucsc.edu Abstract Maintaining an
More informationUsing Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes
Using Eligibility Traces to Find the est Memoryless Policy in Partially Observable Markov Decision Processes John Loch Department of Computer Science University of Colorado oulder, CO 80309-0430 loch@cs.colorado.edu
More informationUNIVERSITY OF CALIFORNIA SANTA CRUZ A STOCHASTIC DYNAMIC MODEL OF THE BEHAVIORAL ECOLOGY OF SOCIAL PLAY
. UNIVERSITY OF CALIFORNIA SANTA CRUZ A STOCHASTIC DYNAMIC MODEL OF THE BEHAVIORAL ECOLOGY OF SOCIAL PLAY A dissertation submitted in partial satisfaction of the requirements for the degree of BACHELOR
More informationIntelligent Machines That Act Rationally. Hang Li Toutiao AI Lab
Intelligent Machines That Act Rationally Hang Li Toutiao AI Lab Four Definitions of Artificial Intelligence Building intelligent machines (i.e., intelligent computers) Thinking humanly Acting humanly Thinking
More informationBottom-Up Model of Strategy Selection
Bottom-Up Model of Strategy Selection Tomasz Smoleń (tsmolen@apple.phils.uj.edu.pl) Jagiellonian University, al. Mickiewicza 3 31-120 Krakow, Poland Szymon Wichary (swichary@swps.edu.pl) Warsaw School
More informationEquilibrium Selection In Coordination Games
Equilibrium Selection In Coordination Games Presenter: Yijia Zhao (yz4k@virginia.edu) September 7, 2005 Overview of Coordination Games A class of symmetric, simultaneous move, complete information games
More informationBayesian Updating: A Framework for Understanding Medical Decision Making
Bayesian Updating: A Framework for Understanding Medical Decision Making Talia Robbins (talia.robbins@rutgers.edu) Pernille Hemmer (pernille.hemmer@rutgers.edu) Yubei Tang (yubei.tang@rutgers.edu) Department
More informationNEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES
NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES Amit Teller 1, David M. Steinberg 2, Lina Teper 1, Rotem Rozenblum 2, Liran Mendel 2, and Mordechai Jaeger 2 1 RAFAEL, POB 2250, Haifa, 3102102, Israel
More informationDopamine neurons activity in a multi-choice task: reward prediction error or value function?
Dopamine neurons activity in a multi-choice task: reward prediction error or value function? Jean Bellot 1,, Olivier Sigaud 1,, Matthew R Roesch 3,, Geoffrey Schoenbaum 5,, Benoît Girard 1,, Mehdi Khamassi
More informationPerfect Bayesian Equilibrium
Perfect Bayesian Equilibrium Econ 400 University of Notre Dame Econ 400 (ND) Perfect Bayesian Equilibrium 1 / 27 Our last equilibrium concept The last equilibrium concept we ll study after Nash eqm, Subgame
More informationKatsunari Shibata and Tomohiko Kawano
Learning of Action Generation from Raw Camera Images in a Real-World-Like Environment by Simple Coupling of Reinforcement Learning and a Neural Network Katsunari Shibata and Tomohiko Kawano Oita University,
More informationA rational analysis of curiosity
A rational analysis of curiosity Rachit Dubey (rach0012@berkeley.edu) Department of Education, University of California at Berkeley, CA, USA Thomas L. Griffiths (tom griffiths@berkeley.edu) Department
More informationAn Economic Model of the Planning Fallacy
An Economic Model of the Planning Fallacy Markus K. Brunnermeier 1, Filippos Papakonstantinou 2 and Jonathan A. Parker 3 1 Princeton University 2 Imperial College 3 Northwestern University Cornell University
More informationSubjective randomness and natural scene statistics
Psychonomic Bulletin & Review 2010, 17 (5), 624-629 doi:10.3758/pbr.17.5.624 Brief Reports Subjective randomness and natural scene statistics Anne S. Hsu University College London, London, England Thomas
More informationSearch e Fall /18/15
Sample Efficient Policy Click to edit Master title style Search Click to edit Emma Master Brunskill subtitle style 15-889e Fall 2015 11 Sample Efficient RL Objectives Probably Approximately Correct Minimizing
More informationModeling Human Understanding of Complex Intentional Action with a Bayesian Nonparametric Subgoal Model
Modeling Human Understanding of Complex Intentional Action with a Bayesian Nonparametric Subgoal Model Ryo Nakahashi Chris L. Baker and Joshua B. Tenenbaum Computer Science and Artificial Intelligence
More informationI. INTRODUCTION /$ IEEE 70 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 2, NO. 2, JUNE 2010
70 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 2, NO. 2, JUNE 2010 Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective Satinder Singh, Richard L. Lewis, Andrew G. Barto,
More informationUtility Maximization and Bounds on Human Information Processing
Topics in Cognitive Science (2014) 1 6 Copyright 2014 Cognitive Science Society, Inc. All rights reserved. ISSN:1756-8757 print / 1756-8765 online DOI: 10.1111/tops.12089 Utility Maximization and Bounds
More informationDesigning a Bayesian randomised controlled trial in osteosarcoma. How to incorporate historical data?
Designing a Bayesian randomised controlled trial in osteosarcoma: How to incorporate historical data? C. Brard, L.V Hampson, M-C Le Deley, G. Le Teuff SCT - Montreal, May 18th 2016 Brard et al. Designing
More informationA Cognitive Model of Strategic Deliberation and Decision Making
A Cognitive Model of Strategic Deliberation and Decision Making Russell Golman (rgolman@andrew.cmu.edu) Carnegie Mellon University, Pittsburgh, PA. Sudeep Bhatia (bhatiasu@sas.upenn.edu) University of
More informationWhen contributions make a difference: Explaining order effects in responsibility attribution
Psychon Bull Rev (212) 19:729 736 DOI 1.3758/s13423-12-256-4 BRIEF REPORT When contributions make a difference: Explaining order effects in responsibility attribution Tobias Gerstenberg & David A. Lagnado
More informationBayesian and Frequentist Approaches
Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law
More information9 research designs likely for PSYC 2100
9 research designs likely for PSYC 2100 1) 1 factor, 2 levels, 1 group (one group gets both treatment levels) related samples t-test (compare means of 2 levels only) 2) 1 factor, 2 levels, 2 groups (one
More informationSawtooth Software. The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? RESEARCH PAPER SERIES
Sawtooth Software RESEARCH PAPER SERIES The Number of Levels Effect in Conjoint: Where Does It Come From and Can It Be Eliminated? Dick Wittink, Yale University Joel Huber, Duke University Peter Zandan,
More informationCS343: Artificial Intelligence
CS343: Artificial Intelligence Introduction: Part 2 Prof. Scott Niekum University of Texas at Austin [Based on slides created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All materials
More informationarxiv: v2 [cs.ai] 26 Sep 2018
Manipulating and Measuring Model Interpretability arxiv:1802.07810v2 [cs.ai] 26 Sep 2018 Forough Poursabzi-Sangdeh forough.poursabzi@microsoft.com Microsoft Research Jennifer Wortman Vaughan jenn@microsoft.com
More informationNEW DIRECTIONS FOR POSITIVE ECONOMICS
part iv... NEW DIRECTIONS FOR POSITIVE ECONOMICS... CAPLIN: CHAP10 2008/1/7 15:51 PAGE 247 #1 CAPLIN: CHAP10 2008/1/7 15:51 PAGE 248 #2 chapter 10... LOOK-UPS AS THE WINDOWS OF THE STRATEGIC SOUL... vincent
More informationThinking and Guessing: Bayesian and Empirical Models of How Humans Search
Thinking and Guessing: Bayesian and Empirical Models of How Humans Search Marta Kryven (mkryven@uwaterloo.ca) Department of Computer Science, University of Waterloo Tomer Ullman (tomeru@mit.edu) Department
More informationForgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting
Forgetful Bayes and myopic planning: Human learning and decision-maing in a bandit setting Shunan Zhang Department of Cognitive Science University of California, San Diego La Jolla, CA 92093 s6zhang@ucsd.edu
More informationOptimal Design of Biomarker-Based Screening Strategies for Early Detection of Prostate Cancer
Optimal Design of Biomarker-Based Screening Strategies for Early Detection of Prostate Cancer Brian Denton Department of Industrial and Operations Engineering University of Michigan, Ann Arbor, MI October
More informationREPORT DOCUMENTATION PAGE
REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
More informationConditional behavior affects the level of evolved cooperation in public good games
Center for the Study of Institutional Diversity CSID Working Paper Series #CSID-2013-007 Conditional behavior affects the level of evolved cooperation in public good games Marco A. Janssen Arizona State
More informationUnderstanding Managerial Decision Risks in IT Project Management: An Integrated Behavioral Decision Analysis Perspective
Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2009 Proceedings Americas Conference on Information Systems (AMCIS) 2009 Understanding Managerial Decision Risks in IT Project Management:
More informationOperant matching. Sebastian Seung 9.29 Lecture 6: February 24, 2004
MIT Department of Brain and Cognitive Sciences 9.29J, Spring 2004 - Introduction to Computational Neuroscience Instructor: Professor Sebastian Seung Operant matching Sebastian Seung 9.29 Lecture 6: February
More informationDynamic Simulation of Medical Diagnosis: Learning in the Medical Decision Making and Learning Environment MEDIC
Dynamic Simulation of Medical Diagnosis: Learning in the Medical Decision Making and Learning Environment MEDIC Cleotilde Gonzalez 1 and Colleen Vrbin 2 1 Dynamic Decision Making Laboratory, Carnegie Mellon
More informationSolutions for Chapter 2 Intelligent Agents
Solutions for Chapter 2 Intelligent Agents 2.1 This question tests the student s understanding of environments, rational actions, and performance measures. Any sequential environment in which rewards may
More informationCognitive Modeling. Lecture 9: Intro to Probabilistic Modeling: Rational Analysis. Sharon Goldwater
Cognitive Modeling Lecture 9: Intro to Probabilistic Modeling: Sharon Goldwater School of Informatics University of Edinburgh sgwater@inf.ed.ac.uk February 8, 2010 Sharon Goldwater Cognitive Modeling 1
More informationPartially-Observable Markov Decision Processes as Dynamical Causal Models. Finale Doshi-Velez NIPS Causality Workshop 2013
Partially-Observable Markov Decision Processes as Dynamical Causal Models Finale Doshi-Velez NIPS Causality Workshop 2013 The POMDP Mindset We poke the world (perform an action) Agent World The POMDP Mindset
More informationDescending Marr s levels: Standard observers are no panacea. to appear in Behavioral & Brain Sciences
Descending Marr s levels: Standard observers are no panacea Commentary on D. Rahnev & R.N. Denison, Suboptimality in Perceptual Decision Making, to appear in Behavioral & Brain Sciences Carlos Zednik carlos.zednik@ovgu.de
More information