Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents

Size: px
Start display at page:

Download "Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents"

Transcription

1 Autonomous Agents and Multiagent Systems manuscript No. (will be inserted by the editor) Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents Pedro Sequeira Francisco S. Melo Ana Paiva Received: date / Accepted: date Abstract The positive impact of emotions in decision-making has long been established in both natural and artificial agents. In the perspective of appraisal theories, emotions complement perceptual information, coloring our sensations and guiding our decision-making. However, when designing autonomous agents, is emotional appraisal the best complement to the perceptions? Mechanisms investigated in affective neuroscience provide support for this hypothesis in biological agents. In this paper, we look for similar support in artificial systems. We adopt the intrinsically motivated reinforcement learning framework to investigate different sources of information that can guide decision-making in learning agents, and an evolutionary approach based on genetic programming to identify a small set of such sources that have the largest impact on the performance of the agent in different tasks, as measured by an external evaluation signal. We then show that these sources of information: (i) are applicable in a wider range of environments than those where the agents evolved; (ii) exhibit interesting correspondences to emotional appraisal-like signals previously proposed in the literature, pointing towards our departing hypothesis that the appraisal process might indeed provide essential information to complement perceptual capabilities and thus guide decision-making. Keywords Emotions Appraisal Theory Intrinsic motivation Genetic programming Reinforcement learning Accepted manuscript version. The final publication, DOI /s , is available at P. Sequeira F.S. Melo A. Paiva INESC-ID / Instituto Superior Técnico, Technical University of Lisbon TagusPark, Edifício IST Porto Salvo, Portugal Tel: Fax: pedro.sequeira@gaips.inesc-id.pt; {fmelo,ana.paiva}@inesc-id.pt

2 2 Pedro Sequeira et al. 1 Introduction Research on psychology, neuroscience and other related areas has established emotions as a powerful adaptive mechanism that influences cognitive and perceptual processing [6, 8, 27]. Emotions indirectly drive behaviors that lead individuals to act, achieve goals and satisfy needs. Studies show that damage to regions of the brain identified as responsible for emotional processing impact the human and animal ability to properly learn aversive stimuli, plan courses of action and, more generally, take decisions that are advantageous for their well-being [3, 7, 17]. Appraisal theories of emotions [9, 10, 16, 18, 28, 32] suggest that they arise from evaluations of specific aspects of the individual s relationship with their environment, providing an adaptive response mechanism to situations occurring therein. In artificial systems, the area of affective computing (AC) also investigated the impact of emotional processing capabilities in the development of autonomous agents, often based in appraisal theories of emotions. Appraisalinspired mechanisms were shown to improve the performance of artificial agents in terms of different metrics, such as robustness, efficiency or believability [23, 29, 31, 33]. In very general terms, computational appraisal models feature an Appraisal derivation model that, together with the perceptual information acquired by the agent, 1 guides its decision process see Fig. 1 [22, 23]. The appraisal signals, 2 provided by such module, also referred to as appraisal variables, translate information about the history of interaction of the agent with its environment that aid decision-making and focus behavior towards dealing with the situation being evaluated [23]. In other words, such signals complement and color the agent s raw perceptions indicating, for example, whether a perception is expected or not, or pleasant or not. Perceptions Appraisal Derivation Perceptual signal Appraisal signals Decision Making Actions Fig. 1 General architecture of an emotional appraisal-based agent [23]. 3 Although one of the driving motivations for the use of emotional appraisalbased agent architectures is the creation of better agents (e.g., agents able to 1 Perceptual information can be of an internal nature, e.g., about goals, needs or beliefs, or external, e.g., about objects or events from the environment. 2 We adopt a rather broad definition of signal. Specifically, we refer to an appraisal signal any emotional appraisal-based information received and processed, in this case, by the decision-making module. 3 The diagram does not aim at providing an accurate representation of existing computational appraisal architectures for autonomous agents, but instead to highlight the point that, in such architectures, the decision making process is driven by both perceptual information from the environment and also by some form of appraisal-based information.

3 Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents 3 Identification (Sec. 3) Validation (Sec. 4) Discussion (Sec. 5) Fig. 2 Roadmap for the study in the paper. We start by identifying optimal sources of information in Section 3. We validate these sources of information in Section 4 and conclude by discussing possible correspondences with appraisal dimensions of emotion in Section 5. successfully perform more complex tasks) one fundamental question remains mostly unaddressed in the literature: in the search for information that may complement an agent s perceptual capabilities, is the emotional appraisal process the best mechanism to provide such information? In this paper we contribute to this question, providing empirical evidence that appraisal-like signals may arise as natural candidates when looking for sources of information to complement an agent s perceptual capabilities. Using an evolutionary approach, we show that such signals emerge as sources of information for artificial agents, providing evolutionary advantages. We thus contribute a computational parallel to the evidence observed in biological systems, where the organisms with the most complex emotional processing capabilities are arguably those most fit to their environment [17, 18, 26]. In our study, we rely on intrinsically motivated reinforcement learning (IMRL) agents [39]. The framework of IMRL provides a principled manner to integrate multiple sources of information in the process of learning and decision-making of artificial agents [38]. 4 As such, it is a framework naturally suited to our investigation. Departing from an initial population of IMRL agents, each relying on different sources of information to guide their decisions, we use genetic programming to select the agents with maximal fitness. 5 This evolutionary process allows us to identify a minimal set of informative signals that provide general and useful information for decision-making. Finally, we establish a correspondence between the identified sources of information and the information associated with appraisal variables usually identified in the specialized literature. Overall, our experimental study highlights the usefulness of appraisal-like processes in identifying different aspects of a task the sources of information that complement an agent s perceptual capabilities in the pursuit for more reliable artificial decision-makers. The paper is organized according to the roadmap sketched in Fig. 2. Section 2 introduces the required background and notation on reinforcement learning. Section 3 identifies a minimal set of signals that provide the most useful information to guide IMRL agents. Section 4 analyzes the general applicability of the identified signals in a set of scenarios inspired by the game of Pac-Man. Finally, Section 5 analyzes the identified signals in light of appraisal theory literature and summarizes our main findings. 4 These complementary sources of information endow the agent with a richer repertoire of behaviors that may successfully overcome agent limitations [33, 40]. 5 In our approach, we use a fitness metric that directly measures the performance of the agent in the underlying task in different scenarios.

4 4 Pedro Sequeira et al. 2 Background As discussed in Section 1, in our study we rely on reinforcement learning (RL) agents. This section reviews basic RL concepts and sets up the notation used throughout the paper. We refer to [13, 42] for a detailed overview of RL. 2.1 Learning and Decision Making At each time step, and depending on its perception of the environment, an RL agent must choose an action from its action repertoire, in order to meet some pre-specified optimality criterion. Actions determine how the state of the environment evolves over time and, depending on such state, different actions have different value for the agent. Typically, the RL agent knows neither the value nor the effect of its actions, and must thus explore its environment and action repertoire before it can adequately select its actions. By state of the environment we refer to any feature of the environment that may be relevant for the agent to choose its actions optimally. Ideally, the agent should be able to unambiguously perceive all such features. Sometimes, however, the agent has limited sensing capabilities and is not able to completely determine the current state of the system. When this is the case, the agent is said to have partial observability. Throughout the paper, most agents considered have partial observability. RL agents can be modeled using the partially observable Markov decision process (POMDP) framework [14]. We denote a POMDP as a tuple M = (S, A, Z, P, O, r, γ), where S is the set of all possible environment states; A is the action repertoire of the agent; Z is the set of all possible agent observations; P(s s, a) indicates the probability that the state at time step t + 1, S t+1, is s, given that the state at time step t, S t, is s and the agent selected action A t = a. O(z s, a) indicates the probability that the observation of the agent at time step t + 1, Z t+1, is z, given that the state at time t + 1 is s and the agent selected action a at time t. r(s, a) represents the average reward that the agent expects to receive for performing action a in state s. 0 γ < 1 is some discount factor. A POMDP evolves as follows. At each time step t = 0, 1, 2, 3,..., the environment is in some state S t = s. The agent selects some action A t = a from its action repertoire, A, and the environment transitions to state S t+1 = s with probability P(s s, a). The agent receives a reward r(s, a) R and makes a new observation Z t+1 = z with probability O(z s, a), and the process repeats. 6 6 Typical RL scenarios assume that Z = S and O(z s, a) = δ(z, s), where δ denotes the Kronecker delta [42]. When this is the case, parameters Z and O can be safely discarded and

5 Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents 5 The objective of the agent can be formalized as that of gathering as much reward as possible throughout its lifespan, usually discounted by the constant γ. This corresponds to maximizing the value [ ] v = E γ t r(s t, A t ). (1) t The reward r(s, a) thus evaluates the immediate utility of making action a in state s, in light of the underlying task that the agent must learn. In order to maximize the value in (1), the agent must learn a mapping that, depending on its history of observations and actions, determines the next action that the agent should take. Such mapping, denoted as π, is known as a policy, and is typically learned through a process of trial and error. In this paper we focus on policies that depend on the agent s current observation. In other words, our agents follow policies π : Z A that map each observation z Z directly to an action π(z) A. If the state is fully observable, then Z = S and Z t = S t. In this case, there is a policy π : S A referred to as the optimal policy maximizing the value in (1). We can associate with π a function Q : S A R that verifies the recursive relation: Q (s, a) = r(s, a) + γ s S P(s s, a) max b A Q (s, b). (2) Q (s, a) represents the value of executing action a in state s and henceforth following the optimal policy. We can use the recursion in (2) to iteratively compute Q for all pairs (s, a) S A. Additionally, V (s) = max a A Q (s, a) represents the value obtained by an agent starting from state s and henceforth following π. From the above, it should be apparent that the goal of the RL agent can be restated as that of learning Q, since from the latter it is possible to derive the optimal policy. Since RL agents typically have no knowledge of either P or r, one possibility is to explore the environment i.e., select actions in some exploratory manner building estimates for P and r, and then using these estimates to successively approximate Q. After exploring its environment, the agent can then exploit its knowledge and select the actions that maximize (its estimate of) Q. Throughout the paper, our RL agents follow a simple variation of this approach known as prioritized sweeping [24]. 2.2 Partial Observability and IMRL As discussed above, our RL agents use prioritized sweeping to build estimates of P and r from which they then approximate Q. Both P and r can be the simplified model thus obtained, represented as a tuple M = (S, A, P, r, γ), is referred to as a Markov decision process (MDP).

6 6 Pedro Sequeira et al. estimated by maintaining running averages of the corresponding values. For example, ˆP(s s, a) = n t(s, a, s ) n t (s, a), models the probability of transition from state s to state s by means of action a as the ratio between the number of times that, by time step t, the agent experienced a transition from s to s after selecting action a n t (s, a, s ) and the number of times agent selected action a in state s n t (s, a). However, as already mentioned, most agents considered in this paper have partial observability i.e., they are unable to unambiguously determine the state of their environment and are only able to perceive some features of this state. This is similar to what occurs in nature: individuals are only able to perceive the environment in their immediate surroundings. Such limited perception necessarily impacts their decision-making process. For example, while the optimal course of action for a hungry predator is to approach its prey, this actually requires the predator to be able to figure out the position of the prey. Similarly, partial observability also impacts the ability of our RL agents to select optimal actions. In terms of their learning algorithm, our RL agents treat each observation Z t as the full state of the environment. They thus build a transition model ˆP(z z, a) and ˆr(z, a) that will generally provide inaccurate predictions. This model is then used to build a Q-function ˆQ : Z A R that the agent uses to guide its decision process. It is a well-established fact that, in scenarios with partial observability, observations alone are not sufficient for the agent to accurately track the underlying state of the system. Therefore, policies computed by treating observations as states can lead to arbitrarily poor performances [37]. Moreover, computing the best such policy is generally hard [19]. In fact, creating robust RL agents that can overcome perceptual limitations often involves significant modeling effort and expert knowledge [40]. The intrinsically motivated reinforcement learning (IMRL) framework [38, 39] proposes the use of richer reward functions that implicitly encode information to potentially overcome the agents perceptual limitations. And, in fact, this approach was shown useful both to facilitate reward design [25, 35] and to mitigate agent limitations [4, 40, 41]. In this framework, the performance of RL agents in the original task provides a measure of the fitness of those agents. Different agents, each with a different reward function accounting for multiple sources of information, are then compared in terms of their fitness, and the most fit agent is selected. This selection process allows to identify, for a given set of environments, which sources of information are most useful to maximize the fitness of RL agents in the task at hand, providing a natural framework for the study in this paper. Formally, IMRL extends traditional RL and provides a framework to address the optimal reward problem (ORP) [40], that we now describe. Let H t be a random variable representing the history of interaction of an agent with its environment up to time-step t, and let h t = {z 1, a 1, ρ 1,..., z t 1, a t 1, ρ t 1, z t }

7 Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents 7 denote a particular realization of H t. Such history corresponds to all information perceived by the agent directly from the environment: sequence {z τ, τ = 1,..., t} corresponds to observations about the environment state (according to the POMDP model described in Section 2); similarly, {a τ, τ = 1,..., t} corresponds to the sequence of actions performed by the agent; finally, {ρ τ, τ = 1,..., t} corresponds to an external evaluation signal that, at each time-step t, depends only on the underlying state S t of the environment and the action A t performed by the agent. This signal can be either environment feedback for example, when an agent receives a monetary prize for performing some action or physiological feedback for example, when an agent feels satisfied after feeding. Given a particular finite history h, we write p H (h r, e) to denote the probability of an RL agent 7 observing history h in environment e when its reward function is r. We evaluate the agent s performance by means of some real-valued fitness function f : H R, where H is the space of all possible (finite) histories. Then, given a space R of possible reward functions, a set E of possible environments, and a distribution p E (E) over the environments in E, the ORP seeks to determine the optimal reward function, denoted by r, maximizing the fitness over the set E according to r = argmax F(r), (3) r R where F(r) is the expected fitness of the RL agent using the reward function r, which is given by F(r) = h,e f(h)p H (h r, e)p E (e), (4) where each e and h is sampled according to p E (e) and p H (h r, e), respectively. Throughout this paper, we specifically consider the fitness associated with a given history h t is given by f(h t ) = t ρ t. (5) From the above, it should be apparent that the signal {ρ t, t = 1,...} actually corresponds to a (external) reward signal that determines the fitness of the agent. 8 We thus define the function r F : S A R as τ=1 r F (s, a) = E [ρ t S t = s, A t = a], (6) 7 Our RL agent all follow the prioritized sweeping algorithm and use the exploration policy detailed in Section 3. 8 We note that our choice in measuring the fitness as the cumulative external evaluation signal is only one among many other possible metrics. In the context of our study, we believe this to be a good metric as it allows us to directly measure the agent s fitness from its performance in the underlying task in the environment.

8 8 Pedro Sequeira et al. Perceptions? Percep. signal Processed signal Decision Making Actions Fig. 3 General architecture for an artificial agent. and henceforth refer to r F as the fitness-based reward function. The function r F can be seen as the sparsest representation of the task to be learned by the agent, as encoded by the signal {ρ t }. We consider throughout the paper that r F R. The interest of considering the ORP problem instead of simple RL agents driven only by the reward r F is that, in the presence of agents with limitations, the solution r to the ORP is often a better alternative than r F. In fact, the reward r obtained often leads to faster learning and induces behaviors that are more robust and efficient than those induced by r F [33, 40]. 3 Identification of Optimal Sources of Information Referring back to the roadmap in Fig. 2, we now address the baseline question driving our study: which information is (potentially) most useful to complement the perceptual capabilities of an autonomous learning agent? In other words, and referring back to the diagram of Fig. 1, we investigate possible alternatives to the appraisal derivation module that may most significantly impact agent performance (see Fig. 3). Importantly, as part of our approach, in this first set of experiments we are only interested in discovering useful sources of information, regardless of their relation with emotions or appraisal theories. To address this general question, we consider foraging scenarios where an IMRL agent acts as a predator in an environment such as those in Fig. 6. The perceptual limitations of the agent in the different environments pose challenges that directly impact its ability to capture its prey and, consequently, its fitness. In order to identify possible sources of useful information to complement the agent s perceptual limitations, we depart from a primitive population of agents, each endowed with a reward structure containing information about different aspects of the agent s past interactions with its environment. The fittest agents (i.e., those with greatest ability to capture preys) are used to successively improve the population. Upon convergence, we identify the set of agents able to attain the largest fitness. The analysis of the corresponding reward structure provides the required information about which signals are potentially most useful to complement the perceptual capabilities of our IMRL agents.

9 Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents Methodology In order to determine which reward functions and, consequently, which information best complements the agent s perceptions, we adopt the genetic programming (GP) approach proposed by Niekum et al. [25]. In that work, the authors used GP in the context of IMRL and the ORP as a possible approach to identify optimal rewards for RL agents. The procedure consisted in searching for reward functions represented by programs that combine different elements of the learning domain, such as the agent s position in the environment or its hunger status. In the context of our work, there are some appealing features in the use of GP. Recall from Section 2.2 that the ORP involves the definition of a space of reward functions R and an optimization procedure to search for the optimal reward function r. GP facilitates the definition of the space of rewards by alleviating the need to specify an explicit parameterization. Instead, we implicitly define the space of possible rewards by specifying a set of operators and terminal nodes, the latter corresponding to constants or variables. Moreover, the optimization mechanism is implicitly defined by a selection method and mathematical operators that combine the terminal nodes, constructing richer, more complex and potentially more informative signals as the evolutionary procedure progresses. Another appealing feature of GP over other search methods (such as gradient descent [41]), in the context of our study, is its close parallel with natural evolution. In the continuation, we provide a detailed description of the setup and procedure used in this first experiment Genetic Programming In general terms, GP aims to find a program that maximizes some measure of fitness [15]. Programs are represented as syntax trees, where nodes correspond to either operators or terminal nodes representing primitive quantities. In our case, we use as terminal nodes quantities that summarize aspects of the history of interaction of the agent with its environment. The GP approach allows for the discovery of interesting mathematical relations between such primitive quantities. Fig. 4 shows the basic elements and operations involved in the approach of using GP to represent and evolve reward functions within IMRL. Non-operator (terminal) nodes are selected from a set T of possible terminal nodes, and represent either numerical variables or constants. Fig. 4(a) shows an example of a GP tree with a single constant terminal node representing the reward function r = 2. Fig. 4(b) shows an example of a tree with a single variable terminal node representing a reward function r = n z that rewards visits to state z according to the number of times it was observed. Operators are selected from a set O of possible operators, and its arguments are represented as their descendants in the tree. GP iteratively explores possible solutions by maintaining a population of candidate programs, producing new generations of programs by means of selection, mutation and crossover. The crossover function randomly replaces some sub-tree (a node and all of its

10 10 Pedro Sequeira et al. n z n z n za n z (a) (b) (c) (d) r za n z n za (e) Fig. 4 Defining reward functions as genetic programs. Some examples: (a) a constant GP node; (b) a variable GP node; (c) a GP tree obtained by a crossover operation between the nodes in (a) and (b); (d) a GP tree obtained by a mutation operation made to the tree in (c); (e) possible evolved GP tree. Bold nodes and lines indicate changes in the tree induced by the several operations. See text for detailed explanation. descendants) of a parent program by another sub-tree from another parent upon reproduction. Fig. 4(c) shows an example of a GP tree that could be obtained through a crossover operation between the nodes in Figs. 4(a) and 4(b), where the multiplication operator node was introduced. The resulting tree represents the reward function r = 2n z. The mutation operator replaces some node by another selected randomly. For example, Fig. 4(d) depicts a possible GP tree obtained by a mutation operation made to the tree in Fig. 4(c), where the left node was replaced. The resulting tree represents the reward function r = n za n z. 9 In our experiments, we used for primitive quantities the set T = C V, with C corresponding to the set of constants, C = {0, 1, 2, 3, 5}, and V to the set of basic variables, V = {r za, n z, n za, v z, q za, d z, e za, p zaz }, where r za = ˆr F t (z, a) is the agent s estimate at time t of the fitness-based reward function for performing action a after observing z. This basic variable essentially informs the agent of its performance in respect to the external signal ρ provided by its environment/designer. 10 It is a function of z, a, and the agent s history up to time t, H t. n z = n t (z) is the number of times that z was observed up to time-step t. This signal informs the agent about the frequency of observations. When compared globally across observations it can be used by the agent e.g., to determine which states were observed more often or which may need further exploration. It is a function of z and the agent s history up to time t, H t. 9 More details on GP can be found in [15]. 10 Recall that r F (s, a) rewards the agent in accordance with the increase/decrease of fitness caused by executing each a in each state s.

11 Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents 11 n za = n t (z, a) is the number of times the agent executed action a after observing z up to time-step t. Similarly to n z, this signal informs the agent about how frequent some action was executed after observing some state. It is a function of z, a, and the agent s history up to time t, H t. v z = Vt F (z) is the value function associated with the reward function estimate ˆr t F. As we have seen in Section 2.1 this function indicates the expected value (relating fitness attainment) of having observed z and following the current policy being learned henceforth. This signal can be used to inform the agent about the fitness-based long-term utility associated with some observation. It is a function of z and the agent s history up to time t, H t. q za = Q F t (z, a) is the Q-function associated with the reward function estimate ˆr t F. Likewise v z, it can be used to indicate the long-term impact for the agent s fitness of executing some action given some observation. It is a function of z, a, and the agent s history up to time t, H t. d z = ˆd t (z) corresponds to an estimate of the number of actions needed to reach a goal after observing z. Goals correspond to those observations that maximize ˆr t F and therefore this variable denotes observations that are close/far away from experienced situations providing maximal immediate fitness. This signal can be used by the agent in its planning mechanism to pursue courses of action that will lead to greater degrees of fitness in the long-run. It is a function of z and the agent s history up to time t, H t. e za = E [ Q F t (z, a) ] is the expected Bellman error associated with Q F t at (z, a). Given an observed transition (z, a, r, z ), the Bellman error associated with Q F t is given by Q F t (z, a) = ˆr F t (z, a) + γ max b A QF t (z, b) Q F t (z, a). This signal essentially indicates the prediction error associated with some transition. If the agent receives a reward and observes a situation which value greatly differs from the previous value attributed by Q F (z, a) then this transition will denote a discrepancy between what was observed and the agent s previous model of the world. The agent can use this basic variable to e.g., identify situations changing very often or choose actions leading to more stable outcomes. It is a function of z, a, and the agent s history up to time t, H t. p zaz = ˆP t (z z, a) corresponds to the estimated probability of observing z when executing action a after observing z. Since the learning algorithm used by the agent averages the perceived reward function, p zaz is actually equivalent to ] E [ˆP(z z, a) = ˆP t (z z, a)p [Z t+1 = z Z t = z, A t = a]. z Z Similarly to e za, this signal can be used by the agent to identify the execution of actions leading to more (un)stable outcomes, i.e., the greater the number of transitions z observed so far after executing a in z, the smaller the value of p zaz, hence the more unreliable or erratic pair z, a will be. p zaz is a function of z, a, and the agent s history up to time t, H t.

12 12 Pedro Sequeira et al. Reward Function Population Reward Function Evaluation Crossover Mutation Selection Fig. 5 The GP approach to the ORP, as proposed in [25]. In each generation j, a population R j contains a set of candidate reward functions r k, k = 1,..., K. All are evaluated according to a fitness function F(r k) and evolve according to crossover, mutation and selection. The variables above include all elements stored and/or computed by the learning agent, and therefore summarize the agent s history of interaction with its environment. As for the operators used by the GP algorithm, we considered the set O = {+,,, /,, exp, log}. Throughout time, and according to the fitness obtained for each reward function, the GP procedure applies the aforementioned operations to evolve relations between the primitive variables and constants in set T and the mathematical operators in set O. For example, the GP tree depicted in Fig. 4(e) represents a more complex reward function expressed by the program 2r za (n z + n za ) that could be obtained after a few iterations of the GP algorithm. This example evolved function rewards the agent for fitness-inducing behaviors by means of the relation 2r za and punishes the agent as it becomes more and more familiarized with z and a, as given by (n z + n za ) Evolutionary Procedure Figure 5 outlines the optimization scheme for the ORP using GP, for a specific set of environments E. At each generation j, a reward function population R j of size K contains a set of candidate reward functions r k, k = 1,..., K. Each r k R j is evaluated according to the fitness function F(r k ). When all the reward functions have been evaluated, the evolutionary procedure takes place by applying the mutation and crossover operations defined earlier and applying selection over the population in order to produce the new generation of reward functions, corresponding to population R j+1. The process repeats for a number J of generations. 11 In our experiments, to run the evolutionary procedure we generate a total of 50 independent initial populations, each containing K = 100 elements, and run the evolutionary procedure for J = 50 generations for each population. For the selection method we use a steady-state procedure [43] that, in each generation j, maintains the 10 most fit elements the reward functions with highest fitness and generates 10 new random elements. The remaining 80 elements are generated either by mutating one element or through crossover. 11 The first generation, corresponding to the population R 1, is randomly generated.

13 Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents 13 This is done by pairing elements of the previous population according to a rank selection that chooses parents with a probability that is proportional to their fitness, i.e., reward functions with a greater fitness have a higher probability of being mutated or paired with another reward function. Recall that resolving the ORP implies the definition of a space of reward functions R and the determination of the optimal reward function r for a specific scenario. By space of reward functions we refer to the set of all reward functions that can (potentially) be generated by the GP algorithm. In particular, any possible combination of the primitive quantities in T and the operators in O that may be generated throughout time by the evolutionary procedure corresponds to a possible reward function and, as such, to an element of our so-called space of reward functions. The parameterization of R is therefore implicitly defined by the sets T and O. The evolved optimal reward function is determined according to (3) for all r k R j, j = 1,..., J, i.e., it corresponds to the reward function with highest fitness considering all generations of all the populations that were initialized. As an effect of mutation and crossover, reward functions might gain subexpressions that do not contribute to the overall fitness attained by the agent as time evolves. Because we are interested in identifying only the interesting sources of information from the optimal reward functions, in a post-hoc procedure r is parsed for sub-expressions that may have no effect on the computed fitness. This is done by first generating all possible sub-combinations of the tree representing r. For example, an optimal reward function defined by the program 2r za (n z + n za ), depicted in Fig. 4(e), would generate the following sub-expressions: 2, 2r za, 2r za n z, 2r za n za, 2 (n z + n za ), 2 n z, 2 n za, r za, r za (n z + n za ), r za n z, r za n za, n z, (n z + n za ) and n za. Each sub-expression is used to form a new reward function and its fitness is estimated. The simplified optimal reward function is then selected as the shortest sub-expression (in number of nodes) which fitness difference, in relation to the evolved optimal reward function, is not statistically significant. 12 Many simplifications involve operations with the constants 0 and 1 as they sometimes cancel or offer no effect of the associated nodes to the overall reward: e.g., expression 0r za 1(v z (exp(0)q za )) + log(1) would automatically simplify to q za v z. In general, depending on the results for each scenario, other sub-expressions may be removed from r Estimating the Reward Function Fitness It is a computationally demanding endeavor to explicitly compute F(r), since it involves computing the expectation of f over p H and p E, as seen in (4). As such, in order to estimate the value F(r), corresponding to the reward function evaluation stage in Fig. 5, we run N = 200 independent Monte- Carlo trials of 100, 000 time-steps each, where in each trial we simulate an RL agent driven by reward r in an environment selected randomly from the 12 We resorted to a simple unpaired t test to determine this statistical significance.

14 14 Pedro Sequeira et al. (a) (b) Fig. 6 Structure of the foraging environments used in the first set of experiments. The pairs (x : y) indicate the possible locations for the agent. corresponding set of environments E. 13 We then approximate F(r) as the mean fitness across all observed histories, i.e., F(r) 1 N N f(h i ), (7) i=1 where h i is the sampled history in the ith trial Scenarios We used a total of six scenarios (see Fig. 6), either from the IMRL literature or modifications thereof [35, 39, 40]. We refer to [35] for a more detailed description of each environment and associated challenges. Hungry-Thirsty scenario: The environment is depicted in Fig. 6(a). It contains two inexhaustible resources, corresponding to food and water. Resources can be positioned in any of the environment corners (positions (1 : 1), (5 : 1), (1 : 5), and (5 : 5)), leading to a total of 12 possible configurations of food and water. The agent s fitness is defined as the amount of food consumed. However, the agent can only consume food if it is not thirsty, a condition that the agent can achieve by consuming the water resource (drinking). At each time-step after drinking, the agent becomes thirsty again with a probability of 0.2. The agent observes its position and thirsty status. Lairs scenario: In this scenario, the layout of the environment corresponds again to Fig. 6(a). In it, the agent is a predator and there are two prey lairs positioned in different corners of the environment, resulting in 6 possible configurations. The fitness of the agent is defined as the number of preys captured. Whenever a lair is occupied by a prey, the agent can drive the prey out by 13 The set E is scenario-specific. For example, in the Hungry-Thirsty scenario, E includes all possible configurations of food and water.

15 Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents 15 means of a Pull action. The state of the lair transitions to prey outside, and the agent has exactly one time-step to capture the prey with a Capture action, before the prey runs away. In either case, the state of the lair transitions to empty. At every time-step there is a 0.1 probability that a prey will appear in an empty lair. In this scenario, A = {N, S, E, W, P, C}, where N, S, E and W move the agent in the corresponding direction, and P and C correspond to the Pull and Capture actions. The agent is able to observe its position in the environment and the state of both lairs. Moving-Preys scenario: The environment for this scenario is depicted in Fig. 6(b). In this scenario, the agent is a predator and, at any time-step, there is exactly one prey available, located in one of the end-of-corridor locations (positions (3 : 1), (3 : 3) or (3 : 5)). The agent s fitness is again defined as the number of preys captured. Whenever the agent captures a prey, the latter disappears from the current location and a new prey randomly appears in one of the two other possible prey locations. Persistence scenario: The environment again corresponds to the one in Fig. 6(b). In this scenario, the environment contains two types of prey always available. Hares are located in position (3 : 1) and contribute to the fitness of the agent with a value of 1. Rabbits are located in position (3 : 5) and contribute with a value of 0.01 to the agent s fitness. Whenever it captures a prey, the agent s position is reset to the initial position (position (3 : 3)). The environment also contains a fence, located in position (1 : 2), that prevents the agent from easily capturing hares. In order for the agent to cross over the fence toward the hare location at time t, it must persistently perform the action N for N t consecutive time-steps. 14 Every time the agent crosses the fence upwards, the fence is reinforced, requiring an increasing number of N actions to be crossed. 15 Seasons scenario: The environment again corresponds to the one in Fig. 6(b). In this scenario the environment contains two possible types of prey. Hares are appear in position (3 : 1) and contribute to the agent s fitness with a value of 1. Rabbits appear in position (3 : 5) and contribute to the fitness of the agent with a value of 0.1. As with the Persistence scenario, the agent s position is reset to (3 : 3) upon capturing any prey. However, unlike the Persistence scenario, in this scenario only one prey is available at each time-step, depending on the season, which changes every 5, 000 time steps. The initial season is randomly selected as either Hare Season or Rabbit Season with equal probability. Additionally, in the rabbit season, for every 10 rabbits that it captures, the agent is attacked by the rabbit farmer, which negatively impacts its fitness by a value of The fence is only an obstacle when the agent is moving upward from position (1 : 2). 15 Denoting by n t(fence) the number of times that the agent crossed the fence upwards up to time-step t, N t is given by N t = min{n t(fence) + 1; 30}.

16 16 Pedro Sequeira et al. Poisoned prey scenario: This scenario is a variation of the the Seasons scenario. The scenario layout and prey positions are the same, but both rabbits and hares are always available to the agent. Rabbits contribute to the fitness of the agent with a value of 0.1. Hares, when healthy contribute positively to the agent s fitness by an amount of 1. When poisoned they contribute negatively to the fitness of the agent with a value if 1. As in the Seasons scenario, the health status of hares changes every 5, 000 steps Agent Description In all scenarios, the agent is modeled as a POMDP whose state dynamics follow from the descriptions above. In all but the Lairs scenario, the agent has 4 actions available, A = {N, S, E, W } that deterministically move it in the corresponding direction; preys are captured automatically whenever colocated with the agent. In all but the Hungry-Thirsty and Lairs scenarios, the agent is only able to observe its current (x : y) position, and whether it is collocated with a prey. All scenarios use prioritized sweeping RL agents [24] to learn a policy that treats observations as states (see Section 2). In our experiments, prioritized sweeping updates the Q-value of up to 10 state-action pairs in each iteration, using a learning rate of α = 0.3. During its life-time, the agent uses an ε- greedy exploration strategy with a decaying exploration parameter ε t = λ t, where λ = In all experiments, we consider a discount γ = Results The results of the GP experiment are summarized in Table 1. We present the average fitness estimated according to (7) and the expression, simplified using the procedure described in Section 3.1.2, obtained by the agent using the evolved optimal reward function r selected using GP in each of the test scenarios. As a straightforward baseline for comparison, we also present the fitness obtained by an agent driven by the fitness-based reward function r F = r za. We note that the agents compared are similar in all aspects except the reward function. In particular, the dimension of the transition function and Q-function learned are the same. One first observation is that, in all scenarios, the evolved reward function clearly outperforms the fitness-based reward function. Our results are in accordance with findings in previous works on the advantages of allowing additional sources of information to guide the agent decision-making [4, 33, 39, 40]. Our results also confirm previous findings on the usefulness of an evolutionary approach to search for optimal reward functions [25]. There is, however, one key difference between our approach and that in [25]: we provide the evolutionary approach with domain-independent sources of information relating to the agent s history of interaction with the environment. We expect that

17 Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents 17 Table 1 Mean fitness and evolved optimal reward function r for each scenario. For ease of analysis, we recall the set of basic variables, V = {r za, n z, n za, v z, q za, d z, e za, p zaz }. For each scenario, we also include the performance of the fitness-based reward function r F. The results correspond to averages over 200 independent Monte-Carlo trials. Scenario Reward function Mean Fitness Hungry-Thirsty Lairs Moving-Preys Persistence Seasons Poisoned prey r = q za v z 2 10, ± 6, r F = r za 7, ± 6, r = q za v z 8, ± 1, r F = r za 7, ± r = n 2 z 2, ± 45.4 r F = r za ± 18.0 r = q za v z 1, ± 11.6 r F = r za ± 1.5 r = r za + q za p zaz 6, ± r F = r za 4, ± 1, r = 5r za q za 5, ± r F = r za 1, ± 4.1 the reward functions thus evolved can be applied in domains other than those used in our experiments and described in Section 3.1. By analyzing the several simplified expressions that emerged from the evolutionary procedure in Table 1 we observe the presence of a particular subexpression, that given by q za v z. Aside from the fact that 3 out of the 6 rewards can be reconstructed directly from this quantity, it is a well-known quantity in the RL literature (known as the advantage function [2]). It proved to be crucial in scenarios having a great diversity of environment configurations, such as the hungry-thirsty and lairs scenario, and also in the persistence scenario. In this scenario, it was important for the agent to ignore sub-optimal decisions when facing the obstacle in the environment, i.e., where choosing actions other than N (the one with higher advantage) was prejudicial in terms of the future gains provided by capturing the hare. The result of the Moving- Preys scenario is given by the expression n 2 z and is quite obvious as it was important for the agent to explore the environment by choosing states with a low number of visits in order to capture the moving preys. In the seasons scenario, the resulting expression gives importance both to the fitness-based reward by means of r za + q za and also to state-action pairs that provide low probability transitions as indicated by p zaz. Such sub-expression proved useful for the agent to continue to go to the hares even when the seasons changed, thus avoiding the negative penalties from the rabbits. In the poisoned prey scenario, a greater importance was given to the fitness-based reward by means of the sub-expression 5r za and the value provided by q za ensured that the agent kept capturing hares, eventually gaining advantage in the Healthy Season.

18 18 Pedro Sequeira et al. 3.3 Discussion We recall that the goal of our first experiment was to identify possible sources of information that could improve the agent s performance if taken into consideration in the process of decision-making. 16 Given the simplification process used to remove unnecessary sub-expressions from the optimal reward functions evolved through GP, each sub-expression indicated in Table 1 can be interpreted as possible signal that can drive the agent s decision process, allowing it to maximize its fitness. Discarding additive and multiplicative constants, we can distill from Table 1 a set of five signals, Φ = {φ fit, φ adv, φ rel, φ prd, φ frq }, given by φ fit = r za corresponds to the agent s estimate of the fitness-based reward function. It evaluates the immediate impact on fitness associated with performing action a after observing z. φ rel = q za corresponds to the estimated Q-function associated with r za. This function assesses the value of executing action a after observing z in terms of long-term impact on fitness, corresponding to the long-run counterpart to φ fit. φ adv = q za v z corresponds to the estimated advantage function associated with r za [2]. This function evaluates how good action a is in state s relatively to the best action (its advantage). While φ rel evaluates the absolute value of actions, φ adv evaluates their relative value. φ prd = p zaz corresponds to the agent s estimate of the transition probabilities. As discussed in Section 3.1, it provides a measure of how predictable the observation at time t + 1 is given that the agent performed action a after observing z. Finally, φ frq = n 2 z provides a (negative) measure of how novel z is given the agent s observations. The signals φ k defined above correspond to the minimal set of sub-expressions from which we can form all the optimal reward functions for each scenario by combining them with the constants in C and using the different operators in O. 17 As noted earlier, the expression of the advantage automatically emerged as a natural candidate for our optimal sources information due to representing the whole optimal reward function (discarding additive constants) in 3 of the 6 tested scenarios. The expression for novelty also emerged as a natural candidate. As for the remaining signals we opted by breaking down the two optimal reward functions r za + q za p zaz and 5r za q za into their smallest terms, thus ensuring that a wide range of rewards can be reconstructed. It was par- 16 We again emphasize that our identification procedure is not guided by appraisal theories, the objective for now is precisely to identify useful signals, regardless of their connection with emotional appraisal. 17 In our distillation process we are focused on extracting a minimal set domainindependent informative signals. As will become clearer in the next section, apart from additive constants (which have minimum impact on the policy and can therefore be safely discarded), it will be possible to reconstruct the reward functions (and attain comparable degrees of fitness) in Table 1 as a linear combination of these signals.

19 Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents 19 φ adv Perceptions φ rel φ prd φ frq φ fit? Reward signal Decision Making Actions Percep. signal Fig. 7 Architecture for an agent using the identified sources of information. ticularly of interest to consider r za as an independent signal given that, unlike the other basic variables, it is not learned and does not depend on the agent s experience, corresponding to an external evaluative signal. However, we note that this partitioning option is by no means unique, e.g., the reward features r za q za and r za + q za p zaz are also a possibilities that could be used while still assuring a minimal set of information signals. Each of the emerged signals is a function mapping observation-action-history triplets to a real-value, and will henceforth be used as a source of information guiding the decision process of the agent. The updated agent architecture is depicted in Fig. 7. Two observations are in order. First of all, in obtaining these signals, we considered a specific class of agents: our agents run prioritized-sweeping with ε-greedy exploration. Had a different learning, exploratory strategy or algorithm parameterization been used, it is possible that variations of the identified features could be observed. However, we would not expect these variations to dramatically change the sort of information provided by such features that is required by the agent to solve the intended tasks, especially in scenarios where the agent has partial observability. 18 Secondly, we would expect GP to yield different (and eventually more complex) signals, had we considered more elaborate domains. However, one interesting aspect of our results arises precisely from the fact that the features used throughout the paper were evolved in such simple scenarios. In spite of their simplicity, and as will soon become apparent, they yield significant improvements in performance in significantly more complex settings (that even include other agents). This, in our view, is indicative that, even though simple, they are extremely informative. 4 Validation of Identified Sources Section 3 focused on identifying general-purpose sources of information that can guide the decision process of an IMRL agent and impact positively its performance. These different sources of information emerged from the interaction of agents with several different environments and, as such, should be applicable in scenarios other than those of Section 3. This section investigates whether this is indeed so, i.e., whether the sources of information identified in 18 Further experimental verification would be required to back up this claim, however this is not within the scope of this paper.

Learning to Use Episodic Memory

Learning to Use Episodic Memory Learning to Use Episodic Memory Nicholas A. Gorski (ngorski@umich.edu) John E. Laird (laird@umich.edu) Computer Science & Engineering, University of Michigan 2260 Hayward St., Ann Arbor, MI 48109 USA Abstract

More information

Time Experiencing by Robotic Agents

Time Experiencing by Robotic Agents Time Experiencing by Robotic Agents Michail Maniadakis 1 and Marc Wittmann 2 and Panos Trahanias 1 1- Foundation for Research and Technology - Hellas, ICS, Greece 2- Institute for Frontier Areas of Psychology

More information

Learning to Identify Irrelevant State Variables

Learning to Identify Irrelevant State Variables Learning to Identify Irrelevant State Variables Nicholas K. Jong Department of Computer Sciences University of Texas at Austin Austin, Texas 78712 nkj@cs.utexas.edu Peter Stone Department of Computer Sciences

More information

Evolutionary Programming

Evolutionary Programming Evolutionary Programming Searching Problem Spaces William Power April 24, 2016 1 Evolutionary Programming Can we solve problems by mi:micing the evolutionary process? Evolutionary programming is a methodology

More information

Towards Learning to Ignore Irrelevant State Variables

Towards Learning to Ignore Irrelevant State Variables Towards Learning to Ignore Irrelevant State Variables Nicholas K. Jong and Peter Stone Department of Computer Sciences University of Texas at Austin Austin, Texas 78712 {nkj,pstone}@cs.utexas.edu Abstract

More information

Dynamic Control Models as State Abstractions

Dynamic Control Models as State Abstractions University of Massachusetts Amherst From the SelectedWorks of Roderic Grupen 998 Dynamic Control Models as State Abstractions Jefferson A. Coelho Roderic Grupen, University of Massachusetts - Amherst Available

More information

Sequential Decision Making

Sequential Decision Making Sequential Decision Making Sequential decisions Many (most) real world problems cannot be solved with a single action. Need a longer horizon Ex: Sequential decision problems We start at START and want

More information

Some Thoughts on the Principle of Revealed Preference 1

Some Thoughts on the Principle of Revealed Preference 1 Some Thoughts on the Principle of Revealed Preference 1 Ariel Rubinstein School of Economics, Tel Aviv University and Department of Economics, New York University and Yuval Salant Graduate School of Business,

More information

Associative Metric for Learning in Factored MDPs based on Classical Conditioning

Associative Metric for Learning in Factored MDPs based on Classical Conditioning GAIPS tech. rep. series, n. GAIPS-TR-2-2, June 22 INTELLIGENT AGENTS AND SYNTHETIC CHARACTERS GROUP Associative Metric for Learning in Factored MDPs based on Classical Conditioning Pedro Sequeira, Francisco

More information

Reinforcement Learning of Hierarchical Skills on the Sony Aibo robot

Reinforcement Learning of Hierarchical Skills on the Sony Aibo robot Reinforcement Learning of Hierarchical Skills on the Sony Aibo robot Vishal Soni and Satinder Singh Computer Science and Engineering University of Michigan, Ann Arbor {soniv, baveja}@umich.edu Abstract

More information

Lecture 13: Finding optimal treatment policies

Lecture 13: Finding optimal treatment policies MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 13: Finding optimal treatment policies Prof. David Sontag MIT EECS, CSAIL, IMES (Thanks to Peter Bodik for slides on reinforcement learning) Outline

More information

Exploration and Exploitation in Reinforcement Learning

Exploration and Exploitation in Reinforcement Learning Exploration and Exploitation in Reinforcement Learning Melanie Coggan Research supervised by Prof. Doina Precup CRA-W DMP Project at McGill University (2004) 1/18 Introduction A common problem in reinforcement

More information

Analyzing Human Negotiation using Automated Cognitive Behavior Analysis: The Effect of Personality. Pedro Sequeira & Stacy Marsella

Analyzing Human Negotiation using Automated Cognitive Behavior Analysis: The Effect of Personality. Pedro Sequeira & Stacy Marsella Analyzing Human Negotiation using Automated Cognitive Behavior Analysis: The Effect of Personality Pedro Sequeira & Stacy Marsella Outline Introduction Methodology Results Summary & Conclusions Outline

More information

A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models

A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models Jonathan Y. Ito and David V. Pynadath and Stacy C. Marsella Information Sciences Institute, University of Southern California

More information

Learning Classifier Systems (LCS/XCSF)

Learning Classifier Systems (LCS/XCSF) Context-Dependent Predictions and Cognitive Arm Control with XCSF Learning Classifier Systems (LCS/XCSF) Laurentius Florentin Gruber Seminar aus Künstlicher Intelligenz WS 2015/16 Professor Johannes Fürnkranz

More information

Models of Information Retrieval

Models of Information Retrieval Models of Information Retrieval Introduction By information behaviour is meant those activities a person may engage in when identifying their own needs for information, searching for such information in

More information

Intelligent Machines That Act Rationally. Hang Li Toutiao AI Lab

Intelligent Machines That Act Rationally. Hang Li Toutiao AI Lab Intelligent Machines That Act Rationally Hang Li Toutiao AI Lab Four Definitions of Artificial Intelligence Building intelligent machines (i.e., intelligent computers) Thinking humanly Acting humanly Thinking

More information

Symbolic CTL Model Checking

Symbolic CTL Model Checking Symbolic CTL Model Checking Crystal Chang Din Precise Modeling and Analysis group (PMA), University of Oslo INF9140 - Specification and Verification of Parallel Systems Crystal Chang Din @ UiO Symbolic

More information

Recognizing Ambiguity

Recognizing Ambiguity Recognizing Ambiguity How Lack of Information Scares Us Mark Clements Columbia University I. Abstract In this paper, I will examine two different approaches to an experimental decision problem posed by

More information

Using Heuristic Models to Understand Human and Optimal Decision-Making on Bandit Problems

Using Heuristic Models to Understand Human and Optimal Decision-Making on Bandit Problems Using Heuristic Models to Understand Human and Optimal Decision-Making on andit Problems Michael D. Lee (mdlee@uci.edu) Shunan Zhang (szhang@uci.edu) Miles Munro (mmunro@uci.edu) Mark Steyvers (msteyver@uci.edu)

More information

Optimal Flow Experience in Web Navigation

Optimal Flow Experience in Web Navigation Optimal Flow Experience in Web Navigation Hsiang Chen, Rolf T. Wigand and Michael Nilan School of Information Studies, Syracuse University Syracuse, NY 13244 Email: [ hchen04, rwigand, mnilan]@mailbox.syr.edu

More information

Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump?

Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump? The Economic and Social Review, Vol. 38, No. 2, Summer/Autumn, 2007, pp. 259 274 Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump? DAVID MADDEN University College Dublin Abstract:

More information

Learning and Adaptive Behavior, Part II

Learning and Adaptive Behavior, Part II Learning and Adaptive Behavior, Part II April 12, 2007 The man who sets out to carry a cat by its tail learns something that will always be useful and which will never grow dim or doubtful. -- Mark Twain

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

FEELING AS INTRINSIC REWARD

FEELING AS INTRINSIC REWARD FEELING AS INTRINSIC REWARD Bob Marinier John Laird 27 th Soar Workshop May 24, 2007 OVERVIEW What are feelings good for? Intuitively, feelings should serve as a reward signal that can be used by reinforcement

More information

PO Box 19015, Arlington, TX {ramirez, 5323 Harry Hines Boulevard, Dallas, TX

PO Box 19015, Arlington, TX {ramirez, 5323 Harry Hines Boulevard, Dallas, TX From: Proceedings of the Eleventh International FLAIRS Conference. Copyright 1998, AAAI (www.aaai.org). All rights reserved. A Sequence Building Approach to Pattern Discovery in Medical Data Jorge C. G.

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

I. INTRODUCTION /$ IEEE 70 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 2, NO. 2, JUNE 2010

I. INTRODUCTION /$ IEEE 70 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 2, NO. 2, JUNE 2010 70 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 2, NO. 2, JUNE 2010 Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective Satinder Singh, Richard L. Lewis, Andrew G. Barto,

More information

Choose an approach for your research problem

Choose an approach for your research problem Choose an approach for your research problem This course is about doing empirical research with experiments, so your general approach to research has already been chosen by your professor. It s important

More information

Commentary on The Erotetic Theory of Attention by Philipp Koralus. Sebastian Watzl

Commentary on The Erotetic Theory of Attention by Philipp Koralus. Sebastian Watzl Commentary on The Erotetic Theory of Attention by Philipp Koralus A. Introduction Sebastian Watzl The study of visual search is one of the experimental paradigms for the study of attention. Visual search

More information

UNIVERSITY OF CALIFORNIA SANTA CRUZ A STOCHASTIC DYNAMIC MODEL OF THE BEHAVIORAL ECOLOGY OF SOCIAL PLAY

UNIVERSITY OF CALIFORNIA SANTA CRUZ A STOCHASTIC DYNAMIC MODEL OF THE BEHAVIORAL ECOLOGY OF SOCIAL PLAY . UNIVERSITY OF CALIFORNIA SANTA CRUZ A STOCHASTIC DYNAMIC MODEL OF THE BEHAVIORAL ECOLOGY OF SOCIAL PLAY A dissertation submitted in partial satisfaction of the requirements for the degree of BACHELOR

More information

Dynamics of Color Category Formation and Boundaries

Dynamics of Color Category Formation and Boundaries Dynamics of Color Category Formation and Boundaries Stephanie Huette* Department of Psychology, University of Memphis, Memphis, TN Definition Dynamics of color boundaries is broadly the area that characterizes

More information

Lesson 6 Learning II Anders Lyhne Christensen, D6.05, INTRODUCTION TO AUTONOMOUS MOBILE ROBOTS

Lesson 6 Learning II Anders Lyhne Christensen, D6.05, INTRODUCTION TO AUTONOMOUS MOBILE ROBOTS Lesson 6 Learning II Anders Lyhne Christensen, D6.05, anders.christensen@iscte.pt INTRODUCTION TO AUTONOMOUS MOBILE ROBOTS First: Quick Background in Neural Nets Some of earliest work in neural networks

More information

On the diversity principle and local falsifiability

On the diversity principle and local falsifiability On the diversity principle and local falsifiability Uriel Feige October 22, 2012 1 Introduction This manuscript concerns the methodology of evaluating one particular aspect of TCS (theoretical computer

More information

Bayesian Reinforcement Learning

Bayesian Reinforcement Learning Bayesian Reinforcement Learning Rowan McAllister and Karolina Dziugaite MLG RCC 21 March 2013 Rowan McAllister and Karolina Dziugaite (MLG RCC) Bayesian Reinforcement Learning 21 March 2013 1 / 34 Outline

More information

Hoare Logic and Model Checking. LTL and CTL: a perspective. Learning outcomes. Model Checking Lecture 12: Loose ends

Hoare Logic and Model Checking. LTL and CTL: a perspective. Learning outcomes. Model Checking Lecture 12: Loose ends Learning outcomes Hoare Logic and Model Checking Model Checking Lecture 12: Loose ends Dominic Mulligan Based on previous slides by Alan Mycroft and Mike Gordon Programming, Logic, and Semantics Group

More information

Emotions of Living Creatures

Emotions of Living Creatures Robot Emotions Emotions of Living Creatures motivation system for complex organisms determine the behavioral reaction to environmental (often social) and internal events of major significance for the needs

More information

IN THE iterated prisoner s dilemma (IPD), two players

IN THE iterated prisoner s dilemma (IPD), two players IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 11, NO. 6, DECEMBER 2007 689 Multiple Choices and Reputation in Multiagent Interactions Siang Yew Chong, Member, IEEE, and Xin Yao, Fellow, IEEE Abstract

More information

AUTONOMOUS robots need to be able to adapt to

AUTONOMOUS robots need to be able to adapt to 1 Discovering Latent States for Model Learning: Applying Sensorimotor Contingencies Theory and Predictive Processing to Model Context Nikolas J. Hemion arxiv:1608.00359v1 [cs.ro] 1 Aug 2016 Abstract Autonomous

More information

EXECUTIVE SUMMARY 9. Executive Summary

EXECUTIVE SUMMARY 9. Executive Summary EXECUTIVE SUMMARY 9 Executive Summary Education affects people s lives in ways that go far beyond what can be measured by labour market earnings and economic growth. Important as they are, these social

More information

Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes

Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes Using Eligibility Traces to Find the est Memoryless Policy in Partially Observable Markov Decision Processes John Loch Department of Computer Science University of Colorado oulder, CO 80309-0430 loch@cs.colorado.edu

More information

Reinforcement learning and the brain: the problems we face all day. Reinforcement Learning in the brain

Reinforcement learning and the brain: the problems we face all day. Reinforcement Learning in the brain Reinforcement learning and the brain: the problems we face all day Reinforcement Learning in the brain Reading: Y Niv, Reinforcement learning in the brain, 2009. Decision making at all levels Reinforcement

More information

The Evolution of Cooperation: The Genetic Algorithm Applied to Three Normal- Form Games

The Evolution of Cooperation: The Genetic Algorithm Applied to Three Normal- Form Games The Evolution of Cooperation: The Genetic Algorithm Applied to Three Normal- Form Games Scott Cederberg P.O. Box 595 Stanford, CA 949 (65) 497-7776 (cederber@stanford.edu) Abstract The genetic algorithm

More information

The benefits of surprise in dynamic environments: from theory to practice

The benefits of surprise in dynamic environments: from theory to practice The benefits of surprise in dynamic environments: from theory to practice Emiliano Lorini 1 and Michele Piunti 1,2 1 ISTC - CNR, Rome, Italy 2 Università degli studi di Bologna - DEIS, Bologna, Italy {emiliano.lorini,michele.piunti}@istc.cnr.it

More information

ERA: Architectures for Inference

ERA: Architectures for Inference ERA: Architectures for Inference Dan Hammerstrom Electrical And Computer Engineering 7/28/09 1 Intelligent Computing In spite of the transistor bounty of Moore s law, there is a large class of problems

More information

CS343: Artificial Intelligence

CS343: Artificial Intelligence CS343: Artificial Intelligence Introduction: Part 2 Prof. Scott Niekum University of Texas at Austin [Based on slides created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All materials

More information

Empirical Formula for Creating Error Bars for the Method of Paired Comparison

Empirical Formula for Creating Error Bars for the Method of Paired Comparison Empirical Formula for Creating Error Bars for the Method of Paired Comparison Ethan D. Montag Rochester Institute of Technology Munsell Color Science Laboratory Chester F. Carlson Center for Imaging Science

More information

Writing Reaction Papers Using the QuALMRI Framework

Writing Reaction Papers Using the QuALMRI Framework Writing Reaction Papers Using the QuALMRI Framework Modified from Organizing Scientific Thinking Using the QuALMRI Framework Written by Kevin Ochsner and modified by others. Based on a scheme devised by

More information

Irrationality in Game Theory

Irrationality in Game Theory Irrationality in Game Theory Yamin Htun Dec 9, 2005 Abstract The concepts in game theory have been evolving in such a way that existing theories are recasted to apply to problems that previously appeared

More information

Chapter 02 Developing and Evaluating Theories of Behavior

Chapter 02 Developing and Evaluating Theories of Behavior Chapter 02 Developing and Evaluating Theories of Behavior Multiple Choice Questions 1. A theory is a(n): A. plausible or scientifically acceptable, well-substantiated explanation of some aspect of the

More information

References. Christos A. Ioannou 2/37

References. Christos A. Ioannou 2/37 Prospect Theory References Tversky, A., and D. Kahneman: Judgement under Uncertainty: Heuristics and Biases, Science, 185 (1974), 1124-1131. Tversky, A., and D. Kahneman: Prospect Theory: An Analysis of

More information

RAPID: A Belief Convergence Strategy for Collaborating with Inconsistent Agents

RAPID: A Belief Convergence Strategy for Collaborating with Inconsistent Agents RAPID: A Belief Convergence Strategy for Collaborating with Inconsistent Agents Trevor Sarratt and Arnav Jhala University of California Santa Cruz {tsarratt, jhala}@soe.ucsc.edu Abstract Maintaining an

More information

The Role of Implicit Motives in Strategic Decision-Making: Computational Models of Motivated Learning and the Evolution of Motivated Agents

The Role of Implicit Motives in Strategic Decision-Making: Computational Models of Motivated Learning and the Evolution of Motivated Agents Games 2015, 6, 604-636; doi:10.3390/g6040604 Article OPEN ACCESS games ISSN 2073-4336 www.mdpi.com/journal/games The Role of Implicit Motives in Strategic Decision-Making: Computational Models of Motivated

More information

Revised Cochrane risk of bias tool for randomized trials (RoB 2.0) Additional considerations for cross-over trials

Revised Cochrane risk of bias tool for randomized trials (RoB 2.0) Additional considerations for cross-over trials Revised Cochrane risk of bias tool for randomized trials (RoB 2.0) Additional considerations for cross-over trials Edited by Julian PT Higgins on behalf of the RoB 2.0 working group on cross-over trials

More information

Study on perceptually-based fitting line-segments

Study on perceptually-based fitting line-segments Regeo. Geometric Reconstruction Group www.regeo.uji.es Technical Reports. Ref. 08/2014 Study on perceptually-based fitting line-segments Raquel Plumed, Pedro Company, Peter A.C. Varley Department of Mechanical

More information

Real-time computational attention model for dynamic scenes analysis

Real-time computational attention model for dynamic scenes analysis Computer Science Image and Interaction Laboratory Real-time computational attention model for dynamic scenes analysis Matthieu Perreira Da Silva Vincent Courboulay 19/04/2012 Photonics Europe 2012 Symposium,

More information

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Timothy N. Rubin (trubin@uci.edu) Michael D. Lee (mdlee@uci.edu) Charles F. Chubb (cchubb@uci.edu) Department of Cognitive

More information

Motivation represents the reasons for people's actions, desires, and needs. Typically, this unit is described as a goal

Motivation represents the reasons for people's actions, desires, and needs. Typically, this unit is described as a goal Motivation What is motivation? Motivation represents the reasons for people's actions, desires, and needs. Reasons here implies some sort of desired end state Typically, this unit is described as a goal

More information

Assurance Cases for Model-based Development of Medical Devices. Anaheed Ayoub, BaekGyu Kim, Insup Lee, Oleg Sokolsky. Outline

Assurance Cases for Model-based Development of Medical Devices. Anaheed Ayoub, BaekGyu Kim, Insup Lee, Oleg Sokolsky. Outline Assurance Cases for Model-based Development of Medical Devices Anaheed Ayoub, BaekGyu Kim, Insup Lee, Oleg Sokolsky Outline Introduction State of the art in regulatory activities Evidence-based certification

More information

Competition Between Objective and Novelty Search on a Deceptive Task

Competition Between Objective and Novelty Search on a Deceptive Task Competition Between Objective and Novelty Search on a Deceptive Task Billy Evers and Michael Rubayo Abstract It has been proposed, and is now widely accepted within use of genetic algorithms that a directly

More information

Causation, the structural engineer, and the expert witness

Causation, the structural engineer, and the expert witness Causation, the structural engineer, and the expert witness This article discusses how expert witness services can be improved in construction disputes where the determination of the cause of structural

More information

Applying Appraisal Theories to Goal Directed Autonomy

Applying Appraisal Theories to Goal Directed Autonomy Applying Appraisal Theories to Goal Directed Autonomy Robert P. Marinier III, Michael van Lent, Randolph M. Jones Soar Technology, Inc. 3600 Green Court, Suite 600, Ann Arbor, MI 48105 {bob.marinier,vanlent,rjones}@soartech.com

More information

Further Properties of the Priority Rule

Further Properties of the Priority Rule Further Properties of the Priority Rule Michael Strevens Draft of July 2003 Abstract In Strevens (2003), I showed that science s priority system for distributing credit promotes an allocation of labor

More information

Intelligent Machines That Act Rationally. Hang Li Bytedance AI Lab

Intelligent Machines That Act Rationally. Hang Li Bytedance AI Lab Intelligent Machines That Act Rationally Hang Li Bytedance AI Lab Four Definitions of Artificial Intelligence Building intelligent machines (i.e., intelligent computers) Thinking humanly Acting humanly

More information

Decision Analysis. John M. Inadomi. Decision trees. Background. Key points Decision analysis is used to compare competing

Decision Analysis. John M. Inadomi. Decision trees. Background. Key points Decision analysis is used to compare competing 5 Decision Analysis John M. Inadomi Key points Decision analysis is used to compare competing strategies of management under conditions of uncertainty. Various methods may be employed to construct a decision

More information

Representing Problems (and Plans) Using Imagery

Representing Problems (and Plans) Using Imagery Representing Problems (and Plans) Using Imagery Samuel Wintermute University of Michigan 2260 Hayward St. Ann Arbor, MI 48109-2121 swinterm@umich.edu Abstract In many spatial problems, it can be difficult

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary Statistics and Results This file contains supplementary statistical information and a discussion of the interpretation of the belief effect on the basis of additional data. We also present

More information

Self-Corrective Autonomous Systems using Optimization Processes for Detection and Correction of Unexpected Error Conditions

Self-Corrective Autonomous Systems using Optimization Processes for Detection and Correction of Unexpected Error Conditions International Journal of Robotics and Automation (IJRA) Vol. 5, No. 4, December 2016, pp. 262~276 ISSN: 2089-4856 262 Self-Corrective Autonomous Systems using Optimization Processes for Detection and Correction

More information

LECTURE 5: REACTIVE AND HYBRID ARCHITECTURES

LECTURE 5: REACTIVE AND HYBRID ARCHITECTURES Reactive Architectures LECTURE 5: REACTIVE AND HYBRID ARCHITECTURES An Introduction to MultiAgent Systems http://www.csc.liv.ac.uk/~mjw/pubs/imas There are many unsolved (some would say insoluble) problems

More information

Scientific Minimalism and the Division of Moral Labor in Regulating Dual-Use Research Steven Dykstra

Scientific Minimalism and the Division of Moral Labor in Regulating Dual-Use Research Steven Dykstra 33 Scientific Minimalism and the Division of Moral Labor in Regulating Dual-Use Research Steven Dykstra Abstract: In this paper I examine the merits of a division of moral labor regulatory system for dual-use

More information

Reinforcement Learning in Steady-State Cellular Genetic Algorithms

Reinforcement Learning in Steady-State Cellular Genetic Algorithms Reinforcement Learning in Steady-State Cellular Genetic Algorithms Cin-Young Lee and Erik K. Antonsson Abstract A novel cellular genetic algorithm is developed to address the issues of good mate selection.

More information

LEARNING. Learning. Type of Learning Experiences Related Factors

LEARNING. Learning. Type of Learning Experiences Related Factors LEARNING DEFINITION: Learning can be defined as any relatively permanent change in behavior or modification in behavior or behavior potentials that occur as a result of practice or experience. According

More information

REPORT ON EMOTIONAL INTELLIGENCE QUESTIONNAIRE: GENERAL

REPORT ON EMOTIONAL INTELLIGENCE QUESTIONNAIRE: GENERAL REPORT ON EMOTIONAL INTELLIGENCE QUESTIONNAIRE: GENERAL Name: Email: Date: Sample Person sample@email.com IMPORTANT NOTE The descriptions of emotional intelligence the report contains are not absolute

More information

Reinforcement Learning : Theory and Practice - Programming Assignment 1

Reinforcement Learning : Theory and Practice - Programming Assignment 1 Reinforcement Learning : Theory and Practice - Programming Assignment 1 August 2016 Background It is well known in Game Theory that the game of Rock, Paper, Scissors has one and only one Nash Equilibrium.

More information

Remarks on Bayesian Control Charts

Remarks on Bayesian Control Charts Remarks on Bayesian Control Charts Amir Ahmadi-Javid * and Mohsen Ebadi Department of Industrial Engineering, Amirkabir University of Technology, Tehran, Iran * Corresponding author; email address: ahmadi_javid@aut.ac.ir

More information

A Monogenous MBO Approach to Satisfiability

A Monogenous MBO Approach to Satisfiability A Monogenous MBO Approach to Satisfiability Hussein A. Abbass University of New South Wales, School of Computer Science, UC, ADFA Campus, Northcott Drive, Canberra ACT, 2600, Australia, h.abbass@adfa.edu.au

More information

A Computational Framework for Concept Formation for a Situated Design Agent

A Computational Framework for Concept Formation for a Situated Design Agent A Computational Framework for Concept Formation for a Situated Design Agent John S Gero Key Centre of Design Computing and Cognition University of Sydney NSW 2006 Australia john@arch.usyd.edu.au and Haruyuki

More information

Finding Information Sources by Model Sharing in Open Multi-Agent Systems 1

Finding Information Sources by Model Sharing in Open Multi-Agent Systems 1 Finding Information Sources by Model Sharing in Open Multi-Agent Systems Jisun Park, K. Suzanne Barber The Laboratory for Intelligent Processes and Systems The University of Texas at Austin 20 E. 24 th

More information

COMP329 Robotics and Autonomous Systems Lecture 15: Agents and Intentions. Dr Terry R. Payne Department of Computer Science

COMP329 Robotics and Autonomous Systems Lecture 15: Agents and Intentions. Dr Terry R. Payne Department of Computer Science COMP329 Robotics and Autonomous Systems Lecture 15: Agents and Intentions Dr Terry R. Payne Department of Computer Science General control architecture Localisation Environment Model Local Map Position

More information

Encoding of Elements and Relations of Object Arrangements by Young Children

Encoding of Elements and Relations of Object Arrangements by Young Children Encoding of Elements and Relations of Object Arrangements by Young Children Leslee J. Martin (martin.1103@osu.edu) Department of Psychology & Center for Cognitive Science Ohio State University 216 Lazenby

More information

An Escalation Model of Consciousness

An Escalation Model of Consciousness Bailey!1 Ben Bailey Current Issues in Cognitive Science Mark Feinstein 2015-12-18 An Escalation Model of Consciousness Introduction The idea of consciousness has plagued humanity since its inception. Humans

More information

BEHAVIOR CHANGE THEORY

BEHAVIOR CHANGE THEORY BEHAVIOR CHANGE THEORY An introduction to a behavior change theory framework by the LIVE INCITE team This document is not a formal part of the LIVE INCITE Request for Tender and PCP. It may thus be used

More information

Constructivist Anticipatory Learning Mechanism (CALM) dealing with partially deterministic and partially observable environments

Constructivist Anticipatory Learning Mechanism (CALM) dealing with partially deterministic and partially observable environments Berthouze, L., Prince, C. G., Littman, M., Kozima, H., and Balkenius, C. (2007). Proceedings of the Seventh International Conference on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems.

More information

Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports

Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports Ramon Maldonado, BS, Travis Goodwin, PhD Sanda M. Harabagiu, PhD The University

More information

Spatial Orientation Using Map Displays: A Model of the Influence of Target Location

Spatial Orientation Using Map Displays: A Model of the Influence of Target Location Gunzelmann, G., & Anderson, J. R. (2004). Spatial orientation using map displays: A model of the influence of target location. In K. Forbus, D. Gentner, and T. Regier (Eds.), Proceedings of the Twenty-Sixth

More information

Aliasing in XCS and the Consecutive State Problem : 1 - Effects

Aliasing in XCS and the Consecutive State Problem : 1 - Effects Aliasing in XCS and the Consecutive State Problem : - Effects Alwyn Barry Faculty of Computer Studies and Mathematics, University of the West of England, Coldharbour Lane, Bristol, BS6 QY, UK Email: Alwyn.Barry@uwe.ac.uk

More information

Partially-Observable Markov Decision Processes as Dynamical Causal Models. Finale Doshi-Velez NIPS Causality Workshop 2013

Partially-Observable Markov Decision Processes as Dynamical Causal Models. Finale Doshi-Velez NIPS Causality Workshop 2013 Partially-Observable Markov Decision Processes as Dynamical Causal Models Finale Doshi-Velez NIPS Causality Workshop 2013 The POMDP Mindset We poke the world (perform an action) Agent World The POMDP Mindset

More information

Abhimanyu Khan, Ronald Peeters. Cognitive hierarchies in adaptive play RM/12/007

Abhimanyu Khan, Ronald Peeters. Cognitive hierarchies in adaptive play RM/12/007 Abhimanyu Khan, Ronald Peeters Cognitive hierarchies in adaptive play RM/12/007 Cognitive hierarchies in adaptive play Abhimanyu Khan Ronald Peeters January 2012 Abstract Inspired by the behavior in repeated

More information

Reactive agents and perceptual ambiguity

Reactive agents and perceptual ambiguity Major theme: Robotic and computational models of interaction and cognition Reactive agents and perceptual ambiguity Michel van Dartel and Eric Postma IKAT, Universiteit Maastricht Abstract Situated and

More information

Omicron ACO. A New Ant Colony Optimization Algorithm

Omicron ACO. A New Ant Colony Optimization Algorithm Omicron ACO. A New Ant Colony Optimization Algorithm Osvaldo Gómez Universidad Nacional de Asunción Centro Nacional de Computación Asunción, Paraguay ogomez@cnc.una.py and Benjamín Barán Universidad Nacional

More information

Sexy Evolutionary Computation

Sexy Evolutionary Computation University of Coimbra Faculty of Science and Technology Department of Computer Sciences Final Report Sexy Evolutionary Computation José Carlos Clemente Neves jcneves@student.dei.uc.pt Advisor: Fernando

More information

THE IMPORTANCE OF MENTAL OPERATIONS IN FORMING NOTIONS

THE IMPORTANCE OF MENTAL OPERATIONS IN FORMING NOTIONS THE IMPORTANCE OF MENTAL OPERATIONS IN FORMING NOTIONS MARIA CONDOR MONICA CHIRA monica.chira@orange_ftgroup.com Abstract: In almost every moment of our existence we are engaged in a process of problem

More information

Implicit Information in Directionality of Verbal Probability Expressions

Implicit Information in Directionality of Verbal Probability Expressions Implicit Information in Directionality of Verbal Probability Expressions Hidehito Honda (hito@ky.hum.titech.ac.jp) Kimihiko Yamagishi (kimihiko@ky.hum.titech.ac.jp) Graduate School of Decision Science

More information

PART - A 1. Define Artificial Intelligence formulated by Haugeland. The exciting new effort to make computers think machines with minds in the full and literal sense. 2. Define Artificial Intelligence

More information

The optimism bias may support rational action

The optimism bias may support rational action The optimism bias may support rational action Falk Lieder, Sidharth Goel, Ronald Kwan, Thomas L. Griffiths University of California, Berkeley 1 Introduction People systematically overestimate the probability

More information

Oxford Foundation for Theoretical Neuroscience and Artificial Intelligence

Oxford Foundation for Theoretical Neuroscience and Artificial Intelligence Oxford Foundation for Theoretical Neuroscience and Artificial Intelligence Oxford Foundation for Theoretical Neuroscience and Artificial Intelligence For over two millennia, philosophers and scientists

More information

1. Before starting the second session, quickly examine total on short form BDI; note

1. Before starting the second session, quickly examine total on short form BDI; note SESSION #2: 10 1. Before starting the second session, quickly examine total on short form BDI; note increase or decrease. Recall that rating a core complaint was discussed earlier. For the purpose of continuity,

More information

BIOLOGY. The range and suitability of the work submitted

BIOLOGY. The range and suitability of the work submitted Overall grade boundaries BIOLOGY Grade: E D C B A Mark range: 0-7 8-15 16-22 23-28 29-36 The range and suitability of the work submitted In this session essays were submitted in a wide range of appropriate

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information