Verbal and nonverbal discourse planning

Similar documents
Emotional Meaning and Expression in Animated Faces

Modalities for Building Relationships with Handheld Computer Agents

Affect in Virtual Agents (and Robots) Professor Beste Filiz Yuksel University of San Francisco CS 686/486

Useful Roles of Emotions in Animated Pedagogical Agents. Ilusca L. L. Menezes IFT6261 :: Winter 2006

Affective Game Engines: Motivation & Requirements

Affective Dialogue Communication System with Emotional Memories for Humanoid Robots

MODULE 41: THEORIES AND PHYSIOLOGY OF EMOTION

From Greta s Mind to her Face:

Emotionally Responsive Virtual Counselor for Behavior-Change Health Interventions

Practical Approaches to Comforting Users with Relational Agents

Strategies using Facial Expressions and Gaze Behaviors for Animated Agents

Simulating Affective Communication with Animated Agents

Modelling character emotion in an interactive virtual environment

Perceptual Enhancement of Emotional Mocap Head Motion: An Experimental Study

Empathy for Max. (Preliminary Project Report)

RISK COMMUNICATION FLASH CARDS. Quiz your knowledge and learn the basics.

Emotional intelligence in interactive systems

Factors for Measuring Dramatic Believability. Brian Magerko, Ph.D. Games for Entertainment and Learning Lab Michigan State University

Making It Personal: End-User Authoring of Health Narratives Delivered by Virtual Agents

Learn how to more effectively communicate with others. This will be a fun and informative workshop! Sponsored by

Affective Computing for Intelligent Agents. Introduction to Artificial Intelligence CIS 4930, Spring 2005 Guest Speaker: Cindy Bethel

EIQ16 questionnaire. Joe Smith. Emotional Intelligence Report. Report. myskillsprofile.com around the globe

The Vine Assessment System by LifeCubby

Social Context Based Emotion Expression

REASON FOR REFLECTING

Affective Agent Architectures

Multimodal Interaction for Users with Autism in a 3D Educational Environment

The Comforting Presence of Relational Agents

The Comforting Presence of Relational Agents

Assertive Communication/Conflict Resolution In Dealing With Different People. Stephanie Bellin Employer Services Trainer

Coloring Multi-Character Conversations through the Expression of Emotions

WP 7: Emotion in Cognition and Action

Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information

D6c. Description of potential exemplars: Interaction. Catherine Pelachaud and WP6 members

PSYC 222 Motivation and Emotions

SARA: Social Affective Relational Agent: A Study on the Role of Empathy in Artificial Social Agents

Model of Facial Expressions Management for an Embodied Conversational Agent

Intelligent Expressions of Emotions

Mastering Emotions. 1. Physiology

Creating Adaptive and Individual Personalities in Many Characters Without Hand-Crafting Behaviors

Peer Support Meeting COMMUNICATION STRATEGIES

Talking Heads for the Web: what for? Koray Balci Fabio Pianesi Massimo Zancanaro

Emotion-Aware Machines

Chapter 3 Self-Esteem and Mental Health

Combining Logical with Emotional Reasoning in Natural Argumentation

Description and explanation of the major themes of The Curious Incident of the Dog in the other people, dealing with new environments, and making

Towards Behavioral Consistency in Animated Agents

Elements of Communication

Artificial Emotions to Assist Social Coordination in HRI

Edge Level C Unit 4 Cluster 1 Face Facts: The Science of Facial Expressions

Loving-Kindness Meditation

EMOTIONAL INTELLIGENCE

LECTURE 5: REACTIVE AND HYBRID ARCHITECTURES

Affective Computing: Emotions in human-computer. interaction. emotions.usc.edu

Proposal for a Multiagent Architecture for Self-Organizing Systems (MA-SOS)

Affect and Affordance: Architectures without Emotion

Is Asperger Syndrome The Same As Autism?

K I N G. mentally ill E N. 38 myevt.com exceptional veterinary team March/April 2012

How is Believability of Virtual Agent Related to Warmth, Competence, Personification and. Embodiment? Virginie Demeure. Université de Toulouse

EMOTION DETECTION FROM TEXT DOCUMENTS

Research Proposal on Emotion Recognition

How to Interact with Adults with Communication Difficulties

EMOTIONAL LEARNING. Synonyms. Definition

An Empathic Virtual Dialog Agent to Improve Human-Machine Interaction

Modeling Emotion in Soar

Towards building a Virtual Counselor: Modeling Nonverbal Behavior during Intimate Self-Disclosure

I know you believe you understand what you think I said; but, I am not sure you realize what you heard is not what I meant.

The Regulation of Emotion

Abstract. Introduction. The Architectural Role of Emotion in Cognitive Systems. Jonathan Gratch and Stacy Marsella University of Southern California

INTERNATIONAL CERTIFICATE IN EMOTIONAL SKILLS AND COMPETENCIES A Paul Ekman Approved Course. A three day international certification program

Emotions and Motivation

emotions "affective computing" 30/3/

Brain Mechanisms Explain Emotion and Consciousness. Paul Thagard University of Waterloo

Affective Advice Giving Dialogs

Computational Phronesis as a Possible Path toward Machine Emotional Integrity

What is Emotion? Emotion is a 4 part process consisting of: physiological arousal cognitive interpretation, subjective feelings behavioral expression.

Emotional Intelligence

Emotions. These aspects are generally stronger in emotional responses than with moods. The duration of emotions tend to be shorter than moods.

Motion Control for Social Behaviours

How Is Believability of a Virtual Agent Related to Warmth, Competence, Personification, and Embodiment?

Communications Sciences & Disorders Course Descriptions

A Year of Tips for Communication Success

relaxation and nervous system regulation exercises

ALMA A Layered Model of Affect

On Alignment of Eye Behaviour in Human-Agent interaction

General Psych Thinking & Feeling

Northeastern University Computer Science. Relational Agents in Nursing 5/25/2010. Relational Agents Group relationalagents.com

The Significance of Textures for Affective Interfaces

What is effective communication?

Understanding RUTH: Creating Believable Behaviors for a Virtual Human under Uncertainty

Incorporating Emotion Regulation into Virtual Stories

Interactively Learning Nonverbal Behavior for Inference and Production: A Machine Learning Approach

From Diagnostic and Statistical Manual of Mental Disorders: DSM IV

A step toward irrationality: using emotion to change belief

An assistive application identifying emotional state and executing a methodical healing process for depressive individuals.

MPEG-4 Facial Expression Synthesis based on Appraisal Theory

Introduction to Cultivating Emotional Balance

Multi-Space Behavioural Model for Face-based Affective Social Agents

Interacting with people

AMERICAN SIGN LANGUAGE (ASL) INTRODUCTORY 11

Transcription:

Verbal and nonverbal discourse planning Berardina De Carolis Catherine Pelachaud Isabella Poggi University of Bari University of Rome La Sapienza University of Rome Three Department of Informatics Department of Computer Department of Linguistics Intelligent Interfaces and System Science 1 Introduction We are designing a Conversational Agent that communicates at the same time by speech and expressive face. As we are engaged in face-to-face interaction or in any communicative acts where the visual modality is available along with the acoustic one, multimodal signals are at work. Our communicative acts are performed not only through words, but also through intonation, body posture, hand gestures, gaze patterns, facial expressions and so on. Thus, an important step is to define how the Agent s communicative acts are expressed into a coordinated, either sequential or simultaneous, verbal and nonverbal message. In making an Autonomous Agent capable of communicative and expressive behavior, then, a relevant problem to be taken into account is how the Agent plans not only what to communicate, but also by what (verbal or nonverbal) signals, in what combination and how synchronized. This depends on several factors : (i) consideration of the available modalities (e.g., face, gaze, voice), (ii) cognitive ease of production and processing of signals (for example; in describing an object, a gesture may be more expressive than a word); (iii) expressivity of each signal in communicating specific meanings (for example: emotions are better told by facial expression than by words); (iv) appropriateness of signals to social situations (for example: an insulting word may be less easily persecuted than a scornful gaze). Finally, metacommunicative constraints may occur that, say, lead to use redundancy when information to convey is particularly important or needs to be particularly clear: this brings to use verbal and nonverbal signals at the same time. Since few years, multimodal conversational systems [2, 3, 5, 18, 1] have been proposed. These systems integrate verbal and nonverbal signals such as deictic gestures, gaze behavior, communicative facial expressions. Cassell and Stone [4] have designed a multimodal manager whose role is to supervise the distribution of behaviors across the several channels (verbal, head, hand, face, body and gaze). Our system is close to this last one. But we are also considering the context in which the conversation takes place. In this paper we first describe our enriched discourse generator explaining the 2 sets of rules (trigger and regulation) we have added. We also review the different types of gaze communicative acts. Finally we present the variables defining the context and how they modify the computation of the display of the communicative acts. 2 Discourse Generator In order to achieve our aim, we developed a generator whose architecture is shown in Figure 1. This architecture is inspired to classical NLG systems [11, 13], with some differences that are due to the need to consider the two following factors: to keep the content level independent of the way in which it is out-putted (media independence) and resources are distributed; to include emotional and personality factors in the discourse plan. Starting from a given communicative goal, a hierarchical planner [7, 6] builds a discourse plan-tree, by using a pool of plan operators and according to constraints on the domain and on the mental states of the Sender and Addressee. The resulting plan is a hierarchy of goals and subgoals down to the leaves, that represents the primitive communicative acts (Inform, Request, etc.). Each node of this tree brings information about the communicative goal, the role the node plays in the Rhetorical Relation (RR) [13] attached to its fathernode, the RR that relates the two subtrees departing from it and the focus of the discourse. Being media independent, this plan is suitable to be translated into a written text or speech. However, at this level

Plan Operators Trigger rules Regulation rules PLANNER D-Plan Goal-Media Prioritizer Enriched Plan Surface realizator Communicative Goal Domain KB Context model Media-Instantiated Discourse Body Instantiator Face 2D character Written text... Figure 1: System architecture only the referential content of the presentation is represented: information on the emotional and cognitive state, that should be expressed in communicating with the User (the Addressee of the conversation), is missing. 2.1 Nonverbal communicative acts Emotional and cognitive information very often pass through nonverbal communication. In previous works [15, 12, 17], we have drawn a semantic typology of the types of information that we can in principle convey while communicating. If we focus only on gaze expressions (but of course, all of these types of information can be conveyed also through verbal behavior or, in many cases, by gestures or facial expressions), we have shown that gaze can make reference to concrete or abstract entities present in the physical context or mentioned in discourse (that is, it has a deictic function); it can mention some concrete or metaphorical properties (like huge, small or subtle ); it can express emotions (fear, anger, dismay...), performatives (it can implore, defy...), conversational information (sentence focus, turn-taking requests); and it can also express metacognitive information (I am thinking, I am trying to remember), and finally Rhetorical Relations within a discourse (contrast, elaboration...) [13]. What we want to focus on here is that RRs are only a (small) subset of the information we convey while talking. The generation of texts must then be enriched with all these other types of information. 2.2 Goal-media prioritizer To associate verbal information with a combination of non-verbal expressions (for instance a gaze, an eyebrow expression, a gesture etc), we need to select the expressions to be shown and to decide, based on their priority, if and when it should be displayed. This is the task of the goal-media prioritizer, a notion first introduced by [4]: this module revises the plan by enriching its nodes with information about the type of non-verbal signals to employ at each stage of the conversation, their combination and their synchronization with verbal communication. The goal-media prioritizer enriches the original plan-tree by applying two sets of rules: trigger and regulation rules. The first set is employed to fire a particular signal, based on information to display, on the domain and the context. The second one is employed to decide whether to display or not a signal

and with which intensity [8]. The left parts of trigger rules formalise domain and context dependent conditions; their right parts establish the emotional, meta-cognitive, or other kinds of signals to convey. The triggering mechanism has been built on the basis of the emotion category theory of Ortony, Clore and Collins [14], revised by Elliott [9, 10]. The general structure of this type of rules is the following: IF DC-Cond THEN (Signal Ag e), where DC-Cond is a condition on the context and/or on the domain, Signal represents the type of triggered signal (i.e. feeling of emotions, meta-cognitive, deictic reference and so on), Ag is the agent that is conveying the signal and e is the object of the Signal. For instance: Given a node in the discourse plan that corresponds to the communicative goal of describing an event (Describe(S, A, event)), whose focus is event, two different trigger rules may be applied to deal with pleasant or unpleasant events: IF (Focus(node) is unpleasant ) THEN (Feel S Sadness) IF (focus(node) is pleasant ) THEN (Feel S Joy) 2.3 Context Obviously, the expression of a particular emotional state may depend on other factors, that influence the way in which that emotion is signalled, its expression intensity (degree), and so on. As we mentioned before, this process is governed by the leakage rules, that take into account factors like the Sender s and the Addressee s personality traits, their social relationship and so on. Therefore contextual information relevant to this decision includes, in our view: 1. Sender s display goal - what S (Sender) wants A (Addressee) to do (to console, to help, to advice..) while expressing one s emotion. 2. the Social situation - either public or intimate; 3. the Addressee s model - what the Sender thinks of the Addressee, that includes in its turn: (a) A s cognitive capacity i. Understanding capacity (S will not share emotions that A cannot understand); ii. Problem solving capacity (S will not show sad and ask help if A cannot give good advice) iii. Experience (If A never felt similar emotions, he can t help S) (b) A s practical resources (if the reason I want to show I am in love is to ask my friend his garçonnière, I will not display it if he has no garçonnière) (c) A s personality, where we include [16]: i. A s typical goals (I will not show sad to a selfish person, who gives more importance to his own than to other people s goal) ii. A s typical emotions (I will not show happy to an envious friend; I will not show afraid to someone who is apprehensive). (d) the social relationship between S and A, namely i. role and power relationship (I will not show contempt to one who can retaliate; I won t show angry at my boss) ii. social attitude - whether helping or aggressive to each other (I ll not show angry at my child if she cut a precious suit to make me a present). An example of how context determine the display or not of the display of fear is the following: IF (Feel S Fear) and(bels (Apprehensive A)) THEN(GoalS( Display S Fear)) S will not show afraid if she does not want that an apprehensive person A gets worried. IF (Feel S Fear) and (Goal S (Help A S)) THEN (Goal S (Display S Fear)) S will display her fear if she wants to get help from A.

(Describe S A Stay(jail)) ElabObjAttr (Inform S A Stay(jail)) (Inform S A LengthofStay(jail)) D-Plan Portion (Display S A Distress) (Inform S A Stay(jail)) (Display S A "I m thinking") ElabObjAttr Enriched Plan (Display S A Adjectival) (Inform S A LengthofStay(jail)) Figure 2: Example of enriched plan 2.4 Enriched plan At the end of this process, a new enriched plan is generated: this is the input of the surface realizer which, according to the type of body that has been selected for the Animated Agent, defines a mediainstantiated presentation. Figure 2 shows an example of the generation process, by illustrating how the result of the planner is enriched and how this new plan is transformed into an annotated structure corresponding to the mediainstantiated presentation. In this example, the communicative goal of the Sender is to describe to A her stay in jail. Jail in the domain knowledge base is associated to distress. To describe her stay in jail the Sender has to inform that she stayed in jail and how long she stayed in jail. But at the same time she remembered how distressful it was to stay in jail and how long it was. In order to maintain the independence and the possible distribution of all these components, the result of enriched plan is annotated with an XML-based language. In particular, the media instantiated presentation, for the same example as above, will be represented as: <gaze type= ImThinking > I have been in <gaze type= Distress > jail </gaze> for a very <gaze type= LargeAdjectival > long </gaze> period </gaze> The gaze type refers to gaze semantic topology proposed in [17]. 3 Conclusion In this paper we have proposed a new approach to generate verbal and nonverbal behavior for an embodied agent. As we generate the discourse we consider the context. The context gathers information on the situation in which the conversation takes place, the Addressee s model and the social relationship between the Sender and the Addressee. We claim that the consideration of the context allows us to build a more personalized and individual agent with a less robotic and repetitive behavior. References [1] E. André, T. Rist, and J. Mueller. Integrating reactive and scripted behaviors in a life-like presentation agent. In Proceedings of the second International Conference on Autonomous Agents, pages 261 268, 1998. [2] J. Cassell, J. Bickmore, M. Billinghurst, L. Campbell, K. Chang, H. Vilhjálmsson, and H. Yan. Embodiment in conversational interfaces: Rea. In CHI 99, pages 520 527, Pittsburgh, PA, 1999. [3] J. Cassell, C. Pelachaud, N.I. Badler, M. Steedman, B. Achorn, T. Becket, B. Douville, S. Prevost, and M. Stone. Animated conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. In Computer Graphics Proceedings, Annual Conference Series, pages 413 420. ACM SIGGRAPH, 1994. [4] J. Cassell and M. Stone. Living hand and mouth. Psychological theories about speech and gestures in interactive dialogue systems. In AAAI99 Fall Symposium on Psychological Models of Communication in Collaborative Systems, 1999.

[5] E. Churchill, S. Prevost, T. Bickmore, P. Hodgson, T. Sullivan, and L. Cook. Design issues for situated conversational characters. In WECC 98, The First Workshop on Embodied Conversational Characters, October 1998. [6] B. De Carolis, F. de Rosis, D.C. Berry, and I. Michas. Evaluating plan-based hypermedia generation. In Proceedings of the 7th European Workshop on Natural Language Generation, Toulouse, France, 1999. [7] B. De Carolis, F. de Rosis, F. Grasso, A. Rossiello, D.C. Berry, and T. Gillie. Generating recipientcentered explanations about drug prescription. Artificial Intelligence in Medicine, 8:123 145, 1996. [8] P. Ekman. Emotion in the human face. Cambridge University Press, 1982. [9] C. Elliott. A affective reasoner: A process model of emotions in a multiagent system. PhD thesis, Northwestern University, The Institute for the Learning Sciences, 1992. Technical Report No. 32. [10] C. Elliott and G. Siegle. Variables influencing the intensity of simulated affective states. In AAAI technical report for the Spring Symposium on Reasoning about Mental States: Formal Theories and Applications, pages 58 67, Stanford University, March 23-25 1993. AAAI. [11] B.J. Grosz and C.L. Sidner. Attention, intentions, and the structure of discourse. Computational Linguistics, 12(3), 1986. [12] E. Magno-Caldognetto and I. Poggi. Micro- and macro-bimodality. In C. Benoit and R. Campbell, editors, Proceedings of ESCA, AVSP 97 workshop, Rhodes, Greece, September 1997. [13] W.C. Mann, C.M.I.M. Matthiessen, and S. Thompson. Rhetorical structure theory and text analysis. Technical Report 89-242, ISI Research, 1989. [14] A. Ortony, G. L. Clore, and A. Collins. The Cognitive Structure of Emotions. Cambridge University Press, 1988. [15] I. Poggi. Mind markers. In 5th International Pragmatics Conference, Mexico City, July 5-9 1996. [16] I. Poggi and C. Pelachaud. Facial performative in a conversational system. In S. Prevost J. Cassell, J. Sullivan and E. Churchill, editors, Embodied Conversational Characters. MITpress, Cambridge, MA, 2000. [17] I. Poggi, C. Pelachaud, and F. de Rosis. Eye communication in a conversational 3d synthetic agent. Special Issue on Behavior Planning for Life-Like Characters and Avatars of AI Communications, 2000. [18] J. Rickel and W.L. Johnson. Animated agents for procedural training in virtual reality: Perception, cognition, and motor control. Applied Artificial Intelligence, 13:343 382, 1999.