Active Affective State Detection and User Assistance with Dynamic Bayesian Networks. Xiangyang Li, Qiang Ji

Actve Affectve State Detecton and User Assstance wth Dynamc Bayesan Networks Xangyang L, Qang J Electrcal, Computer, and Systems Engneerng Department Rensselaer Polytechnc Insttute, 110 8th Street, Troy, NY 12180 USA xyl@eee.org, qj@ecse.rp.edu Abstract. Intellgent user assstance systems face challenges of ncomplete, uncertan and multple modalty sensory observatons, user s changng nternal state, and constrants n makng decsons. We ntroduce a new probablstc framework based on the dynamc Bayesan networks to dynamcally model the user s nternal state, profle, the sensory observatons, and the contextual nformaton. A systematc mechansm performs purposve and suffcng nformaton ntegraton to nfer user s affectve state and provde correct assstance. We am to actvely nfer the user s status and engage n approprate assstance n a tmely and effcent manner. 1. Introducton Intellgent assstance systems are an mportant applcaton area of user modelng, especally wth the rapd development of pervasve and ubqutous computng. For example, n every year many people are njured n car accdents because drvers are n dangerous status ncludng fatgue, nervousness, or confuson. If we could dstngush these dangerous states n a tmely manner, and provde assstance n terms of approprate alerts, we may prevent many accdents from happenng. However such systems face several challenges: 1) sensory data are often ncomplete, uncertan, and from sources of dfferent modaltes; 2) sensory data are often dynamc and evolvng over tme to reflect change n the user s state; and 3) decsons about the user s need and the assstance must be rendered approprately and n a tmely and effcent manner under varous constrants on tme and resources. We ntroduce a probablstc framework based on the Dynamc Bayesan Networks (DBNs) and nformaton theory to smultaneously address the above challenges. Frstly, a generc herarchcal probablstc framework for user modelng s ntroduced to model the vsual sensory observatons, and the profle and contextual nformaton related to the user s mental state. Secondly, ths framework dynamcally evolves and grows to account for temporal change n sensory observatons as a result of the change n user s nternal state. The DBNs allow the temporal nformaton to be systematcally ncorporated va temporal causalty. Thrdly, the proposed framework provdes a mechansm that performs purposve and suffcng nformaton ntegraton n order to determne the user s status. Specfcally, nstead of passvely fusng the nformaton that s avalable, ths system frst formulates an ntal hypothess about the user s current nternal state and then actvely selects the most nformatve sensory/questonng strategy n order to quckly and economcally confrm or refute the hypotheszed nternal state. All these methods help the system to actvely nfer the user s need/state under uncertanty over tme and engage approprate assstance n a tmely and effcent manner. The paper s organzed as follows. Secton 2 revews the related works to our research wth focus on the applcatons of Bayesan networks model n plan recognton, user need nference and affectve state assessment. Secton 3 provdes a descrpton of the modelng framework proposed n our study. The actve nference and assstance are ntroduced n secton 4 and 5 respectvely. In the last part of ths paper, the proposed modelng and

nference framework and strateges are evaluated n experments usng subjectve and smulated data. 2. User Modelng and Assstance In (Zukerman & Albrecht, 2001; Webb et al, 2001), authors summarze the challenges for user modelng ncludng the need for large datasets, the need for labeled data, concept drft, and computatonal complexty. Jameson (1996) revews the user and student modelng systems that manage the uncertanty usng numercal technques ncludng Bayesan networks, Dempster-Shafer theory, and fuzzy logc. The applcatons vary wth the objectve, and the stuatons to address. Recently, there has been a sgnfcant surge n usng Bayesan Networks (BNs) for user modelng, ntellgent tutorng, and other related felds. The efforts closely related to ours are: plan recognton, user need nference, and affectve state assessment. 2.1 Plan recognton Plans are descrptons of acton patterns. They encode a user s ntentons and desres. When buldng up the user model, there s an assumpton that ratonal agents have a mental state. Huber et al (1994) provde a unform procedure for convertng plans represented n a flexble procedural language to observaton models represented as probablstc belef networks. The goals and ther relatons n the language such as subgoal, AND, OR branches, and multple goals are mapped nto varables and arcs n Bayesan networks. Albrecht et al (1997) use Dynamc belef networks to represent doman features, ncludng the actons and locatons of the players, to dentfy players' plans and goals n computer onlne games. Pynadath and Wellman (1995) present a Bayesan framework descrbng the context n whch the plan was generated, the mental state and plannng process of the agent, and the consequences of the agent's actons. The core part of ther work s the Belef-Preference- Capablty model of the agent s mental states that s used to model the user. The belef s the agent s knowledge of the state of the world and ts dynamcs. The preference s the agent s ntentons mpactng ts behavor on the world. The capablty s the agent s self-model of ts avalable actons. The authors method s demonstrated n traffc montorng to predct the plan of a drver from observaton of vehcle movements. Ths framework provdes a general way to model plan recognton problem. However t s not ntended for actve assstance. 2.2 User need nference and assstance Intellgent user assstance systems need the ablty to adaptvely accommodate the user s specfc need. In the READY system (Bohnenberger et al, 2002), the authors use DBNs n a dalog system to adjust the polcy n provdng nstructons, based on the recognzed tme pressure and cogntve load of the user. The relevant observatons from the user s utterances may nclude flled pauses, dsfluences, errors, and so on. The adaptaton s realzed by a rule base that maps detected stuatons nto actons. It focuses on the ntutve adaptaton of the nterface based on changes n hypotheses. No actve nformaton collecton strategy s consdered. Mcrosoft s carryng out extensve research applyng Bayesan networks to create ntellgent software assstants. The Lumere project s ntended to help computer users wth nteractve nterfaces (Horvtz et al, 1998). DBN user models are used to nfer a computer user s goals and needs by takng the user s background, actons, and queres nto account. Based on the assessment of user s need and the utlty theory of nfluence dagrams, the automated assstant provdes customzed help. Ths research addresses the ssues of automatc assstance such as the tmng and optmzaton of assstance. The research, however, does not focus on provdng actve nformaton fuson that dynamcally selects nformaton channels, and actvely determne the most approprate assstance.

DeepLstener augments the speech recognton n clarfcaton dalogs by usng DBN models (Horvtz & Paek, 2000). The models nfer user ntentons assocated wth utterances, such as affrmaton, negaton, reflecton, and so on. Therefore, costs and benefts can be calculated for dfferent avalable actons and the acton wth the hghest utlty s executed. Utlty s calculated by assessng the cross product of the stuatons and these actons, through psychologcal experments, or the use of assessment tools. DeepLstener, however, does not dstngush actons and sensory tests n utlty calculaton and reles heavly on mmersve nteracton. Ths may be effectve sometmes, but dangerous other tmes because normally users are hghly vared regardng operatng sklls and personaltes, and thus more easly become resstant to such overactve nterface. The Bayesan Receptonst system (Horvtz & Paek, 1999) suffers from the same problems plagung DeepLstener. Its central goal decomposton herarchy uses Bayesan models at ncreasngly detaled levels of analyss. At each level, the system apples a greedy value of nformaton calculaton based on entropy to select the next sngle pece of evdence. When the expected cost of evaluatng observatons exceeds the expected value, the value of nformaton calculaton termnates wthn the current level and the system moves to the next level of detal. Underlyng the Receptonst, Quartet, s a task ndependent, multmodal archtecture for supportng robust contnuous spoken dalog (Paek & Horvtz, 2000). The four levels of nference and decson-makng are the channel level, the sgnal level, the ntenton level, and the conversaton level. The system uses value of nformaton to ask questons, make recommendatons, and seek out nformaton n a dalog settng. 2.3 Affectve computng More and more HCI researchers are nterested n the users emotonal and mental aspects snce affectve states are an mportant ndcaton of the user s nternal state, ntenton and needs. Affectve computng focuses on recognzng emotonal ntellgence (Pcard, 1997). It s workng towards developng software that has the ablty to recognze, express, and have emotons. Human bengs have abundant emotons such as sadness, happness, gulty, prde, shame, anxety, fear, anger, and so on. From the vew of computatonal theory, affectve computng uses pattern recognton and nformaton retreval technologes for affectve state assessment. These technques nclude rule based system (Pantc, 2002), dscrmnant analyss (Ark et al, 1999), fuzzy rules (Hudlcka & McNeese, 2002), case/nstance based learnng (Scherer, 1993), lnear or nonlnear regresson (Moryama et al, 1999), neural networks (Petrushn, 1999), Bayesan learnng (Q & Pcard, 2002), HMM model (Cohen et al, 2000), and Bayesan networks. Most of these research efforts focus on the low level mappng between certan sensory data and the underlyng emotons. We could categorze current approaches n affectve state assessment nto two groups. The frst group uses the sensory measures as predctor varables and apples classfcaton algorthms wthout the pror and context knowledge about these varables and the target affectve states. Such algorthms lack the ablty to handle uncertanty, complexty, and ncompleteness nvolved n data n buldng pattern models and commttng classfcaton tasks. The second group, represented by the Bayesan network and HMM models, represents the pror knowledge and expertse n graphc network form. They mantan the balance between the global and local representatons and provde powerful capabltes n handlng the complex stuaton n practcal systems wth the ad of the causal and uncertanty representaton structure. Ball and Breese (2000) use a Bayesan network to assess the user s affectve state n terms of valence and arousal, and the personalty n terms of domnance and frendlness. The observable data represent facal and speech nformaton about the user. Ths network was mplemented n a dynamc Bayesan network to capture the temporal emoton state structure. It gves a representaton of a smple emotonal state assessment task. Conat (2002) provdes

a dynamc Bayesan network model for assessng students emoton n educatonal games. The emoton states are modeled as consequences of how the current stuaton (acton and help) fts wth the student s goals and preferences. Some body expressons and sensors are also used as evdence. 2.4 Fault detecton and troubleshootng Fnally, another area of research that s closely related to the proposed actve sensng s fault detecton and troubleshootng. Fault dagnoss and troubleshootng are decsontheoretc processes that generate low cost plans for dentfyng faults so that the devce can be repared effcently. Heckerman et al (1996) apply an approach based on Bayesan networks to encode the possble faults and to dentfy an optmal acton plan for repar. Ths s acheved by evaluatng the cost of repars for varous plans. After each acton, the probabltes are updated and new potental plans are generated. The work by Breese & Heckerman (1996) represents the contnuaton of the above work. They use a persstent network to overcome the lmtaton of a sngle Bayesan network. Langseth & Jensen (2003) extend the tradtonal troubleshootng framework to model non-perfect repar actons and questons. The system uses the expected cost of repar to choose the repar sequence of actons. The effcency, the expected cost of repar, and the value of nformaton for actons and questons could be used to defne more complex measures of repar strateges, and to determne the repar sequence and whether to ask questons. All of these works focus on globally and statcally seekng the best acton sequence for a problem settng where the actons and questons are not repeated. However, ther defntons of effcency, cost, and value of nformaton are useful for our task to effcently and accurately nfer user state. 2.5 Summary In concluson, researchers have realzed the benefts of DBNs and utlty theory, and have begun to apply them to user modelng and HCI related applcatons. Current research n these areas, however, s lmted to passve nference, mostly affect-nsenstve, and n a statc doman. Effcency n user state nference s usually not consdered and the utlty of an acton does not usually vary over tme. They apparently are not enough to meet the demands of HCI. The research n affectve computng may help user modelng n supportng sensory data processng. However n order to extract the requred nformaton for user modelng and assstance, the nformaton from these modaltes s generally not suffcent and must be ntegrated wth hgher level models of the user and the envronment. Compared wth the Bayesan network systems dscussed above, our system currently ams at two objectves. Frst, non-ntrusve and actve user state nference. Our target s to desgn slent agents that use the most relable and non-ntrusve evdences, to provde the user wth accurate and actve assstance n a pervasve and ubqutous computng envronment. Second, dynamc and actve sensor selecton. The selecton of sensor or sensors n such a system should not be done once and then forgotten, but needs to be contnually and dynamcally reevaluated. We focus more on refnng sensors/questons dynamcally usng a local optmal strategy. 3. Context-Profle-State-Observaton Model Bayesan networks are probablstc graphcal models representng jont probabltes of a set of random varables and ther condtonal ndependence relatons (Jensen, 2001; Pearl, 1988). The nodes characterze the hypothess/goal varables, hdden state varables, and evdence/observaton varables of the physcal system, whle the arcs lnkng these nodes represent the causal relatons among these varables. Hypothess nodes represent what we want to nfer whle the observaton nodes represent sensory observatons. The ntermedate hdden nodes are necessary to model the state generaton process. They lnk the hypothess

nodes wth the observaton nodes and therefore nfluence the varables we observe and the varables we want to nfer. Nodes are often arranged herarchcally at dfferent levels, representng nformaton at dfferent levels of abstracton. Statc Bayesan Networks (SBNs) work wth evdences and belefs from a sngle tme nstant. As a result, SBNs are not partcularly suted to modelng systems that evolve over tme. DBNs have been developed to overcome ths lmtaton. In general, a DBN s made up of nterconnected tme slces of SBNs, and the relatonshps between two neghborng tme slces are modeled by a Hdden Markov model,.e., random varables at tme t are affected by varables at tme t, as well as by the correspondng random varables at tme t-1 only. Fgure 1 llustrates such behavors. The slce at the prevous tme provdes dagnostc support, through ts temporal lnks, for current slce and t s used n conjuncton wth current sensory data to nfer the current hypothess. DBNs represent a generalzaton of the conventonal systems for modelng dynamc events such as Kalman flterng and Hdden Markov Models. Fgure 1 A generc Dynamc Bayesan Network structure consstng of 3 tme slces, where H represents a collecton of hypothess nodes, S a collecton of hdden nodes, Es a collecton of observaton nodes, and t represents tme. Bayesan networks have several advantages for modelng and nferrng user s affectve state. Frstly, BNs provde a herarchcal framework to systematcally represent nformaton from dfferent modaltes at dfferent levels of abstracton and systematcally account for ther uncertantes. Furthermore, wth the dependences coded n the graphcal model, Bayesan networks can handle stuatons where some data entres are mssng. We can gan understandng about a problem doman and predct the consequences of nterventon based on the learned causal relatonshps. Secondly, the user s dynamcally changng state and the surroundng stuatons call for a framework that not only captures the belefs of current events, but also predcts the evoluton of future scenaros. DBNs provde a very powerful tool for addressng ths problem by provdng a coherent and unfed herarchcal probablstc framework for sensory nformaton representaton, ntegraton, and nference over tme. Furthermore, DBNs provde us wth the ablty to predct the nfluence of possble future actons through ts temporal causalty. We could choose the best decson accordngly. Thrdly, n many applcatons, the cost n terms of tme, computatonal complexty, the nterrupton to the user, and the expense of nformaton retreval from varous sensors, puts strct constrants on mplementng the decsons about provdng assstance. Therefore, we have to carry out our work actvely along the most effcent and economcal path. DBNs provde the facltes to actvely and effcently determne the utlty of each sensory acton over tme. Our generc framework to apply Bayesan networks to user modelng s the Context-Affectve State-Profle-Observaton model. It s used to nfer the user s affectve state from ther vsual

observatons. As n Fgure 2, such model captures the user s profle, affectve state, and the contextual nformaton n. Context component. The represents nformaton about the specfc envronmental factors that can nfluence the user s affectve state. Affectve state component. Ths component represents the user s emotonal status. Typcal affectve states nclude fatgue, confused, frustraton, fear, sad, and anger. Profle component. Ths models user s ablty and compettveness n fnshng the operatons. Ths provdes the adaptaton capablty of the model to ndvdual users. Observaton component. It ncludes sensory observatons of dfferent modaltes descrbng user behavors. Context User profle Affectve state Eye movement Gaze Facal expresson Head gesture Hand gesture Blnk freq Fxato n vs. saccade Closure tme Spatal dstrbuton Facal features Tlt freq Pose Locaton Moton Fgure 2 Context-Profle-State-Observaton model, where self-pontng arrows ndcate temporal lnks. The affectve state of the user and the hdden nodes of the user s vsual, audo and behavoral status n current tme slce are nfluenced by the correspondng varables n the most recent tme slce. The user profle could also have temporal lnks between tme slces. However n ths fgure, we consder t unchangeable n a runnng sesson. Ths fgure also outlnes the causal relatons between context, profle, state, and observaton varables. The context and profle varables nfluence the user s state. The user s states lead to the evolvement of vsual, audo, and behavoral expressons. 4. Actve User s State Inference Snce we are often constraned by the tme and resource we could use and the strct requrement on the accuracy of assstance, purposve and suffcng nformaton collecton and ntegraton are needed to nfer about the user s affectve state n a tmely and economc manner. Fgure 3 shows a general vew of the actve user state detecton system. We are nterested n how to dynamcally control (select actons and make decsons) the system that has a repertore of sensors such that the system operates n a purposve manner. We collect the observatons from the most nformatve sensors, e.g., we could even further reduce the uncertanty by askng user questons drectly. We update the belefs for current affectve state varables wth the prevous belefs of context and profle varables servng as evdence.

Fgure 3 Actve state assessment system overvew. 4.1 Actve nference Mathematcally, the user affectve state nference problem may be vewed as a hypothess detecton problem, wth hypothess, H={h 1, h 2,,h n }, representng the possble user state. The sensory observatons E have m dverse sensors,.e., E={E 1, E 2,,E m }. The goal s to estmate a posteror probablty that H=h s true gven E,.e., P(H=h E). Accordng to the Shannon s measure of entropy, the entropy of a dstrbuton over hypothess H t s: ENT( H ) = p( h )log p( h ) (1) ht 1 t t Ths entropy s zero when H t s unambguous,.e., one state has a probablty 1; t has the hghest value when the dstrbuton s evenly bult for all states. The beneft of certan evdence to a hypothess can be measured by ts expected potental to reduce the uncertanty wth the hypothess, called mutual nformaton,.e., the dfferental entropy between pror to and after the sensory acton. Gven sensor E wth a set of states and the hypothess dstrbuton n last tme slce H t-1, ts mutual nformaton to hypothess H can be denoted as I(E ). I( E ) = ENT( H ) = p( e ) ENT( H e ) e p( ht ht 1 )log p( ht ht 1 ) + [ p( e ) ht 1 e ht p( h h t t 1, e )log p( h h Ths formula s fundamental for dynamcally computng the uncertanty reducng potental for H due to E. We could easly extend t to consder the case that multple sensors, E={E 1,,E n } E, are nstantated smultaneously. I( E) = ( E,.., E = + 1 ht 1 p( h... t n ) = ENT( H ) h t 1 [ p( e )log p( h,..., e 1 n e1 en ht ) t... e1 h t 1 p( h t ) en h p( e t 1, e,..., e 1 n,..., e ) ENT( H e n )log p( h t 1 h,..., e t 1, 1 n ) e,..., e The probabltes n the above equaton are readly avalable from the forward and backward nference propagaton based on hypothess belefs for last tme slce. For example, p(h t h t-1, e 1,,e n ) s the posteror probablty of hypothess state for current tme slce gven a confguraton on sensor states. Thus the mutual nformaton for each sensorng opton could be calculated handly. n t )] t 1, e )] (2) (3)

Acqurng nformaton ncurs cost. The cost may nclude the cost of nformaton retreval, the tme to nclude the nformaton from source nto the fuson system, the computaton tme for sensory data processng, and the hardware executon tme. We consder the sensor cost C of selectng E, a set of n sensors, where the costs for dfferent sensors are assumed to be ncorporated wth the same mportance, usng the followng formula: n C = 1 C( E) = m (4) C j= 1 j where C s the cost to acqure the nformaton from sensor. Combnng uncertanty reducng potental and nformaton acquston cost, we form the expected utlty u gven the sensor E as: EU( E) = αi( E) + (1 α ) C( E) (5) where α s the balance coeffcent between the two terms. The optmal sensor acton can be found by usng the followng decson rule: E * = arg max EU(E) (6) E Thus we could search for the best sensory acton by examnng the utltes for all confguratons of sensors. 4.2 Some ssues n actve nference In the prevous secton, mutual nformaton provdes a very nce crteron for actve sensor selecton. In ths selecton process, complcated stuatons may emerge and we have to handle them. Specfcally we wll dscuss the ssues of nference degradaton, mpact of sensory state dstrbuton, selecton of multple sensors, and multple hypothess varables. One ssue we encounter wth actve sensng s nference degradaton. The degradaton s defned as the compromse on nference ablty when a small set of sensors are repeatedly selected n selecton process. In actve fuson, due to the chosen mutual nformaton defnton, a sensor tends to yeld a hgher value n calculaton, f ts condtonal probablty table has very uneven entres for dfferent confguratons of parent states. Thus the mutual nformaton crteron s lkely to apply the same sensorng acton to try to maxmze the dfferences among the states of the hypothess node. It tends to favor a partcular set of sensors for a partcular hypothess varable. Ths tendency ntensfes as tme passes by, observed from the mplementaton. Ths s reasonable snce our prncpal objectve n actve fuson s to seek the sensor whch could most effcently dstngush the varous affectve states of the subject. However, on the other hand, the stuaton that one or a very small set of sensors domnate the selecton depresses the advantage of fusng multmodal nformaton for better accuracy and robustness. Furthermore, such superorty of these sensors n mutual nformaton comes partally from usng the expected potental. In expected potental calculaton, all possble states of a sensor are nspected and the whole network structure (ncludng condtonal probabltes) mpacts the result largely. But later on, only one state s nstantated for the selected sensor. In ths sense, such crteron has bas to certan sensors n selecton process. To reduce the nference degradaton effect, mutual nformaton should not be the only factor n determnng sensor s beneft. An obvous soluton s to force more sensors nto engagement. We could acheve ths by montorng the utlzaton hstory of sensors. In our study we use a taboo lst to exclude recently selected sensors from current selecton decson. Ths taboo lst has one parameter,

.e., the length. A lst length of 1 means ths lst contans one entry at any tme, whch s the sensor selected n the most recent tme slce. Usng ths approach, we take the hstory nformaton n consderaton. We examne the effect of such taboo lst n expermentaton. Another possble soluton s to dynamcally adjust the utlty of a sensor. For example, the utlty of a sensor can be reduced over tme when t s more often selected than others. Ths could avod repeatedly selectng the same sensor. Further from the above dscusson, we would prefer the sensor that has less predcted uncertanty n ts probablty dstrbuton. We frst determne what would be the most possble expresson predcted for the subject s nternal status. A sensor that may catch such expresson should be a strong verfer to what s gong on wth the user s nternal mnd. And such a sensor shows dstncton among ts possble states. Otherwse when a sensor has a number of states havng almost equal probabltes to appear, t shows weak correlaton to the underlyng status. In other words, a sensor showng hgh possblty for certan state s a better ndcator for the subject s current nternal status, compared wth the sensors that have equal possbltes for a set of states. Thus we use splt nformaton to penalze such sensors as used n the GanRato ndex for decson trees (Mtchell, 1997). The splt nformaton gves a large value when the sensor has many states that are evenly dstrbuted (have unform probablty dstrbuton). Snce the stuaton becomes more complex f there are more than one sensors, here we only consder the case of one sensor to smplfy the descrpton n ths paper. Gven sensor E, the new mutual nformaton calculaton uses the followng formulae. ' I ( E SI( E ) = I( E ) = e ) / SI( E p( e ) )log p( e ) Another concern s effcent multple sensor selecton for the case where more than one sensors could be actvated at the same tme slce. In ths case we have to consder the mutual nformaton and combned cost for every possble mxture of sensors n order to seek the optmal sensor combnaton. Ths s a typcal NP-hard problem, demandng ntense computaton wth the ncrease of sensors. On the other hand, lmtaton on the number of sensors that could be turned on together at each tme may reduce the number of combnatons. Seekng an affordable search algorthm n ths stuaton s a challenge for future research. In our research, we use a greedy or myopa search strategy (Jensen, 2001), whch calculates the utlty for each sensor and ranks the sensors by ther utltes. Then we choose the number of needed sensors from the top. Fnally we have to consder the case where several affect hypotheses exst and we are n nterested n assessng them smultaneously. There are dfferent vews about whether dfferent affects of human bengs could coexst. We could use multple bnary affect nodes snce ths approach could also accommodate exclusve affectve states, by puttng a constrant on the postve states of these nodes. Change I(E) to I(H j, E) to be the mutual nformaton of sensor set E to hypothess j. Then we could rewrte the formula for sensor s mutual nformaton. " I ( E) = w I( H, E) j j j where w j s the weght to dstngush the mportance of dfferent affects. There are varous ways to set ths weght. In our expermentaton, we set ths weght as the current belef for the hypothess s postve state,.e. w j = p( H j = postve). Ths means we gve the suspcous hypothess more mportance. (7) (8)

After we consder all the above ssues, we could choose the approprate form of mutual nformaton calculaton from equatons 7 and 8. Then we ncorporate the defnton nto equatons 5 and 6 to seek the best sensor at each tme slce. 4.3 Relablty model of nformaton sources Another ssue that we need be aware of whle evaluatng the utlty of an nformaton source s the relablty of the acqured data ether through a sensory acton or va user query. It can sgnfcantly affect the qualty of the nformaton, whch, n turn, wll affect the accuracy of decson. A relablty model could be ncorporated nto the orgnal Bayesan model. DBNs provde an easy faclty to ncorporate ths by addng an nformaton node layer between the hdden layer and the observatons layer, as shown n Fgure 4. The condtonal probabltes that specfy the lnk between the nformaton layer and the observaton layer characterze the relablty of the sensory data wth respect to the nformaton t yelds. Then we use the same nference algorthm to process the Bayesan networks wth relablty problem. The new expected utlty, after ncorporatng the relablty, s calculated usng the same formula accordng to the new network structure, wth the orgnal evdence varables replaced by the evdence varables assocated the relablty model. The fnal utlty of a sensory acton s determned by ts mutual nformaton, ts cost, and ts relablty. H S I 1 I 2 I m E 1 E 2 E 3 E n Fgure 4 The relablty model of nformaton source, where an nformaton layer (Is) s nserted between the sensor layer (Es) and hdden state layer (S). 5. Decson on Assstances There are two key questons to answer consderng the decson on assstance: when we should provde the help and what help we should provde. The frst queston normally requres control thresholds on the probablty dstrbuton for affectve state varables. Frst we calculate a State Level (SL): SL = w( ht ) p( h t ) (9) h t where w ( h t ) s the weght for the state of an affect, ndcatng ths state s serousness level. Then we could set an Engagng Threshold (ET) on SL. If SL s greater than ET, we provde assstance for users. For example, ths weght could be set as zero for the negatve state of an affect such as fatgue. In practce, a SL smoothed over tme s more approprate as a relable ndcator. In our expermentaton, the SLs are smoothed over three tme slces. Moreover, we

may want to use a set of state levels when there are dfferent affect varables that are not exclusve wth each other. Therefore we could calculate a SL for each state, called an ndvdual SL, and a SL to yeld the overall awareness about the subject s status, whch s the average of all ndvdual SLs. The type of assstance to provde depends on the utlty of assstance. The utlty of assstance represents the optmal trade-off between ts beneft and ts cost. The beneft focuses on benefcal consequence of the assstance. One measure of assstance beneft s ts potental to return the user from an anomaly state to hs/her normal state. The beneft could be calculated by assessng the cross product of the stuatons and these assstances, through psychologcal experments on a populaton of users or some assessment tools lke undmensonal or multdmensonal scalng. The cost ncludes the computatonal cost, the potental of annoyng the user, the physcal cost, and the cost of not provdng or delayng the assstance. Let A j represent the jth assstance n consderaton; let G B (Aj, h ) and G C (Aj,h ) represent the beneft and cost of assstance j respectvely, gven user s current state h. In probablty theory, the beneft n form of the potental to return the user from an anomaly state, h, to normal state, h 0, s G B (A j, h ) = p(h 0 A j, h ). Thus we have a way to fnd the beneft by usng probablty expertse. State Task Cause Tolerance Utlty Assstance Fgure 5 An assstance model based on the user's current status n determnng what assstance to provde. The utlty of assstance s also mpacted by the user s current status, ncludng the affectve state, the current task goal, the cause, and the user s tolerance to assstance, shown n Fgure 5. Task shows the user s current nterest, such as choosng some con or button. Cause s the explanaton of causes for the subject s state. Tolerance s lke a control swtch determnng the nterventon degree the user would agree on. A utlty form consderng all these factors needs much more research effort. In ths paper, we just gve the smple utlty calculaton based on the belefs of the user s affectve state. Then the expected utlty of assstance A j may be defned as: EU ( A j ) = G B(A j,h )p(h ) G C (A j,h )p(h ) (10) Lke sensor utlty defnton, beneft and cost should be scaled to mantan the same value range. And the best assstance s determned va: EU * = arg max EU ( A ) (11) A j j In a practcal system, t s a much more complex task to provde accurate and tmely assstance, largely due to the nvolvement of human subject n the loop. The above strategy s just one smple way to control assstance engagement decsons. More complcated

approaches, e.g., buld up a tolerance-to-nterrupton model of the subject, or a closed-loop feedback control scheme, could be developed to acheve better effect. However the research effort n ths area remans a sgnfcant challenge. Assstance could be provded n proactve and reactve modes. Proactve assstance s normally slght and unnotceable, amng to prevent serous stuaton from happenng n the future. Such an example s to zoom n a dsplay f we fnd the focus of an operator s at a detal of a map. Reactve assstances may pause or stop the whole system and user operatons wth the ad of addtonal dalogue nterface or audo/vdeo facltes. Reactve assstances occur when current state of users s already dangerous and needs mmedate correcton. Both the reactve and proactve assstances have to be conservatve and sparng. The reason s that napproprate or naccurate assstance s almost unavodable n assstance systems at current research level. And n the same tme t s very dffcult to know the tolerance range of users to naccurate nterruptons. 6. Evaluaton We use subjectve parameters and smulated data to evaluate ths framework. The task n evaluaton s to detect whether a computer operator s among fatgue, nervousness and confuson affects, usng vsual cues about facal expresson, eyeld, gaze, and query. 6.1 Bayesan network and parameterzaton The dynamc Bayesan network s shown n two tme slces n Fgure 6. The mplementaton s n MATLAB usng the BNT toolkt (Murphy, 2001). The nference algorthm uses the juncton tree engne. Descrpton of the dscrete varables n the model s n Table 1. We use three separate nodes for the affectve states because we do not stpulate that these states are exclusve from each other. The assstance node here s somehow to show the mpact of the chosen assstance on mental states, although t s very hard to estmate n practce. Fgure 6 DBN network structure used n evaluaton model whch uses fve vsual cues and drect query to assess three unsafe mental states of a user. The parameters of ths Bayesan network nclude the pror probabltes for context and profle nodes, and the condtonal probabltes for the lnks. Here we focus more on the workng mechansm than a fdelty model. In our expermentaton, the pror probabltes are all set even, 0.5 for each state snce the root nodes are all bnary. The condtonal probabltes of mental affects between two consecutve tme slces are also called transtonal probabltes, as that n HMM model. Normally the transtonal probablty between same states of two slces, e.g., postve to postve for fatgue, s hgh, f we consder a user s mental state should reman relatvely stable. For transent affects, such as confuson whch may come and go quckly, such probablty may be lower. The transtonal

probablty between opposte states, postve to negatve or negatve to postve, s much lower correspondngly. In our expermentaton, the transtonal probablty between the same states of fatgue s 0.9, whle t s 0.85 for the other two affects. Other condtonal probablty values are got subjectvely by expertse. For example, the possblty that gaze fxaton rato s hghest when the subject s fatgued, not nervous and not confused. However we stll have too many condtonal probabltes n terms of all combnatons of states n parent and chld nodes. In our survey we frst nstead nvestgate the condtonal probablty on a sngle parent, e.g., the probablty of gaze fxaton rato beng hgh gven the subject s fatgued. Nosy-or assumpton or the extensons, especally by Srnvas (1993), have been used to reduce the number of probabltes to estmate from these one-on-one probabltes. Whle such extenson could deal wth arbtrary nput and output nodes, the physcal meanng behnd the nary varables and the output functon are very opaque to represent and nterpret. In our evaluaton we just use the followng equaton to approxmate the combnatoral condtonal probabltes. p ( C A1,..., An ) P( C A ) (10) where A,, A n are parents of C. There s a normalzaton step for the products to keep the requrement on condtonal probablty. The assumpton here s that these affectve states functon ndependently to rase the external expressons n vsual sensory channels. The fnal condtonal probablty s proportonal to the product of all related probabltes condtonal on sngle parent. Table 1 Varables used n the evaluaton model. Component Varables States Notes Assstance Assstance null/warnng/emph ass/smplfcaton Context Context complex/smple The surroundng envronment of the subject. Physcal Profle condton strong/weak Physcal condton of the subject. Skll strong/weak Computer skll of the subject. Fatgue postve/negatve Mental state Nervousness postve/negatve Confuson postve/negatve neutral, happness, Facal sadness, anger, expresson surprse, dsgust, Expressons from the FACS system. (11) fear Sensory observaton Eyeld Percentage of eyeld closure over the pupl over hgh/normal/low PERCLOS tme. (12) Eyeld AECS fast/normal/slow Average eye closure/open speed over tme. (13) Gaze spatal The rato of gaze stayng outsde the computer hgh/even/low dstrbuton screen n the overall tme. (14) Gaze fxaton hgh/even/low The rato of gaze fxaton over tme. (15) rato Query fatgued/confused/ no Answers to drect questonng n the form are you? among fatgued, confused, comfortable. (16) 6.2 Expermental data, scenaros and settngs In expermentaton, we compare the results of dfferent sensor actvaton and assstance strateges, coverng aspects of passve, actve nference and assstance process. These settngs are lsted n Table 2. In actve fuson among the frst sx settngs, the sensor costs of sensors

11-15 are all set as zero,.e., α s 1 n equaton 5. In the last settng, we show the case of usng a set of non-zero sensor costs. We choose to assgn larger costs to the sensors ranked top n mutual nformaton calculaton. Table 2 Seven dfferent settngs used n evaluaton. Settng Passve Fuson I Actve Fuson I Actve Fuson II Passve Fuson II Actve Fuson III Assstance Varous Costs Notes Randomly select 1 sensor. Actvely select 1 sensor wthout Taboo Lst. Actvely select 1 sensor wth Taboo Lst. Randomly select two sensors at each tme slce. Actvely select two sensors at each tme slce Provde assstance when n actve fuson III. Use dfferent sensor costs n calculatng the utlty values of sensors. In assstance settng, all ETs n ths set of experments are set to 0.8. Based on the probablty form to determne beneft of assstance dscussed n secton 5 and the subjectve expertse of assstance cost, the beneft and cost of assstance to each affect are assgned n Table 3. Query provdes a more accurate estmate of the user s mental state. But t s very ntrusve to the subject and thus s assocated wth a hgh cost. We demonstrate the functon of query by only turnng on t once as the last confrmaton before any assstance. When the ndvdual or overall state levels exceed the predefned Engagement Threshold (ET), an answer from the query channel s retreved and used to update the status of the subject. If after ths, any state level s stll sgnfcant, the utltes for assstances are calculated and the one wth the hghest value s chosen to nstantate the correspondng state n assstance node. We assume the assstance s engaged n for certan tme (fve slces n our expermentaton). Then we could smulate and observe the effect realzed by the causal lnk between the affect state nodes and the assstance node. Table 3 The beneft and cost of assstance to affect states. Assstance Beneft (G B ) Cost (G C ) Fatgued Nervous Confused Fatgued Nervous Confused Warnng 0.56 0.33 0.13 0.07 0.33 0.57 Emphass 0.25 0.60 0.31 0.43 0.07 0.36 Smplfcaton 0.19 0.07 0.56 0.50 0.6 0.07 Dfferent replcatons (scenaros) are used, ncludng fatgue, nervousness, confuson and normal. At the begnnng of each scenaro, the belefs for postve and negatve states of each affect varable are all set as 0.5. In each scenaro, the selected sensor s nstantated usng the correspondng sensor state for the current tme slce. In determnng these sensor states n these scenaros, we frst do a forward propagaton usng the same Bayesan network, settng probabltes for the mental state varables. For example, for the scenaro where the subject s fatgued, the probablty for the postve state of fatgue varable s set as 99%, whle the probabltes for postve states of other two affectve state varables are set as 1%. After propagaton, each sensor has a probablty dstrbuton assocated wth ts states. In relable sensor channels, each sensor turns out the state wth the hghest probablty. In the sense of relablty, such sensory channel could always catch a true observaton from the subject that s the most ndcatve sgnal for the underlyng affect. In other words, there s no nose from these sensor channels.

(a) fatgue (b) normal Fgure 7 The belefs for passve and actve nferences, wth one and two sensors actvated.

6.3 Expermental results We use the seven dfferent nference settngs to process the three affectve scenaros,.e., fatgue, nervousness and confuson, and the normal scenaro. The posteror probablty for the postve state of each affect varable n each tme slce s recorded, as well as the calculated SLs. Thereafter we call ths probablty the belef of the correspondng affect, e.g., the belef of fatgue. Ths belef and the nformaton entropy assocated wth the belef dstrbuton are the measure n evaluatng varous actve settngs. There are a lot of results produced from expermentaton. In ths secton, we summarze the results nto the followng categores: comparson of actve fuson wth passve fuson, sensor sequences, usage of taboo lst for nference degradaton, relablty of sensory channels and assstance process. In each scenaro, the dfference resulted from subtractng the correspondng entropy for the belef dstrbuton of underlyng affect n actve settng from that n passve settng produces a set of ponts. From the fgure, we see that most ponts le above the X-axs, meanng that n most tme slces, the uncertanty reducton for the underlyng affect of each scenaro s more sgnfcant n actve fuson than n passve fuson. However, we notce there are some exceptons though only occurrng n a mnorty of all cases, such as for the fatgue scenaro usng one sensor. In these cases, the ponts le below the X-axs, meanng the passve fuson gves hgher certanty for the underlyng affect. And fnally, the superorty of actve settng to passve settng s more evdent at the begnnng tme slces. Though n the late stage, passve settngs more lkely have hgher certanty n some cases. Ths s possbly due to what we call nference degradaton. 6.3.1 Actve fuson versus passve fuson Fgure 7 shows the belef curves for fatgue and normal scenaros, usng passve and actve fuson settngs respectvely. As the curves show, actve fuson settngs (on the rght) detect the underlyng status of the subject more quckly. In fatgue scenaro, when only one censor s selected n each tme slce, the belef of fatgue rses fast to reach around 0.9 whle the belefs for other affects reman at relatvely low levels. The belef of fatgue reaches almost 1 when two censors are selected each tme. In fact, the results for other scenaros of nervousness and confuson show the same features. In normal scenaro, all three probabltes drop below 0.5. Although the correspondng passve fuson settngs could detect the same trends n these belefs, they are not as effcent as actve fuson. (a) Fgure 8 The dfference between nformaton entropes n actve and passve settngs of the underlyng affect n affectve scenaros, actvatng one (a) and two (b) sensors respectvely. Now we examne the dfference between the nformaton entropes (see equaton 1) assocated wth the affect state belef dstrbutons n actve and passve fuson settngs n (b)

each tme slce. In each of the three affectve scenaros, we focus only on the dfference for the subject s underlyng affect,.e., fatgue for fatgue scenaro, and so forth. Whle gnorng other affects, we plot ths dfference value aganst tme slce for the settngs usng one and two sensors respectvely, shown n Fgure 8. In each scenaro, the dfference resulted from subtractng the correspondng entropy for the belef dstrbuton of underlyng affect n actve settng from that n passve settng produces a set of ponts. From the fgure, we see that most ponts le above the X-axs, meanng that n most tme slces, the uncertanty reducton for the underlyng affect of each scenaro s more sgnfcant n actve fuson than n passve fuson. However, we notce there are some exceptons though only occurrng n a mnorty of all cases, such as for the fatgue scenaro usng one sensor. In these cases, the ponts le below the X-axs, meanng the passve fuson gves hgher certanty for the underlyng affect. And fnally, the superorty of actve settng to passve settng s more evdent at the begnnng tme slces. Though n the late stage, passve settngs more lkely have hgher certanty n some cases. Ths s possbly due to what we call nference degradaton. Table 4 The number of tme slces needed to reach the target threshold on SL for three affectve scenaros wth dfferent settngs, where n/a ndcates the threshold s never reached durng the 25 tme slces. One sensor (threshold = 0.8) Two sensors (threshold = 0.9) Passve Actve Passve Actve Fatgue 7 5 7 3 Nervousness n/a 14 n/a 6 Confuson 16 6 11 4 We could also examne the uncertanty reducton abltes by settng a target threshold on the underlyng affect belef, and compare n dfferent settngs the number of tme slces used to frst reach ths threshold. Ths s a good measure snce n practce we could regard ths threshold as the control threshold for assstance engagement. Moreover we would lke to use the ndvdual SL nstead of the affect belef because t provdes a more relable estmator. Here we set the target threshold as 0.8 for actvatng one sensor and 0.9 for two sensors. We could make smlar concluson from Table 4. 6.3.2 Sensor sequences n actve fuson and the mpact of sensor costs We want to examne the sensor sequence selected n actve fuson. Actve fuson selects the sensors wth the hghest utlty n each tme slce. Ths utlty may change along wth tme even n the same scenaro. Table 5 shows the sensor sequences for dfferent scenaros wth and wthout sensor costs. Because the way we assgn the ntal belefs, the frst sensor selected for all scenaros are all the same, AECS. Then wth the change of affectve hypothess belefs, dfferent sensors may be selected, based on the mert of mutual nformaton and sensor cost. However, we notce that not all sensors are selected. More specfcally only sensors 12 (PERCLOS), 13 (AECS) and 15 (gaze fxaton rato), are ever selected n all scenaros. We also notce that n the late tme slces, the sensor sequence s fxed, wth certan sensors repeated. We further nvestgate the mpact of sensor cost on nference performance. In Fgure 9, the belefs for affect varables show dfference n the two graphs for fatgue scenaro snce dfferent sensors are selected due to the mpact of sensor costs. In the settng on the rght that assgns sensor costs, sensor 15, gaze fxaton rato, snce t has a lower cost, s selected over sensor 13, AECS. Although the mutual nformaton value of gaze fxaton rato s not the hghest, the nformaton from ths sensor yelds better belefs for affect hypotheses. The belef of fatgue s hgher whle the belef of nervousness s much lower. Ths remnds us that

the mutual nformaton s just an expected beneft calculated on probablty dstrbuton. A sensor wth the hghest value of mutual nformaton s not absolutely superor to others. In ths sense, we should not be too rgd n usng ths nformaton. Fgure 9 The belef curves for fatgue scenaro, where dfferent sensor costs change the sensor sequence selecton n actve fuson. Table 5 Sensor sequences n each scenaro n actve fuson settng where one sensor s selected n each tme slce, wth (a) no sensor cost and (b) larger cost for sensors rankng top n mutual nformaton calculaton. Tme slce Fatgued (a) Fatgued (b) Nervous (a) Nervous (b) Confused (a) Confused (b) Normal (a) Normal (b) 1 13 13 13 13 13 13 13 13 2 13 13 13 13 12 12 12 12 3 13 15 13 13 12 12 15 15 13 13 15 13 13 12 12 15 15 14 13 15 13 12 12 12 12 12 25 13 15 13 12 12 12 15 15 6.3.3 Usage of taboo lst Recall that the nference degradaton refers to repetton of sensors n actve fuson. We use the taboo lst n actve fuson to force more sensors nto selecton n order to counter ths problem. In Fgure 10, we compare the nformaton entropy of belef dstrbuton n the actve fuson settng usng taboo lst of length 1 and n the settng wthout the taboo lst, actvatng one sensor n each tme slce. Smlarly, we plot the dfference resulted from subtractng the nformaton entropy wth taboo lst from that wthout taboo lst of the underlyng affect for the three affectve scenaros. From the produced ponts, we observe that such taboo lst mproves the performance of fatgue and nervousness scenaros n late tme slces. However, t has hgher uncertanty for confuson scenaro from the begnnng. Thus, rather than we conclude the effect of such a way usng utlzaton hstory of sensors, an advce of delcate deployment s more mportant for ths method to allevate the nference degradaton.