Stochastic Extension of the Attention-Selection System fo the icub Technical Repot (in pepaation) H. Matinez, M. Lungaella, and R. Pfeife Atificial Intelligence Laboatoy Depatment of Infomatics Univesity of Zuich, Switzeland Abstact To povide humanoid obots with means to act in the envionment, they need to be endowed with some kind of mechanism to select and extact meaningful infomation. In this epot, we descibe a stochastic extension of the attention-selection system of the icub platfom. A supe-diffusive stochastic pocess esticted by a multimodal saliency map, dives the obot s visual exploation based in the actual salient conditions. Ou esults show that such a stochastic extension yields visual exploative behavio which (a) is faste than to the standad winne-take-all saliency-diven scheme, and (b) does not equie stoing the location of the pevious focus of attention. Intoduction The attention system seves as a font gate to select fom an abundance of steaming sensoy input and thus is an impotant peequisite fo the ontogenesis of sensoy-moto coodinated behavios such as eaching and gasping. In this epot, we descibe a biologically-inspied stochastic attention selection mechanism and its integation in the icub s multimodal attention famewok (Ruesch et al., 008). Based on existing esults that demonstate that human eye saccades can be modeled as a supe-diffusive pocess (Bockmann et al., 1999), we conjectued that the incopoation of a Levy-flight andom walk stategy in the attention selection mechanism, would lead to an incease of the pefomance of the obot's in tems of: (a) speed of the visual exploation pocess, and (b) the elated enegy consumption. Using manually geneated images, we pefomed a systematic paamete selection fo the Levy-flight-based attention selection algoithm. We conducted an expeiment in ode to evaluate the Levy Flight-based attention selection algoithm while embedded in the icub's multi-modal attention and sensoy-moto famewok. We also compaed the pefomance against the icub's existing winne-take-all saliency-diven attention selection mechanism. In this expeiment, the obot pefomed a visual exploation task, which integates the low-level sensoy pocessing, multi-modal sensoy integation, bottom-up saliency, and attention selection mechanism. The esults demonstate that the stochastic Levy Flight-based attention selection mechanism enables the obot to exploe its envionment up to thee times faste compaed to the standad winne-take-all saliency-diven attention mechanism. Mateials and Methods The Robot. Ou expeimental test bed was the 6 degees of feedom (DOF) icub obot head with the attention system descibed in (Ruesch et al., 008). This attention system builds a saliency map based on visual and auditoy infomation. The visual pat consists of a seies of filtes that extact infomation such as colo, movement, oientation and intensity fom the scene. The auditoy pat computes the saliency in a time-fequency domain, a high saliency is assigned to shot and long tones, as well as to the absence of fequencies in a boad band noise, and to tempoally modulated tones compaed to stationay tones. In both cases (visual and audito, each point in the space is accumulated to build a visual saliency map and an auditoy saliency map espectively. Using these maps, a final multimodal saliency map is geneated by taking the 1
highest value at each point. Finally, given this infomation, a winne-take-all with inhibition-ofetun algoithm is used to select the elevant point in the envionment to focus on. The inhibition is impotant in ode to avoid that the obot keeps looking at the same point all the time. This implementation uses two functions to modulate (1) how much time the obot can spend looking at something befoe it is inhibited fom the multimodal saliency map, and () how much time it should be inhibited. The Attention Selection Algoithm. The winne-take-all with inhibition-of-etun algoithm can dive the obot to cyclical attention behavios. In ode to avoid this situation, we poposed the implementation of a stochastic pocess esticted by the saliency map as the mechanism in chage to select the point of attention. The Levy- flight andom walk has been demonstated to poduce a distibution simila to the one poduced by human eye saccades and also minimizes the time needed to pocess the visual envionment (Bockmann et al., 1999; Bockmann et al., 000). Boccignone (000) poposed an algoithm to constain the andom walk using the saliency map, in this algoithm, the diection of the jump is fist selected based on a nomal distibution. In contast, in ou Levy flight-based algoithm, all the jump chaacteistics ae esticted by the saliency map. 1: Compute the saliency map of the image exp( x, y)))/ x : Compute φ( = exp( β ( ))/ β( )) 3: image cente; n 0 4: epeat 5: Cuent fixation ; accepted false 6: while not accepted do 7: Geneate Dφ( andomly a p( = x + y + D jump 8: Compute sˆ sˆ( ) sˆ( ) sˆ( x, y ) = s s y, y ))) ( x xs ) ( y ys ) exp( ) σ 9: if 0 ŝ then 10: Stoe ; ; accepted tue; n n + 1 11: else 1: Geneate a andom numbe ρ 13: if ρ exp( sˆ / T ) then 14: Stoe ; ; accepted tue; n n + 1 15: until n<k Figue 1: Poposed algoithm. The main stuctue of the algoithm is as follows: (1) compute a function φ( which tansfoms the saliency data, and calculate the pobability density function of the jump p(. This function is made by the multiplication of φ( with a Levy pobability density function (point 7 in Fig. 1 p(); () p( is used to andomly select the next point of attention, (3) this point of attention has to be compaed against an acceptance citeia based in the value of the saliency of the actual neighbohoods. If the saliency weighted addition of the points close to the chosen point, is geate than, the saliency weighted addition of the points close to actual attention point, then the selected point is accepted like the point of attention. Howeve, if it is not the case, a
andom numbe between 0 and 1 is geneated. If this is lowe than a function that depends on the weighted saliency and on a tempeatue T, the point is accepted. The T value detemines the amount of andomness in the acceptance pocess. In Fig. 1 (step ), the thee diffeent types of φ( that we used in this eseach ae shown. With all these functions, it is necessay to set the initial values of the vaious paametes. The paamete D is pat of the Levy pobability density function, with this vaiable it is possible to contol the volatility of the andom walk, the highe D, the highe the pobability fo big jumps. β is a paamete to scale the diffeence between the actual and the next point of attention. If β is high the algoithm is moe likely to select a point with high saliency and if β is low the algoithm behaves moe like Levy andom walk. All those paametes have to be selected appopiately to allow the obot to attend the envionment smoothly accoding with the data fom the saliency map. Tuning and Validation of the Algoithm. In ode to establish optimal values fo these paametes and also optimal function φ(, this algoithm was tested using the static image shown in Fig.. To evaluate the pefomance of the algoithm, we pefomed 10 000 iteations of the test and poduced a statistics figue (Fig. 4). To calculate this figue, we calculated fequency map, based on the numbe of times that it takes fo the algoithm to visit a point in the visual scene. Figue : Test Image. Image used to tune and test the poposed algoithm. Figue 3: Saliency Map. Saliency map poduced by the attention system fo the Test Image. 1: Compute the saliency map of the image : Compute the fequency map f( 3: Initialize measue=0; 4: Compute fo each column the mean and the standad deviation. (fstd, sstd, fmean, smean) 5: fo i=1: Total of column 6: measue + = ( fstd ( i) sstd ( i)) + ( fmean( i) smean( i)) + ( f ( ) 7: End 8: 0. 5 measue = measue columni Figue 4: Statistics Figue. Measue of the diffeence between the saliency and fequency maps. This statistics figue shown in Fig. 4 is the tool used to compae the similaity between the saliency map and the fequency map. It povides statistical measuements ove the space to 3
establish when the algoithm is capable of pefoming attention selection that is close to the saliency map. Based on this esult, we selected the appopiate φ( function as well as the othe paametes of the algoithm. Fig. 5 to Fig. 7 show the esults fo diffeent combinations of paametes and φ( functions. D= T=600 β =0.03 exp( x, y ))) φ(y )= β ( x, y ))) D= T=00 β =0.01 exp( x, y))) φ(y )= β ( x, y ))) D= T=00 β =0.0 exp( x, y))) φ(y )= β ( x, y ))) Figue 5. Fequency Maps fo Φ1. Diffeent fequency maps developed, using the function Φ1 fo diffeent values of paametes using the test image. D= T=1000 φ(y )= D=0.50 T=3800 φ(y )= D=1.40 T=5000 φ(y )= Figue 6. Fequency Maps fo Φ. Diffeent fequency maps developed, using the function Φ fo diffeent values of paametes using the test image. D=1.10 T=4500 β =-1/55 φ(y )= exp( s ( ))/ )) D=1.70 T=600 β =-1/55 φ(y )= exp( s ( ))/ )) D=0.5 T=600 β=-1/55 φ(y )= exp( s ( ))/ )) Figue 7: Fequency Maps fo Φ3. Diffeent fequency maps developed, using the function Φ3 fo diffeent values of paametes using the test image. 4
The best pefomance was found fo φ( equal to saliency and the paametes D equal to and a tempeatue of 1000. Expeiment and Results Using the paamete selection esults descibed above, we conducted an expeiment in ode to evaluate the Levy Flight-based attention selection algoithm while embedded in the icub's multimodal attention and sensoy-moto famewok, and to compae the pefomance against the icub's existing winne-take-all saliency-diven attention selection mechanism. Fo the obot s visual envionment, we have aanged a black backgound with up to 8 coloed elements, as shown in Fig. 8. We stat with 5 elements and incementally expand the egion to the left, by adding up to 3 elements Fo each goup of elements, the obot had 5 diffeent tials to exploe the scene. Duing each tial, we compute thee diffeent measues of pefomance: minimum time to visit all objects, mean powe spent by the algoithm, and the diffeence between the saliency map and the fequency map using the statistical measuement pesented in Fig. 4. Figue 8: Expeimental setup. icub obot head with six degees of feedom in the expeimental envionment. In ode to compae the elation between the saliency map and the fequency of attention, we computed an aggegated saliency map duing each tial, which is basically the aggegation of the saliency map pojection in the egosphee, multiplied by a fixed scale facto. Using this aggegated saliency map and statistical measuement descibed in Fig. 4, we compute thee measues of pefomance, as descibed above. Fig. 9 illustates the esults of the similaity measue between the fequency map and the aggegated saliency map using the Levy flight-based and winnetake-all selection algoithm. Fig. 10 shows the amount of Mean Powe employed by both algoithms. Lastly, Fig. 11 shows the minimum time needed fo both algoithms to look at all objects in the envionment. 5
Figue 9. Similaity between the algoithms. Mean and standad deviation of the similaity measue between the fequency map and the aggegated saliency map fo both algoithms winne take all with inhibition of etun and the esticted Levy fly andom walk. Figue 10. Mean Powe of the algoithms. Mean and standad deviation of the Mean Powe employed fo both algoithms winne take all with inhibition of etun and the esticted Levy fly andom walk. Figue 11: Minimum time of the algoithms. Mean and standad deviation of the Minimum time needed to look at all the objects in the envionment fo both algoithms winne take all with inhibition of etun and the esticted Levy fly andom walk. These esults demonstate that the obot conducts a qualitatively faste exploation when using the stochastic Levy Flight-based attention selection mechanism compaed to the standad winne-take-all saliency-diven attention mechanism, as expected fom the esult in (Bockmann et al., 000), while keeping simila behavio in tems of the similaity with the saliency map and mean powe consumption. Conclusions and Futue Wok This eseach pesents a novel appoach to dive the attention of the obot using the Levy fly andom walk esticted to the saliency map, implemented on the icub head that uns in eal-time. This enfoces the exploation of the envionment but also keeps the attention of the obot in the salient points in the envionment, without having to keep tack of what has been looked at by the obot, This is one majo diffeence between the novel appoach and the standad algoithm employed to dive the attention of the attention systems. Fo futue wok, it would inteesting to exploe how the attention system can be exploited to povide elevant infomation fo an specific task like gasping. Futhemoe, because the attention system in itself only delives etino-centic coodinates, and theefoe does not povide any 3D position infomation, we have begun the development of a eal-time depth peception algoithm fo the icub. Depth infomation is equied fo efficient eaching and gasping. The implementation elies on a vegence-based eal-time steeo-vision algoithm which includes an on-line calibation scheme (Matinez et al., in pepaation). We expect that compaed to a static steeo-vision setup, ou appoach will lead to highe pecision in depth estimation. Refeences Ruesch, J., Lopes, M., Benadino, A., Hoenstein, J., Santos-Victo, J. and Pfeife, R. (008). Multimodal saliency-based bottom-up attention: a famewok fo the humanoid obot icub. Poc. of IEEE Int. Conf. on Robotics and Automation, pp. 96-967. Bockmann, D. and Geisel, T: (1999). Ae human scanpaths Levy-flights? Poc. of 9th Int. Conf. on Atificial Neual Netwoks, 1: 63-68. 6
Bockmann, D. and Geisel, T. (000). The ecology of gaze shifts. Neuocomputation, 3/33: 643. Boccignone, G. and Feao, M. (004). Modelling gaze shift as a constained andom walk. Physic A, 331: pp. 07-18 Matinez, H., Lungaella, M., and Pfeife, R. (in pepaation). Impoving eal-time depth peception though a vegence-based steeo-vision algoithm (Technical Repot). 7