Neurally Inspired Mechanisms for the Dynamic Visual Attention Map Generation Task

Similar documents
Validating the Visual Saliency Model

Compound Effects of Top-down and Bottom-up Influences on Visual Attention During Action Recognition

On the implementation of Visual Attention Architectures

Modeling the Deployment of Spatial Attention

Computational Models of Visual Attention: Bottom-Up and Top-Down. By: Soheil Borhani

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian

VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING

A Neurally-Inspired Model for Detecting and Localizing Simple Motion Patterns in Image Sequences

Keywords- Saliency Visual Computational Model, Saliency Detection, Computer Vision, Saliency Map. Attention Bottle Neck

An Attentional Framework for 3D Object Discovery

Introduction to Computational Neuroscience

Can Saliency Map Models Predict Human Egocentric Visual Attention?

Computational Cognitive Science

BOTTLENECK PROBLEM SOLUTION USING BIOLOGICAL MODELS OF ATTENTION IN HIGH RESOLUTION TRACKING SENSORS. Alexander Fish, Orly Yadid-Pecht

An Evaluation of Motion in Artificial Selective Attention

Computational Cognitive Science. The Visual Processing Pipeline. The Visual Processing Pipeline. Lecture 15: Visual Attention.

Control of Selective Visual Attention: Modeling the "Where" Pathway

Goal-directed search with a top-down modulated computational attention system

The synergy of top-down and bottom-up attention in complex task: going beyond saliency models.

Proposal for a Multiagent Architecture for Self-Organizing Systems (MA-SOS)

Realization of Visual Representation Task on a Humanoid Robot

Cognitive Penetrability and the Content of Perception

The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling

Attention and Scene Perception

Relevance of Computational model for Detection of Optic Disc in Retinal images

9.S913 SYLLABUS January Understanding Visual Attention through Computation

Spiking Inputs to a Winner-take-all Network

Recurrent Refinement for Visual Saliency Estimation in Surveillance Scenarios

Pre-Attentive Visual Selection

Observational Learning Based on Models of Overlapping Pathways

Computational Cognitive Science

Katsunari Shibata and Tomohiko Kawano

Running head: PERCEPTUAL GROUPING AND SPATIAL SELECTION 1. The attentional window configures to object boundaries. University of Iowa

A context-dependent attention system for a social robot

Attention and Scene Understanding

A Neural Network Architecture for.

A Bayesian Hierarchical Framework for Multimodal Active Perception

Biologically Motivated Local Contextual Modulation Improves Low-Level Visual Feature Representations

VENUS: A System for Novelty Detection in Video Streams with Learning

196 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 2, NO. 3, SEPTEMBER 2010

intensities saliency map

Dynamic Visual Attention: Searching for coding length increments

A software hardware selective attention system

Measuring the attentional effect of the bottom-up saliency map of natural images

Models of Attention. Models of Attention

Improved Intelligent Classification Technique Based On Support Vector Machines

Sparse Coding in Sparse Winner Networks

Neurobiological Models of Visual Attention

Theoretical Neuroscience: The Binding Problem Jan Scholz, , University of Osnabrück

What we see is most likely to be what matters: Visual attention and applications

A Dynamical Systems Approach to Visual Attention Based on Saliency Maps

Visual Attention Framework: Application to Event Analysis

A Hierarchical Visual Saliency Model for Character Detection in Natural Scenes

Selective Attention. Inattentional blindness [demo] Cocktail party phenomenon William James definition

International Journal of Advanced Computer Technology (IJACT)

Overview of the visual cortex. Ventral pathway. Overview of the visual cortex

Detection of terrorist threats in air passenger luggage: expertise development

Feature Integration Theory

An Information Theoretic Model of Saliency and Visual Search

A Neural Theory of Visual Search: Recursive Attention to Segmentations and Surfaces

How does the ventral pathway contribute to spatial attention and the planning of eye movements?

INCORPORATING VISUAL ATTENTION MODELS INTO IMAGE QUALITY METRICS

arxiv: v2 [cs.cv] 22 Mar 2018

Grouped Locations and Object-Based Attention: Comment on Egly, Driver, and Rafal (1994)

Knowledge-driven Gaze Control in the NIM Model

Reactive agents and perceptual ambiguity

HUMAN VISUAL PERCEPTION CONCEPTS AS MECHANISMS FOR SALIENCY DETECTION

PRECISE processing of only important regions of visual

R Jagdeesh Kanan* et al. International Journal of Pharmacy & Technology

Bottom-up and top-down processing in visual perception

Top-Down Control of Visual Attention: A Rational Account

Development of goal-directed gaze shift based on predictive learning

Increasing Spatial Competition Enhances Visual Prediction Learning

A Computational Model of Saliency Depletion/Recovery Phenomena for the Salient Region Extraction of Videos

Bottom-Up Guidance in Visual Search for Conjunctions

Ch 5. Perception and Encoding

Exploring the Functional Significance of Dendritic Inhibition In Cortical Pyramidal Cells

A Model for Automatic Diagnostic of Road Signs Saliency

Experiences on Attention Direction through Manipulation of Salient Features

Lung Cancer Diagnosis from CT Images Using Fuzzy Inference System

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011

FUZZY LOGIC AND FUZZY SYSTEMS: RECENT DEVELOPMENTS AND FUTURE DIWCTIONS

NEOCORTICAL CIRCUITS. specifications

Object detection in natural scenes by feedback

Ch 5. Perception and Encoding

Computational modeling of visual attention and saliency in the Smart Playroom

Detection of Inconsistent Regions in Video Streams

In: Connectionist Models in Cognitive Neuroscience, Proc. of the 5th Neural Computation and Psychology Workshop (NCPW'98). Hrsg. von D. Heinke, G. W.

MEMORABILITY OF NATURAL SCENES: THE ROLE OF ATTENTION

Methods for comparing scanpaths and saliency maps: strengths and weaknesses

Computational Cognitive Neuroscience and Its Applications

Biologically-Inspired Human Motion Detection

Memory, Attention, and Decision-Making

Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention

A Selective Attention Based Method for Visual Pattern Recognition

Grounding Ontologies in the External World

Cognitive Modeling of Dilution Effects in Visual Search

TWO HANDED SIGN LANGUAGE RECOGNITION SYSTEM USING IMAGE PROCESSING

Attention enhances feature integration

Transcription:

Neurally Inspired Mechanisms for the Dynamic Visual Attention Map Generation Task Maria T. Lopez 1, Miguel A. Fernandez 1, Antonio Fernandez-Caballero 1, and Ana E. Delgado 2 Departamento de Informatica E.P.S.A., Universidad de Castilla-La Mancha, 02071 - Albacete, Spain {mlopez, miki, caballer}@info-ab.uclm.es 2 Departamento de Inteligencia Artificial Facultad de Ciencias and E.T.S.I. Informatica, UNED, 28040 - Madrid, Spain {adelgado}@dia.uned.es Abstract. A model for dynamic visual attention is briefly introduced in this paper. A PSM (problem-solving method) for a generic "Dynamic Attention Map Generation" task to obtain a Dynamic Attention Map from a dynamic scene is proposed. Our approach enables tracking objects that keep attention in accordance with a set of characteristics defined by the observer. This paper mainly focuses on those subtasks of the model inspired in neuronal mechanisms, such as accumulative computation and lateral interaction. The subtasks which incorporate these biologically plausible capacities are called "Working Memory Generation" and "Thresholded Permanency Calculation". 1 Introduction Visual attention models may be classified as space-based models - e.g., spotlight metaphor [12] and zoom-lens metaphor [1] - and object-based models - for instance, discrete objects [11, 14]. Our approach falls into the models based in objects, where visual attention always focuses on discrete objects or coherent groups of visual information. In order to select an object that captures our attention, we previously have to determine which image features should be combined to obtain the object. This process is carried out in two stages. Firstly, obtaining features to generate simple shapes, and, secondly, combining these simple shapes into more complex ones. Once the objects have been obtained, the next process -attention based on objects- consists in selecting one of the shapes generated. Vecera [14] introduced a model to obtain objects separated from the background in static images by combing bottom-up (scene-based) and top-down (task-based) processes. The bottom-up process gets the borders to form the objects, whereas the top-down process uses known shapes stored in a database to be compared to the shapes previously obtained in the bottom-up process. A first plausible neuronal bottom-up architecture was proposed by Koch and UUman [9]. Their approach is also based in features integration [13]. In Itti, Koch and J. Mira and J.R.Alvarez (Eds.): IWANN 2003, LNCS 2686, pp. 694-701, 2003. Springer-Verlag Berlin Heidelberg 2003

Neurally Inspired Mechanisms 695 Niebur [10] a visual attention system inspired in the neural architecture of the early visual system and in the architecture proposed by Koch and Ullman [9] was introduced. This system combines multi-scale image features in a unique saliency map. The most salient locations of the scene are selected from the highest activity of the saliency map using a winner-take-all (WTA) algorithm. Another example based in the saliency map idea is the guided search (GS) model proposed by Wolfe [15]. This model incorporates two visual selection stages. The first one, the pre-attention stage, with a great spatial parallelism, performs the computation of simple visual features. The second stage is performed spatially in a sequential way, using more complex visual representations - including the combination of features - to perform the calculations. The value of the features obtained is a combination of bottom-up and top-down knowledge. Attention is directed to the location that shows the highest value in the Dynamic Attention Map. The guided search ends if the location contains the target looked for. If this is not the case, the search continues in the Dynamic Attention Map in a descending order, finishing when the target is found or when attention falls under a threshold. In this paper we introduce a model of dynamic visual attention that combines bottom-up and top-down processes. Bottom-up is related to the first step of the architectures proposed, where the input image is segmented using dynamic criteria by means of neurally inspired accumulative computation [2-4] and lateral interaction [58]. The observer may indicate how to tune system parameters to define the attention focus using top-down processes. These processes are of a static nature, during the configuration of the features selection and the attention focus processing, or dynamic, when the observer modifies parameters to centre the focus on the interest elements. 2 Model of dynamic visual attention Our approach defines a PSM for the generation of a Dynamic Attention Map on a dynamic scene, which is able to obtain the objects that will keep the observer's attention in accordance with a set of predefined characteristics. Figure 1 shows the result of applying our model to the "Dynamic Attention Map Generation" task, where attention has been paid on moving elements belonging to class "car". Fig. 1. Input and output of the "Dynamic Attention Map Generation" task The different subtasks proposed to obtain the Dynamic Attention Map are depicted on figure 2, whilst figure 3 shows the inferential scheme of the model introduced. Next these subtasks are briefly described:

696 Maria T. Lopez et al. Fig 2. Decomposition of the "Dynamic Attention Map Generation" task into subtasks a) Subtask "Segmentation in Grey Level Bands" is thought to segment the 256 grey level value input images into n (static role) grey level bands. b) Subtask "Motion Features Extraction" calculates for each pixel a set of dynamic features, e.g. the presence or absence of motion, the speed, or the acceleration. The output of this subtask is formed by those pixels that deserve some dynamic features. Parameters M, (static roles) indicate the range of values for motion features that pixels must possess to be marked as activated at the output of the subtask. c) Now, subtask "Shape Features Extraction" calculates certain shape features such as the size, the width, the height, the relation width to height, the value size/(width*height), and so on. It obtains those elements that deserve a set of predefined features. Parameters S t (static roles) indicate the range of values for shape features that elements must possess to be labelled as activated at the output of this subtask. The elements that do not have the features defined will be marked as inhibited. All the rest of pixels which do not belong to elements are labelled as neutral. d) The idea of subtask "Motion & Shape (M&S) Features Integration" is to generate a unique Motion and Shape (M&S) Interest Map by integrating motion features and shape features in accordance with the criteria defined by the observer. These criteria correspond to C, (static role). e) Subtask "Working Memory Generation" segments the image in regions composed of connected pixels whose brightness pertains to a common interval. Each of these regions or silhouettes of a uniform grey level represents an element of the scene.

Neurally Inspired Mechanisms 697 f) In subtask "Thresholded Permanency Calculation" firstly the persistency of its input is accumulated. Then, a further threshold is performed. P t are the static roles for this subtask. Dynamic Attention Map Fig 3. Inferential scheme of the model From now on, this paper focuses on the generation of the working memory and the Dynamic Attention Map from the image segmented in grey level bands and the Motion and Shape Interest Map. As already mentioned the Motion and Shape Interest Map is calculated frame to frame, and is composed of activated pixels pertaining to pixels or elements of interest, inhibited pixels included in pixels or elements of no interest, and neutral pixels. In other words, this paper will show the processes related to lateral interaction and accumulative computation, whose neuronal basis has already been demonstrated in some previous papers of the same authors [2-8]. Hence, these subtasks may be implemented with a neural inspiration. Next section is devoted to describe in extensive both subtasks.

698 Maria T. Lopez et al. 3 Accumulative computation and lateral interaction subtasks 3.1 Subtask "Working Memory Generation" The process of obtaining the Working Memory consists in superimposing, just as done with superimposed transparencies, the image segmented in grey level bands of the current frame (at time instant t), that is to say, Grey Level Bands (dynamic role of inference "Segmentation in Grey Level Bands") with image Motion and Shape Interest Map (dynamic role of inference "Motion and Shape Features Integration") constructed at the previous frame (at time instant t-1). Thus, in the Working Memory, at time instant t all scene elements associated to pixels or elements of interest will appear. This process may be observed in more detail in figure 4, where in image Motion and Shape Interest Map a black pixel means neutral, a grey pixel means inhibited and a white pixel means activated. Coming from the image segmented in bands, some processes are performed in parallel for each band. So, there will be as many images as number of bands the image has been segmented in. These images, Bj(x,y,t), are binary: 1 if the brightness of the pixel belongs to band i, and 0 if it does not belong to band i. For each one of the binary images, Bj(x,y,t), a process is carried out to fill the shapes of any band that includes any value activated in the classified Motion and Shape Interest Map, MS(x,y,t). For it, initially S t (x,y,t), the value of the filled shape at band i for coordinate (x,y) at time instant t, is obtained as explained next. It is assigned the value of the band if the corresponding coordinate in the Motion and Shape Interest Map is activated: 3, if B t (x, y,t) = \ and MS(x, y, t) = activated S t (x, y,t) = (T (4J, otherwise Afterwards, by means of lateral interaction, the same value i is assigned to all positions connected to Bj(x,y,t), - using a connectivity of 8 pixels -, whenever its value in the Motion and Shape Interest Map does not indicate that it belongs to a non valid shape. w/ ox r j., _ml P/ o^ 3, ifb t (a, P, t) = \ and MS (a, P, t)* inhibited V(a,P)e[x±l,y±li Si(a,p,t) = (T (4J, otherwise Finally, to generate the Working Memory, all obtained images are summed up, as in: W{x,y,t)= psi{x,y,i) i=\ Notice that, in a given time instant t, scene elements whose shape features do not correspond to those defined by the observer may appear in the Working Memory. This may be due to the fact that these elements have not yet been labelled. But, if the

Neurally Inspired Mechanisms 699 features of these elements do not correspond to those indicated as interesting by the observer, these elements at t+\ will appear as inhibited in the Motion and Shape Interest Map. This way, at t+\ these elements will disappear from the Working Memory. In order to only obtain elements that really fulfil features defined by the observer, some accumulative computation mechanisms are introduced, as explained in section 3.2. Gray Level Bands Bands ^^Separation, Bandl (B 1 (x,y,t)) Shape Filling Shape B1 (Si(x,y,t)) Shape""- Comb ination, _J Working Memory?a Fig. 4. Subtask "Working Memory Generation" 3.2 Subtask "Thresholded Permanency Calculation" Subtask "Thresholded Permanency Calculation" uses as input the Working Memory and gets as output the final Dynamic Attention Map. This subtask firstly calculates the so called permanency value associated to each pixel. If the value of the Working Memory is activated, the charge value is incremented at each image frame by a value of 8 c up to a maximum. If this is not the case, the charge value is decremented by a value of Sd down to a minimum. In a second step, the charge values are thresholded to get the Dynamic Attention Map. Thus, the Dynamic Attention Map will only have activated those pixels which have been activated in the Working Memory during various successive time instants, namely "threshold / charge" (see figure 5). This is shown by means of the following formulas:

700 Maria T. Lopez et al., A ^MQ{x,y,t-At) + Sc,Q ma j), ifw(x,y,t)=l Q(x,y,t) = <x. i c- \ ^nax{q(x, y, t-at)-8d, Q min J, if W(x, y,t) = 0 A(. A if Q(x,y,t)>9 A(x,y,t) = (r. (4J, otherwise where S c is the charge constant, S d is the discharge constant, Q max is the maximum charge value, Q mia is the minimum charge value, 8 is the threshold value and A(x,y,t) is the value of the Dynamic Attention Map at coordinate (x,y) at time instant t. Fig.5. Subtask "Thresholded Permanency Calculation" The accumulative computation process, followed by the threshold stage, enables to maintain active in a stable way a set of pixels which pertain to a group of scene elements of interest to the observer. Thus, the state of this "memory" defines the attention focus of the system, and is the input to the observer's system. The observer will modify the parameters that configure the mechanisms of extraction and selection of shapes and/or pixels of interest. 4 Conclusions A model of dynamic visual attention capable of segmenting objects in a scene has been introduced in this paper. The model enables focusing the attention at each moment at shapes that possess certain characteristics and eliminating shapes that are of no interest. The features used are related to motion and shape of the elements present in the dynamic scene. The model may be used to observe real environments indefinitely in time. In this paper we have highlighted those subtasks of the generic "Dynamic Attention Map Generation" task related to plausible biologically inspired methods. These mechanisms, namely accumulative computation and lateral interaction, have so far

Neurally Inspired Mechanisms 701 been proven to be inherently neuronal. The subtasks which incorporate these biologically plausible capacities, called "Working Memory Generation" and "Thresholded Permanency Calculation", because of the fact of making use of accumulative computation and lateral interaction processes, enable maintaining a stable system of dynamic visual attention. References 1. Eriksen, C. W., St. James, J. D.: Visual attention within and around the field of focal attention: A zoom lens model. Perception and Psychophysics 40 (1986) 225-240 2. Fernandez, M.A., Mira, J.: Permanence memory: A system for real time motion analysis in image sequences. IAPR Workshop on Machine Vision Applications, MVA'92 (1992) 249-252 3. Fernandez, M.A., Mira, J., Lopez, M.T., Alvarez, J.R., Manjarres, A., Barro, S.: Local accumulation of persistent activity at synaptic level: Application to motion analysis. In: Mira, J., Sandoval, F. (eds.): From Natural to Artificial Neural Computation, IWANN'95, LNCS 930. Springer-Verlag (1995) 137-143 4. Fernandez, M.A.: Una arquitectura neuronal para la detection de blancos moviles. Unpublished Ph.D. dissertation (1995) 5. Fernandez-Caballero, A., Mira, J., Fernandez, M.A., Lopez, M.T.: Segmentation from motion of non-rigid objects by neuronal lateral interaction. Pattern Recognition Letters 22:14(2001)1517-1524 6. Fernandez-Caballero, A., Mira, J., Delgado, A.E., Fernandez, M.A.: Lateral interaction in accumulative computation: A model for motion detection. Neurocomputing 50C (2003) 341-364 7. Fernandez-Caballero, A., Fernandez, M.A., Mira, J., Delgado, A.E.: Spatio-temporal shape building from image sequences using lateral interaction in accumulative computation. Pattern Recognition 36:5 (2003) 1131-1142 8. Fernandez-Caballero, A., Mira, J., Fernandez, M.A., Delgado, A.E.: On motion detection through a multi-layer neural network architecture. Neural Networks 16:2 (2003) 205-222 9. Koch, C, & Ullman, S.: Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology 4 (1985) 219-227. 10. Itti, L., Koch, C. Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998) 1254-1259 11. Moore, C. M., Yantis, S., Vaughan, B.: Object-based visual selection: Evidence from perceptual completion. Psychological Science 9 (1998) 104-110 12. Posner, M.I., Snyder, C.R.R., Davidson, B.J.: Attention and the detection of signals. Journal of Experimental Psychology: General 109 (1980) 160-174 13. Treisman, A.M., Gelade, G.: A feature-integration theory of attention. Cognitive Psychology 12 (1980) 97-136 14. Vecera, S.P.: Toward a biased competition account of object-based segregation and attention. Brain and Mind 1 (2000) 353-385 15. Wolfe, J.M.: Guided Search 2.O.: A revised model of visual search. Psychonomic Bulletin & Review 1 (1994)202-238