Computational Cognitive Science. The Visual Processing Pipeline. The Visual Processing Pipeline. Lecture 15: Visual Attention.

Similar documents
Computational Cognitive Science

Object detection in natural scenes by feedback

Computational Cognitive Science

Computational Models of Visual Attention: Bottom-Up and Top-Down. By: Soheil Borhani

Validating the Visual Saliency Model

Can Saliency Map Models Predict Human Egocentric Visual Attention?

Advertisement Evaluation Based On Visual Attention Mechanism

On the implementation of Visual Attention Architectures

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian

Top-down Attention Signals in Saliency 邓凝旖

A Model of Saliency-Based Visual Attention for Rapid Scene Analysis

An Attentional Framework for 3D Object Discovery

A Model for Automatic Diagnostic of Road Signs Saliency

INCORPORATING VISUAL ATTENTION MODELS INTO IMAGE QUALITY METRICS

(Visual) Attention. October 3, PSY Visual Attention 1

Goal-directed search with a top-down modulated computational attention system

Models of Attention. Models of Attention

The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search

Compound Effects of Top-down and Bottom-up Influences on Visual Attention During Action Recognition

Relevance of Computational model for Detection of Optic Disc in Retinal images

Control of Selective Visual Attention: Modeling the "Where" Pathway

Real-time computational attention model for dynamic scenes analysis

Empirical Validation of the Saliency-based Model of Visual Attention

An Evaluation of Motion in Artificial Selective Attention

Keywords- Saliency Visual Computational Model, Saliency Detection, Computer Vision, Saliency Map. Attention Bottle Neck

HUMAN VISUAL PERCEPTION CONCEPTS AS MECHANISMS FOR SALIENCY DETECTION

Biologically Motivated Local Contextual Modulation Improves Low-Level Visual Feature Representations

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling

Visual Thinking for Design Colin Ware

MEMORABILITY OF NATURAL SCENES: THE ROLE OF ATTENTION

Reading Assignments: Lecture 18: Visual Pre-Processing. Chapters TMB Brain Theory and Artificial Intelligence

Computational Saliency Models Cheston Tan, Sharat Chikkerur

Visual Selection and Attention

Selective Attention. Inattentional blindness [demo] Cocktail party phenomenon William James definition

The Attraction of Visual Attention to Texts in Real-World Scenes

Experiences on Attention Direction through Manipulation of Salient Features

Introduction to Computational Neuroscience

Spiking Inputs to a Winner-take-all Network

Evaluation of the Impetuses of Scan Path in Real Scene Searching

A Computational Model of Saliency Depletion/Recovery Phenomena for the Salient Region Extraction of Videos

Recurrent Refinement for Visual Saliency Estimation in Surveillance Scenarios

intensities saliency map

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

IAT 814 Knowledge Visualization. Visual Attention. Lyn Bartram

Incorporating Audio Signals into Constructing a Visual Saliency Map

Finding Saliency in Noisy Images

TARGET DETECTION USING SALIENCY-BASED ATTENTION

Increasing Spatial Competition Enhances Visual Prediction Learning

A context-dependent attention system for a social robot

A Knowledge Driven Computational Visual Attention Model

Realization of Visual Representation Task on a Humanoid Robot

Detection of terrorist threats in air passenger luggage: expertise development

Psychophysical tests of the hypothesis of a bottom-up saliency map in primary visual cortex

A Dynamical Systems Approach to Visual Attention Based on Saliency Maps

Lateral Geniculate Nucleus (LGN)

Development of goal-directed gaze shift based on predictive learning

VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING

What we see is most likely to be what matters: Visual attention and applications

Object-Level Saliency Detection Combining the Contrast and Spatial Compactness Hypothesis

Object-based Saliency as a Predictor of Attention in Visual Tasks

Available online at ScienceDirect. Procedia Computer Science 54 (2015 )

(In)Attention and Visual Awareness IAT814

VENUS: A System for Novelty Detection in Video Streams with Learning

The discriminant center-surround hypothesis for bottom-up saliency

Visual Attention Framework: Application to Event Analysis

FEATURE EXTRACTION USING GAZE OF PARTICIPANTS FOR CLASSIFYING GENDER OF PEDESTRIANS IN IMAGES

Properties of V1 neurons tuned to conjunctions of visual features: application of the V1 saliency hypothesis to visual search behavior

Prof. Greg Francis 7/31/15

Linear filtering. Center-surround differences and normalization. Linear combinations. Inhibition of return. Winner-take-all.

Biologically Inspired Autonomous Mental Development Model Based on Visual Selective Attention Mechanism

Detection of Inconsistent Regions in Video Streams

Dynamic Visual Attention: Searching for coding length increments

Measuring the attentional effect of the bottom-up saliency map of natural images

An Information Theoretic Model of Saliency and Visual Search

Overview of the visual cortex. Ventral pathway. Overview of the visual cortex

Contribution of Color Information in Visual Saliency Model for Videos

A Neural Network Architecture for.

PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks

ADAPTING COPYCAT TO CONTEXT-DEPENDENT VISUAL OBJECT RECOGNITION

PAPER Simulating the role of visual selective attention during the development of perceptual completion

Supplementary Material: The Interaction of Visual and Linguistic Saliency during Syntactic Ambiguity Resolution

Computational modeling of visual attention and saliency in the Smart Playroom

Neurally Inspired Mechanisms for the Dynamic Visual Attention Map Generation Task

LISC-322 Neuroscience Cortical Organization

A new path to understanding vision

Chapter 14 Mining Videos for Features that Drive Attention

Top-Down Control of Visual Attention: A Rational Account

Pre-Attentive Visual Selection

Motion Saliency Outweighs Other Low-level Features While Watching Videos

A Visual Saliency Map Based on Random Sub-Window Means

IAT 355 Perception 1. Or What You See is Maybe Not What You Were Supposed to Get

Computational Cognitive Neuroscience Approach for Saliency Map detection using Graph-Based Visual Saliency (GBVS) tool in Machine Vision

Feature combination strategies for saliency-based visual attention systems

Local Image Structures and Optic Flow Estimation

A Bayesian Hierarchical Framework for Multimodal Active Perception

Attention and Cognitive Control in Humans, Animals and Intelligent Systems

Reading Assignments: Lecture 5: Introduction to Vision. None. Brain Theory and Artificial Intelligence

Tsuchiya & Koch Page 1 of 5 Supplementary Note

using deep learning models to understand visual cortex

Knowledge-driven Gaze Control in the NIM Model

Transcription:

Lecture 15: Visual Attention School of Informatics University of Edinburgh keller@inf.ed.ac.uk November 11, 2016 1 2 3 Reading: Itti et al. (1998). 1 2 When we view an image, we actually see this: The rest of the course will deal with human visual cognition. We will focus on high-level visual processing (not visual neuroscience): Visual attention: How do we decide which parts of an image to focus on? Visual search: How do we search for a target in an image? Object recognition: How do we identify objects in an image? We will introduce computational models in all three domains. 3 Only the fovea, a small area in the center of the retina, is in focus. 4 Image from http://eyetracking.me/

In order to take in the whole image, we have to move our eyes: fixations (stationary periods) and saccades (rapid movements). 471 340 216 113 119 205 161 238 355 154 250 Image from Henderson (2003) Visual Saliency We attend to the areas that are visually salient. An area is salient if it stands out, is different from the rest of the image. How do we determine where to look? We need to work out which area are interesting, i.e., attract visual attention. 5 The visual system computes a saliency map of the image, and then moves the eyes to the most salient regions in turn (Itti et al. 1998). 6 Visual Features Saliency can be tested using visual search experiments: participants have to find a target item among a number of distractors. Visual Features Pop-out because of color (Itti 2007): Examples for visual features that can make a target salient: color; orientation; intensity. Saliency can make the target pop out from its distractors if it differs in one of this features. The pop-out effect doesn t occur if the target is different from the distractors in two aspects (conjunction target). 7 8

Visual Features Pop-out because of orientation (Itti 2007): Visual Features No pop-out: conjunction target (Itti 2007): 9 10 Itti et al. s (1998) computational model of saliency: compute feature maps for color, intensity, orientation at different scales; compute center-surround difference and apply a normalization; combine the maps across scales into conspicuity maps; saliency map is a linear combination of the conspicuity maps; winner-takes-all operator predicts attended locations. This model mainly works for free viewing. In the next lectures we will talk about models that can account for search data. 11 12

Intensity Feature maps are computed at nine spatial scales (1:1 to 1:256) by low-pass filtering (blurring) and subsampling the image. A center-surround operator is used to detect locations that stand out from their surroundings: this is implemented as the difference between finer and coarser scales; the center is a pixel at scale c {2, 3, 4}; the surround is the corresponding pixel at scale s = c + d, with d {3, 4}; the across-scale difference between two maps is denoted as. At each spatial scale, a set of feature maps are computed based on the red, green, and blue color values (r, g, b) of the pixels. Intensity map: compute the intensity function I = (r + g + b)/3 and then the intensity map using the center surround operator: I(c, s) = I (c) I (s) with c {2, 3, 4} and s = c + d, with d {3, 4}. 13 14 Color Orientation Color maps: compute four color values R = r (g + b)/2 for red, G = g (r + b)/2 for green, B = b (r + g)/2 for blue, and Y = (r + g)/2 r g /2 b for yellow. Then compute color maps again using center-surround: RG(c, s) = (R(c) G(c)) (G(s) R(s)) BY(c, s) = (B(c) Y (c)) (Y (s) B(s)) These are based on color opponencies (exist in the visual cortex). Orientation map: compute Gabor pyramids O(σ, θ) where σ [0... 8] is the scale and θ {0, 45, 90, 135 } is the preferred orientation. Then compute color maps again using center-surround: O(c, s, θ) = O(c, θ) O(s, θ) In total, 42 feature maps are computed: six for intensity, 12 for color, and 24 for orientation. 15 16

Example Before we combine feature maps, a normalization operator N ( ) is applied, which promotes maps with a small number of strong peaks, and suppressed maps with many similar peaks. C Ī Ō 17 18 The feature maps are combined into three conspicuity maps for intensity, color, and orientation at the same scale (σ = 4). For intensity and color, we get: Ī = 4 c=2 c+4 s=c+3 N (I(c, s)) C = 4 c=2 c+4 s=c+3 [N (RG(c, s)) + N (BY(c, s))] where the operator reduces each map to scale 4 and performs point-by-point addition. For orientation, we first combine the six feature maps for a given angle and then add them to get a single conspicuity map: Ō = N ( 4 c=2 c+4 s=c+3 N (O(c, s, θ))) θ {0,45,90,135 } The overall saliency map is then computed by normalizing and averaging the three conspicuity maps: S = 1 3 (N (Ī) + N ( C) + N (Ō)) Why do we normalize each conspicuity map separately? Similar features compete strongly for saliency, while different ones contribute independently to saliency. 19 20

Example Now we can predict sequences of fixations from a saliency map: the maximum of S is the most salient location, which becomes the focus of attention (FOA); all other locations are ignored (inhibited); then the saliency around the FOA is reset, so that the second most salient location becomes the new FOA. The last property is crucial: it results in inhibition of return, so that the FOA doesn t immediate return to the most salient location. Itti et al. (1998) implement this using a winder-take-all neural network. This allows them to simulate fixation durations. 21 22 23 23

23 23 Test the model by adding noise to the image, see if it is still able to pick out salient locations correctly. Test the model by adding noise to the image, see if it is still able to pick out salient locations correctly. 24 24

Evaluation Test the model by adding noise to the image, see if it is still able to pick out salient locations correctly. Evaluation reported by Itti et al. (1998): saliency model can reproduce human performance in pop-out tasks (including conjunction target); tested also on images of traffic signs, red soda cans, and emergency triangles (though no details given in the paper); outperforms spatial frequency models. No evaluation of saliency against eye-tracking data. However, there is a lot of subsequent work on this topic (e.g., Borji et al. 2013). 24 25 Strengths: simple feed-forward architectures generates complex behavior; massively parallel implementation (biologically plausible); very successful as model of early visual processing. Weaknesses: can only detect regions that are salient based on either color, intensity, or orientation; other features (e.g., T junctions, line termination) or conjunctions of features are not accounted for in the model; motion is important for saliency, but is not modeled; the normalization function N ( ) plays a crucial role without being theoretically well-founded; no notion of object in the model (saliency is a property of a point); but objectness crucial for human scene perception. 26 Summary Attention selects the part of the visual input which is fixated and processed in detail; attention is directed to visually salient areas in an image, i.e., areas that are different from the rest of the image; the saliency model is based on color, orientation, intensity maps computed at various spatial scales; center-surround differences are applied, and the maps normalized and combined into a single saliency map; a winner-takes-all mechanism then predicts attended locations; model is robust to noise and models human fixation behavior. 27

References Borji, Ali, Dicky N. Sihite, and Laurent Itti. 2013. Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study. IEEE Transactions on Image Processing 22(1):55 69. Henderson, John. 2003. Human gaze control in real-world scene perception. Trends in Cognitive Sciences 7:498 504. Itti, Laurent. 2007. Visual salience. Scholarpedia 2(9):3327. Itti, Laurent, Christof Koch, and Ernst Niebur. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(11):1254 1259. 28