Computational Cognitive Science

Similar documents
Computational Cognitive Science. The Visual Processing Pipeline. The Visual Processing Pipeline. Lecture 15: Visual Attention.

Computational Cognitive Science

Object detection in natural scenes by feedback

Computational Models of Visual Attention: Bottom-Up and Top-Down. By: Soheil Borhani

Validating the Visual Saliency Model

Can Saliency Map Models Predict Human Egocentric Visual Attention?

Advertisement Evaluation Based On Visual Attention Mechanism

On the implementation of Visual Attention Architectures

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian

A Model for Automatic Diagnostic of Road Signs Saliency

A Model of Saliency-Based Visual Attention for Rapid Scene Analysis

An Attentional Framework for 3D Object Discovery

(Visual) Attention. October 3, PSY Visual Attention 1

Goal-directed search with a top-down modulated computational attention system

An Evaluation of Motion in Artificial Selective Attention

Compound Effects of Top-down and Bottom-up Influences on Visual Attention During Action Recognition

Models of Attention. Models of Attention

The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search

Relevance of Computational model for Detection of Optic Disc in Retinal images

Real-time computational attention model for dynamic scenes analysis

Keywords- Saliency Visual Computational Model, Saliency Detection, Computer Vision, Saliency Map. Attention Bottle Neck

Empirical Validation of the Saliency-based Model of Visual Attention

HUMAN VISUAL PERCEPTION CONCEPTS AS MECHANISMS FOR SALIENCY DETECTION

Top-down Attention Signals in Saliency 邓凝旖

INCORPORATING VISUAL ATTENTION MODELS INTO IMAGE QUALITY METRICS

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling

Biologically Motivated Local Contextual Modulation Improves Low-Level Visual Feature Representations

Visual Thinking for Design Colin Ware

Selective Attention. Inattentional blindness [demo] Cocktail party phenomenon William James definition

MEMORABILITY OF NATURAL SCENES: THE ROLE OF ATTENTION

Control of Selective Visual Attention: Modeling the "Where" Pathway

Computational Saliency Models Cheston Tan, Sharat Chikkerur

The Attraction of Visual Attention to Texts in Real-World Scenes

Visual Selection and Attention

Spiking Inputs to a Winner-take-all Network

Introduction to Computational Neuroscience

Experiences on Attention Direction through Manipulation of Salient Features

A Computational Model of Saliency Depletion/Recovery Phenomena for the Salient Region Extraction of Videos

Evaluation of the Impetuses of Scan Path in Real Scene Searching

Recurrent Refinement for Visual Saliency Estimation in Surveillance Scenarios

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

Lateral Geniculate Nucleus (LGN)

Finding Saliency in Noisy Images

Incorporating Audio Signals into Constructing a Visual Saliency Map

IAT 814 Knowledge Visualization. Visual Attention. Lyn Bartram

Increasing Spatial Competition Enhances Visual Prediction Learning

Development of goal-directed gaze shift based on predictive learning

Reading Assignments: Lecture 18: Visual Pre-Processing. Chapters TMB Brain Theory and Artificial Intelligence

A context-dependent attention system for a social robot

Detection of terrorist threats in air passenger luggage: expertise development

Realization of Visual Representation Task on a Humanoid Robot

TARGET DETECTION USING SALIENCY-BASED ATTENTION

Psychophysical tests of the hypothesis of a bottom-up saliency map in primary visual cortex

What we see is most likely to be what matters: Visual attention and applications

VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING

Available online at ScienceDirect. Procedia Computer Science 54 (2015 )

intensities saliency map

The discriminant center-surround hypothesis for bottom-up saliency

VENUS: A System for Novelty Detection in Video Streams with Learning

(In)Attention and Visual Awareness IAT814

Visual Attention Framework: Application to Event Analysis

FEATURE EXTRACTION USING GAZE OF PARTICIPANTS FOR CLASSIFYING GENDER OF PEDESTRIANS IN IMAGES

A Dynamical Systems Approach to Visual Attention Based on Saliency Maps

Properties of V1 neurons tuned to conjunctions of visual features: application of the V1 saliency hypothesis to visual search behavior

Object-based Saliency as a Predictor of Attention in Visual Tasks

Biologically Inspired Autonomous Mental Development Model Based on Visual Selective Attention Mechanism

IAT 355 Perception 1. Or What You See is Maybe Not What You Were Supposed to Get

Prof. Greg Francis 7/31/15

Local Image Structures and Optic Flow Estimation

Measuring the attentional effect of the bottom-up saliency map of natural images

Detection of Inconsistent Regions in Video Streams

PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks

Overview of the visual cortex. Ventral pathway. Overview of the visual cortex

A Neural Network Architecture for.

Contribution of Color Information in Visual Saliency Model for Videos

An Information Theoretic Model of Saliency and Visual Search

ADAPTING COPYCAT TO CONTEXT-DEPENDENT VISUAL OBJECT RECOGNITION

Neurally Inspired Mechanisms for the Dynamic Visual Attention Map Generation Task

Supplementary Material: The Interaction of Visual and Linguistic Saliency during Syntactic Ambiguity Resolution

PAPER Simulating the role of visual selective attention during the development of perceptual completion

Computational modeling of visual attention and saliency in the Smart Playroom

Top-Down Control of Visual Attention: A Rational Account

A new path to understanding vision

Chapter 14 Mining Videos for Features that Drive Attention

LISC-322 Neuroscience Cortical Organization

A Visual Saliency Map Based on Random Sub-Window Means

Pre-Attentive Visual Selection

Motion Saliency Outweighs Other Low-level Features While Watching Videos

Computational Cognitive Neuroscience Approach for Saliency Map detection using Graph-Based Visual Saliency (GBVS) tool in Machine Vision

Object-Level Saliency Detection Combining the Contrast and Spatial Compactness Hypothesis

A Bayesian Hierarchical Framework for Multimodal Active Perception

Dynamic Visual Attention: Searching for coding length increments

A Knowledge Driven Computational Visual Attention Model

Attention and Cognitive Control in Humans, Animals and Intelligent Systems

using deep learning models to understand visual cortex

Tsuchiya & Koch Page 1 of 5 Supplementary Note

Reading Assignments: Lecture 5: Introduction to Vision. None. Brain Theory and Artificial Intelligence

Knowledge-driven Gaze Control in the NIM Model

Natural Scene Statistics and Perception. W.S. Geisler

Deriving an appropriate baseline for describing fixation behaviour. Alasdair D. F. Clarke. 1. Institute of Language, Cognition and Computation

Transcription:

Computational Cognitive Science Lecture 15: Visual Attention Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 14 November 2017 1 / 28

1 Introduction The Visual Processing Pipeline Attention and Visual Features 2 The Saliency Model Model Architecture Feature Maps Saliency Map Inhibition of Return 3 Results Robustness to Noise Strengths and Limitations Reading: Itti, Koch, and Niebur (1998). 2 / 28

The Visual Processing Pipeline The rest of the course will deal with human visual cognition. We will focus on high-level visual processing (not visual neuroscience): Visual attention: How do we decide which parts of an image to focus on? Visual search: How do we search for a target in an image? Object recognition: How do we identify objects in an image? We will introduce computational models in all three domains. 3 / 28

The Visual Processing Pipeline When we view an image, we actually see this: Only the fovea, a small area in the center of the retina, is in focus. Image from http://eyetracking.me/ 4 / 28

The Visual Processing Pipeline In order to take in the whole image, we have to move our eyes: fixations (stationary periods) and saccades (rapid movements). 471 340 216 113 161 238 355 154 250 Image from Henderson, 2003 119 205 How do we determine where to look? We need to work out which area are interesting, i.e., attract visual attention. 5 / 28

Visual Saliency We attend to the areas that are visually salient. An area is salient if it stands out, is different from the rest of the image. The visual system computes a saliency map of the image, and then moves the eyes to the most salient regions in turn (Itti et al., 1998). 6 / 28

Visual Features Saliency can be tested using visual search experiments: participants have to find a target item among a number of distractors. Examples for visual features that can make a target salient: color; orientation; intensity. Saliency can make the target pop out from its distractors if it differs in one of this features. The pop-out effect doesn t occur if the target is different from the distractors in two aspects (conjunction target). 7 / 28

Visual Features Pop-out because of color (Itti, 2007): 8 / 28

Visual Features Pop-out because of orientation (Itti, 2007): 9 / 28

Visual Features No pop-out: conjunction target (Itti, 2007): 10 / 28

Model Architecture Itti et al. s (1998) computational model of saliency: compute feature maps for color, intensity, orientation at different scales; compute center-surround difference and apply a normalization; combine the maps across scales into conspicuity maps; saliency map is a linear combination of the conspicuity maps; winner-takes-all operator predicts attended locations. This model mainly works for free viewing. In the next lectures we will talk about models that can account for search data. 11 / 28

Model Architecture 12 / 28

Feature Maps Feature maps are computed at nine spatial scales (1:1 to 1:256) by low-pass filtering (blurring) and subsampling the image. A center-surround operator is used to detect locations that stand out from their surroundings: this is implemented as the difference between finer and coarser scales; the center is a pixel at scale c {2, 3, 4}; the surround is the corresponding pixel at scale s = c + d, with d {3, 4}; the across-scale difference between two maps is denoted as. 13 / 28

Intensity At each spatial scale, a set of feature maps are computed based on the red, green, and blue color values (r, g, b) of the pixels. Intensity map: compute the intensity function I = (r + g + b)/3 and then the intensity map using the center surround operator: I(c, s) = I (c) I (s) with c {2, 3, 4} and s = c + d, with d {3, 4}. 14 / 28

Color Color maps: compute four color values R = r (g + b)/2 for red, G = g (r + b)/2 for green, B = b (r + g)/2 for blue, and Y = (r + g)/2 r g /2 b for yellow. Then compute color maps again using center-surround: RG(c, s) = (R(c) G(c)) (G(s) R(s)) BY(c, s) = (B(c) Y (c)) (Y (s) B(s)) These are based on color opponencies (exist in the visual cortex). 15 / 28

Orientation Orientation map: compute Gabor pyramids O(σ, θ) where σ [0... 8] is the scale and θ {0, 45, 90, 135 } is the preferred orientation. Then compute color maps again using center-surround: O(c, s, θ) = O(c, θ) O(s, θ) In total, 42 feature maps are computed: six for intensity, 12 for color, and 24 for orientation. 16 / 28

Example C Ī Ō 17 / 28

Saliency Map Before we combine feature maps, a normalization operator N ( ) is applied, which promotes maps with a small number of strong peaks, and suppressed maps with many similar peaks. 18 / 28

Saliency Map The feature maps are combined into three conspicuity maps for intensity, color, and orientation at the same scale (σ = 4). For intensity and color, we get: Ī = 4 c=2 c+4 s=c+3 N (I(c, s)) C = 4 c=2 c+4 s=c+3 [N (RG(c, s)) + N (BY(c, s))] where the operator reduces each map to scale 4 and performs point-by-point addition. 19 / 28

Saliency Map For orientation, we first combine the six feature maps for a given angle and then add them to get a single conspicuity map: Ō = N ( 4 c=2 c+4 s=c+3 N (O(c, s, θ))) θ {0,45,90,135 } The overall saliency map is then computed by normalizing and averaging the three conspicuity maps: S = 1 3 (N (Ī) + N ( C) + N (Ō)) Why do we normalize each conspicuity map separately? Similar features compete strongly for saliency, while different ones contribute independently to saliency. 20 / 28

Example 21 / 28

Inhibition of Return Now we can predict sequences of fixations from a saliency map: the maximum of S is the most salient location, which becomes the focus of attention (FOA); all other locations are ignored (inhibited); then the saliency around the FOA is reset, so that the second most salient location becomes the new FOA. The last property is crucial: it results in inhibition of return, so that the FOA doesn t immediate return to the most salient location. Itti et al. (1998) implement this using a winner-take-all neural network. This allows them to simulate fixation durations. 22 / 28

Inhibition of Return 23 / 28

Inhibition of Return 23 / 28

Inhibition of Return 23 / 28

Inhibition of Return 23 / 28

Robustness to Noise Test the model by adding noise to the image, see if it is still able to pick out salient locations correctly. 24 / 28

Robustness to Noise Test the model by adding noise to the image, see if it is still able to pick out salient locations correctly. 24 / 28

Robustness to Noise Test the model by adding noise to the image, see if it is still able to pick out salient locations correctly. 24 / 28

Evaluation Evaluation reported by Itti et al. (1998): saliency model can reproduce human performance in pop-out tasks (including conjunction target); tested also on images of traffic signs, red soda cans, and emergency triangles (though no details given in the paper); outperforms spatial frequency models. No evaluation of saliency against eye-tracking data. However, there is a lot of subsequent work on this topic, such as Borji, Sihite, and Itti (2013). 25 / 28

Strengths and Limitations Strengths: simple feed-forward architectures generates complex behavior; massively parallel implementation (biologically plausible); very successful as model of early visual processing. Weaknesses: can only detect regions that are salient based on either color, intensity, or orientation; other features (e.g., T junctions, line termination) or conjunctions of features are not accounted for in the model; motion is important for saliency, but is not modeled; the normalization function N ( ) plays a crucial role without being theoretically well-founded; no notion of object in the model (saliency is a property of a point); but objectness crucial for human scene perception. 26 / 28

Summary Attention selects the part of the visual input which is fixated and processed in detail; attention is directed to visually salient areas in an image, i.e., areas that are different from the rest of the image; the saliency model is based on color, orientation, intensity maps computed at various spatial scales; center-surround differences are applied, and the maps normalized and combined into a single saliency map; a winner-takes-all mechanism then predicts attended locations; model is robust to noise and models human fixation behavior. 27 / 28

References Borji, A., Sihite, D. N., & Itti, L. (2013). Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study. IEEE Transactions on Image Processing, 22(1), 55 69. Henderson, J. (2003). Human gaze control in real-world scene perception. Trends in Cognitive Sciences, 7, 498 504. Itti, L. (2007). Visual salience. Scholarpedia, 2(9), 3327. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254 1259. 28 / 28