Computational Cognitive Science

Similar documents
Computational Cognitive Science

Validating the Visual Saliency Model

Computational Cognitive Science. The Visual Processing Pipeline. The Visual Processing Pipeline. Lecture 15: Visual Attention.

A Model of Saliency-Based Visual Attention for Rapid Scene Analysis

Can Saliency Map Models Predict Human Egocentric Visual Attention?

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian

Contextual Guidance of Eye Movements and Attention in Real-World Scenes: The Role of Global Features in Object Search

The Attraction of Visual Attention to Texts in Real-World Scenes

MEMORABILITY OF NATURAL SCENES: THE ROLE OF ATTENTION

Object-based Saliency as a Predictor of Attention in Visual Tasks

On the implementation of Visual Attention Architectures

Recurrent Refinement for Visual Saliency Estimation in Surveillance Scenarios

Knowledge-driven Gaze Control in the NIM Model

Keywords- Saliency Visual Computational Model, Saliency Detection, Computer Vision, Saliency Map. Attention Bottle Neck

HUMAN VISUAL PERCEPTION CONCEPTS AS MECHANISMS FOR SALIENCY DETECTION

Computational Saliency Models Cheston Tan, Sharat Chikkerur

An Evaluation of Motion in Artificial Selective Attention

Evaluation of the Impetuses of Scan Path in Real Scene Searching

Real-time computational attention model for dynamic scenes analysis

A Model for Automatic Diagnostic of Road Signs Saliency

Putting Context into. Vision. September 15, Derek Hoiem

A Unified Theory of Exogenous and Endogenous Attentional. Control

Computational Models of Visual Attention: Bottom-Up and Top-Down. By: Soheil Borhani

VENUS: A System for Novelty Detection in Video Streams with Learning

Reading Assignments: Lecture 18: Visual Pre-Processing. Chapters TMB Brain Theory and Artificial Intelligence

SUN: A Bayesian Framework for Saliency Using Natural Statistics

An Attentional Framework for 3D Object Discovery

Visual Attention and Change Detection

UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society

Attention and Scene Understanding

What we see is most likely to be what matters: Visual attention and applications

Meaning-based guidance of attention in scenes as revealed by meaning maps

Detection of terrorist threats in air passenger luggage: expertise development

Searching in the dark: Cognitive relevance drives attention in real-world scenes

On the role of context in probabilistic models of visual saliency

Development of goal-directed gaze shift based on predictive learning

Experience-Guided Search: A Theory of Attentional Control

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling

FEATURE EXTRACTION USING GAZE OF PARTICIPANTS FOR CLASSIFYING GENDER OF PEDESTRIANS IN IMAGES

Visual Task Inference Using Hidden Markov Models

VISUAL search is necessary for rapid scene analysis

Advertisement Evaluation Based On Visual Attention Mechanism

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

Relative Influence of Bottom-up & Top-down Attention

Increasing Spatial Competition Enhances Visual Prediction Learning

Biologically Motivated Local Contextual Modulation Improves Low-Level Visual Feature Representations

Human Learning of Contextual Priors for Object Search: Where does the time go?

The influence of clutter on real-world scene search: Evidence from search efficiency and eye movements

The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search

Object Detectors Emerge in Deep Scene CNNs

196 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 2, NO. 3, SEPTEMBER 2010

Human Learning of Contextual Priors for Object Search: Where does the time go?

Adding Shape to Saliency: A Computational Model of Shape Contrast

ECG Beat Recognition using Principal Components Analysis and Artificial Neural Network

Motion Saliency Outweighs Other Low-level Features While Watching Videos

A Visual Saliency Map Based on Random Sub-Window Means

Experiences on Attention Direction through Manipulation of Salient Features

INCORPORATING VISUAL ATTENTION MODELS INTO IMAGE QUALITY METRICS

Object detection in natural scenes by feedback

Deriving an appropriate baseline for describing fixation behaviour. Alasdair D. F. Clarke. 1. Institute of Language, Cognition and Computation

The Role of Color and Attention in Fast Natural Scene Recognition

9.S913 SYLLABUS January Understanding Visual Attention through Computation

A context-dependent attention system for a social robot

The discriminant center-surround hypothesis for bottom-up saliency

Gist of the Scene. Aude Oliva ABSTRACT II. THE NATURE OF THE GIST I. WHAT IS THE GIST OF A SCENE? A. Conceptual Gist CHAPTER

Top-Down Control of Visual Attention: A Rational Account

Predicting human gaze using low-level saliency combined with face detection

Finding Saliency in Noisy Images

Performance and Saliency Analysis of Data from the Anomaly Detection Task Study

SUN: A Model of Visual Salience Using Natural Statistics. Gary Cottrell Lingyun Zhang Matthew Tong Tim Marks Honghao Shan Nick Butko Javier Movellan

Visual Scene Understanding

Computational modeling of visual attention and saliency in the Smart Playroom

Goal-directed search with a top-down modulated computational attention system

intensities saliency map

Research Article. Automated grading of diabetic retinopathy stages in fundus images using SVM classifer

Realization of Visual Representation Task on a Humanoid Robot

Auditory gist perception and attention

Saliency of Color Image Derivatives: A Comparison between Computational Models and Human Perception

INTEGRATION OF EYE-TRACKING METHODS IN VISUAL COMFORT

UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society

Relevance of Computational model for Detection of Optic Disc in Retinal images

Object and Gist Perception in a Dual Task Paradigm: Is Attention Important?

Attentional Selection for Action in Mobile Robots

Neuromorphic convolutional recurrent neural network for road safety or safety near the road

Saliency Prediction with Active Semantic Segmentation

Spiking Inputs to a Winner-take-all Network

Methods for comparing scanpaths and saliency maps: strengths and weaknesses

Recognizing Scenes by Simulating Implied Social Interaction Networks

A Knowledge Driven Computational Visual Attention Model

Neurally Inspired Mechanisms for the Dynamic Visual Attention Map Generation Task

Biologically Inspired Autonomous Mental Development Model Based on Visual Selective Attention Mechanism

Webpage Saliency. National University of Singapore

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

IAT 355 Perception 1. Or What You See is Maybe Not What You Were Supposed to Get

Compound Effects of Top-down and Bottom-up Influences on Visual Attention During Action Recognition

A Computational Model of Saliency Depletion/Recovery Phenomena for the Salient Region Extraction of Videos

Subjective randomness and natural scene statistics

PRECISE processing of only important regions of visual

Learning Spatiotemporal Gaps between Where We Look and What We Focus on

PNN -RBF & Training Algorithm Based Brain Tumor Classifiction and Detection

Transcription:

Computational Cognitive Science Lecture 19: Contextual Guidance of Attention Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 20 November 2018 1 / 24

1 Introduction Visual Saliency Contextual Guidance 2 Contextual Guidance Model Model Architecture Local Features Global Features Context 3 Results Eye-tracking Data Evaluation Reading: Torralba, Oliva, Castelhano, and Henderson (2006). 2 / 24

Visual Saliency We attend to the areas that are visually salient. An area is salient if it stands out, is different from the rest of the image. The visual system computes a saliency map of the image, and then moves the eyes to the most salient regions in turn (Itti, Koch, & Niebur, 1998). 3 / 24

Contextual Guidance However, visual search in real-world scenes is different from visual search in artificial stimuli (arrays of lines, etc.). Ex: person search: 4 / 24

Contextual Guidance However, visual search in real-world scenes is different from visual search in artificial stimuli (arrays of lines, etc.). Ex: person search: Saliency map (white area) is quite different from actual fixations. 4 / 24

Contextual Guidance In real-world scenes, factors in addition to saliency guide search: contextual knowledge: where an object is located (e.g., a person typically is on the floor); semantic knowledge: which objects are related to each other and co-occur (e.g., computer and monitor occur together); schematic knowledge: which objects occur in given type of scene (e.g., a kitchen typically contains an oven); task knowledge: viewing strategy required by a given tasks (e.g., search vs. memorization). We will look at the Contextual Guidance Model (Torralba et al., 2006), which combines contextual knowledge with saliency. 5 / 24

Model Architecture 6 / 24

Model Architecture The Contextual Guidance Model (CGM) combines saliency with scene gist to model visual search. Intuitively, scene gist represents what type of scene we re dealing with (indoor scene, street scene, landscape, etc.). 7 / 24

Model Architecture The CGM computes the probability that target object O is present at point X = (x, y) in the image: p(o = 1, X L, G) = 1 p(l O = 1, X, G) (1) p(l G) p(x O = 1, G)p(O = 1 G) where L is a set of local image features at X and G is a set of global image features representing gist. 8 / 24

Model Architecture Components of the CGM in Equation (1): 1 p(l G) is a saliency model (implemented differently from Itti et al., but same idea); p(l O = 1, X, G) enhances the features of X that belong to the target object; p(x O = 1, G) is the contextual prior, provides information about likely target locations; p(o = 1 G) is the probability that O is present in the scene. Note that unlike Itti s model, the CGM is fully probabilistic. 9 / 24

Model Architecture The CGM basically combines two components: a saliency map and a context map. In the implementation of the CGM, Equation (1) is simplified to: S(X ) = 1 p(x O = 1, G) (2) p(l G) Contextually modulated saliency S(X ) is saliency combined with a prior over target locations, conditioned scene gist. 10 / 24

Local Features Probabilistic definition of saliency: local image features are salient when they are statistically distinguishable from the background. Compute local image features (vector L): compute orientation features separately for the three color channels, at 6 orientations and 4 scales (Steerable pyramid); model the resulting distribution using a multivariate power-exponential (generalization of Gaussian); then compute p(l G), distribution of local features conditioned on global features. 11 / 24

Local Features Examples for saliency maps: 12 / 24

Global Features Before computing saliency or performing object recognition, the visual system computes a global summary of the image (gist). This can be simulated by pooling the outputs of local feature detectors across large regions of the visual field: compute luminance (intensity) as mean of red, green, blue values; compute orientation features for luminance at 6 orientations and 4 scales (Steerable pyramid); compute the average of each feature over 4 4 non-overlapping windows in the image; reduce resulting vector G using principal component analysis. 13 / 24

Global Features e 3. scale orientation Magnitude of multiscale oriented filter outputs Sampled filter outputs G Computation of global features. The Luminance channel is decomposed using a Steerable pyramid with 6 orientations and 4 14 / 24

Context The context component of the CGM associates a scene type (i.e., a set of global features) with likely target locations. Example: When searching for people in a street scene, locations in the bottom half of the scene are likely. Assume the expected target location is a weighted mixture of the target locations in all scenes: p(x, G O = 1) = N P(n)p(X n)p(g n) n=1 where we assume that the scenes are clustered into N prototypes: P(n) is the weight, p(x n) the distribution of target locations, p(g n) the distribution of global features for prototype n. 15 / 24

Context Implementation of context model: context only predicts the vertical location of the target object (horizontal location is unconstrained); so we can approximate p(x O = 1, G) as p(y O = 1, G); the model parameters can be estimated using the expectation maximization algorithm (this infers the prototypes); train model on images containing three types of target objects: people in street scenes; paintings and mugs in indoor scenes (around 300 images per object type); number of prototypes set to N = 4; then compute modulated saliency map S(X ) as weighted product of saliency map and context map. 16 / 24

Context Context prototypes for people in street scenes: e 7. Scene prototypes selected for the people search task in urban scenes. The top row shows the images from the training set th 17 / 24

Eye-tracking Data The CGM predicts fixation locations, so it can be evaluated against eye-tracking data. Collect eye-tracking from participants performing visual search: task was to count the number of people, mugs, or paintings present in the image; a total of 72 images were used with up to six targets each; about half the images contained no target; participants could take up to 10 s for the task; accuracy was the same for target-present and target-absent conditions. 18 / 24

Evaluation Use eye-tracking data to evaluate the CGM: compare how well the CGM predicts fixation locations compared to saliency alone for the three search tasks; the model outputs a probability value for each image location; apply a threshold to this probability so that the model selects a fixed percentage of the image (here 20%); then count how many fixations fall within the selected region; chance baseline: 20% correct; upper limit: consistency across participants; check also how fixation number influences performance (hypothesis: CGM models early stages of visual search). 19 / 24

Evaluation Example: saliency model vs. fixations for painting and mug search: painting search mug search 20 / 24

Evaluation Example: CGM vs. fixations for painting and mug search: Task: painting search Saliency Task: mug search Task: painting search Task: mug search 21 / 24

Evaluation CGM evaluation by fixation number: Consistency across participants Full Model Saliency Model % fixation within predicted region 100 90 80 70 60 50 40 30 People Present 1 2 3 4 5 Fixation number 100 90 80 70 60 50 40 30 People Absent 1 2 3 4 5 Fixation number 22 / 24

Evaluation CGM evaluation by fixation number: % fixation within predicted region 100 90 80 70 60 50 40 30 Painting Present 1 2 3 4 5 Fixation number 100 90 80 70 60 50 40 30 Painting Absent 1 2 3 4 5 Fixation number 22 / 24

Evaluation CGM evaluation by fixation number: % fixation within predicted region 100 90 80 70 60 50 40 30 Mug Present 1 2 3 4 5 100 90 80 70 60 50 40 30 Mug Absent 1 2 3 4 5 22 / 24

Summary Information about scene context is important for visual search; the Contextual Guidance Model combines saliency with context, both conditioned on scene gist, to compute likely fixation locations; gist is essentially an orientation/intensity map of the scene at a coarse scale; context is modeled a distribution over likely vertical locations of the target object; the CGM successfully models eye-tracking data on visual search in photorealistic scenes. 23 / 24

References Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254 1259. Torralba, A., Oliva, A., Castelhano, M., & Henderson, J. M. (2006). Contextual guidance of attention in natural scenes: The role of global features on object search. Psychological Review, 113(4), 766 786. 24 / 24