Methods for comparing scanpaths and saliency maps: strengths and weaknesses

Similar documents
What we see is most likely to be what matters: Visual attention and applications

Validating the Visual Saliency Model

Methods for comparing scanpaths and saliency maps: strengths and weaknesses

Methods for comparing scanpaths and saliency maps: strengths and weaknesses

MEMORABILITY OF NATURAL SCENES: THE ROLE OF ATTENTION

VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING

Saliency aggregation: Does unity make strength?

The Attraction of Visual Attention to Texts in Real-World Scenes

Deriving an appropriate baseline for describing fixation behaviour. Alasdair D. F. Clarke. 1. Institute of Language, Cognition and Computation

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling

PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks

Evaluation of the Impetuses of Scan Path in Real Scene Searching

SUN: A Bayesian Framework for Saliency Using Natural Statistics

The influence of clutter on real-world scene search: Evidence from search efficiency and eye movements

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition

Knowledge-driven Gaze Control in the NIM Model

UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society

Finding Saliency in Noisy Images

On the role of context in probabilistic models of visual saliency

Object-based Saliency as a Predictor of Attention in Visual Tasks

An Attentional Framework for 3D Object Discovery

Dynamic Visual Attention: Searching for coding length increments

Human Learning of Contextual Priors for Object Search: Where does the time go?

Where should saliency models look next?

Visual Task Inference Using Hidden Markov Models

How does image noise affect actual and predicted human gaze allocation in assessing image quality?

Can Saliency Map Models Predict Human Egocentric Visual Attention?

Saliency in Crowd. 1 Introduction. Ming Jiang, Juan Xu, and Qi Zhao

Webpage Saliency. National University of Singapore

Relative Influence of Bottom-up & Top-down Attention

Learning Spatiotemporal Gaps between Where We Look and What We Focus on

Understanding eye movements in face recognition with hidden Markov model

Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths

Searching in the dark: Cognitive relevance drives attention in real-world scenes

Saliency in Crowd. Ming Jiang, Juan Xu, and Qi Zhao

A Visual Saliency Map Based on Random Sub-Window Means

Human Learning of Contextual Priors for Object Search: Where does the time go?

HUMAN VISUAL PERCEPTION CONCEPTS AS MECHANISMS FOR SALIENCY DETECTION

An Evaluation of Motion in Artificial Selective Attention

Recurrent Refinement for Visual Saliency Estimation in Surveillance Scenarios

Keywords- Saliency Visual Computational Model, Saliency Detection, Computer Vision, Saliency Map. Attention Bottle Neck

Dynamic Eye Movement Datasets and Learnt Saliency Models for Visual Action Recognition

Objects do not predict fixations better than early saliency: A re-analysis of Einhäuser et al. s data

Learning to Predict Saliency on Face Images

A Model for Automatic Diagnostic of Road Signs Saliency

VISUAL search is necessary for rapid scene analysis

Motion Saliency Outweighs Other Low-level Features While Watching Videos

Computational modeling of visual attention and saliency in the Smart Playroom

Computational Cognitive Science

UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society

The Importance of Time in Visual Attention Models

Adding Shape to Saliency: A Computational Model of Shape Contrast

Models of Attention. Models of Attention

Computational Models of Visual Attention: Bottom-Up and Top-Down. By: Soheil Borhani

Visual Attention and Change Detection

Predicting human gaze using low-level saliency combined with face detection

A Hierarchical Visual Saliency Model for Character Detection in Natural Scenes

Scan patterns when viewing natural scenes: Emotion, complexity, and repetition

Modeling visual attention on scenes

Improving Saliency Models by Predicting Human Fixation Patches

Evaluating Visual Saliency Algorithms: Past, Present and Future

Contribution of Color Information in Visual Saliency Model for Videos

A Neural Network Architecture for.

A Locally Weighted Fixation Density-Based Metric for Assessing the Quality of Visual Saliency Predictions

UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society

Detection of terrorist threats in air passenger luggage: expertise development

Computational Cognitive Science

Measuring Focused Attention Using Fixation Inner-Density

(Visual) Attention. October 3, PSY Visual Attention 1

Vision Research. Clutter perception is invariant to image size. Gregory J. Zelinsky a,b,, Chen-Ping Yu b. abstract

Computational Cognitive Science. The Visual Processing Pipeline. The Visual Processing Pipeline. Lecture 15: Visual Attention.

GESTALT SALIENCY: SALIENT REGION DETECTION BASED ON GESTALT PRINCIPLES

The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search

Introduction to Computational Neuroscience

Predicting When Saliency Maps are Accurate and Eye Fixations Consistent

Visual Strategies in Analogical Reasoning Development: A New Method for Classifying Scanpaths

A Survey on the Cognitive Basis of Visual Attention in Real-World Behavior

Target Template Guidance of Eye Movements During Real-World Search. George Law Malcolm

SUN: A Model of Visual Salience Using Natural Statistics. Gary Cottrell Lingyun Zhang Matthew Tong Tim Marks Honghao Shan Nick Butko Javier Movellan

A Bayesian Hierarchical Framework for Multimodal Active Perception

9.S913 SYLLABUS January Understanding Visual Attention through Computation

An Audiovisual Saliency Model For Conferencing and Conversation

Comparative Study of K-means, Gaussian Mixture Model, Fuzzy C-means algorithms for Brain Tumor Segmentation

EVALUATION OF DRUG LABEL DESIGNS USING EYE TRACKING. Agnieszka Bojko, Catherine Gaddy, Gavin Lew, Amy Quinn User Centric, Inc. Oakbrook Terrace, IL

NIH Public Access Author Manuscript Vis cogn. Author manuscript; available in PMC 2009 December 8.

Probabilistic Evaluation of Saliency Models

ELL 788 Computational Perception & Cognition July November 2015

The discriminant center-surround hypothesis for bottom-up saliency

I. INTRODUCTION VISUAL saliency, which is a term for the pop-out

State-of-the-Art in Visual Attention Modeling

Natural Scene Statistics and Perception. W.S. Geisler

On the control of visual fixation durations in free viewing of complex images

{djamasbi, ahphillips,

Saliency Prediction with Active Semantic Segmentation

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)

Incorporating Audio Signals into Constructing a Visual Saliency Map

A top-down saliency model with goal relevance

Swimming in the Underlying Stream: Computational Models of Gaze in a Comparative Behavioral Analysis of Autism

Neuron, Volume 63 Spatial attention decorrelates intrinsic activity fluctuations in Macaque area V4.

Real-time computational attention model for dynamic scenes analysis

Transcription:

Methods for comparing scanpaths and saliency maps: strengths and weaknesses O. Le Meur olemeur@irisa.fr T. Baccino thierry.baccino@univ-paris8.fr Univ. of Rennes 1 http://www.irisa.fr/temics/staff/lemeur/ July 2011 1

Introduction Denition (Scanpath [Noton and Stark(1971)]) A scanpath is a particular sequence of eye movements when a particular visual pattern is viewed. Visual scanpath is often held as a marker of attention For the purpose of this presentation, we will consider a scanpath as being any eye-movement data collected by eye-tracking apparatus, any path stemming from a computational model (saliency algo. with IOR [Koch and Ullman(1985)] for instance). 2

Introduction The overall scanpath pattern is inuenced and shaped by a combination of: 1 Top-down cognitive factors (expectations, goals, memory...). 2 Bottom-up processes involving visual sensory input. Example: Impact of the visual quality on the deployment of visual attention Dierent methods can be used to evaluate the similarity between scanpaths. 3

1 Introduction 2 5 6 3 4 Methods involving scanpaths and saliency maps 4

Agenda Introduction String edit Vector-based metric 1 Introduction 2 String edit Vector-based metric 5 6 3 4 Methods involving scanpaths and saliency maps 5

Three principal methods String edit Vector-based metric These three methods have been described in the chapter proposal: String edit [Levenshtein(1966)]; Mannan's metric [Mannan et al.(1995)]; Vector-based metric [Jarodzka et al.(2010)]. 6

Three principal methods String edit-levenshtein distance String edit Vector-based metric Denition (String edit-levenshtein distance [Levenshtein(1966)]) This technique was originally developed to account for the edit distance between two words. The similarity is given by the minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution of a single character. Advantages: + Easy to compute + Keep the order of xation Drawbacks: How many viewing areas of interest should we use (7,12,15,25...)? It does not take into account xation duration... Parrot picture with a 5 3 grid overlaid 7

Three principal methods Vector-based metric (1/2) String edit Vector-based metric Denition (Vector-based metric [Jarodzka et al.(2010)]) The vector-based metric represents the scanpath as a sequence of vector. For example, a scanpath with n xations is represented by a set of n 1 vectors. This representation is interesting because it preserves: the shape of the scanpath; the length of the scanpath (almost); the direction of the scanpath saccades; the position of xations; the duration of xations. 8

Three principal methods Vector-based metric (2/2) The vector-based metric is composed of three steps: 1 Scanpath simplication: small consecutive saccadic vectors are merged; consecutive vectors having similar directions are merged. 2 Temporal alignment: Similarity matrix M; Adjacency matrix A; Find the shortest path. 3 Scanpath comparison providing 5 measures: dierence in shape (vector dierence); dierence in amplitude of saccade; dierence in spatial position; dierence in direction; dierence in duration. String edit Vector-based metric Advantages: + No pre-dened AOIs + Alignment of scanpaths (based on their shapes or on other dimensions) Drawbacks Eye movements such as smooth pursuit are not handled It compares only two scanpaths 9

From a xation map to a saliency map Divergence of Kullback-Leibler ROC analysis 1 Introduction 2 5 6 3 From a xation map to a saliency map Divergence of Kullback-Leibler ROC analysis 4 Methods involving scanpaths and saliency maps 10

Three principal methods From a xation map to a saliency map Divergence of Kullback-Leibler ROC analysis These three methods have been described in the chapter proposal: Correlation-based measure; Divergence of Kullback-Leibler; ROC analysis. 11

Introduction Two populations of visual xations? From a xation map to a saliency map Divergence of Kullback-Leibler ROC analysis Three principal methods From a xation map to a saliency map Discrete xation map f i for the i th observer (M is the number of xations): f M i (x) = X δ(x x (k ) ) f k =1 (1) Continuous saliency map S (N is the number of observers): S 12 (a) Original (x) = N 1X N (b) Fixation map i =1 i (x)! f (2) G σ ( x) (c) Saliency map (d) Heat map

Three principal methods Divergence of Kullback-Leibler From a xation map to a saliency map Divergence of Kullback-Leibler ROC analysis Denition (Divergence of Kullback-Leibler) The Kullback-Leibler divergence is used to estimate the overall dissimilarity between two probability density functions. Let dene two discrete distributions R and P with probability density functions r k and p k, the KL-divergence between R and P is given by the relative entropy of P with respect to R: KL(R, P) = ( ) pk p k log (3) r k k The KL-divergence is only dened if r k and p k both sum to 1 and if r k > 0 for any k such that p k > 0. 13 (a) (b) (c) KL(c, b) = 3.33 and KL(b, c) = 7.06. Advantages: + Easy to use Drawbacks: Not bounded

Three principal methods ROC analysis (1/2) From a xation map to a saliency map Divergence of Kullback-Leibler ROC analysis Denition (ROC) The Receiver Operating Characteristic (ROC) analysis provides a comprehensive and visually attractive framework to summarize the accuracy of predictions. The problem is here limited to a two-class prediction (binary classication). Pixels of the ground truth as well as those of the prediction are labeled either as xated or not xated. Hit rate (TP) ROC curve AUC (Area Under Curve) AUC=1 perfect; AUC=0.5 random. 14

Three principal methods ROC analysis (2/2) From a xation map to a saliency map Divergence of Kullback-Leibler ROC analysis (a) Reference (b) Predicted (c) Classication A ROC curve plotting the false positive rate as a function of the true positive rate is usually used to present the classication result. Advantages: + Invariant to monotonic transformation + Well dened upper bound Drawbacks:... 15

Receiver Operating Analysis 1 Introduction 2 5 6 3 4 Methods involving scanpaths and saliency maps Receiver Operating Analysis 16

Four principal methods Receiver Operating Analysis These four methods have been described in the chapter proposal: Receiver Operating Analysis; Normalized Scanpath Saliency [Parkhurst et al.(2002), Peters et al.(2005)]; Percentile [Peters and Itti(2008)]; The Kullback-Leibler divergence [Itti and Baldi(2005)]. 17

Four principal methods Receiver Operating Analysis (1/3) Receiver Operating Analysis ROC analysis is performed between a continuous saliency map and a set of xations. Human xations only [Torralba et al.(2006), Judd et al.(2009)]: In this case, the hit rate is measured in function of the threshold used to binarize the saliency map. (a) HitRate=100% (b) HitRate=50% This method is not sensitive to the false alarm rate. 18

Four principal methods Receiver Operating Analysis (2/3) Receiver Operating Analysis The ROC analysis is here performed between a continuous saliency map and a set of xations. Human xations plus a set of control points [Einhäuser and König(2003), Tatler et al.(2005)]: by selecting the control points from a uniform or random distribution; 19

Four principal methods Receiver Operating Analysis (3/3) Receiver Operating Analysis The ROC analysis is here performed between a continuous saliency map and a set of xations. Human xations plus a set of control points [Einhäuser and König(2003), Tatler et al.(2005)]: by selecting locations randomly from a distribution of all xation locations for that observer that occurred at the same time, but on other images. 20 This method accounts for center bias, same systematic tendency... It underestimates the salience of areas which are more or less centered in the image...

Agenda 1 Introduction 2 5 6 3 4 Methods involving scanpaths and saliency maps 21

(1/3) No saliency prediction can perform better than inter-observers dispersion. Dispersion between observers: prior knowledge, experience, task, cultural dierence... face, text, low-level visual features... The dispersion can be evaluated by a one-against-all or leave one out. Example: inter-observers congruency based on the Hit Rate metric [Torralba et al.(2006)] 22

Inter-observers congruency for Judd's database [Judd et al.(2009)]: 1000 pictures, 15 observers; congruency based on the hit rate.

(3/3) The inter-observer dispersion can be used as: to the dene the upper bound of a prediction to normalize the metric (nauc as proposed by [Zhao and Koch(2011)]). Comparison of four state-of-the-art models (Hit Rate) by using two dataset of eye movement N. Bruce's database: O. Le Meur's database: 24

Predicting the dispersion between observers There exist two computationnal models to predict the dispersion between observers: Visual Clutter [Rosenholtz et al.(2007)] based on entropy of Wavelet subbands; IOVC (inter-observers Visual Congruency) [Le Meur et al.(2011)]: Face detection; Color Harmony; Depth of Field; Scene Complexity (entropy, number of regions, contours). 25

Pictures with the highest predicted congruency Pictures with the lowest predicted congruency

Agenda Introduction Focal-ambient dichotomy 1 Introduction 2 3 5 6 Focal-ambient dichotomy 4 Methods involving scanpaths and saliency maps 27

Focal-ambient dichotomy Recent ndings about two distinct populations of xations Velichkovsky and his colleagues [Velichkovsky(2002), Unema et al.(2005), Pannasch et al.(2011)] conjointly analyzed the xation duration with the subsequent saccade amplitude. (short) Fixations with subsequent large-amplitude saccades Ambient mode (long) Fixations with subsequent small-amplitude saccades Focal mode Ad hoc threshold to classify the xations (5 degrees). Larger proportion of focal xations Automatic classication of visual xations based on K-means [Follet et al.(2011)]: Two populations of xation similar to previous studies Automatic classication gives a threshold of 6 degrees 70% of focal xations and 30% of ambient xations 28

Focal-ambient dichotomy Automatic classication of visual xations based on K-means [Follet et al.(2011)] (a) Focal (b) Ambient (c) Focal (d) Ambient Focal and Ambient xation-density maps Is there a correlation between model-predicted saliency and these maps? Both are correlated to model-predicted saliency; Focal maps are more bottom-up than ambient ones; Ambient maps are less correlated to center map. 29

Agenda 1 Introduction 2 5 6 3 4 Methods involving scanpaths and saliency maps 30

31 Introduction

References Einhauser, W., Konig, P., 2003. Does luminance-contrast contribute to a saliency for overt visual attention? European Journal of Neuroscience 17, 10891097. Follet, B., Le Meur, O., Baccino, T., 2011. Features of ambient and focal xations on natural visual scenes, in: ECEM. Itti, L., Baldi, P., 2005. Bayesian surprise attracts human attention, in: Cambridge, M.M.p. (Ed.), Advances in Neural Information Processing Systems, pp. 18. Jarodzka, H., Holmqvist, K., Nystr, M., 2010. A vector-based, multidimensional scanpath similarity measure, in: Proceedings of the 2010 Symposium on Eye-Tracking Research and Applications. Judd, T., Ehinger, K., Durand, F., Torralba, A., 2009. Learning to predict where people look, in: ICCV. Koch, C., Ullman, S., 1985. Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology 4, 219227. Le Meur, O., Baccino, T., Roumy, A., 2011. Prediction of the Inter-Observers Visual Congruency (IOVC) and application to image ranking. ACM TO BE PUBLISHED xx, xxxx. Levenshtein, 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 70710. 31

Mannan, S., Ruddock, K.H., Wooding, D.S., 1995. Automatic control of saccadic eye movements made in visual inspection of briey presented 2D images. Spatial Vision 9, 363386. Noton, D., Stark, L., 1971. Scanpaths in saccadic eye movements while viewing and recognizing patterns. Vision Research 11, 929942. Pannasch, S., Schulz, J., Velichkovsky, B., 2011. On the control of visual xation durations in free viewing of complex images. Attention, Perception & Psychophysics 73, 11201132. Parkhurst, D., Law, K., Niebur, E., 2002. Modeling the role of salience in the allocation of overt visual attention. Vision Research 42, 107123. Peters, R., Itti, L., 2008. Applying computational tools to predict gaze direction in interactive visual environments. ACM Transactions on Applied Perception 5. Peters, R.J., Iyer, A., Itti, L., Koch, C., 2005. Components of bottom-up gaze allocation in natural images. Vision Research 45, 23972416. Rosenholtz, R., Li, Y, Nakano, L., 2007. Measuring visual clutter. Journal of Vision 7, 122. Tatler, B.W., Baddeley, R.J., Gilchrist, I.D., 2005. Visual correlates of xation selection: eects of scale and time. Vision Research 45, 643659. 31

Torralba, A., Oliva, A., Castelhano, M., Henderson, J., 2006. Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychological review 113, 766786. Unema, P., Pannasch, S., Joos, M., Velichkovsky, B.M., 2005. Time course of information processing during scene perception: The relationship between saccade amplitude and xation duration. Visual Cognition 12, 473494. Velichkovsky, B., 2002. Heterarchy of cognition: The depths and the highs of a framework for memory research. Memory 10, 405419. Zhao, Q., Koch, C., 2011. Learning a saliency ma using xated locations in natural scenes. Journal of Vision 11, 115. 31