Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition

Similar documents
RECENT progress in computer visual recognition, in particular

RECENT progress in computer visual recognition, in particular

Dynamic Eye Movement Datasets and Learnt Saliency Models for Visual Action Recognition

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling

Action Recognition. Computer Vision Jia-Bin Huang, Virginia Tech. Many slides from D. Hoiem

Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths

Computational modeling of visual attention and saliency in the Smart Playroom

Beyond R-CNN detection: Learning to Merge Contextual Attribute

Methods for comparing scanpaths and saliency maps: strengths and weaknesses

Adding Shape to Saliency: A Computational Model of Shape Contrast

A Model for Automatic Diagnostic of Road Signs Saliency

Learning Spatiotemporal Gaps between Where We Look and What We Focus on

MEMORABILITY OF NATURAL SCENES: THE ROLE OF ATTENTION

Automated Volumetric Cardiac Ultrasound Analysis

Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM

Designing Caption Production Rules Based on Face, Text and Motion Detections

Can Saliency Map Models Predict Human Egocentric Visual Attention?

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011

Computational Models of Visual Attention: Bottom-Up and Top-Down. By: Soheil Borhani

VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING

The Importance of Time in Visual Attention Models

Understanding eye movements in face recognition with hidden Markov model

The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search

COLOR DISTANCE ON MAPS

Learning to Generate Long-term Future via Hierarchical Prediction. A. Motion-Based Pixel-Level Evaluation, Analysis, and Control Experiments

VENUS: A System for Novelty Detection in Video Streams with Learning

Introduction to Computational Neuroscience

Human Learning of Contextual Priors for Object Search: Where does the time go?

Motivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience.

Motion Saliency Outweighs Other Low-level Features While Watching Videos

Saliency in Crowd. Ming Jiang, Juan Xu, and Qi Zhao

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.2 8/10/2011 I. Purpose

An Attentional Framework for 3D Object Discovery

Dynamic Visual Attention: Searching for coding length increments

Validating the Visual Saliency Model

Supplementary material: Backtracking ScSPM Image Classifier for Weakly Supervised Top-down Saliency

Saliency in Crowd. 1 Introduction. Ming Jiang, Juan Xu, and Qi Zhao

Understanding and Predicting Importance in Images

Accessible Computing Research for Users who are Deaf and Hard of Hearing (DHH)

Automated Embryo Stage Classification in Time-Lapse Microscopy Video of Early Human Embryo Development

PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks

Human Activities: Handling Uncertainties Using Fuzzy Time Intervals

Task oriented facial behavior recognition with selective sensing

Meaning-based guidance of attention in scenes as revealed by meaning maps

Facial Event Classification with Task Oriented Dynamic Bayesian Network

Lateral Geniculate Nucleus (LGN)

The Visual World as Seen by Neurons and Machines. Aaron Walsman, Akanksha Saran

Show Me the Features: Regular Viewing Patterns. During Encoding and Recognition of Faces, Objects, and Places. Makiko Fujimoto

Detection of terrorist threats in air passenger luggage: expertise development

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

Viewpoint Dependence in Human Spatial Memory

{djamasbi, ahphillips,

What we see is most likely to be what matters: Visual attention and applications

VISUAL SALIENCY BASED BRIGHT LESION DETECTION AND DISCRIMINATION IN RETINAL IMAGES

Object Detectors Emerge in Deep Scene CNNs

CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL

Facial expression recognition with spatiotemporal local descriptors

Deep Learning Models for Time Series Data Analysis with Applications to Health Care

EDGE DETECTION. Edge Detectors. ICS 280: Visual Perception

Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction

VISUAL search is necessary for rapid scene analysis

On the role of context in probabilistic models of visual saliency

Nature Neuroscience: doi: /nn Supplementary Figure 1

doi: / _59(

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Neuroinformatics. Ilmari Kurki, Urs Köster, Jukka Perkiö, (Shohei Shimizu) Interdisciplinary and interdepartmental

A Computational Model of Saliency Depletion/Recovery Phenomena for the Salient Region Extraction of Videos

Visual Task Inference Using Hidden Markov Models

Webpage Saliency. National University of Singapore

Information Processing During Transient Responses in the Crayfish Visual System

Neuromorphic convolutional recurrent neural network for road safety or safety near the road

Compound Effects of Top-down and Bottom-up Influences on Visual Attention During Action Recognition

Hierarchical Convolutional Features for Visual Tracking

Seizure Prediction Through Clustering and Temporal Analysis of Micro Electrocorticographic Data

SALIENCY refers to a component (object, pixel, person) in a

Selective Attention. Modes of Control. Domains of Selection

Outline. Teager Energy and Modulation Features for Speech Applications. Dept. of ECE Technical Univ. of Crete

Attending to Learn and Learning to Attend for a Social Robot

Optical Illusions and Human Visual System: Can we reveal more? CIS Micro-Grant Executive Report and Summary of Results

Fundamentals of Cognitive Psychology, 3e by Ronald T. Kellogg Chapter 2. Multiple Choice

1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1

Sum of Neurally Distinct Stimulus- and Task-Related Components.

Finding an Efficient Threshold for Fixation Detection in Eye Gaze Tracking

Computational Cognitive Science

Dimensional Emotion Prediction from Spontaneous Head Gestures for Interaction with Sensitive Artificial Listeners

A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U

A Novel Capsule Neural Network Based Model For Drowsiness Detection Using Electroencephalography Signals

Increasing Spatial Competition Enhances Visual Prediction Learning

AUTOMATIC HUMAN BEHAVIOUR RECOGNITION AND EXPLANATION FOR CCTV VIDEO SURVEILLANCE

EXEMPLAR-BASED HUMAN INTERACTION RECOGNITION: FEATURES AND KEY POSE SEQUENCE MODEL

High-level Vision. Bernd Neumann Slides for the course in WS 2004/05. Faculty of Informatics Hamburg University Germany

Quantifying location privacy

Feature selection methods for early predictive biomarker discovery using untargeted metabolomic data

Long Term Activity Analysis in Surveillance Video Archives. Ming-yu Chen. June 8, 2010

Neurally Inspired Mechanisms for the Dynamic Visual Attention Map Generation Task

An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns

Real Time Sign Language Processing System

Development of goal-directed gaze shift based on predictive learning

Neuro-Inspired Statistical. Rensselaer Polytechnic Institute National Science Foundation

An Evaluation of Motion in Artificial Selective Attention

Transcription:

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition Stefan Mathe, Cristian Sminchisescu Presented by Mit Shah

Motivation Current Computer Vision Annotations subjectively defined Intermediate levels of computation?? 2

Motivation Lack of large scale datasets that provide recordings of the workings of the human visual system 3

Previous Work... Study of Gaze patterns in Humans A person browsing reddit with the F-shaped pattern 4

Previous Work... Study of Gaze patterns in Humans Inter-observer consistency 5

Previous Work... Study of Gaze patterns in Humans Inter-observer consistency Bottom-up Features 6

Previous Work... Study of Gaze patterns in Humans Inter-observer consistency Bottom-up Features Human Fixations 7

Previous Work... Study of Gaze patterns in Humans Inter-observer consistency Bottom-up Features Human Fixations Models of saliency 8

Previous Work... Study of Gaze patterns in Humans Inter-observer consistency Bottom-up Features Human Fixations Models of saliency Uses of Saliency maps Action Recognition Object Localization Scene Classification 9

Previous Work... Study of Gaze patterns in Humans Inter-observer consistency Bottom-up Features Human Fixations Models of saliency Uses of Saliency maps Previous data sets At most few hundred videos recorded under free viewing conditions 10

Contributions... (1) Extended existing large scale datasets Hollywood-2 and UCF Sports 11

Contributions... (2) Dynamic consistency and alignment measures AOI Markov Dynamics Temporal AOI Alignment 12

Contributions... (3) Training an End-to-End automatic visual action recognition system 13

Data Collection... Hollywood-2 Movie Dataset 12 classes 69 movies 823/884 split 487k frames 20 hr Largest and Most challenging dataset Answering phone, driving a car, eating, fighting, etc. 14

Data Collection... UCF Sports Action Dataset Broadcast of television channels 150 videos covering 9 sports action classes Diving, golf swinging, kicking, etc.. 15

Data Collection... Extending the two data sets Many other Specifications Timings/Durations & Breaks SMI iview X HiSpeed 1250 Tower-Mounted Eye Tracker Context Recognition 19 Humans ed d i v Di 3 into ks s Ta Action Recognition Free Viewing TASKS Recording Environment Recording Protocol 16

Static & Dynamic Consistency Action Recognition by Humans Goal & Importance Human errors Co Occurring Actions False Positives Mislabeling Videos 17

Static Consistency Among Subjects How well the regions fixated by human subjects agree on a frame by frame basis? Evaluation Protocol 18

Static Consistency Among Subjects 19

The Influence of Task on Eye Movements SA \ {s} Derive Saliency Maps Predict Fixations of Subject s na prediction scores na Times SA Derive Saliency Maps Evaluate average prediction score for s in SB nb prediction scores Hypothesis Independent 2-sample T-test with unequal variances p-value >= 0.5? 20

The Influence of Task on Eye Movements Results - 21

Dynamic Consistency Among Subjects Spatial distribution - highly consistent Significant consistency in the order also?? Automatic Discovery of AOIs & 2 metrics AOI Markov dynamics Temporal AOI alignment 22

Scanpath representation Human fixations - tightly clustered Assigning to closest AOI Trace the scan path 23

Automatically Finding AOIs Clustering the fixations of all subjects in a frame Start K-Means with 1 cluster Successively Increase until the sum of squared errors drops below a threshold Each fixation assigned to the closest AOI at the time of creation Link centroids from successive frames into tracks Each resulting track becomes an AOI 24

Automatically Finding AOIs. 25

AOI Markov Dynamics Transitions of human visual attention between AOIs by.. n ma ion u H xat f i Fi ring St Fixated at AOI a @ time t-1 Probability of Transitioning to AOI b @ time t 26

Temporal AOI Alignment Longest Common Subsequence?? Able to handle gaps and missing elements 27

Evaluation Pipeline Interest Point Operator Descriptor Input: A video Output: A set of spatio-temporal coordinates Spacetime generalization of the HoG & MBH from optical flow Visual Dictionary Cluster descriptors into 4000 Visual words using K-means Classifiers RBF-2 kernel and Multiple Kernel Learning (MKL) framework 28

Human Fixation Studies Human vs. Computer Vision Operators Fixations as interest point detector Findings Low correlation Why?? 29

Impact of Human Saliency Maps for Computer Visual Action Recognition Saliency maps encoding only the weak surface structure of fixations (no time ordering), can be used to boost the accuracy of contemporary methods 30

Saliency Map Prediction Static Features Motion Features AUC & Spatial KL Divergence 31

Automatic Visual Action Recognition 32

Conclusions Combining Human + Computer Vision Extending Dataset Evaluating Static & Dynamic Consistency Human Fixations -> Saliency Maps End-to-End Action Recognition System 33

Thanks! 34