Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling

Similar documents
Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition

Validating the Visual Saliency Model

VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING

GESTALT SALIENCY: SALIENT REGION DETECTION BASED ON GESTALT PRINCIPLES

Learning Spatiotemporal Gaps between Where We Look and What We Focus on

What we see is most likely to be what matters: Visual attention and applications

Saliency aggregation: Does unity make strength?

On the implementation of Visual Attention Architectures

A Hierarchical Visual Saliency Model for Character Detection in Natural Scenes

Methods for comparing scanpaths and saliency maps: strengths and weaknesses

Can Saliency Map Models Predict Human Egocentric Visual Attention?

Computational Cognitive Science

Learning to Predict Saliency on Face Images

MEMORABILITY OF NATURAL SCENES: THE ROLE OF ATTENTION

Computational Cognitive Science. The Visual Processing Pipeline. The Visual Processing Pipeline. Lecture 15: Visual Attention.

Visual Saliency Based on Multiscale Deep Features Supplementary Material

PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks

An Experimental Analysis of Saliency Detection with respect to Three Saliency Levels

A Visual Saliency Map Based on Random Sub-Window Means

Action Recognition. Computer Vision Jia-Bin Huang, Virginia Tech. Many slides from D. Hoiem

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian

Computational modeling of visual attention and saliency in the Smart Playroom

An Attentional Framework for 3D Object Discovery

Putting Context into. Vision. September 15, Derek Hoiem

Dynamic Visual Attention: Searching for coding length increments

Computational Cognitive Science

Hierarchical Convolutional Features for Visual Tracking

Computational Models of Visual Attention: Bottom-Up and Top-Down. By: Soheil Borhani

An Integrated Model for Effective Saliency Prediction

A Model for Automatic Diagnostic of Road Signs Saliency

VENUS: A System for Novelty Detection in Video Streams with Learning

Saliency in Crowd. Ming Jiang, Juan Xu, and Qi Zhao

Local Image Structures and Optic Flow Estimation

An Evaluation of Motion in Artificial Selective Attention

Saliency in Crowd. 1 Introduction. Ming Jiang, Juan Xu, and Qi Zhao

Compound Effects of Top-down and Bottom-up Influences on Visual Attention During Action Recognition

Finding Saliency in Noisy Images

Learning a Combined Model of Visual Saliency for Fixation Prediction Jingwei Wang, Ali Borji, Member, IEEE, C.-C. Jay Kuo, and Laurent Itti

Keywords- Saliency Visual Computational Model, Saliency Detection, Computer Vision, Saliency Map. Attention Bottle Neck

An Attention-based Activity Recognition for Egocentric Video

Dynamic Eye Movement Datasets and Learnt Saliency Models for Visual Action Recognition

The University of Tokyo, NVAIL Partner Yoshitaka Ushiku

Relevance of Computational model for Detection of Optic Disc in Retinal images

Adding Shape to Saliency: A Computational Model of Shape Contrast

A LEARNING-BASED VISUAL SALIENCY FUSION MODEL FOR HIGH DYNAMIC RANGE VIDEO (LBVS-HDR)

The Attraction of Visual Attention to Texts in Real-World Scenes

On the role of context in probabilistic models of visual saliency

HUMAN VISUAL PERCEPTION CONCEPTS AS MECHANISMS FOR SALIENCY DETECTION

Attentional Masking for Pre-trained Deep Networks

Human Activities: Handling Uncertainties Using Fuzzy Time Intervals

Webpage Saliency. National University of Singapore

Neurally Inspired Mechanisms for the Dynamic Visual Attention Map Generation Task

Advertisement Evaluation Based On Visual Attention Mechanism

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

Group Behavior Analysis and Its Applications

Object-Level Saliency Detection Combining the Contrast and Spatial Compactness Hypothesis

Efficient Salient Region Detection with Soft Image Abstraction

Object-based Saliency as a Predictor of Attention in Visual Tasks

The discriminant center-surround hypothesis for bottom-up saliency

Neuro-Inspired Statistical. Rensselaer Polytechnic Institute National Science Foundation

FEATURE EXTRACTION USING GAZE OF PARTICIPANTS FOR CLASSIFYING GENDER OF PEDESTRIANS IN IMAGES

Saliency Prediction with Active Semantic Segmentation

Detection of Inconsistent Regions in Video Streams

Recurrent Refinement for Visual Saliency Estimation in Surveillance Scenarios

Detection of terrorist threats in air passenger luggage: expertise development

A Study on Automatic Age Estimation using a Large Database

A Computational Model of Saliency Depletion/Recovery Phenomena for the Salient Region Extraction of Videos

Improving the Interpretability of DEMUD on Image Data Sets

Contribution of Color Information in Visual Saliency Model for Videos

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

A Data-driven Metric for Comprehensive Evaluation of Saliency Models

Motivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience.

Incorporating Audio Signals into Constructing a Visual Saliency Map

Multi-attention Guided Activation Propagation in CNNs

Real-time computational attention model for dynamic scenes analysis

arxiv: v1 [cs.cv] 1 Sep 2016

Medical Image Analysis

Where should saliency models look next?

Aggregated Sparse Attention for Steering Angle Prediction

Salient Object Detection in RGB-D Image Based on Saliency Fusion and Propagation

A Locally Weighted Fixation Density-Based Metric for Assessing the Quality of Visual Saliency Predictions

Measuring the attentional effect of the bottom-up saliency map of natural images

Predicting When Saliency Maps are Accurate and Eye Fixations Consistent

The Role of Top-down and Bottom-up Processes in Guiding Eye Movements during Visual Search

Deriving an appropriate baseline for describing fixation behaviour. Alasdair D. F. Clarke. 1. Institute of Language, Cognition and Computation

Relative Influence of Bottom-up & Top-down Attention

Research Article Learning-Based Visual Saliency Model for Detecting Diabetic Macular Edema in Retinal Image

arxiv: v1 [cs.cv] 3 Sep 2018

Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM

Evaluation of the Impetuses of Scan Path in Real Scene Searching

A Model of Saliency-Based Visual Attention for Rapid Scene Analysis

Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction

Development of goal-directed gaze shift based on predictive learning

Learning Saliency Maps for Object Categorization

VISUAL attention is an important characteristic of the

arxiv: v2 [cs.cv] 19 Dec 2017

RECENT progress in computer visual recognition, in particular

Salient Object Detection in Videos Based on SPATIO-Temporal Saliency Maps and Colour Features

Available online at ScienceDirect. Procedia Computer Science 54 (2015 )

Transcription:

AAAI -13 July 16, 2013 Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling Sheng-hua ZHONG 1, Yan LIU 1, Feifei REN 1,2, Jinghuan ZHANG 2, Tongwei REN 3 1 Department of Computing, The Hong Kong Polytechnic University 2 School of Psychology, Shandong Normal University 3 Software Institute, Nanjing University 1

Outline Introduction to video saliency detection Spatio-temporal attention technique Experiments and results Conclusion and future work 2

Outline Introduction to video saliency detection Spatio-temporal attention technique Experiments and results Conclusion and future work 3

Introduction to Saliency Map Definition of saliency map The most famous attention model, referred to as the allocation of processing resources Measure of conspicuity and calculate the likelihood of a location to attract attention [Koch et. al, Hum Neurobiol, 1985] Motivation of constructing saliency map (a) original image (b) Eye- fixation locations (c) Ground truth of saliency map Provide predictions about which regions are likely to attract observers attention Be useful to image/video representation (Wang et al. ICME,2007), object detection and recognition (Yu et al. ACMMM, 2010), object tracking (Yilmaz et al. CSUR, 2006), and robotics controls (Jiang & Crookes, AAAI, 2012) 4

Video Saliency Detection Definition of video saliency map Calculate the salient degree of each location both in spatial and in temporal areas [Li et al. AAAI, 2012] Not much work has been extended to video sequences where motion plays an important role Two pathways simulation Video saliency detection procedure are divided into spatial and temporal channels [Marat et al., IJCV, 2009] corresponding to the magnocellular and parvocellular pathways Classical optical flow model is the most widely used motion detection approaches in video saliency detection Classical optical flow model in saliency detection The independent calculation of each frame pair leads to high computational complexity The continuous motion of the prominent object cannot be popped out 5

Outline Introduction to video saliency detection Spatio-temporal attention technique Experiments and results Conclusion and future work 6

Framework of Spatio-temporal Attention Technique Input Video Im n Im 1 I m I 1 Motion Estimation Feature Maps Extraction Temporal Sequence Activation Maps Activation Optical flow field Map Spatial Saliency Maps Normalization Normalization Temporal Saliency Maps Spatial Saliency Map Construction Temporal Saliency Map Construction Spatio-Temporal Saliency Fusion FSmap m Output Saliency 7

Temporal Saliency Map Construction Basic idea Emphasize the dynamic continuity of neighbor locations in the same frame Emphasize the dynamic continuity of same locations in the temporal domain argmin E( u, v, uˆ, vˆ ) i, j uv,, n { f [ ( I ( i, j) I ( i ku, j kv ))] D m mk i, j i, j kn [ f ( u u ) f ( u u ) f ( v v ) f ( v v )]} 1 S i, j i1, j S i, j i, j1 S i, j i1, j S i, j i, j1 ( u uˆ v vˆ ) ( uˆ uˆ uˆ uˆ ) 2 3 i, j i, j i, j i, j i, j ( i, j ) N i, j s. t. u v ( i, j ),( i, j ) argmax( u v ) 2 2 * * * * 2 2 * * * * i, j i, j o i, j i, j i, j Dynamic continuity of same locations in temporal domain Dynamic continuity of neighbor locations in same frame Smoothness within a region in the auxiliary flow field Constraint based on the observation standard deviation of human visual system 8

Dynamic Consistent Saliency Detection 9

Spatial Saliency Map and Spatio- Temporal Saliency Fusion Spatial saliency map construction Extraction: multiple low-level visual features are extracted at multiple scales Activation: activation maps are built based on multiple low-level feature maps Normalization: saliency map is constructed by a normalized combination of the activation map Spatio-temporal saliency fusion Different fusions methods can be utilized, such as mean fusion, max fusion, and multiplicative fusion Max integration method has best performance [Marat et al., IJCV, 2009] FSmap max( SSmap, TSmap) SSmap: Spatial saliency map ; TSmap: Temporal saliency map; FSmap: Spatio-temporal saliency map. 10

Outline Introduction to video saliency detection Spatio-temporal attention technique Experiments and results Conclusion and future work 11

Experiment Setting Datasets Hollywood2 natural dynamic human scene videos dataset [Marszallek et al., CVPR, 2009] Ten different natural environments, including: house, road, bedroom and so on Three typical CNN Headline news videos Each video clip is approximately 30 seconds and the frame rate is 30 frames/second Resolution is 640 360 Subset of the largest real world actions video dataset with human fixations 12 categories, 884 videos clips, including: answering phone, driving car, eating and so on 16 subjects fixations First 5 video clips from every category Compared algorithms Temporal saliency detection models Classical optial flow model (COF) [Horn & Schunck, AI, 1981] [Black & Anandan, CVIU, 1996] Spatial continuous optical flow model (SOF) [Sun et al., CVPR, 2010] Spatio saliency detection models Itti saliency model (Itti) [Itti et al., PAMI, 1998], graph based saliency map (GBVS) [Harel et al., NIPS, 2007] 12

Experiments on Natural Dynamic Scene Videos Dataset Hollywood2 natural dynamic human scene videos dataset Experiments on face detection Higher level visual cortex regions influence the human s attention in a top-down manner; Humans often fixate on people and face; Face detection region is often added into saliency map as a high level feature [Judd et al., NIPS 2009] [Mathe & Sminchisescu, ECCV, 2012] 13

Face Saliency Detection on Natural Dynamic Scene Videos Table. Face saliency detection on natural dynamic scene videos Model Average Saliency Value Average Detection Accuracy DCOF 0.6501 0.8252 COF 0.6018 0.7537 SOF 0.6393 0.7782 (a) Original frame image (b) Face detection result (c) Saliency map of DCOF (d) Saliency map visualization (e) Saliency map of SOF (f) Saliency map visualization 14

Experiments on News Headline Videos Dataset Three typical CNN Headline news videos Compared algorithms Temporal saliency detection models COF, SOF Experiments Efficiency comparison Effectiveness comparison Table. Efficiency comparison on the news headline videos Model DCOF COF SOF Running Time per Frame (s) 33.12 46.24 53.88 Output Frame Ratio 0.4 1 1 15

Experiments on News Headline Videos Figure. Temporal saliency detection result. 16

Experiments on Eye-tracking Action Videos Dataset: Largest real world actions video dataset with human fixations Compared algorithms Temporal saliency detection models COF, SOF Spatio-temporal saliency detection models (Itti, GBVS)+ (COF, SOF) Two experiments Sample video with eye-tracking fixations Average receiver operating characteristic (ROC) areas Average receiver operating characteristic (ROC) curves 17

ROC Area Comparison The area under the ROC curve to demonstrate the performance of a saliency model ROC Area DCOF COF SOF Answer phone 0.6098 0.5303 0.5910 Drive car 0.5233 0.4817 0.5195 Eat 0.6902 0.6598 0.6644 Fight 0.6045 0.5535 0.6005 Get out car 0.5260 0.4874 0.5212 Hand shake 0.6993 0.6485 0.6934 Hug 0.6402 0.5602 0.5996 Kiss 0.5833 0.5120 0.5503 Run 0.5535 0.5104 0.5496 Sit down 0.5183 0.4761 0.5074 Sit up 0.5171 0.4871 0.5006 Stand up 0.5602 0.5269 0.5601 18

ROC Curve Comparison ROC curve is plotted as the False Positive Rate vs. Hit Rate 19

Outline Introduction to video saliency detection Spatio-temporal attention technique Experiments and results Conclusion and future work 20

Conclusion and Future Work Conclusion Emphasize the dynamic consistency of neighbor locations in the same frame and same locations in the temporal domain Effective prominent object detection and coverage Better efficiency and less storage space Future work Jointly optimize the spatial and temporal saliency detection together 21

Reference [Black, M. J. & Anandan, P. 1996. The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. In Computer Vision and Image Understanding, 63, pages 75 104. Harel, J., Koch, C., and Perona, P.. 2007. Graph-based visual saliency. In Proceedings of the 20 th Annual Conference on Neural Information Processing Systems (NIPS), pages 545-552. Horn, B. & Schunck, B. 1981. Determining optical flow. In Artificial Intelligence, 16, pages 185 203. Jiang, J.R. & Crookes, D.. 2012. Visual saliency estimation through manifold learning. In Proceedings of the 26 th AAAI Conference on Artificial Intelligence, pages 2003-2010. Judd, T., Ehinger, K., Durand, F. and Torralba, A.. 2009. Learning to predict where humans look. In Proceedings of the IEEE 12 th International Conference on Computer Vision (ICCV), pages 2106-2113. Itti, L., Koch, C. and Niebur, E.. 1998. A model of saliency-based visual attention for rapid scene analysis. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), pages 1254-1259. Koch, C. & Ullman, S.. 1985. Shifts in selective visual attention: Towards the Underlying Neural Circuitry. In Human Neurobiology, pages 219-227. Li, B., Xiong, W.H., Hu, W.M.. 2012. Visual saliency map from tensor analysis. In Proceedings of the 26 th AAAI Conference on Artificial Intelligence, pages 1585-1591. Marszalłek, M., Laptev, I., and Schmid, C.. 2009. Actions in Context. In Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). Marat, S., Ho Phuoc, T., Granjon, L., Guyader, N., Pellerin, D., Guérin-DuguéA.. 2009. Modelling spatio-temporal saliency to predict gaze direction for short videos. In International Journal of Computer Vision, 82(3), pages 231 243. Mathe, S. & Sminchisescu, C.. 2012. Dynamic Eye Movement Datasets and Learnt Saliency Models for Visual Action Recognition, In Proceedings of the 9th European Conference on Computer Vision (ECCV), pages 842-856. Sun, D., Roth, S., Black, M. J.. 2010. Secrets of Optical Flow Estimation and Their Principles. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). 22

Reference Wang, T., Mei, T., Hua, X.-S., Liu, X., and Zhou, H.-Q. 2007. Video collage: A novel presentation of video sequence. In Proceedings of the 2009 IEEE International Conference on Multimedia and Expo (ICME), pages 1479--1482. Yilmaz, A., Javed, O., and Shah, M.. 2006. Object Tracking: A Survey, In ACM Computing Surveys, 38(4), pages 13. Yu, H., Li, J., Tian, Y., Huang, T., 2010. Automatic interesting object extraction from images using complementary saliency maps, In Proceedings of the 18 th ACM International Conference on Multimedia (ACMMM), pages 891-894. 23

Q & A Thank You! 24