Hierarchical Convolutional Features for Visual Tracking

Similar documents
Action Recognition. Computer Vision Jia-Bin Huang, Virginia Tech. Many slides from D. Hoiem

Attentional Masking for Pre-trained Deep Networks

Shu Kong. Department of Computer Science, UC Irvine

Efficient Deep Model Selection

Shu Kong. Department of Computer Science, UC Irvine

Beyond R-CNN detection: Learning to Merge Contextual Attribute

Progressive Attention Guided Recurrent Network for Salient Object Detection

Rich feature hierarchies for accurate object detection and semantic segmentation

Medical Image Analysis

Improving the Interpretability of DEMUD on Image Data Sets

CSE Introduction to High-Perfomance Deep Learning ImageNet & VGG. Jihyung Kil

Multi-attention Guided Activation Propagation in CNNs

HHS Public Access Author manuscript Med Image Comput Comput Assist Interv. Author manuscript; available in PMC 2018 January 04.

Y-Net: Joint Segmentation and Classification for Diagnosis of Breast Biopsy Images

Selecting Patches, Matching Species:

Computational modeling of visual attention and saliency in the Smart Playroom

Neuro-Inspired Statistical. Rensselaer Polytechnic Institute National Science Foundation

Group Behavior Analysis and Its Applications

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling

Object Detectors Emerge in Deep Scene CNNs

Multi-Scale Salient Object Detection with Pyramid Spatial Pooling

arxiv: v1 [cs.cv] 17 Aug 2017

CS-E Deep Learning Session 4: Convolutional Networks

The University of Tokyo, NVAIL Partner Yoshitaka Ushiku

arxiv: v1 [cs.cv] 2 Jun 2017

Task-driven Webpage Saliency

arxiv: v1 [cs.cv] 28 Dec 2018

Object recognition and hierarchical computation

Facial Expression Classification Using Convolutional Neural Network and Support Vector Machine

Differential Attention for Visual Question Answering

Putting Context into. Vision. September 15, Derek Hoiem

Convolutional Neural Networks (CNN)

Visual Saliency Based on Multiscale Deep Features Supplementary Material

Automated diagnosis of pneumothorax using an ensemble of convolutional neural networks with multi-sized chest radiography images

A Deep Multi-Level Network for Saliency Prediction

The Impact of Visual Saliency Prediction in Image Classification

Towards The Deep Model: Understanding Visual Recognition Through Computational Models. Panqu Wang Dissertation Defense 03/23/2017

3D Deep Learning for Multi-modal Imaging-Guided Survival Time Prediction of Brain Tumor Patients

SALIENCY refers to a component (object, pixel, person) in a

Flexible, High Performance Convolutional Neural Networks for Image Classification

Skin cancer reorganization and classification with deep neural network

Automatic Diagnosis of Ovarian Carcinomas via Sparse Multiresolution Tissue Representation

Image Captioning using Reinforcement Learning. Presentation by: Samarth Gupta

A Data-driven Metric for Comprehensive Evaluation of Saliency Models

Chair for Computer Aided Medical Procedures (CAMP) Seminar on Deep Learning for Medical Applications. Shadi Albarqouni Christoph Baur

Deep Networks and Beyond. Alan Yuille Bloomberg Distinguished Professor Depts. Cognitive Science and Computer Science Johns Hopkins University

Cervical cytology intelligent diagnosis based on object detection technology

ITERATIVELY TRAINING CLASSIFIERS FOR CIRCULATING TUMOR CELL DETECTION

Automated Volumetric Cardiac Ultrasound Analysis

Quantifying Radiographic Knee Osteoarthritis Severity using Deep Convolutional Neural Networks

arxiv: v1 [stat.ml] 23 Jan 2017

Learning Convolutional Neural Networks for Graphs

The Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions.

An Integrated Model for Effective Saliency Prediction

LSTD: A Low-Shot Transfer Detector for Object Detection

Motivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience.

arxiv: v3 [cs.cv] 26 May 2018

Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM

CS6501: Deep Learning for Visual Recognition. GenerativeAdversarial Networks (GANs)

Look, Perceive and Segment: Finding the Salient Objects in Images via Two-stream Fixation-Semantic CNNs

Multi-Level Net: a Visual Saliency Prediction Model

VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING

using deep learning models to understand visual cortex

Holistically-Nested Edge Detection (HED)

HUMAN age estimation from faces is an important research. Attended End-to-end Architecture for Age Estimation from Facial Expression Videos

Networks and Hierarchical Processing: Object Recognition in Human and Computer Vision

A Novel Capsule Neural Network Based Model For Drowsiness Detection Using Electroencephalography Signals

Learning a Discriminative Filter Bank within a CNN for Fine-grained Recognition

Design of Palm Acupuncture Points Indicator

Weakly Supervised Coupled Networks for Visual Sentiment Analysis

Mammographic Breast Density Classification by a Deep Learning Approach

Delving into Salient Object Subitizing and Detection

Research Article Multiscale CNNs for Brain Tumor Segmentation and Diagnosis

Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction

Object-Level Saliency Detection Combining the Contrast and Spatial Compactness Hypothesis

Network Dissection: Quantifying Interpretability of Deep Visual Representation

Deep Visual Attention Prediction

Social Group Discovery from Surveillance Videos: A Data-Driven Approach with Attention-Based Cues

B657: Final Project Report Holistically-Nested Edge Detection

A convolutional neural network to classify American Sign Language fingerspelling from depth and colour images

The Importance of Time in Visual Attention Models

On the Selective and Invariant Representation of DCNN for High-Resolution. Remote Sensing Image Recognition

arxiv: v2 [cs.cv] 29 Jan 2019

Simultaneous Estimation of Food Categories and Calories with Multi-task CNN

Efficient Salient Region Detection with Soft Image Abstraction

DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation

Age Estimation based on Multi-Region Convolutional Neural Network

A NEW HUMANLIKE FACIAL ATTRACTIVENESS PREDICTOR WITH CASCADED FINE-TUNING DEEP LEARNING MODEL

arxiv: v2 [cs.cv] 19 Dec 2017

DeSTIN: A Scalable Deep Learning Architecture with Application to High-Dimensional Robust Pattern Recognition

arxiv: v1 [cs.cv] 3 Sep 2018

Interpreting CNN Models for Apparent Personality Trait Regression

Automatic Beautification for Group-photo Facial Expressions using Novel Bayesian GANs

DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation

International Journal of Advanced Computer Technology (IJACT)

Face Gender Classification on Consumer Images in a Multiethnic Environment

EDGE DETECTION. Edge Detectors. ICS 280: Visual Perception

An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns

Transcription:

Hierarchical Convolutional Features for Visual Tracking Chao Ma Jia-Bin Huang Xiaokang Yang Ming-Husan Yang SJTU UIUC SJTU UC Merced ICCV 2015

Background Given the initial state (position and scale), estimate the unknown states in the subsequence frames Model-free Single target visual tracking 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 2

Real-Applications with Tracking Images from Google Search 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 3

Challenges I 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 4

Challenges II Challenges = significant appearance variations over time!!! 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 5

Convolutional Neural Networks Show significant advantages on a wide range of computer vision problems: image classification, object detection, object recognition et al. AlexNet (NIPS 12) 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 6

Typical Tracking Framework Incrementally learn classifiers to separate targets from background (online learning to adapt to appearance changes) MIL (CVPR 09), Struck (ICCV 11), CT (ECCV 12), ASLA (CVPR 12), MEEM (ECCV 14), etc. 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 7

Existing CNN Trackers DLT (NIPS'13), LHF (TIP'15), DeepTrack (BMVC'14), CNN-SVM (ICML'15), MDNet (CVPR 16) This figure credits to Li et al. in the DeepTrack (BMVC 14) 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 8

Issues of Existing CNN Trackers Only use the last (fully-connected) layer of the CNN network for classification Too coarse to localize target precisely Sample target states with binary labels (positive and negative) Ambiguity in labeling the spatially over-correlated samples MDNet (CVPR 16): negative mining Struck (ICCV 11): structure output 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 9

Issues of Existing CNN Trackers Only use the last (fully-connected) layer of the CNN network for classification Too coarse to localize target precisely Sample target states with binary labels (positive and negative) Ambiguity in labeling the spatially over-correlated samples MDNet (CVPR 16): negative mining Struck (ICCV 11): structure output 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 10

Our Observations Earlier layers retain higher spatial resolution for precise localization. Latter layers capture more semantic information and are robust to appearance changes. Exploit the rich hierarchies for robust visual tracking. 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 11

Toy Example Layer conv5 robust to appearance change: insensitive to the sharp step edge Layer conv3 is useful for precise localization: sensitive to the edge position 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 12

Feature Visualization using VGG-Net-19 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 13

Flowchart of Our Approach 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 14

Issues of Existing CNN Trackers Only use the last (fully-connected) layer of the CNN network for classification Too coarse to localize target precisely Sample target states with binary labels (positive and negative) Ambiguity in labeling the spatially over-correlated samples MDNet (CVPR 16): negative mining Struck (ICCV 11): structure output 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 15

Alleviating Sampling Ambiguity Adaptive correlation filters regress the deep features with soft labels decaying from 1 to 0 Computational efficiency using FFT Convolutional theorem: convolutional filter? correlation filter? Best exploit the contextual cues K. Zhang et al, Fast Visual Tracking via Dense Spatio-Temporal Context Learning, in ECCV 14 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 16

Correlation Filters Correlation filters learning in the spatial domain: Vertical circular shifts of input x with corresponding soft labels generated by a Gaussian function. The first five figures credit to the KCF tracker by Henrisque et al. Use FFT to learn correlation filter in the frequency domain as 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 17

Implementation Details: Feature Interpolation Problem: deeper layers with lower spatial resolution due to the pooling pool5-4 in VGG-Net is of spatial size 7 x 7, which is 1/32 of the input image 224 x 224 Solution: resize each CNN layers with bilinear interpolation Affirm that deconvolution is usually helpful for finer position inference Different conclusion without feature interpolation M. Danelljan et al. Convolutional Features for Correlation Filter Based Visual Tracking. In ICCV 2015 workshop 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 18

Coarse-to-Fine Inference For the l-th CNN layer with channel D, the response map is: Given the location target in the (l-1)-th layer:, locate the 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 19

Model Update Use a moving average scheme to update the numerator and denominator of separately as: 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 20

Experimental Setting Datasets: OTB-50, and OTB-100 Yi Wu et al, Online Object Tracking: A Benchmark, in CVPR, 2013 Yi Wu et al, Object Tracking Benchmark, TPAMI, 2015 Metrics: Distance precision rate Overlap success (intersection of union) rate Validation schemes: OPE: one-pass evaluation TRE: temporal robustness evaluation SRE: spatial robustness evaluation Fix parameters for all sequences 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 21

Overall Results on OTB-50 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 22

Overall Results on OTB-100 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 23

Attribute Evaluation on OTB-50 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 24

Attribute Evaluation on OTB-100 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 25

Ablation Studies Single layer (c5,c4 and c3), combination of the conv5-4 and conv4-4 layers (c5-c4), and concatenation of three layers (c543) 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 26

Qualitative Results I 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 27

Qualitative Results II 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 28

Failure Cases 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 29

Public Sources on This Work Project webpage https://sites.google.com/site/chaoma99/iccv15_tracking Source code https://github.com/jbhuang0604/cf2 Further release the results of nine baseline trackers on OTB- 100 https://sites.google.com/site/chaoma99/iccv15_tracking 5/6/2016 Hierarchical Convolutional Features for Visual Tracking 30

Thanks