Putting Context into. Vision. September 15, Derek Hoiem

Similar documents
Computational Cognitive Science

Neuro-Inspired Statistical. Rensselaer Polytechnic Institute National Science Foundation

On the role of context in probabilistic models of visual saliency

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling

Action Recognition. Computer Vision Jia-Bin Huang, Virginia Tech. Many slides from D. Hoiem

Vision: Over Ov view Alan Yuille

Medical Image Analysis

Active Deformable Part Models Inference

Part 1: Bag-of-words models. by Li Fei-Fei (Princeton)

Validating the Visual Saliency Model

CONSTRAINED SEMI-SUPERVISED LEARNING ATTRIBUTES USING ATTRIBUTES AND COMPARATIVE

EECS 433 Statistical Pattern Recognition

Beyond R-CNN detection: Learning to Merge Contextual Attribute

The Attraction of Visual Attention to Texts in Real-World Scenes

Hierarchical Convolutional Features for Visual Tracking

Keywords- Saliency Visual Computational Model, Saliency Detection, Computer Vision, Saliency Map. Attention Bottle Neck

An Analysis of Automatic Gender Classification

Recognizing Jumbled Images: The Role of Local and Global Information in Image Classification

An Analysis of Automatic Gender Classification

Vision as Bayesian inference: analysis by synthesis?

Attentional Masking for Pre-trained Deep Networks

Gist of the Scene. Aude Oliva ABSTRACT II. THE NATURE OF THE GIST I. WHAT IS THE GIST OF A SCENE? A. Conceptual Gist CHAPTER

Face Gender Classification on Consumer Images in a Multiethnic Environment

Learning Saliency Maps for Object Categorization

Human Activities: Handling Uncertainties Using Fuzzy Time Intervals

Fusing Generic Objectness and Visual Saliency for Salient Object Detection

MEMORABILITY OF NATURAL SCENES: THE ROLE OF ATTENTION

Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs

CS4495 Computer Vision Introduction to Recognition. Aaron Bobick School of Interactive Computing

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

Competing Frameworks in Perception

Competing Frameworks in Perception

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian

The University of Tokyo, NVAIL Partner Yoshitaka Ushiku

Shu Kong. Department of Computer Science, UC Irvine

A Hierarchical Visual Saliency Model for Character Detection in Natural Scenes

Detection of Glaucoma and Diabetic Retinopathy from Fundus Images by Bloodvessel Segmentation

CS221 / Autumn 2017 / Liang & Ermon. Lecture 19: Conclusion

Local Image Structures and Optic Flow Estimation

Shu Kong. Department of Computer Science, UC Irvine

Natural Scene Categorization: from Humans to Computers


Motivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience.

Learning Exemplar-Based Categorization for the Detection of Multi-View Multi-Pose Objects

VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING

DEEP LEARNING BASED VISION-TO-LANGUAGE APPLICATIONS: CAPTIONING OF PHOTO STREAMS, VIDEOS, AND ONLINE POSTS

An Attentional Framework for 3D Object Discovery

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Leveraging Cognitive Context for Object Recognition

Natural Scene Categorization: from Humans to Computers

Neuromorphic convolutional recurrent neural network for road safety or safety near the road

Object and Gist Perception in a Dual Task Paradigm: Is Attention Important?

Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags

Recognizing Scenes by Simulating Implied Social Interaction Networks

Predicting Memorability of Images Using Attention-driven Spatial Pooling and Image Semantics

Object Recognition: Conceptual Issues. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman

Relative Influence of Bottom-up & Top-down Attention

VENUS: A System for Novelty Detection in Video Streams with Learning

Webpage Saliency. National University of Singapore

UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society

A Visual Saliency Map Based on Random Sub-Window Means

Spatial Cognition for Mobile Robots: A Hierarchical Probabilistic Concept- Oriented Representation of Space

Finding Saliency in Noisy Images

Goals. Visual behavior jobs of vision. The visual system: overview of a large-scale neural architecture. kersten.org

Object-Level Saliency Detection Combining the Contrast and Spatial Compactness Hypothesis

Object Detectors Emerge in Deep Scene CNNs

Chapter 5: Perceiving Objects and Scenes

Probabilistic Models of the Cortex: Stat 271. Alan L. Yuille. UCLA.

Region Proposals. Jan Hosang, Rodrigo Benenson, Piotr Dollar, Bernt Schiele

Efficient Salient Region Detection with Soft Image Abstraction

Recurrent Refinement for Visual Saliency Estimation in Surveillance Scenarios

Hybrid HMM and HCRF model for sequence classification

Predicting the location of interactees in novel human-object interactions

Automated Volumetric Cardiac Ultrasound Analysis

Implementation of Automatic Retina Exudates Segmentation Algorithm for Early Detection with Low Computational Time

Domain Generalization and Adaptation using Low Rank Exemplar Classifiers

Deep Networks and Beyond. Alan Yuille Bloomberg Distinguished Professor Depts. Cognitive Science and Computer Science Johns Hopkins University

Role of Color in Face Recognition

Automatic Context-Aware Image Captioning

Estimating scene typicality from human ratings and image features

Auditory gist perception and attention

Facial expression recognition with spatiotemporal local descriptors

Object-based Saliency as a Predictor of Attention in Visual Tasks

Available online at ScienceDirect. Procedia Computer Science 54 (2015 )

Can Saliency Map Models Predict Human Egocentric Visual Attention?

M.Sc. in Cognitive Systems. Model Curriculum

Designing Caption Production Rules Based on Face, Text and Motion Detections

Computational Cognitive Neuroscience

Reading Assignments: Lecture 18: Visual Pre-Processing. Chapters TMB Brain Theory and Artificial Intelligence

Improving Saliency Models by Predicting Human Fixation Patches

1 Introduction. Fig. 1 Examples of microaneurysms

Adding Shape to Saliency: A Computational Model of Shape Contrast

The discriminant center-surround hypothesis for bottom-up saliency

Top down saliency estimation via superpixel-based discriminative dictionaries

Assigning Relative Importance to Scene Elements

Saliency Prediction with Active Semantic Segmentation

Describing Clothing by Semantic Attributes

How do Categories Work?

An Evaluation of Motion in Artificial Selective Attention

Transcription:

Putting Context into Vision Derek Hoiem September 15, 2004

Questions to Answer What is context? How is context used in human vision? How is context currently used in computer vision? Conclusions

Context What is context? Any data or meta-data not directly produced by the presence of an object Nearby image data

Context Context What is context? Any data or meta-data not directly produced by the presence of an object Nearby image data Scene information

Tree What is context? Any data or meta-data not directly produced by the presence of an object Nearby image data Scene information Presence, locations of other objects

How do we use context?

Attention Are there any live fish in this picture?

Clues for Function What is this?

Clues for Function What is this? Now can you tell?

Low-Res Scenes What is this?

Low-Res Scenes What is this? Now can you tell?

More Low-Res What are these blobs?

More Low-Res The same pixels! (a car)

Why is context useful? Objects defined at least partially by function Trees grow in ground Birds can fly (usually) Door knobs help open doors

Why is context useful? Objects defined at least partially by function Context gives clues about function Not rooted into the ground not tree Object in sky {cloud, bird, UFO, plane, superman} Door knobs always on doors

Why is context useful? Objects defined at least partially by function Context gives clues about function Objects like some scenes better than others Toilets like bathrooms Fish like water

Why is context useful? Objects defined at least partially by function Context gives clues about function Objects like some scenes better than others Many objects are used together and, thus, often appear together Kettle and stove Keyboard and monitor

How is context used in computer vision?

Neighbor-based Context Markov Random Field (MRF) incorporates contextual constraints sites (or nodes) neighbors

Blobs and Words Carbonetto 2004 Neighbor-based context (MRF) useful even when training data is not fully supervised Learns models of objects given captioned images

Discriminative Random Fields Kumar 2003 Using data surrounding the label site (not just at the label site) improves results Buildings vs. Non-Buildings

Multi-scale Conditional Random Field (mcrf) He 2004 Local context Raw image Independent databased labels Scene context

mcrf Final decision based on Classification (local data-based) Local labels (what relation nearby objects have to each other Image-wide labels (captures coarse scene context)

mcrf Results

Neighbor-based Context References P. Carbonetto, N. Freitas and K. Barnard. A Statistical Model for General Contextual Object Recognition, ECCV, 2004 S. Kumar and M. Hebert, Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification, ICCV, 2003 X. He, R. Zemel and M. Carreira-Perpiñán, Multiscale Conditional Random Fields for Image Labeling, CVPR, 2004

Scene-based Context Average pictures containing heads at three scales

Context Priming Torralba 2001/2003 Local evidence (what everyone uses) Pose Scale Location Object Presence Image Measurements Context (generally ignored) Pose and Shape Priming Scale Selection Focus of Attention Object Priming

Getting the Gist of a Scene Simple representation Spectral characteristics (e.g., Gabor filters) with coarse description of spatial arrangement PCA reduction Probabilities modeled with mixture of Gaussians (2003) or logistic regression (Murphy 2003)

Context Priming Results Object Presence Focus of Attention Scale Selection Small Large

Using the Forrest to See the Trees Murphy (2003) Adaboost Patch-Based Object Detector Gist of Scene (boosted regression) Detector Confidence at Location/Scale Expected Location/Scale of Object Combine (logistic regression) Object Probability at Location/Scale

Object Detection + Scene Context Results Keyboard Screen People Often doesn t help that much May be due to poor use of context Assumes independence of context and local evidence Only uses expected location/scale from context

Scene-based Context References E. Adelson, On Seeing Stuff: The Perception of Materials by Humans and Machines, SPIE, 2001 B. Bose and E. Grimson, Improving Object Classification in Far-Field Video, ECCV, 2004 K. Murphy, A. Torralba and W. Freeman, Using the Forrest to See the Trees: A Graphical Model Relating Features, Object, and Scenes, NIPS, 2003 U. Rutishauser, D. Walther, C. Koch, and P. Perona, Is bottom-up attention useful for object recognition?, CVPR, 2004 A. Torralba, Contextual Priming for Object Detection, IJCV, 2003 A. Torralba and P. Sinha, Statistical Context Priming for Object Detection, ICCV, 2001 A. Torralba, K. Murphy, W. Freeman, and M. Rubin, Context-Based Vision System for Place and Object Recognition, ICCV, 2003

Object-based Context

Raw Image Mutual Boosting Fink (2003) Local Window Contextual Window Filters eyes faces + + Confidence Object Likelihoods

Mutual Boosting Results Learned Features First-Stage Classifier (MIT+CMU)

Contextual Models using BRFs Torralba2004 Template features Build structure of CRF using boosting Other objects locations likelihoods propagate through network

Labeling a Street Scene

Labeling an Office Scene F = local G = compatibility

Object-based Context References M. Fink and P. Perona, Mutual Boosting for Contextual Inference, NIPS, 2003 A. Torralba, K. Murphy, and W. Freeman, Contextual Models for Object Detection using Boosted Random Fields, AI Memo 2004-013, 2004

What else can be done?

Scene Structure Improve understanding of scene structure Floor, walls, ceiling Sky, ground, roads, buildings

Semantics vs. Low-level High-D Low-Level Semantics Image Statistics High-D Image Statistics High-D Statistics Useful to OD Statistics Useful to OD Semantics High-D Presence of Object High-D Low-D Presence of Object

Putting it all together Context Priming Scene Gist Neighbor-Based Context Basic Structure Identification Object Detection Object-Based Context Scene-Based Context Scene Recognition

Summary Neighbor-based context Using nearby labels essential for complete labeling tasks Using nearby labels useful even without completely supervised training data Using nearby labels and nearby data is better than just using nearby labels Labels can be used to extract local and scene context

Summary Scene-based context Gist representation suitable for focusing attention or determining likelihood of object presence Scene structure would provide additional useful information (but difficult to extract) Scene label would provide additional useful information

Summary Object-based context Even simple methods of using other objects locations improve results (Fink) Using BRFs, systems can automatically learn to find easier objects first and to use those objects as context for other objects

Conclusions General Few object detection researchers use context Context, when used effectively, can improve results dramatically A more integrated approach to use of context and data could improve image understanding

References E. Adelson, On Seeing Stuff: The Perception of Materials by Humans and Machines, SPIE, 2001 B. Bose and E. Grimson, Improving Object Classification in Far-Field Video, ECCV, 2004 P. Carbonetto, N. Freitas and K. Barnard. A Statistical Model for General Contextual Object Recognition, ECCV, 2004 M. Fink and P. Perona, Mutual Boosting for Contextual Inference, NIPS, 2003 X. He, R. Zemel and M. Carreira-Perpiñán, Multiscale Conditional Random Fields for Image Labeling, CVPR, 2004 S. Kumar and M. Hebert, Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification, ICCV, 2003 J. Lafferty, A. McCallum and F. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, ICML, 2001 K. Murphy, A. Torralba and W. Freeman, Using the Forrest to See the Trees: A Graphical Model Relating Features, Object, and Scenes, NIPS, 2003 U. Rutishauser, D. Walther, C. Koch, and P. Perona, Is bottom-up attention useful for object recognition?, CVPR, 2004 A. Torralba, Contextual Priming for Object Detection, IJCV, 2003 A. Torralba and P. Sinha, Statistical Context Priming for Object Detection, ICCV, 2001 A. Torralba, K. Murphy, and W. Freeman, Contextual Models for Object Detection using Boosted Random Fields, AI Memo 2004-013, 2004 A. Torralba, K. Murphy, W. Freeman, and M. Rubin, Context-Based Vision System for Place and Object Recognition, ICCV, 2003