Vision: Over Ov view Alan Yuille

Similar documents
Object Recognition: Conceptual Issues. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman

Vision as Bayesian inference: analysis by synthesis?

Probabilistic Models of the Cortex: Stat 271. Alan L. Yuille. UCLA.

Object recognition and hierarchical computation

Human Activities: Handling Uncertainties Using Fuzzy Time Intervals

Neuro-Inspired Statistical. Rensselaer Polytechnic Institute National Science Foundation

Putting Context into. Vision. September 15, Derek Hoiem

Deep Networks and Beyond. Alan Yuille Bloomberg Distinguished Professor Depts. Cognitive Science and Computer Science Johns Hopkins University

Ideal Observers for Detecting Motion: Correspondence Noise

ERA: Architectures for Inference

Sensory Cue Integration

Ideal Observers for Detecting Human Motion: Correspondence Noise.

Competing Frameworks in Perception

Competing Frameworks in Perception

Adding Shape to Saliency: A Computational Model of Shape Contrast

Fusing Generic Objectness and Visual Saliency for Salient Object Detection

Action Recognition. Computer Vision Jia-Bin Huang, Virginia Tech. Many slides from D. Hoiem

Active Deformable Part Models Inference

Signal Detection Theory and Bayesian Modeling

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

Computational Cognitive Neuroscience

AClass: A Simple, Online Probabilistic Classifier. Vikash K. Mansinghka Computational Cognitive Science Group MIT BCS/CSAIL

Part 1: Bag-of-words models. by Li Fei-Fei (Princeton)

Computational Models of Visual Attention: Bottom-Up and Top-Down. By: Soheil Borhani

Learning object categories for efficient bottom-up recognition

Cognitive Penetrability and the Content of Perception

Introduction to Computational Neuroscience

Shu Kong. Department of Computer Science, UC Irvine

The probabilistic approach

Shu Kong. Department of Computer Science, UC Irvine

Chapter 5: Perceiving Objects and Scenes

CS4495 Computer Vision Introduction to Recognition. Aaron Bobick School of Interactive Computing

Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention

IN this paper we examine the role of shape prototypes in

(Visual) Attention. October 3, PSY Visual Attention 1

M.Sc. in Cognitive Systems. Model Curriculum

Natural Scene Categorization: from Humans to Computers

Medical Image Analysis

5/20/2014. Leaving Andy Clark's safe shores: Scaling Predictive Processing to higher cognition. ANC sysmposium 19 May 2014

SPICE: Semantic Propositional Image Caption Evaluation

Visual Context Dan O Shea Prof. Fei Fei Li, COS 598B

A Hierarchical Visual Saliency Model for Character Detection in Natural Scenes

Biomedical Machine Learning

Agents and Environments

Local Image Structures and Optic Flow Estimation

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian

Overview. What is an agent?

Spatial Cognition for Mobile Robots: A Hierarchical Probabilistic Concept- Oriented Representation of Space

Exploring the Functional Significance of Dendritic Inhibition In Cortical Pyramidal Cells

Sensation & Perception PSYC420 Thomas E. Van Cantfort, Ph.D.

Agents and Environments

Artificial Intelligence Lecture 7

A Biased View of Perceivers. Commentary on `Observer theory, Bayes theory,

Subjective randomness and natural scene statistics

Scenes Categorization based on Appears Objects Probability

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

Memory, Attention, and Decision-Making

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis

Natural Scene Categorization: from Humans to Computers

The Human Behaviour-Change Project

Gist of the Scene. Aude Oliva ABSTRACT II. THE NATURE OF THE GIST I. WHAT IS THE GIST OF A SCENE? A. Conceptual Gist CHAPTER

Learning Deterministic Causal Networks from Observational Data

Bayesian models of inductive generalization

Architectural Visualization

Lecture 2.1 What is Perception?

Bundles of Synergy A Dynamical View of Mental Function

Ambiguity and invariance: two fundamental challenges for visual processing Nicole C Rust and Alan A Stocker

Motivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience.

Goals. Visual behavior jobs of vision. The visual system: overview of a large-scale neural architecture. kersten.org

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling

Using Inverse Planning and Theory of Mind for Social Goal Inference

Recognizing Scenes by Simulating Implied Social Interaction Networks

Rethinking Cognitive Architecture!

High-level Vision. Bernd Neumann Slides for the course in WS 2004/05. Faculty of Informatics Hamburg University Germany

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Computational Models for Belief Revision, Group Decisions and Cultural Shifts

February 6. Bird s eye-view of February 8. February 8 (contd.) February 8 (contd.) February 13. Experimental studies of recognition

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition

Comment on W.S. Cleveland, A Model for Studying Display Methods of Statistical Graphics. Leland Wilkinson SYSTAT, Inc. and Northwestern University

Computational Auditory Scene Analysis: An overview and some observations. CASA survey. Other approaches

3. L EARNING BAYESIAN N ETWORKS FROM DATA A. I NTRODUCTION

Productivity, Reuse, and Competition between Generalizations. Timothy J. O Donnell MIT

Mammogram Analysis: Tumor Classification

Computational Cognitive Science

Neurophysiology and Information: Theory of Brain Function

Learning Saliency Maps for Object Categorization

Automated Volumetric Cardiac Ultrasound Analysis

MS&E 226: Small Data

VISUAL search is necessary for rapid scene analysis

Artificial Neural Networks (Ref: Negnevitsky, M. Artificial Intelligence, Chapter 6)

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

Bayesian (Belief) Network Models,

The Visual System s Internal Model of the World

Real Time Sign Language Processing System

Benchmarking Human Ability to Recognize Faces & People

2018 Keypoints Challenge

Bayesian Networks in Medicine: a Model-based Approach to Medical Decision Making

Sperling conducted experiments on An experiment was conducted by Sperling in the field of visual sensory memory.

10CS664: PATTERN RECOGNITION QUESTION BANK

Rajeev Raizada: Statement of research interests

Transcription:

Vision: Overview Alan Yuille

Why is Vision Hard? Complexity and Ambiguity of Images. Range of Vision Tasks. More 10x10 images 256^100 = 6.7 x 10 ^240 than the total number of images seen by all humans throughout history 3 x 10^21. (50 billion people, live 20 billion seconds, 30 image per second)

Why does Vision seem easy? Because we devote roughly half our cortex to vision. Understanding vision means understanding half the cortex.

Bayes and Vision. History of Bayes and Vision dates to the early 1980 s and before. (Ulf Grenander s pattern theory, 1960 s). Vision as an inverse inference problem. Decode images by inverting image formation. As argued by Gibson and Marr, this requires knowledge about the world Natural constraints (Marr), Ecological constraints (Gibson). Bayesian formulations are natural. Constraints are priors and can be learnt from examples.

Bayes for Vision Courtesy of Pavan Sinha (MIT) y ( ) The likelihood is not enough.

When priors are violated Who is bigger?

Models for Generating Images: Grammars (Grenander, Fu, Mjolsness, Biederman). (,, j, ) Simple to Complex Grammars: Easy to hard Inference

Analysis by Synthesis Analyze an image by inverting image formation. Proposals and Verification

Can we do this for Real Images? Image Parsing: Learn probabilistic models of the visual patterns that can appearin images. Interpret/understand an image by decomposing it into its constituent parts.

Vision Goals and Tasks Vision is often formalized as low, middle, and high level. This seems to map onto different parts of the visual cortex (V1, V2,, IT). (Poggio s Talk). High level lvision ii relates very naturally to other aspects of cognition reasoning, language.

Some Vision Goals (SC Zhu et al) Understandingobjects objects, scenes, and events. Reasoning about functions and roles of objects, goals and intentions of agents, predicting the outcomes of events.

Converting Parse Graphs to Language Illustration: Perona and Fei Fei Li.

Reasoning about Objects in 3D Space Understanding the 3D scene structure enables Understanding the 3D scene structure enables reasoning.

Graphical Models Graphical Models give a nice way to formulate vision problems. Different types of Interactions.

Cue Combination How to couple different visual cues? Uncertainty statistics relations between different factors that cause the image. What factors are independent? Which are not? Studies show (e.g., Larry Maloney) Ml )h that humans can combine visual, and other perceptual cues, optimally and make optimal decisions.

How optimal are humans? Vision researchers can design Ideal Observer Models. These can be used to benchmark human performance compared to an ideal observer who know how the stimuli are generated. td Humans typically perform poorly compared to the benchmark. But for certain tasks like motion estimation it appears that humans use an ecological prior, based on the statistics of natural images, instead the `prior used by the experimenter to construct the stimuli. Conjecture humans are optimal for ecological stimuli.

Optimal Vrs. Slow and Smooth Optimal for Experiment vrs. Slow and smooth (e.g., HongJing Lu). Humans performordersorders ofmagnitude worse than an ideal modelwhich knows the probabilistic model that generated the stimuli. But humans perform similarly to a model that assumes a prior of slow andsmooth motion, similar to the prior measured from the statistics of natural image sequences.

Stochastic Grammars and Compositional Models dl Objects are composed of parts. These are composed of subparts, and so on. Semantic parts eg e.g., arms and legs of humans. Structured graphical models, probability defined over thesemodels models, inferencealgorithms algorithms. Learning the models with, or without, known graph structure. Talks by Geman, Bienenstock, Yuille, Zhu, Poggio.

Key Idea: Compositionality Objects and Images areconstructed by compositions of parts ANDs and ORs. The probability models for are built by combining elementary models by composition. Efficient Inference and Learning.

Why compositionality? (1). Ability to transfer bt between contexts t and generalize or extrapolate (e.g., from Cow to Yak). (2). Ability to reason about the system, intervene, do diagnostics. (3). Allows the system to answer many different questions based on the same underlying knowledge structure. (4). Scale up to multiple objects by part sharing. An embodiment of faith iththat t the world is knowable, that t one can tease things apart, comprehend them, and mentally recompose them at will. The world is compositional or God exists.

Horse Model (ANDs only). Nodes of the Graph represents parts of the object. Parts can move and deform. y: (position, scale, orientation) 21

AND/OR Graphs for Horses Introduce OR nodes and switch variables. Settings of switch variables alters graph topology allows different parts for different viewpoints/poses: Mixtures of models with shared parts. 22

AND/OR Graphs for Baseball Enables RCMs to deal with objects with multiple poses and viewpoints (~100). Inference and Learning by bottom up and topdown processing: 23

Results on Baseball Players: Performed well on benchmarked datasets. Zhu, Chen, Lin, Lin, Yuille CVPR 2008, 2010. 24

Unsupervised Structure Learning Task: given 10 training images, no labeling, no alignment, highly ambiguous features. Estimate Graph structure (nodes and edges) Estimate the parameters. Correspondence is unknown? Combinatorial Explosion problem 25

The Dictionary: From Generic Parts to Object Structures Unified representation (RCMs) and learning Bridge the gap between the generic features and specific object structures 26

Bottom up Learning Suspicious Coincidence Composition Clustering Competitive Exclusion 27

Dictionary Size, Part Sharing and l Computationall Complexity Level Composition Clusters Suspicious Coincidence More Sharing 0 Competitive Exclusion Seconds 4 1 1 167,431 14,684 262 48 117 2 2 034 851 2,034,851 741 662 741,662 995 116 254 3 2,135,467 1,012,777 305 53 99 4 236,955 72,620 30 2 9 28

Top down refinement Fill in missing parts Examine every node from top to bottom 29

Part Sharing for multiple objects Strategy: share partsbetweendifferent objects and viewpoints. 30

Learning Shared Parts Unsupervised learning algorithm to learn parts shared between different objects. Zhu, Chen, Freeman, Torralba, Yuille 2010. Structure Induction learning the graph structures and learning the parameters. Supplemented by supervised learning of masks. 31

Many Objects/Viewpoints 120 templates: 5 viewpoints & 26 classes 32

Learn Hierarchical Dictionary. Low level to Mid level to High level. Learn by suspicious coincidences. 33

Part Sharing decreases with Levels 34

Multi View Single Class Performance Comparable to State of the Art. 1 (a) Multi-view Motorbike dataset 1 (b) Weizmann Horse dataset 1 (c) LabelMe Multi-view Car dataset 0.9 0.9 Precision 0.8 0.7 0.6 0.5 0.4 Thomas et al. CVPR06 Savaese and Fei-fei ICCV07 RCMs (AP=73.8) 0.3 0 0.2 0.4 0.6 0.8 Recall Recall 0.95 0.9 0.85 Shotton et al. BMVC08 (AUC=93) RCMs (AUC=98.0) 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 false postitives per image Precision 0.8 0.7 0.6 0.5 0.4 0.3 HOG+SVM AP=63.2 RCMs AP=66.6 0.2 0 0.2 0.4 0.6 0.8 Recall 35

Conclusions Principles: Recursive Composition Composition i > complexity decomposition i Recursion > Universal rules (self similarity) Recursion and Composition > sparseness A unified approach object detection, recognition, parsing, matching, image labeling. Statistical Models, Machine Learning, andefficient Inference algorithms. Extensible Models easy to enhance. Scaling up: shared parts, compositionality. Trade offs: sophistication of representation vrs. Features. The Devil is in the Details. 36