Vision as Bayesian inference: analysis by synthesis?

Vision as Bayesian inference: analysis by synthesis? Schwarz Andreas, Wiesner Thomas 1 / 70

Outline Introduction Motivation Problem Description Bayesian Formulation Generative Models Letters, Text Faces Generating Proposals ADABoost Training DDMCMC Examples Relation to neurosciene 2 / 70

Motivation Natural Images contain an overwhelming number of visual patterns Vision algorithms that work on artifical stimuli almost never generalize to natural images 3 / 70

What is the main problem? (1) Complexity of the image: Hundreds of objects Overlapping objects Different Objects 4 / 70

What is the main problem? (2) Ambiguity Similar Objects can result in different images Different Objects can result in similar images 5 / 70

6 / 70

Outline Introduction Motivation Problem Description Bayesian Formulation Generative Models Letters, Text Faces Generating Proposals ADABoost Training DDMCMC Examples Relation to neurosciene 7 / 70

Approach Possible solution Bases on Bayesian Inference Using probability distributions Vision Inverse inference problem Understand how the image was genereated 8 / 70

Image Parsing Natural tasks: Segmentation Object detection/recognition Approach we want to present: Segmentation + Recognition simultaniously 9 / 70

Bayesian Approach: Basic Idea Formulate the Problem as Bayesian inference Combine segmentation, detection and recognition Use Top-down generative models Describe how objects and regions generate the image intensities Use Bottom-up proposals based on low level cues Guide through the parameter space 10 / 70

Bayesian Approach: Requirements Crucial: raw image intensities Compare different models. Bring that toghether: Use Bottom-Up cues and Top- Down generative models using the DDMCMC algorithm. Guaranteed to converge 11 / 70

Bayesian Approach Scene contains the whole image Interpretation includes: Regions Faces Text, Letters 12 / 70

Bayesian formulation 13 / 70

Bayesian formulation 14 / 70

Bayesian Formulation Formulation of the Prior: Formulation of the likelihood: Whole model: 15 / 70

Outline Introduction Motivation Problem Description Bayesian Formulation Generative Models Letters, Text Faces Generating Proposals ADABoost Training DDMCMC Examples Relation to neurosciene 16 / 70

Generative Models Generate the image Text, letters Faces 17 / 70

Generative Models (text,letters) 18 / 70

Generative Models (text, letters) Template has boundaries: control points (25) Shape parameters 19 / 70

Generative Models (text, letters) 20 / 70

Generative Models (text, letters) Index Likelihood Prior Prior 21 / 70

Generative models (faces) Uses PCA to obtain representation of faces Additional features can be added 22 / 70

Outline Introduction Motivation Problem Description Bayesian Formulation DDMCMC Generative Models Letters, Text Faces Generating Proposals ADABoost Training 23 / 70

AdaBoos Algorithm Generating Proposals Strong classifier Weak classifier weight Returns a binary decision e.g. face or no face 24 / 70

Generating Proposals Adapt the AdaBoost to return conditional probabilities 25 / 70

AdaBoost Training Features and weights are learned offline supervised training Texttraining from Streetsigns Facetraining form FERET DB 26 / 70

AdaBoost results 27 / 70

AdaBoost results 28 / 70

Generating Proposals: Summary AdaBoost: Sums up low classifiers to a high classifier Training happens supervised, offline Delivers Conditional Probabilities to Generative Models 29 / 70

Outline Introduction Motivation Problem Description Bayesian Formulation Generative Models Letters, Text Faces Generating Proposals ADABoost Training DDMCMC Examples Relation to neurosciene 30 / 70

DDMCMC Datadriven methods for exploiting image characteristics and speed up MCMC First: lets have a look at Segmentation 31 / 70

Segmentation Image Lattice i, j Points Image For any Point either Lattice in K disjoint regions. Problem: This kind of partition in regions is no Image segmentation 32 / 70

Segmentation Define each Region as a realization from a probabilistic model:, are parameters of model indexed by Consider a segmentation W are hidden Variables of the Segmentationvector K are the number of Regions this particular Graph has. R,l, are Properties as follows Assume that I is the Image and W a semantic representation of the World 33 / 70

Segmentation 34 / 70

Bayesian Framework Posterior Likelihood Prior Probability of this particular W, given the Image How likely Is the image Given this W Probability of this particular representation 35 / 70

What we want to do Partition Space We want to find a segmentation of the image which most likely represents the image 36 / 70

Recap Segmentation: Probability of Segmentation Likelihood 37 / 70

Search through the solutionspace Space of all k-partitions 38 / 70

How to search? Enumeration of all possible segmentations? Takes much too long Greedy Search like Gradient descent/ascent Local minima and maxima Stochastic search Takes also too long MCMC based Lets have a look 39 / 70

MCMC Requirements Ergodic from an initial segment W 0, any other state W can be visited in finite time Aperiodic ensured by random dynamics Balance every move is reversible 40 / 70

DDMCMC Behind the scenes 41 / 70

DDMCMC with bottom-up driven top down models - Behind the scenes 42 / 70

DDMCMC Types of moves Jump Moves Discrete: What is the model for that region? Splitting and Merging of Regions Switching the model for a region (eg texture model to spline model) Diffusion Processes Continous changes: altering boundary shape 43 / 70

DDMCMC Types of moves Bottom-up proposals drive top-down generative models which compete with each other to explain the image. [Tu et al. 2003] 44 / 70

45 / 70

46 / 70

More Visually Split & Merge 47 / 70

More Visually Diffusion 48 / 70

Generic Image Parsing 49 / 70

Summary generative and discriminative Discriminative Methods Edge Cues Binarization Cues Face Region Cues (Adaboost) Text Region Cues (Adaboost) Shape Affinity cues Region Affinity cues Model Paramters & Pattern Type Generative Models Top-Down processing strong 50 / 70

Bottom-Up increasing good 51 / 70

Future We are on a good way Combined Bottom-up & Top-Down Still not performing like humans Natural Imageprocessing is a complex task 52 / 70

Outline Introduction Motivation Problem Description Bayesian Formulation Generative Models Letters, Text Faces Generating Proposals ADABoost Training DDMCMC Examples Relation to neurosciene 53 / 70

Examples (1) 54 / 70

Examples (2) 55 / 70

Outline Introduction Motivation Problem Description Bayesian Formulation Generative Models Letters, Text Faces Generating Proposals ADABoost Training DDMCMC Examples Relation to neurosciene 56 / 70

Relation to NeuroScience Analysis by Synthesis approach Forward and backward pathways in the brain 57 / 70

Relation to NeuroScience fmri ERP Evoked response potentials 58 / 70

Relation to NeuroScience Human lateral occipital cortex (LOC) increases activity during the perception of object completion Later findings precise these findings During object completion : activity decreases in the Primary cortex Several other studies support that theorem 59 / 70

Summary Natural image processing is still a quite complex task Long way until our algorithms perform in the same way as humans. 60 / 70

Thank you for your attention! 61 / 70

References Trends in Cognitive Sciences In Probabilistic models of cognition, Vol. 10, No. 7. (July 2006), pp. 301-308, doi:10.1016/j.tics.2006.05.002 Trends in Cognitive Sciences In Probabilistic models of cognition supplying material, Vol. 10, No. 7. (July 2006), pp. 301-308, doi:10.1016/j.tics.2006.05.002 Image Parsing: Unifying Segmentation, Detection, and Recognition International Journal of Computer Vision, Vol. 63, No. 2. (2005), pp. 113-140 Data-driven Markov Chain Monte Carlo, Presentation, S.C. Zhu, Stat232B. Stat Computing and Inference, MCMC estimation in MLwiN, William J. Browne Data-Driven Markov Chain Monte Carlo, Presentation, Tomasz Malisiewicz 62 / 70