Recognizing Scenes by Simulating Implied Social Interaction Networks

Similar documents
Spatial Orientation Using Map Displays: A Model of the Influence of Target Location

Grounding Ontologies in the External World

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

Spatial Cognition for Mobile Robots: A Hierarchical Probabilistic Concept- Oriented Representation of Space

ADAPTING COPYCAT TO CONTEXT-DEPENDENT VISUAL OBJECT RECOGNITION

Information Design. Information Design

Using Perceptual Grouping for Object Group Selection

Intelligent Object Group Selection

Simulating Network Behavioral Dynamics by using a Multi-agent approach driven by ACT-R Cognitive Architecture

Learning to Use Episodic Memory

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

Rethinking Cognitive Architecture!

Biologically-Inspired Human Motion Detection

An Activation-Based Model of Routine Sequence Errors

Viewpoint Dependence in Human Spatial Memory

Using Diverse Cognitive Mechanisms for Action Modeling

High-level Vision. Bernd Neumann Slides for the course in WS 2004/05. Faculty of Informatics Hamburg University Germany

Towards Using Multiple Cues for Robust Object Recognition

Chapter 5: Perceiving Objects and Scenes

Intelligent Mouse-Based Object Group Selection

Computational Models of Visual Attention: Bottom-Up and Top-Down. By: Soheil Borhani

Arithmetic Notation now in 3D!

Unsurpervised Learning in Hybrid Cognitive Architectures

Computational Cognitive Science

An Account of Interference in Associative Memory: Learning the Fan Effect

Computational Cognitive Science

Computational Cognitive Science. The Visual Processing Pipeline. The Visual Processing Pipeline. Lecture 15: Visual Attention.

LECTURE 5: REACTIVE AND HYBRID ARCHITECTURES

IAT 355 Visual Analytics. Encoding Information: Design. Lyn Bartram

Salient Feature Coding in Response Selection. Circularity?

Clusters, Symbols and Cortical Topography

ACT-R Workshop Schedule

The synergy of top-down and bottom-up attention in complex task: going beyond saliency models.

ERA: Architectures for Inference

Multimodal interactions: visual-auditory

Mental Imagery. What is Imagery? What We Can Imagine 3/3/10. What is nature of the images? What is the nature of imagery for the other senses?

The Effect of Sensor Errors in Situated Human-Computer Dialogue

An Account of Associative Learning in Memory Recall

Theoretical Neuroscience: The Binding Problem Jan Scholz, , University of Osnabrück

From Pixels to People: A Model of Familiar Face Recognition by Burton, Bruce and Hancock. Presented by Tuneesh K Lella

Learning Classifier Systems (LCS/XCSF)

Target-to-distractor similarity can help visual search performance

An Activation-Based Model of Routine Sequence Errors

Effective and Efficient Management of Soar s Working Memory via Base-Level Activation

Recurrent Refinement for Visual Saliency Estimation in Surveillance Scenarios

A Dynamic Field Theory of Visual Recognition in Infant Looking Tasks

Bayesian Networks Representation. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

NeuroMem. RBF Decision Space Mapping

Deep Networks and Beyond. Alan Yuille Bloomberg Distinguished Professor Depts. Cognitive Science and Computer Science Johns Hopkins University

THE ENCODING OF PARTS AND WHOLES

Carnegie Mellon University Annual Progress Report: 2011 Formula Grant

Reactive agents and perceptual ambiguity

Learning to classify integral-dimension stimuli

Human Learning of Contextual Priors for Object Search: Where does the time go?

games ISSN

Scenes Categorization based on Appears Objects Probability

Dynamic Control Models as State Abstractions

CS148 - Building Intelligent Robots Lecture 5: Autonomus Control Architectures. Instructor: Chad Jenkins (cjenkins)

Beyond R-CNN detection: Learning to Merge Contextual Attribute

Competing Frameworks in Perception

Competing Frameworks in Perception

ICS 606. Intelligent Autonomous Agents 1. Intelligent Autonomous Agents ICS 606 / EE 606 Fall Reactive Architectures

Which way is which? Examining symbolic control of attention with compound arrow cues

Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017

Object vision (Chapter 4)

A Neurally-Inspired Model for Detecting and Localizing Simple Motion Patterns in Image Sequences

Discrimination and Generalization in Pattern Categorization: A Case for Elemental Associative Learning

SPATIAL UPDATING 1. Do Not Cross the Line: Heuristic Spatial Updating in Dynamic Scenes. Markus Huff. Department of Psychology, University of Tübingen

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian

Vision as Bayesian inference: analysis by synthesis?

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Hierarchical Stimulus Processing by Pigeons

Face Analysis : Identity vs. Expressions

Mammography is a most effective imaging modality in early breast cancer detection. The radiographs are searched for signs of abnormality by expert

The Role of Action in Object Categorization

Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification

Neuro-Inspired Statistical. Rensselaer Polytechnic Institute National Science Foundation

Computational Architectures in Biological Vision, USC, Spring 2001

Differences of Face and Object Recognition in Utilizing Early Visual Information

196 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 2, NO. 3, SEPTEMBER 2010

Extending Cognitive Architecture with Episodic Memory

Sound Preference Development and Correlation to Service Incidence Rate

Choice adaptation to increasing and decreasing event probabilities

Gist of the Scene. Aude Oliva ABSTRACT II. THE NATURE OF THE GIST I. WHAT IS THE GIST OF A SCENE? A. Conceptual Gist CHAPTER

Layout Geometry in Encoding and Retrieval of Spatial Memory

Mammogram Analysis: Tumor Classification

Visual Context Dan O Shea Prof. Fei Fei Li, COS 598B

A SITUATED APPROACH TO ANALOGY IN DESIGNING

Layout Geometry in Encoding and Retrieval of Spatial Memory

Framework for Comparative Research on Relational Information Displays

B.A. II Psychology - Paper A. Form Perception. Dr. Neelam Rathee. Department of Psychology G.C.G.-11, Chandigarh

HIERARCHIES: THE CHUNKING OF GOAL. A Generalized Model of Practice. Abstract. Alien Newell Carnegie-Mellon University. Stanford University

A Race Model of Perceptual Forced Choice Reaction Time

Modeling Emotion and Temperament on Cognitive Mobile Robots

Artificial Neural Networks (Ref: Negnevitsky, M. Artificial Intelligence, Chapter 6)

Using a Domain-Independent Introspection Improve Memory Search Angela 1 C. Kennedy

Perceived similarity and visual descriptions in content-based image retrieval

Psychology of visual perception C O M M U N I C A T I O N D E S I G N, A N I M A T E D I M A G E 2014/2015

Top-Down Control of Visual Attention: A Rational Account

On the implementation of Visual Attention Architectures

Transcription:

Recognizing Scenes by Simulating Implied Social Interaction Networks MaryAnne Fields and Craig Lennon Army Research Laboratory, Aberdeen, MD, USA Christian Lebiere and Michael Martin Carnegie Mellon University, Pittsburgh, PA, USA To appear in the Proceedings of the 8th International Conference on Intelligent Robots and Applications (ICIRA 2015)

Exploiting Cognitive Context OBJECTIVE & BENEFITS Exploit cognitive context to augment bottom-up perceptual approaches Leverage activation mechanisms in ACT-R to provide contextual expectations Develop techniques to exchange information between cognitive and perceptual systems Benefits include improved object and scene recognition, and support for active perception TECHNICAL APPROACH TO OVERCOME BARRIERS Establish a feedback loop between perceptual and cognitive systems via the World Model Encode Spatially-organized Hierarchical Object Graphs (SHOGs) from perceptual system Augment context via semantic priming in ACT-R Share contextual information from ACT-R with perceptual system STATE OF THE ART & BARRIERS Perceptual systems tend to feed forward to cognitive systems that provide little feedback. General world knowledge, ontologies, goals and preceding cues create expectancies in ACT-R that have been used in perception as context for anticipating and resolving ambiguities about objects or scenes. Our main challenge is to provide the cognitive system with usable information based on the semantic label distributions for objects and regions generated by our perceptual approach.

Robot Readable World by Timo Arnall http://berglondon.com/blog/2012/02/06/robot-readable-world-the-film/ -- Embedded Video Removed (see URL above) --

Public Spaces Object recognition not usually sufficient for scene recognition. Configurations required to disambiguate. Developed room simulator to create notional SHOGs containing tables and s These SHOGs code social affordances Social immediacy operationalized in terms of object proximity and orientation Developed approach to encode semantic perception knowledge structures (SHOGs) to cognitive models Instance-based learning in ACT-R Global graph properties = scene gist Local graph properties = exemplars of object in context (scene content, interobject configuration, affordances, etc.) Centrality guides order of object encoding (attention) Demonstrated utility of relational features in discriminating spaces with similar objects & similarity of KNN to ACT-R partial-matching and blending mechanisms (Fields, Lennon, Lebiere, & Martin, in press) METRICS Confusions and error rates

Scene Classification Indoor scene recognition remains challenging Current methods use object or parts recognition, along with the co-occurrence of salient features, to recognize interior scenes Rooms that contain collections of commonplace objects (e.g., tables & s) are vexing We tested a method to classify scenes based on how arrangements of constituent objects might impact social interactions Chairs acted as surrogates for imagined humans so we could define social affordances based on spatial layout. We compared the impact of affordancebased vs object-based features on room classification performance. We compared pattern-matching mechanisms in ACT-R to k-nearest Neighbor classification to provide common ground. We examined how classifier performance changed depending on training set size and noise level. Only affordance in this instance of a café is proximity table cafe table table table

Experiment We simulated 5 highly confusable room-types (café, boardroom, conversation areas, instructional rooms, theaters) Canonical room-types (except boardroom) were populated with a variable number of s and tables, ranging from: 2-4 rows 2-4 sections within each row 2-6 s grouped with 1 table (or focus point) within each section Each room was generated in a fashion that allowed testing of the robustness of classification to 2 levels of noise (low, high) in: social dynamics (s shifted and rotated from their canonical positions) object identification (s mislabeled as tables or tables as s). Noise Level x (left/right) y (front/back) s Labeling error Low [-6, 6] in. [0, 6] in. 15 0.05 High [-12,12] in. [0, 12] in. 45 0.20 We created 2 feature sets object-based node counts s tables affordance-based binary link counts View Angle: 190 View Distance: 60 proximity edges (60 ) mutual visibility edges (potential eye-contact based on orientation) We created 100 simulated rooms of each roomtype x room-size combo for a total of 18,500 instances at each level of noise. Classifier robustness was further tested across 3 training set sizes (1%, 10%, 100%)

Classifiers Subtle interplay of environment, an agent s relevant knowledge and the agent s goals Context Some mechanisms underlying this interplay are inherent part of ACT-R Similar to ML techniques, but integrated in a unified cognitive architecture KNN Requires training set with quantitative features, associated labels, and a similarity metric (Euclidean distance in this case). Assumes feature space is continuous enough that a point w/in it is likely to have same label as points near it. Classifies new observations according to modal label of K closest training set points. We set neighborhood size of k = 1, 5, & 10 (for the 1, 10 and 100% training sets, respectively) ACT-R Classification based on retrieval of knowledge patterns (chunks) from declarative memory Chunks are data structures associating small sets of data items Retrieval governed by statistical quantities reflecting history, associations, similarities. Classification reflects entire training set

Classification Error Both classifiers (KNN, ACT-R) recognized rooms more accurately by using affordancebased features rather than object-based features Both classifiers responded similarly to the degree of noise present in the stimuli (high, low) especially for the object-based features Low noise stimuli tended to reduce classification errors relative to high noise stimuli. However, for affordance-based features high noise improves performance marginally in the ACT-R classifier while still decreasing it slightly for KNN. Both classifiers were robust to decreases in training-set size (1%, 10%, 100%). They performed best with full sampling (i.e., 100%). Performance at 10% sampling was nearly as good.

Confusions A close similarity in classifier performance can be seen in confusion patterns, too. Social affordances were more effective than pure object-based feature sets Confusion pairs for object-based features Theater/conversation no tables Instructional/café tables Confusion pairs for affordance-based features Theater/instructional same social structure except for tables Café/conversation same social structure except for tables Boardroom is not very confusable in either feature set because of its unique structure Room-type confusions for each classifier for full sampling with low noise.

Classifier Comparison Similarities between ACT-R memory retrieval and KNN. Each chunk in declarative memory corresponds to a training instance. The partial matching mechanism is akin to the distance computation in KNN Blending and KNN classify by summing over instances Differences between ACT-R Model & KNN Algorithm Ratio similarity vs linear distance Manhattan distance vs Euclidean distance ACT-R memory retrieval is more general than the KNN voting process ACT-R activation equation captures recency, frequency, and semantic priming effects Blending operates over all instances in memory rather than the most similar K of them Broadens the experience base upon which the decision is made Removes the need for modelers to specify a proper value for the K parameter More similar examples have a higher impact than more distant ones because of weighting term Process of aggregating answers in blending is more general than KNN voting process Can also average over values for which similarity functions are defined (e.g., numbers) Can find consensus values among symbolic chunks for which similarities are defined. Embedding generalizations of machine learning algorithms such as KNN, RL, and Bayesian Learning in cognitive architectures enables them to be integrated with other cognitive mechanisms. Flexible ways of reflecting cognitive context in perception and decision making Leverage knowledge about the semantics of the domain

Revise room simulator to include perceptual errors and metric info in notional SHOGs Mislabeled, missing, hallucinated Metric distances, sizes, orientations Incorporate incremental perception Incorporate recency, frequency, and semantic priming effects Map SHOG network properties to Gestalt principles where possible Explore feature sets co-developed with the perceptual system Integrate with semantic perception algorithms Next Steps