ERA: Architectures for Inference

Similar documents
NEOCORTICAL CIRCUITS. specifications

Rethinking Cognitive Architecture!

PIB Ch. 18 Sequence Memory for Prediction, Inference, and Behavior. Jeff Hawkins, Dileep George, and Jamie Niemasik Presented by Jiseob Kim

Information Processing & Storage in the Brain

M.Sc. in Cognitive Systems. Model Curriculum

AND BIOMEDICAL SYSTEMS Rahul Sarpeshkar

Sparse Coding in Sparse Winner Networks

CS 4365: Artificial Intelligence Recap. Vibhav Gogate

Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention

Belief Propagation and Wiring Length Optimization as Organizing Principles for Cortical Microcircuits.

Memory Prediction Framework for Pattern Recognition: Performance and Suitability of the Bayesian Model of Visual Cortex

Cognitive Neuroscience History of Neural Networks in Artificial Intelligence The concept of neural network in artificial intelligence

Assignment Question Paper I

The Development and Application of Bayesian Networks Used in Data Mining Under Big Data

CS343: Artificial Intelligence

Lecture 9 Internal Validity

Questions and Answers

Outline. What s inside this paper? My expectation. Software Defect Prediction. Traditional Method. What s inside this paper?

Curriculum structure of B.Tech ECE Programme (III Semester onwards)

Bayesian Networks Representation. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Motivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience.

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

Decisions and Dependence in Influence Diagrams

Signal Detection Theory and Bayesian Modeling

CISC453 Winter Probabilistic Reasoning Part B: AIMA3e Ch

Using Probabilistic Reasoning to Develop Automatically Adapting Assistive

Using Bayesian Networks to Direct Stochastic Search in Inductive Logic Programming

Fodor on Functionalism EMILY HULL

Supplementary notes for lecture 8: Computational modeling of cognitive development

Introduction to Computational Neuroscience

Appendix I Teaching outcomes of the degree programme (art. 1.3)

Neuro-Inspired Statistical. Rensselaer Polytechnic Institute National Science Foundation

Robot Learning Letter of Intent

Automatic Extraction of Synoptic Data. George Cernile Artificial Intelligence in Medicine AIM

DeSTIN: A Scalable Deep Learning Architecture with Application to High-Dimensional Robust Pattern Recognition

High-level Vision. Bernd Neumann Slides for the course in WS 2004/05. Faculty of Informatics Hamburg University Germany

DIGITIZING HUMAN BRAIN: BLUE BRAIN PROJECT

A probabilistic method for food web modeling

Bayesian Belief Network Based Fault Diagnosis in Automotive Electronic Systems

Bundelkhand Institute of Engineering & Technology, Jhansi. B. Tech.-First Semester (Common to all Branches)

Artificial Intelligence Lecture 7

Pythia WEB ENABLED TIMED INFLUENCE NET MODELING TOOL SAL. Lee W. Wagenhals Alexander H. Levis

Intelligent Control Systems

Computational Neuroscience. Instructor: Odelia Schwartz

CS148 - Building Intelligent Robots Lecture 5: Autonomus Control Architectures. Instructor: Chad Jenkins (cjenkins)

Vision: Over Ov view Alan Yuille

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

Computational Explorations in Cognitive Neuroscience Chapter 7: Large-Scale Brain Area Functional Organization

AU B. Sc.(Hon's) (Fifth Semester) Esamination, Introduction to Artificial Neural Networks-IV. Paper : -PCSC-504

Spatial Cognition for Mobile Robots: A Hierarchical Probabilistic Concept- Oriented Representation of Space

Affective Game Engines: Motivation & Requirements

An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns

Grounded Cognition. Lawrence W. Barsalou

BAYESIAN NETWORK FOR FAULT DIAGNOSIS

Memory, Attention, and Decision-Making

Unit 2 Boundary Value Testing, Equivalence Class Testing, Decision Table-Based Testing. ST 8 th Sem, A Div Prof. Mouna M.

Object recognition and hierarchical computation

CS221 / Autumn 2017 / Liang & Ermon. Lecture 19: Conclusion

Research and Innovation Roadmap Annual Projects

Pavlovian, Skinner and other behaviourists contribution to AI

A Bayesian Network Analysis of Eyewitness Reliability: Part 1

Biologically-Inspired Human Motion Detection

Cogs 202 (SP12): Cognitive Science Foundations. Computational Modeling of Cognition. Prof. Angela Yu. Department of Cognitive Science, UCSD

EEL-5840 Elements of {Artificial} Machine Intelligence

Probabilistic Models in Cognitive Science and Artificial Intelligence

Digital. hearing instruments have burst on the

Plan Recognition through Goal Graph Analysis

Recognizing Scenes by Simulating Implied Social Interaction Networks

Computational Models for Belief Revision, Group Decisions and Cultural Shifts

PS3021, PS3022, PS4040

Intelligent Machines That Act Rationally. Hang Li Bytedance AI Lab

Introduction to Artificial Intelligence 2 nd semester 2016/2017. Chapter 2: Intelligent Agents

10CS664: PATTERN RECOGNITION QUESTION BANK

Jitter-aware time-frequency resource allocation and packing algorithm

Local Image Structures and Optic Flow Estimation

CAS Seminar - Spiking Neurons Network (SNN) Jakob Kemi ( )

Overview of the visual cortex. Ventral pathway. Overview of the visual cortex

Expert System Profile

THE FUTURE HEALTH CARE SYSTEMS The case of Cardiovascular Medicine

LECTURE 5: REACTIVE AND HYBRID ARCHITECTURES

Neurons and neural networks II. Hopfield network

A Smart Texting System For Android Mobile Users

Recognition of English Characters Using Spiking Neural Networks

Automatic Definition of Planning Target Volume in Computer-Assisted Radiotherapy

Implementation of Perception Classification based on BDI Model using Bayesian Classifier

Bayesian Inference Bayes Laplace

Behavior Architectures

TEACHING YOUNG GROWNUPS HOW TO USE BAYESIAN NETWORKS.

Agent-Based Systems. Agent-Based Systems. Michael Rovatsos. Lecture 5 Reactive and Hybrid Agent Architectures 1 / 19

Introduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods

Outline. Hierarchical Hidden Markov Models for HIV-Transmission Behavior Outcomes. Motivation. Why Hidden Markov Model? Why Hidden Markov Model?

Learning to Use Episodic Memory

Artificial Neural Networks in Cardiology - ECG Wave Analysis and Diagnosis Using Backpropagation Neural Networks

Data mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis

Grounding Ontologies in the External World

Computational Cognitive Neuroscience

Intelligent Machines That Act Rationally. Hang Li Toutiao AI Lab

arxiv: v1 [cs.ai] 5 Oct 2018 October 8, 2018

Rolls,E.T. (2016) Cerebral Cortex: Principles of Operation. Oxford University Press.

Transcription:

ERA: Architectures for Inference Dan Hammerstrom Electrical And Computer Engineering 7/28/09 1

Intelligent Computing In spite of the transistor bounty of Moore s law, there is a large class of problems that computers still do not solve well These problems involve the transformation of data across the boundary between the real world and the digital world They occur wherever a computer is sampling and acting on real world data, which includes almost all embedded computing applications Our lack of general solutions to these problems, outside of specialized niches, constitutes a significant barrier to computer usage and huge potential markets Hammerstrom 7/28/09 2

These are difficult problems that require computers to find complex structures and relationships through space and time in massive quantities of low precision, ambiguous, noisy data AI pursued solutions in this area, but ran into scaling problems among other things Artificial Neural Networks (ANN) extended computational intelligence in a number of important ways, primarily by adding the ability to incrementally learn and adapt However, ANNs also had trouble scaling, and they were often difficult to apply to many problems Traditional Rule Based Knowledge systems are now evolving into probabilistic structures where inference becomes the key computation, generally based on Bayes rule Hammerstrom 7/28/09 3

Bayesian Networks We now have Bayesian networks A major contribution to this effort was the work of Judea Pearl Pearl, J., Probabilistic Reasoning In Intelligent Systems Networks of Plausible Inference, Morgan Kaufman, 1988 & 1997 These systems are far less brittle and they also more faithfully model aspects of animal behavior, since animals learn from their surroundings and appear to do a kind of probabilistic inference from learned knowledge as they interact with their environment Bayesian nets express the structured, graphical representations of probabilistic relationships between several random variables The graph structure is an explicit representation of conditional dependence (encoded by network edges) And the fundamental computation has become probabilistic inference The proper implementation of inference could be applied to *any* problem potentially being a golden bullet algorithm! Hammerstrom 7/28/09 4

A Simple Bayesian Network The CPT for node D, there are similar tables for A, B, and C P(d b,c) d 1 d 2 b 1, c 1 0.5 0.5 Each node has a CPT or Conditional Probability Table b 2, c 1 0.3 0.7 b 1, c 2 0.9 0.1 b 2, c 2 0.8 0.2 Hammerstrom 7/28/09 5

Pattern Recognition As Noisy Channel Decoding 6

Inference, A Simple Example The Inference Problem: Choose the most likely y, based on P[y x ] We need to infer the most likely original message given the data we received and our knowledge of the statistics of channel errors and the messages being generated Hammerstrom 7/28/09 7

Assumption: Bayesian Inference As A Fundamental Computation In Future Systems In mapping inference computations to hardware there there are a number of issues to be considered, including: The type and degree of parallelism (multiple, independent threads versus data parallelism) Arithmetic precision Inter-thread communication Local storage requirements, etc. There are several variations on basic Bayesian techniques, over a number of different fields, from communication theory and pattern recognition, to computer vision, robotics, and speech recognition However, for this review of inference as a computational model, three general families of algorithms are considered: 1) Inference by Analysis, 2) Inference by Random Sampling, and 3) Inference Using Distributed Representations 8

Analytic Inference Analytic techniques constitute the most widely used approach for doing inference over Bayesian Networks Most Bayesian Networks are evaluated using Bayesian Belief Propagation (BBP) developed by Pearl Typically data are input to the network by setting certain variables to known or observed values ( evidence ) Bayesian Belief Propagation is then performed to find the probability distributions of the free variables Analytic techniques require significant precision and dynamic range, generally in the form of floating point representations This plus the limited parallelism makes them good candidates for multi-core architectures, but not necessarily for more advanced nano-scale computation 9

Random Sampling Another approach to handling inference is via random sampling There are a number of techniques, most of which fall under the general category of Monte Carlo Simulation Again, evidence is input by setting some nodes to known values Then random samples of the free variables are generated Two commonly used techniques are Adaptive Importance Sampling and Markov Chain Monte Carlo simulation These techniques basically use adaptive sampling techniques to do an algorithmic search of the model s state space For large complex Bayesian Structures, such random sampling is often the only way to evaluate the network However, random sampling suffers from the fact that as the size of the Network increases, increasingly larger sample sets are required to obtain sufficiently accurate statistics 10

An example of hardware / device support for such a system is the work of Prof. Krishna Palem (Rice University) on probabilistic CMOS (pcmos) Prof. Palem has shown that using pcmos can provide significant performance benefits in implementing Monte Carlo random sampling pcmos logic then is used to generate random numbers, thereby accelerating the sampling process Monte Carlo techniques are computationally intensive and so still tend to have scaling limitations However, on the plus side, massively parallel evaluations are possible and arithmetic precision requirements are relaxed So such techniques map cleanly to simpler, massively parallel, low precision computing structures These techniques may also benefit from morphic cores with hardware accelerated random number generation. 11

Distributed Data Representation Networks Networks based on distributed data representations (DDR) are actually a different way to structure Bayesian Networks, and although analytic and sampling techniques can be used on these structures, they also allow different kinds of massively parallel execution The use of DDR is very promising, but is also the most limited in successful demonstrations of real applications Computing with DDRs can be thought of as the computational equivalent of spread spectrum communication In a distributed representation, meaning is not represented by single symbolic units, but is the result of the interaction of a group of units typically configured in a network structure, and often each unit can participate in several representations Representing data in this manner more easily allows incremental, integrative, decentralized adaptation The computational and communication loads are spread more evenly across the system Distributed representation also appears to be an important computational principle in neural systems 12

Biological neural circuits perform inference over huge knowledge structures in fractions of a second using extremely slow, unreliable devices It is not clear exactly how they manage this incredible trick, considering that Bayesian inference is so computationally intense (NP-Hard) However, there is some speculation that DDR plays an important role however, more theory and algorithm development is needed This is an active area of research and no doubt much progress will be made by the time many Emerging Research Devices become commercially available. One hypothesis is that sparsely connection networks lead naturally to the distribution of representation, and subsequently to significant factorization of the inference process, this, coupled with massively parallel hardware, may enable entirely new levels of inference capabilities. 13

One such model was developed by Lee and Mumford, who proposed a hierarchical, Bayesian inference model of the primate visual cortex 14 Lee and Mumford Visual cortex model Hammerstrom

Another example is the work of Jeff Hawkins and Dileep George at Numenta, Inc. Their model starts with an approximation to a general Bayesian module, which can then be combined into a hierarchy to form what they call a Hierarchical Temporal Memory (HTM) Issues related to hardware architectures for Bayesian Inference and how they may be implemented with emerging devices are now being studied 15

Architecture Mapping Bayesian Networks to a multi-core implementation is straightforward, just implement each node as a task - and in a simple SMP based, multi-core machine that would most certainly provide good performance However, this approach breaks down as we scale to very large networks Bayesian Networks tend to be storage intensive, so implementation issues such as data structure organization, memory management and cache utilization also become important A potentially serious performance constraint may be access to primary memory, and it is not yet clear how effective caching will be in ameliorating this delay Another is limited inter-thread bandwidth in distributed memory systems (since BBP Inference also sometimes requires extensive message passing) However, as we scale to the very large networks required to solve complex problems, a variety of optimizations become possible and, in fact, necessary 16

One promising massively parallel approach is that of associative processing, which has been shown to approximate Bayesian inference, and which has the potential for huge levels of parallelism Using morphic cores for heterogeneous multi-core structures, such massively parallel implementations of Bayesian networks becomes relevant Another interesting variation is to eliminate synchrony, where inter-module update messages arrive at random times and computation within a module proceeds at its own pace, updating its internal estimates when it receives update messages, otherwise continuing without them More study is needed to explore radical new implementation technologies, such as analog based soft constraint satisfaction, and how they may be used to do inference 17

Algorithm Family Summary Technique Parallelism (Threads) Inter-Thread Communication Computational Precision Storage/Node State of the Art Analytic Moderate Moderate High Moderate Mature Random Sampling Distributed / Hierarchies High Low Moderate Moderate Mature High Low Low Low Preliminary 18