An Introduction to Biologically-Inspired Visual Recognition

Size: px
Start display at page:

Download "An Introduction to Biologically-Inspired Visual Recognition"

Transcription

1 Universität Hamburg Department Informatik Knowledge Technology, WTM An Introduction to Biologically-Inspired Visual Recognition Seminar Paper Biologically-Inspired Artificial Intelligence Sebastian Starke Matr.Nr

2

3 An Introduction to Biologically-Inspired Visual Recognition Abstract This paper focusses on introducing the topic of visual recognition from a biologicallyinspired point of view. Since humans can efficiently detect and recognize edges, structures and motion and are able to classify objects in highly complex and dynamic real-world environments, it briefly explains the anatomy and functionalities of the primate visual system in order to achieve an intuitive understanding of the underlying processes. Coherently, the opportunities and limitations in resulting computational approaches will be described. Edge detection as a very fundamental lower-level problem will be considered by comparing different conventional algorithms with novel biologically-inspired approaches using multilevel surround inhibition or hexagonal pixels processed by spiking neural networks. Ultimately, the hierarchical vision architecture HMAX for the higher-level problem of object recognition as well as its improvements that could be obtained by sparsity-regularization will be discussed. Contents 1 Introduction and Motivation 2 2 The Primate Visual System Anatomy and Functionalities Inspirations for Computational Visual Recognition Edge Detection Conventional Operators Multilevel Surround Inhibition Hexagonal Pixels and Spiking Neural Networks Object Recognition HMAX: The Standard Model Sparsity-Regularization Conclusion 19 Bibliography 20 1

4 Sebastian Starke 1 Introduction and Motivation Here lay a way to formulate the purpose of vision building a description of the shapes and positions of things from images. Of course, that is by no means all that vision can do; it also tells about the illumination and about the reflectances of the surfaces that make the shapes their brightness and colors and visual textures and about their motion. But these things seemed secondary; they could be hung off a theory in which the main job of vision was to derive a representation of shape. David Marr This quote perfectly matches the problem of vision in an intuitive but very meaningful way. The brain s visual system provides remarkably good functionalities and abilities in visual information processing by applying highly complex neuronal processing. All the more, it seems to perform this processing almost effortlessly. Therefore, the ultimate goal of computational image processing is lastly to find algorithms and models that imitate this underlying behaviour as good as possible. In fact, nature itself mostly provides the optimal solutions for many problems where it is beneficial to if possible adopt the underlying patterns and processes. For example, determining the depth and distance of objects is optimally done by stereo-vision using two cameras and integrating the correlated information of two separate images. This principle is clearly adopted from nature given two eyes that are independently able to see, but are connected to the same visual system for neuronal processing. However, in contrast to cameras that merely give a raw projection of visual information of the environment, the visual system is able to provide information analysis of lower-level features like edges, shapes and forms, textures and colors, motion and depth as well as subjective awareness of higher-level categories like objects and their interpretation. Fig. 1 shows two examples of illusory contour perception that give an intuitive proof of such visual information processing in the brain. Though there is no contrast change in the intensity or true luminance at certain invisible contours, the brain is able to perceive specific shapes at even different layers. In algorithmic design, biologically- Figure 1: Illusory Contour Perception: An illusionary white square overlapping four perceived circles (left image) and the famous Kanisza Triangle (right image) 2

5 An Introduction to Biologically-Inspired Visual Recognition inspired approaches are more likely to fulfill the demand in robustness, selectivity and fastness that is provided by the primate visual system and required for many applications such as edge and motion detection, object and activity recognition or vision-based navigation. Therefore, the neuronal information processing in the brain has been extensively studied in recent years. Intrinsically, it can be modelled as a complex hierarchical processing system that is subdivided into various layers consisting of various cells of different complexity [Hubel and Wiesel, 1968]. Each of these layers fulfills specific functionalities [Masland, 2001] which are activated by spiking neurons or spatio-temporal receptive fields of retinal ganglion cells [Hosoya et al., 2005, Hubel and Wiesel, 1968]. It has been identified that these receptive fields are reactive to different stimuli like contrast change or orientation features [Hosoya et al., 2005, Kandel and Schwartz, 1981]. Former research in the visual system of the human retina has also shown that the cone photoreceptors are arranged in a hexagonal lattice. However, many common approaches that try to solve the problem of edge detection still use rectangular shaped lattices what raises the question why to do so when nature provides another well-working solution. This insight shows up that the integrated neighbourhood lattice around a specific pixel or cell has important influence on the behaviour and efficiency of the visual processing model in the sense of accuracy as well as computational complexity. Therefore, [Kerr et al., 2011] imitated the mechanism of hexagonal shaped receptive fields modelled by hexagonal pixels which are then processed by spiking neural networks. Also, the Canny-Edge-Detector [Canny, 1983] could be extended by a neighbourhood-based and biologically-inspired technique namely multilevel surround inhibition with the outcome to suppress texture edges that do not represent important object boundaries [Papari et al., 2007]. Moreover, conventional algorithms solving the problem of object recognition and classification typically require many very well chosen training samples while still lacking in adaptivity. Therefore, hierarchical models especially HMAX (Hierarchical Model and X) have been extensively studied during recent years by trying to adapt the nature of the visual cortex and hence performing a generic object recognition [Hubel and Wiesel, 1968, Riesenhuber and Poggio, 1999]. Biologically-inspired sets of features have been discovered in order to achieve a higher robustness and selectivity while only requiring very few examples for learning phases [Serre et al., 2005, Ghodrati et al., 2012]. It was also possible to apply sparsity-regularization to HMAX and thus to obtain an improvement of sparse firing patterns [Zhang et al., 2014]. A major outcome of this modification is that no supervised labelling has to be applied for learning higher-level features of objects. Ultimately, it will be shown how these mentioned approaches and improvements are able to outperform several state-of-the-art algorithms by using a variety of test images from different recognition and classification categories. 3

6 Sebastian Starke 2 The Primate Visual System The primate visual system has been studied for decades, but only recent years of research in the visual cortex had remarkable impact from a neurological or psychological point of view. Most of these investigations were studied on the brain of the macaque monkey where several sensory-motor and cortical areas were discovered to be homologous to the corresponding areas in the human brain. Still, there is high uncertainty in the functionalities and correlation of certain cortex areas considering the awareness and consciousness of vision as well as the general information processing under various stimuli input. This chapter is most widely based on recent reviews of the primate visual system found in [Krüger et al., 2013, Tong, 2003, Yang, 2007]. 2.1 Anatomy and Functionalities The stimuli input of the environment is captured by the eye which can from a functional point of view be compared to a camera. From the retina with its hexagonally arranged photoreceptor ganglion cells the information impulses are transmitted through the optic nerves which are emerged at the optic chiasma. Therefore, the visual input in the left and right optic nerve is perceived by each other side of the cerebral hemisphere. Furthermore, around 90% of the nerve fibres from the optic tract are projected (optic radiation) to the lateral geniculate nucleus (LGN) which is lastly connected to the primary visual cortex also known as V1 or striate cortex. This area is currently the most extensively and well studied part of the visual cortex. The remaining 10% are transmitted to various areas of the extrastriate cortex which covers the area around V1. The most important regions in the extrastriate cortex are covered by layers V2 to V4 as well as MT (V5) or MST (medial (superior) temporal lobe). Since the whole visual cortex covers the areas of psychological visual information processing, any damage along this pathway from the eyeball to this region the precortical processing will result in a loss of conscious visual perception. However, the majority of information transmission between layers in the visual cortex is sent by feedforward connections from V1 to V4. Layers below V4 typically provide feedback projections to the V1 and the LGN or might be even bidirectional. Fig. 2 illustrates a higher-level description of the anatomy of the brain from the eyeball to the visual cortex while Fig. 3 shows the locations and connectivity of cortical areas in the primate visual cortex. All information transmission is obtained by receptive fields which are activated under various stimuli input. They were presented in the famous paper of Hubel and Wiesel [Hubel and Wiesel, 1968]. These receptive fields are located anywhere in the brain to enable neuronal processing and can roughly be differentiated in simple or complex cells. Simple cells respond to light patterns that have a particular orientation, size and position. By their clearly structured excitatory and inhibitory regions, they respond best to a bar on an edge where most light is located in the excitatory region and only little in the inhibitory. From a technical point 4

7 An Introduction to Biologically-Inspired Visual Recognition Figure 2: Higher-level anatomy of the brain from [Pearson-Education, 2004] Figure 3: Cortical areas in the primate visual cortex from [Krüger et al., 2013] 5

8 Sebastian Starke of view, it was possible to show that Gabor-Wavelet transformations provide an approximation to the behaviour of simple cells. Complex cells are also sensitive to the orientation, but not to the position of the stimulus which has proven to be most effective if flickering or moving. Furthermore, the size of these receptive fields increases by their hierarchical distance from the lower layers in the striate cortex to the higher layers in the extrastriate cortex. Fig. 4 illustrates the impulse responses of simple and complex cells by different stimuli input. Considering Figure 4: Stimuli input impulses in simple and complex cells from [Hubel and Wiesel, 1968, Yang, 2007] scene interpretation by the visual cortex, the receptive field cells in the V1 mainly process visual information that is sensitive to edges, line endings, motion, color or disparity. Therefore, this cortical area is responsible for the raw perception of vision. V2 is also sensitive to orientation, color and disparity, but provides a more sophisticated behaviour by processing correlated perception in terms of texture-defined and illusory contours or relative depth between shapes. Lower cortical layers provide a highly connective behaviour to each other. They continue integrating the transmitted visual information of V1 and V2 from lower- to higherlevel responses that provide direction and speed of motion, connectivity of shapes, curvature selectivity or luminance-invariant perception of hue. These lower layers (i.e. MT/MST) are not solely responsible for perception, but have also impact on motor control like smooth eye movements by feedback to higher layers. Considering a higher-level perspective of the functionality in the visual cortex, there are two important pathways of visual information processing namely the ventral pathway (V1 V2 (V3) V4) and the dorsal pathway (V1 V2 (MT)). The ventral pathway also known as the what-stream is responsible for object recognition in the inferior temporal cortex (IT). Therefore, the subcortical areas TEO (posterior inferior IT) and TE (anterior IT) are able to process highly complex visual features while providing position- and size-invariant shapeselectivity. Furthermore, the dorsal pathway described as the where-stream mostly projects into the frontal lobe bridging between the visual and sensorymotor cortical areas. The corresponding receptive fields typically react to visual motion, optical flow, first- or second-order disparity signals. Therefore, it is clear that the cortical areas in the dorsal pathway provide higher-level responses of cognitive reactive behaviour from various stimuli input of the environment. 6

9 An Introduction to Biologically-Inspired Visual Recognition 2.2 Inspirations for Computational Visual Recognition The explicit functionality and correlative connectivity between many areas in the primate visual cortex is still very theoretical and sometimes even controversal regarding different experiments containing lesions. However, focussing on biologically-inspired approaches for computational visual processing, the neurological behaviour within the primate visual cortex offers some interesting design principles. The various layers from V1 and V2 along the dorsal and ventral pathway roughly form an ordered sequence which thus indicates the use of hierarchical models. In terms of computational complexity, receptive fields suggest a sparse and spatial-partitioned neural processing where on the other hand the complexity of features at each lower layer increases generically. This generic approach comes along with another beneficial principle regarding the learning efficiency in the sense that already designed robust layers with particularly invariant proven features can be inherited. Also, considering the size of receptive fields that increases at layers which are sensitive to more sophisticated features, adopting this principle does also avoid the general problem of overfitting. However, the truth is that even the world from where the visual stimuli lastly come from is hierarchical. Accordingly, objects and their complex and simplex features each share the same hierarchical layers in the environment. Consequently, they are more correlated to each other and can therefore similarly be processed from a corresponding limited subset of layers in the designed computational hierarchical model. Furthermore, different visual processing tasks like selection and connection of edges and shapes, color perception or recognition of motion can be processed separately and can also be used for different purposes at higher layers. This seperation of information channels also provides robustness in case of unavailability of certain visual information. Additionally, this does even result in a better efficiency of combinatorial feature representation since creating a unique pattern of features for each object would lead to an explosion in combinatorial complexity. Also, previously unseen and therefore unknown objects can be learned and represented easily by feature combinations also known as the binding problem provided by separate lower layers. Moreover, while most computational approaches for visual recognition are based on pure feed-forward architectures, the visual cortex indicates that feed-backward connections play an important role as well. This is because vision is not only processed by the stimuli input of the environment, but also from any prior knowledge that has been learned. Lastly, the question remains whether nature itself preferably suggests either to optimize the functionalities or the hierarchical collaboration and connectivity of the layers in the visual cortex. This signalizes the tradeoff considering the design of biologically-inspired approaches that try to imitate the behaviour of the brain. 7

10 Sebastian Starke 3 Edge Detection It seems clear that edge detection is both from a biological and computational point of view a fundamental task to enable any higher-level visual processing considering scene interpretation. Accordingly, referring to the primate visual cortex, this visual information is mainly processed in V1. Edges in images basically depict the boundaries of objects which are typically characterized by abrupt changes in local intensities. Furthermore, solving this problem can also be used to filter out noisy high frequency content in the image while preserving important structural properties. In order to provide some historical and technical background, this chapter will at first review some conventional operators used for edge detection. Afterwards, the biologically-inspired concept of multilevel surround inhibition as well another approach processing hexagonally-shaped pixel lattices by spiking neural networks will be discussed. 3.1 Conventional Operators Over the last decades, there have been numerous publications concerning the problem of edge detection. The most prominent approaches are known as the Robert s- Cross-Operator [Roberts, 1963], the Sobel-Operator [Sobel and Feldman, 1968], the Laplacian-of-Gaussian [Marr and Hildreth, 1980] or the famous Canny-Edge-Detector [Canny, 1983]. These are either purely gradient-based by calculating an approximation for the first-order derivative or more sophisticatedly try to find zerocrossings using the second-order derivative convolved with a smoothing filter. However, both models are interested in finding the direction vector that denotes the maximum rate of change. The Roberts-Cross-Operator was one of the first solutions for edge detection and provides a simple and fast computation of the gradient. The partial first-orderderivatives g x and g y are efficiently computed using the two 2 2 convolution kernel masks depicted in Fig. 5. The gradient magnitude Mag( g) is then given by (1) and the angle of the gradient direction with the maximum rate of change is calculated by (2). Mag( g) = g(x, y) = g x 2 + g y 2 g x + g y (1) Mag( g) = arctan g y g x (2) According to the masks, this operator responds optimally to edges that are arranged by an angle of 45 within the pixel neighbourhood. Furthermore, the g x : { } g y : { } Figure 5: Convolution kernel masks used by Robert s-cross-operator 8

11 An Introduction to Biologically-Inspired Visual Recognition Sobel-Operator extends this approach by using a 3 3 convolution kernel matrix shown in Fig 6, but instead of approximating the partial first-order derivatives diagonally, the gradients g x and g y are computed with respect to the X- and Y -axis. Therefore, this operator has maximum response on edges that run horizontally or vertically. However, the computation of the gradient magnitude as well as for the direction with the maximum rate of change then holds the same as for the Robert s-cross-operator given by (1) and (2) g x : g y : Figure 6: Convolution kernel masks used by Sobel-Operator The major disadvantage of both the Robert s-cross-operator and the Sobel- Operator is that their overall performance is edge orientation dependent, meaning they might perform very poorly. An advance in robustness is therefore possible to achieve by adding Gaussian-distributed white noise. However, the Laplacian-of- Gaussian (LoG, also called the Marr-Hildreth-Operator) obtains this property of rotation invariance considering zero-crossings in the second derivative plus applying a smoothing filter that comes from a convolution with a Gaussian kernel mask. The Laplacian operator at a certain pixel of the intensity image is given by (3) and the corresponding Gaussian filtering function is defined as (4). 2 g = σ2 g σx + σ2 g 2 σy 2 (3) ) G(x, y) = exp ( x2 + y 2 2σ 2 (4) Interchanging the order of differentation and convolution in (5) then gives rise to (6) where c is used in order to normalize the sum of mask elements. 2 (f(x, y, σ))g(x, y) = h(x, y)g(x, y) (5) ( ) ) x h(x, y) = 2 + y 2 σ 2 G(x, y) = c exp ( x2 + y 2 σ 4 2σ 2 (6) This resulting function is also known as the Mexican-Hat-Operator given by its shape. A 5 5 approximation yields the convolution kernel mask shown in Fig. 7. The truth is that all edge detection algorithms are in some way inspired by the processing of the primate visual system since they integrate the direct neighbourhood around each pixel. Nevertheless, the Laplacian-of-Gaussian was the first approach that was indeed directly biologically-inspired by applying convolution with a smoothing filter since this processing is also done by the LGN. Coherently, the main outcome in using a smoothing filter is that it yields a remarkable improvement in robustness to outlier noise as provided by the primate visual system. 9

12 Sebastian Starke Figure 7: 5 5 approximation of the LoG convolution kernel mask Lastly, the Canny-Edge-Detector is considered as one of the most optimal edge detectors due to its high robustness to noise and reliable representation of edges. In contrast to other algorithms, a multi-stage process in applied where at first the image is smoothed by a Gaussian convolution filter. Afterwards, a gradient-based operator like the Roberts-Cross-Operator or the Sobel-Operator is used in order to highlight regions in the intensity image providing high gradient magnitudes that give rise to edges. Furthermore, the ultimate goal that distinguishes this algorithm from all others is that it connects those regions with high gradient magnitudes aiming to return continuous lines. This is achieved by non-maximum suppression this is setting to zero all pixels that are not on the top of a ridge and using two defined hysteresis thresholds a high and a low which are used to signalize only the beginning and the end of an edge and also to suppress noise which might otherwise be detected as an additional edge. Fig. 8 illustrates the performance of the four presented conventional algorithms for edge detection under the presence of noise. While both the Roberts- Cross-Operator and the Sobel-Operator either tend to under- or overfit, the Canny- Edge-Detector succeeds in filtering many noisy edge fragments and returns more separable regions. The Laplacian-of-Gaussian overall detects less edges than the Canny-Edge-Detector, but provides clearly structured regions and traceable lines of salient regions in the image. Figure 8: Conventional edge detectors from [Juneja and Sandhu, 2009] 10

13 An Introduction to Biologically-Inspired Visual Recognition 3.2 Multilevel Surround Inhibition While noise can be suppressed by smoothing or hysteresis thresholding, a major drawback which is shared by all presented conventional edge detection algorithms is given by their inability to detect luminance or intensity changes of edges caused by an object s texture. Studies on the primary visual system of humans have discovered a mechanism called non-classical receptive field inhibition (surround suppression) [Nothdurft et al., 1999, Papari et al., 2006] which performs such additional visual processing. Further experimental studies on the human visual system in [Papari et al., 2006] have also shown that regions of similar edge stimuli are more likely to be recognized as texture than object contours. Therefore, multilevel surround inhibition is a biologically-inspired technique which aims to imitate this mechanism with the outcome to obtain stronger region boundaries and object contours and to suppress texture edges. [Papari et al., 2007] applied this technique as an additional computational step to the Canny-Edge- Detector taking into account the amount of texture T (g x,y ) around a pixel g x,y controlled by a parameter α that defines the inhibition strength. The influence of α can be considered as an additional trade-off thresholding where too strong inhibition discards weak contours while too weak inhibition does not reliably detect texture edges. Fig. 9 illustrates the effect of setting different levels for thresholding and inhibition strength. Figure 9: Effect of different levels for thresholding and inhibition strength: high to low from left to right from [Papari et al., 2007] While the hysteresis thresholding of the Canny-Edge-Detector obtains two binary maps, multilevel surround inhibition generalizes this concept by generating n thresholds t k;1 k n regarding the gradient-magnitude of which each is connected to an inhibition level α k;1 k n. Afterwards, the n obtained binary maps are combined by an iterative connectivity-based algorithm. In more detail, T (g x,y ) is high under a close occurence of many similar edge stimuli in a local neighbourhood that is computed by the weighted average of gradient magnitudes. Accordingly, [7] denotes the inhibited gradient magnitude Mag I (g x,y ) and [8] describes the computation of regions Q k which are required in order to combine the binary maps. Mag I (g x,y ) = Mag( g x,y ) αt (g x,y ) (7) Q k = {[Mag, T ] T Mag αt > t k } (8) 11

14 Sebastian Starke Fig. 10 illustrates that most texture edges detected by the standard Canny-Edge- Detector are successfully suppressed by the multilevel surround inhibition. The important object boundaries are clearly visible and hence can significantly ease the task of segmentation and object recognition. Figure 10: Improvement of multilevel surround inhibition over the standard Canny- Edge-Detector from [Papari et al., 2007] 3.3 Hexagonal Pixels and Spiking Neural Networks Considering the human visual system, the cone photoreceptors in the retina are arranged in a hexagonal lattice from where the perceived visual stimuli of the environment are channelled over the LGN to the visual cortex. This stream of information is obtained by receptive field cells which are sensitive to action potentials also called spikes. Inspired by this mechanism, [Kerr et al., 2011] presented a novel approach for edge detection that is based on spiking neural networks which models the behaviour of the hexagonally arranged near-circular receptive field cells in the human visual system. The images are converted from a standard rectangular lattice into a hexagonal lattice pixel representation using the technique presented in [Middleton and Sivaswamy, 2001]. A significant property of hexagonal lattices is that the weighted distance of a center pixel to all of its neighbours equals 1. A hexagonal pixel is created by clustering 56 sub-pixels from a corresponding rectangular pixel block illustrated in Fig. 11. Each of them has to fulfill the properties 12

15 An Introduction to Biologically-Inspired Visual Recognition Figure 11: Hexagonal pixel from [Kerr et al., 2011] that there is no overlapping or gap between neighbouring subpixels of multiple hexagonal pixels and that all six edges are of approximately same length. The pixel intensity is calculated by the average over all 56 sub-pixels. This transformation lastly enabled a higher sampling efficiency and hence a better computational performance. Furthermore, a spiking neural network shown in Fig. 12 was used which is based on the conductance-based I&F (integrate-and-fire) model. It behaves mostly similar to the spiking neuron model proposed in [Hodgkin and Huxley, 1952], but requires less computational complexity. The applied model consists of 3 different layers, where the receptor layer represents the cone photoreceptors of which each corresponds to a hexagonal pixel in the image. These receptive fields are connected Figure 12: Model of the Spiking Neural Network from [Kerr et al., 2011] 13

16 Sebastian Starke to the intermediate layer consisting of 4 different direction-selective neurons which are lastly integrated by a single neuron in the output layer. Depending on the firing rate, the neurons in the output layer lastly generate the corresponding edge graphics. For experimental studies, [Kerr et al., 2011] compared their presented hexagonal SNN to a square SNN proposed in [Wu et al., 2007] which applies square receptive fields to corresponding normal square pixel based images. The results are shown in Fig. 13. The upper row shows the output over the whole image while the lower row depicts a zoomed section area marked in the original image. The performance over all edge types was measured using the Figure of Merit (FoM) [Baddeley et al., 1979] which considers missing valid edge points and false-positive classifications due to noise fluctuations. While the hexagonal SNN overall and for all edge types obtained a better signal-to-noise ratio than the square SNN, it also yielded notably better results in areas of high noise. Figure 13: Performance of hexagonal SNN over square SNN from [Kerr et al., 2011] 14

17 An Introduction to Biologically-Inspired Visual Recognition 4 Object Recognition While the primary visual cortex i.e. V1 is mainly responsible for edge detection and perceiving lower-level features, the whole connectivity along the ventral pathway accomplishes the task of object recognition integrating higher-level features. Object recognition means to find a proper segmentation of edges to determine prominent contour lines which are required to perform classification using distinctive features. Conventional algorithms solve this problem by computing several position- and scale-subimages to achieve transformation invariance before performing classification. Accordingly, this section will discuss a biologically-inspired hierarchical model for object recognition namely HMAX introducing the standard model of the architecture as well as the improvements which could be obtained by sparsity-regularization. 4.1 HMAX: The Standard Model Inspired by the work of [Hubel and Wiesel, 1968] on the monkey macaque brain, [Riesenhuber and Poggio, 1999] proposed a computational hierarchical model which aims to imitate the mechanism of object recognition in the visual cortex. However, the name HMAX (Hierarchical Model and X) was assigned by [Tarr, 1999]. Basically, HMAX is a straight feed-forward architecture though of local feedback loops which are not necessarily needed for its basic processing but are well studied to play a key role in cortical areas which models the main properties along the ventral pathway hence from V1 to IT. These are an increasing size and complexity of receptive fields and stimuli-selective neurons as well as a position-, scale- and orientation-invariant perception of features and pattern selectivity. This is obtained by simple cells having the same orientation-selective receptive fields and being located at different positions but connected to the same corresponding complex cells. Additionally, the output of a simple to a complex cell is computed by one of two pooling mechanisms (MAX or SUM) in order to suppress noise and give high relevance to strong and important afferent inputs. MAX performs a nonlinear maximum operation which takes the strongest afferent input to a simple cell yielding an postsynaptic response. The key idea is to achieve only little variation of cell responses and therefore to match the best stimuli feature. Moreover, MAX-operation responses have also shown to be selective to objects appearing together with multiple other objects at the same time by ignoring minor afferents. SUM performs an equally weighted linear summation of afferent inputs to obtain an isotropic response. In order to enable perception of higher-level features such as poses or facial expressions as well as invariance to illumination or perspective, HMAX also implies a learning network which is based on Gaussian Radial Basis Functions (GRBF) [Poggio, 1990]. This network learns from a set of samples of view-tuned unit (VTU) cells with different weighted input-output pairs. However, 15

18 Sebastian Starke the model is mainly based on MAX-operations while SUM-operations are specifically applied to higher-level VTU cells where the afferents are already particularly sensitive to specific stimuli patterns. In more detail, Fig. 14 shows the hierarchical architecture that has been proposed in [Riesenhuber and Poggio, 1999]. The model basically consists of five layers namely S1, C1, S2, C2 and VTU. The S1 layer models the visual processing from the retina a receptive field size of greyscale pixels (5 ) to the cortical area V1 resembling the properties and the mechanism of simple cells being sensitive to differently oriented bars or edges. The S1 units are two-dimensional Gaussian-filters oriented at 0, 45, 90 and 135 where each pixel is centered and normalized such that an S1 activity between -1 and 1 is obtained. The responses of simple S1-cells of same orientations within a certain pooling range are then MAXpooled by complex C2-cells of larger receptive field size while preserving feature specifity. C1 cells are then either pooled by S2 cells sensitive to co-responding features of C1 cells or larger C2 cells sensitive to the same features as C1 cells but required to combine features of higher complexity. S2 can be considered to represent the feature dictionary of HMAX that are combinations of C1 cells of different orientations. Therefore, S2 cells are also MAX-pooled by C2 cells in order to achieve size and position invariance. Accordingly, the mechanism of the C2 layer can be compared to the visual processing of the higher layers in the extrastriate cortex i.e. V4 or IT. Lastly, the C2 cells are then feeding to the VTU layer pooled by SUM-operations in order to achieve both GRBF-based learning as well as an invariant recognition of complex features of different objects. Figure 14: Hierarchical architecture of HMAX from [Riesenhuber and Poggio, 1999] 16

19 An Introduction to Biologically-Inspired Visual Recognition 4.2 Sparsity-Regularization One major disadvantage in the standard model of HMAX is that the lower-level features are more considerably static than adaptively learned. The neurons of the S1 layer fire under each input that matches a certain stimuli regardingless to its activity what tends to cause randomly-like generated mid-level features which then again apply the same behaviour to extract higher-level features. This is controversial to the actual processing in the brain, since many stimuli especially for lower features, but more generally at almost all stages of the ventral pathway are suppressed and fire only occasionally when reaching a certain amount of activity [Abdou and Pratt, 1997, Carlson et al., 2011]. This shows that sparse firing takes a prominent role in designing models to mimic the visual processing in the brain. Sparsity in general means to suppress and filter out information of minor relevance in order to reduce noisy signal and hence to use prominent features with high impact. [Waydo and Koch, 2008] firstly applied a sparse coding mechanism to the output of HMAX leading to sparse invariant representation of objects. Based on this, [Zhang et al., 2014] proposed an advanced sparsity-regularized model for HMAX where either a standard sparse coding (SSC) [Pasupathy and Connor, 2002] or an independent component analysis (ICA) [Hurri et al., 2009] is applied to every S layer in order to obtain emergence of mid-level and also higher-level features. Both SSC and ICA are linear unsupervised learning models where feature extraction with SSC provides a larger dictionary size and ICA succeeds better in inferring different feature maps. While the original HMAX consists of S1 and S2 layers of different size which are then MAX-pooled over all receptive field positions to produce single higher-level features at the final C layer, the S1 and S2 bases or filters in the sparse HMAX are of same size and are each learned by SSC or ICA. Also, there are allowed higher layers than C2 what then enables to extract multiple and also more complex features. Fig. 15 shows the model of Sparse HMAX consisting of six layers with addition of S3 and C3 compared to the original HMAX. A major Figure 15: Sparse HMAX from [Zhang et al., 2014] 17

20 Sebastian Starke outcome of Sparse HMAX is the ability to learn and robustly recognize higherlevel features from unlabeled training images what directly mimics a prominent capability of the brain. Accordingly, this strongly corresponds to the behaviour of the human cortical areas ITC and MTL. For experiments, the model was trained with images from different classification categories where Fig. 16 illustrates the extracted features that were learned by S2 and S3 bases. The classification was done by a linear multiclass SVM where the resulting ROC-curves are depicted in Fig. 17. Compared to the original HMAX, Sparse HMAX offers a strikingly more reliable classification accuracy which could be increased from 44±1.5 up to 76.13±0.85 and also manages to outperform many other state-of-the-art models for object recognition. However, it can be considered to be most efficient on large scale datasets due to the nature of sparsity. Figure 16: Learned features at S2 (top row) and S3 (bottom row) bases from [Zhang et al., 2014] Figure 17: Most selective learned features depicted in Fig. 16 (top row) with corresponding ROC-curve (bottom-row, vertical-axis: true-positive rate, horizontalaxis: false-positive rate) from [Zhang et al., 2014] 18

21 An Introduction to Biologically-Inspired Visual Recognition 5 Conclusion This paper introduced the general concepts and methods of biologically-inspired visual recognition. To give an initial background of the visual processing in the brain, the higher-level anatomy of the visual system and the functionalities of the various important cortical areas from V1 along the ventral and dorsal pathway were explained. Also, the important role of simple and complex receptive field cells has been pointed out. Accordingly, the main inspirations one can derive for computational approaches are a hierarchical extraction from lower- to higher-level features, separate and generic processing of distinct stimuli and lastly sparse firing patterns. Considering the lower-level problem of edge detection which mainly takes part in the cortical area V1, conventional algorithms typically face the problem of capturing noisy texture edges within the true boundaries of an object. Therefore, multilevel surround inhibition as an extension of the Canny-Edge-Detector has shown to be able to suppress such texture edges by applying different levels for thresholding and a parameter α which controls the inhibition strength taking into account the amount of texture around a pixel. Inspired by the hexagonally arranged cone photoreceptors on the retina, traditional images with rectangular shaped pixel lattices were converted to a hexagonal representation and processed by a spiking neural network. It could be shown that the signal-to-noise ratio of detected edges with different orientations was improved. Also, it also yielded a higher robustness to highly noisy image content. Furthermore, the higher-level problem of object recognition was presented by HMAX aiming to imitate the hierarchical processing along the ventral pathway. It mainly represents a straight feed-forward architecture where the neuron outputs of simple or complex cells are pooled by a MAX or SUM operator in order to generically extract increasingly complex features. The extension by sparsityregularization (Sparse HMAX) lastly obtained sparse firing patterns together with the possibility to add more layers to the network with the outcome to dramatically reduce the rate of misclassification. Conclusively, biologically-inspired approaches can yield remarkably good and outperforming results, but it remains that finding a convex solution might not always be possible due to the intransparency of the underlying processing scheme. However, such approximative solutions are mostly sufficient since nature itself shares the same limitations but obtains striking capabilties. 19

22 Sebastian Starke References [Abdou and Pratt, 1997] Abdou, I. E. and Pratt, W. K. (1997). Quantitative design and evaluation of enhancement/ thresholding edge detectors. Proceedings of the IEEE, Vol. 67, No. 5, pp [Baddeley et al., 1979] Baddeley, R., Abbott, L. F., Booth, M., Sengpiel, F., and Freeman, T. (1979). Responses of neurons in primary and inferior temporal visual cortices to natural scenes. Proceedings of the Royal Society B - Biological Sciences, Vol. 264, pp [Canny, 1983] Canny, J. F. (1983). Finding edges and lines in images. MIT Press, Masters thesis. [Carlson et al., 2011] Carlson, E. T., Rasquinha, R. J., Zhang, K., and Connor, C. E. (2011). A sparse object coding scheme in area V4. Current Biology, Vol. 21, pp [Ghodrati et al., 2012] Ghodrati, M., Khaligh-Razavi, S., Ebrahimpour, R., Rajaei, K., and Pooyan, M. (2012). How Can Selection of Biologically Inspired Features Improve the Performance of a Robust Object Recognition Model? PLoS One. [Hodgkin and Huxley, 1952] Hodgkin, A. and Huxley, A. (1952). A quantitative description of membrane current and its application to conduction and excitation in nerve. Journal of Physiology, Vol. 117, pp [Hosoya et al., 2005] Hosoya, T., Baccus, S., and Meister, M. (2005). Dynamic predictive coding by the retina. Nature Neuroscience, Vol. 436, pp [Hubel and Wiesel, 1968] Hubel, D. and Wiesel, T. (1968). Receptive fields and functional architecture ofmonkey striate cortex. J. Physiol. [Hurri et al., 2009] Hurri, J., Hoyer, P. O., and Hyvarinen, A. (2009). Natural Image Statistics: A Probabilistic Approach to Early Computational Vision. Springer-Verlag. [Juneja and Sandhu, 2009] Juneja, M. and Sandhu, P. S. (2009). Performance Evaluation of Edge Detection Techniques for Images in Spatial Domain. International Journal of Computer Theory and Engineering, Vol. 1, No. 5. [Kandel and Schwartz, 1981] Kandel, E. and Schwartz, J. (1981). neural science. Elsevier. Principles of [Kerr et al., 2011] Kerr, D., Coleman, S., McGinnity, M., Wu, Q., and Clogenson, M. (2011). Biologically Inspired Edge Detection. IEEE, 11th International Conference on Intelligent Systems Design and Applications. 20

23 An Introduction to Biologically-Inspired Visual Recognition [Krüger et al., 2013] Krüger, N., Janssen, P., Kalkan, S., Lappe, M., Leonardis, A., Piater, J., Rodrguez-Sanchz, A., and Wiskott, L. (2013). Deep Hierarchies in the Primate Visual Cortex: What Can We Learn For Computer Vision? Transactions on Pattern Analysis and Machine Intelligence. [Marr and Hildreth, 1980] Marr, D. and Hildreth, E. (1980). Theory of Edge Detection. Proceedings of the Royal Society of London, Series B, Biological Sciences, Vol. 207, No. 1167, pp [Masland, 2001] Masland, R. H. (2001). The fundamental plan of the retina. Nature Neuroscience, Vol. 4, pp [Middleton and Sivaswamy, 2001] Middleton, L. and Sivaswamy, J. (2001). Edge Detection in a Hexagonal-Image Processing Framework. Image and Vision Computing, Vol. 19, pp [Nothdurft et al., 1999] Nothdurft, H., Gallant, J., and van Essen, D. (1999). Response modulation by texture surround in primate area V1: Correlates of popout under anesthesia. Visual Neuroscience, Vol. 16, pp [Papari et al., 2007] Papari, G., Campisi, P., and Petkov, N. (2007). Multilevel Surround Inhibition. A Biologically Inspired Contour Detector. SPIE, Vol [Papari et al., 2006] Papari, G., Campisi, P., Petkov, N., and Neri, A. (2006). A multiscale approach to contour detection by texture suppression. SPIE, In Proceedings of Alg. and Syst., Vol. 6064A. [Pasupathy and Connor, 2002] Pasupathy, A. and Connor, C. E. (2002). Population coding of shape in area V4. Nature Neuroscience, Vol. 5, pp Inc., publishing as Ben- [Pearson-Education, 2004] Pearson-Education (2004). jamin Cummings. [Poggio, 1990] Poggio, T. (1990). A theory of how the brain might work. Cold Spring Harbor Symp. Quant. Biol., Vol. 55, pp [Riesenhuber and Poggio, 1999] Riesenhuber, M. and Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature America. [Roberts, 1963] Roberts, L. (1963). solids. MIT Press. Machine perception of three-dimensional [Serre et al., 2005] Serre, T., Wolf, L., and Poggio, T. (2005). Object Recognition with Features Inspired by Visual Cortex. CVPR, Vol. 2. [Sobel and Feldman, 1968] Sobel, I. and Feldman, G. (1968). A 3x3 isotropic gradient operator for image processing. Presented at a talk at the Stanford Artificial Intelligence Project. 21

24 Sebastian Starke [Tarr, 1999] Tarr, M. (1999). News on Views: Pandemonium Revisited. Nature Neuroscience, Vol. 2, pp [Tong, 2003] Tong, F. (2003). Primary Visual Cortex and Visual Awareness. Nature Reviews, Neuroscience, Vol. 4. [Waydo and Koch, 2008] Waydo, S. and Koch, C. (2008). Unsupervised learning of individuals and categories from images. Neural Computation, Vol. 20, pp [Wu et al., 2007] Wu, Q., McGinnity, M., Maguire, L., Belatreche, A., and Blackin, B. (2007). Edge Detection Based on Spiking Neural Network Model. Springer, Proceedings of the International Conference on Intelligent Computing. [Yang, 2007] Yang, L. (2007). Biologically inspired visual models by sparse and unsupervised learning. Student Scholar Archive, Paper 260. [Zhang et al., 2014] Zhang, J., Hu, X., and Zhang, B. (2014). Sparsity-Regularized HMAX for Visual Recognition. PLoS One. 22

Reading Assignments: Lecture 5: Introduction to Vision. None. Brain Theory and Artificial Intelligence

Reading Assignments: Lecture 5: Introduction to Vision. None. Brain Theory and Artificial Intelligence Brain Theory and Artificial Intelligence Lecture 5:. Reading Assignments: None 1 Projection 2 Projection 3 Convention: Visual Angle Rather than reporting two numbers (size of object and distance to observer),

More information

Object recognition and hierarchical computation

Object recognition and hierarchical computation Object recognition and hierarchical computation Challenges in object recognition. Fukushima s Neocognitron View-based representations of objects Poggio s HMAX Forward and Feedback in visual hierarchy Hierarchical

More information

CS294-6 (Fall 2004) Recognizing People, Objects and Actions Lecture: January 27, 2004 Human Visual System

CS294-6 (Fall 2004) Recognizing People, Objects and Actions Lecture: January 27, 2004 Human Visual System CS294-6 (Fall 2004) Recognizing People, Objects and Actions Lecture: January 27, 2004 Human Visual System Lecturer: Jitendra Malik Scribe: Ryan White (Slide: layout of the brain) Facts about the brain:

More information

Early Stages of Vision Might Explain Data to Information Transformation

Early Stages of Vision Might Explain Data to Information Transformation Early Stages of Vision Might Explain Data to Information Transformation Baran Çürüklü Department of Computer Science and Engineering Mälardalen University Västerås S-721 23, Sweden Abstract. In this paper

More information

Modeling the Deployment of Spatial Attention

Modeling the Deployment of Spatial Attention 17 Chapter 3 Modeling the Deployment of Spatial Attention 3.1 Introduction When looking at a complex scene, our visual system is confronted with a large amount of visual information that needs to be broken

More information

International Journal of Advanced Computer Technology (IJACT)

International Journal of Advanced Computer Technology (IJACT) Abstract An Introduction to Third Generation of Neural Networks for Edge Detection Being inspired by the structure and behavior of the human visual system the spiking neural networks for edge detection

More information

Lateral Geniculate Nucleus (LGN)

Lateral Geniculate Nucleus (LGN) Lateral Geniculate Nucleus (LGN) What happens beyond the retina? What happens in Lateral Geniculate Nucleus (LGN)- 90% flow Visual cortex Information Flow Superior colliculus 10% flow Slide 2 Information

More information

M Cells. Why parallel pathways? P Cells. Where from the retina? Cortical visual processing. Announcements. Main visual pathway from retina to V1

M Cells. Why parallel pathways? P Cells. Where from the retina? Cortical visual processing. Announcements. Main visual pathway from retina to V1 Announcements exam 1 this Thursday! review session: Wednesday, 5:00-6:30pm, Meliora 203 Bryce s office hours: Wednesday, 3:30-5:30pm, Gleason https://www.youtube.com/watch?v=zdw7pvgz0um M Cells M cells

More information

Visual Categorization: How the Monkey Brain Does It

Visual Categorization: How the Monkey Brain Does It Visual Categorization: How the Monkey Brain Does It Ulf Knoblich 1, Maximilian Riesenhuber 1, David J. Freedman 2, Earl K. Miller 2, and Tomaso Poggio 1 1 Center for Biological and Computational Learning,

More information

PHY3111 Mid-Semester Test Study. Lecture 2: The hierarchical organisation of vision

PHY3111 Mid-Semester Test Study. Lecture 2: The hierarchical organisation of vision PHY3111 Mid-Semester Test Study Lecture 2: The hierarchical organisation of vision 1. Explain what a hierarchically organised neural system is, in terms of physiological response properties of its neurones.

More information

Plasticity of Cerebral Cortex in Development

Plasticity of Cerebral Cortex in Development Plasticity of Cerebral Cortex in Development Jessica R. Newton and Mriganka Sur Department of Brain & Cognitive Sciences Picower Center for Learning & Memory Massachusetts Institute of Technology Cambridge,

More information

LISC-322 Neuroscience Cortical Organization

LISC-322 Neuroscience Cortical Organization LISC-322 Neuroscience Cortical Organization THE VISUAL SYSTEM Higher Visual Processing Martin Paré Assistant Professor Physiology & Psychology Most of the cortex that covers the cerebral hemispheres is

More information

Reading Assignments: Lecture 18: Visual Pre-Processing. Chapters TMB Brain Theory and Artificial Intelligence

Reading Assignments: Lecture 18: Visual Pre-Processing. Chapters TMB Brain Theory and Artificial Intelligence Brain Theory and Artificial Intelligence Lecture 18: Visual Pre-Processing. Reading Assignments: Chapters TMB2 3.3. 1 Low-Level Processing Remember: Vision as a change in representation. At the low-level,

More information

Intelligent Edge Detector Based on Multiple Edge Maps. M. Qasim, W.L. Woon, Z. Aung. Technical Report DNA # May 2012

Intelligent Edge Detector Based on Multiple Edge Maps. M. Qasim, W.L. Woon, Z. Aung. Technical Report DNA # May 2012 Intelligent Edge Detector Based on Multiple Edge Maps M. Qasim, W.L. Woon, Z. Aung Technical Report DNA #2012-10 May 2012 Data & Network Analytics Research Group (DNA) Computing and Information Science

More information

Just One View: Invariances in Inferotemporal Cell Tuning

Just One View: Invariances in Inferotemporal Cell Tuning Just One View: Invariances in Inferotemporal Cell Tuning Maximilian Riesenhuber Tomaso Poggio Center for Biological and Computational Learning and Department of Brain and Cognitive Sciences Massachusetts

More information

The Integration of Features in Visual Awareness : The Binding Problem. By Andrew Laguna, S.J.

The Integration of Features in Visual Awareness : The Binding Problem. By Andrew Laguna, S.J. The Integration of Features in Visual Awareness : The Binding Problem By Andrew Laguna, S.J. Outline I. Introduction II. The Visual System III. What is the Binding Problem? IV. Possible Theoretical Solutions

More information

Networks and Hierarchical Processing: Object Recognition in Human and Computer Vision

Networks and Hierarchical Processing: Object Recognition in Human and Computer Vision Networks and Hierarchical Processing: Object Recognition in Human and Computer Vision Guest&Lecture:&Marius&Cătălin&Iordan&& CS&131&8&Computer&Vision:&Foundations&and&Applications& 01&December&2014 1.

More information

Visual Physiology. Perception and Attention. Graham Hole. Problems confronting the visual system: Solutions: The primary visual pathways: The eye:

Visual Physiology. Perception and Attention. Graham Hole. Problems confronting the visual system: Solutions: The primary visual pathways: The eye: Problems confronting the visual system: Visual Physiology image contains a huge amount of information which must be processed quickly. image is dim, blurry and distorted. Light levels vary enormously.

More information

2/3/17. Visual System I. I. Eye, color space, adaptation II. Receptive fields and lateral inhibition III. Thalamus and primary visual cortex

2/3/17. Visual System I. I. Eye, color space, adaptation II. Receptive fields and lateral inhibition III. Thalamus and primary visual cortex 1 Visual System I I. Eye, color space, adaptation II. Receptive fields and lateral inhibition III. Thalamus and primary visual cortex 2 1 2/3/17 Window of the Soul 3 Information Flow: From Photoreceptors

More information

Prof. Greg Francis 7/31/15

Prof. Greg Francis 7/31/15 s PSY 200 Greg Francis Lecture 06 How do you recognize your grandmother? Action potential With enough excitatory input, a cell produces an action potential that sends a signal down its axon to other cells

More information

Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition

Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition Charles F. Cadieu, Ha Hong, Daniel L. K. Yamins, Nicolas Pinto, Diego Ardila, Ethan A. Solomon, Najib

More information

A CONVENTIONAL STUDY OF EDGE DETECTION TECHNIQUE IN DIGITAL IMAGE PROCESSING

A CONVENTIONAL STUDY OF EDGE DETECTION TECHNIQUE IN DIGITAL IMAGE PROCESSING Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

OPTO 5320 VISION SCIENCE I

OPTO 5320 VISION SCIENCE I OPTO 5320 VISION SCIENCE I Monocular Sensory Processes of Vision: Color Vision Mechanisms of Color Processing . Neural Mechanisms of Color Processing A. Parallel processing - M- & P- pathways B. Second

More information

EDGE DETECTION. Edge Detectors. ICS 280: Visual Perception

EDGE DETECTION. Edge Detectors. ICS 280: Visual Perception EDGE DETECTION Edge Detectors Slide 2 Convolution & Feature Detection Slide 3 Finds the slope First derivative Direction dependent Need many edge detectors for all orientation Second order derivatives

More information

A Neurally-Inspired Model for Detecting and Localizing Simple Motion Patterns in Image Sequences

A Neurally-Inspired Model for Detecting and Localizing Simple Motion Patterns in Image Sequences A Neurally-Inspired Model for Detecting and Localizing Simple Motion Patterns in Image Sequences Marc Pomplun 1, Yueju Liu 2, Julio Martinez-Trujillo 2, Evgueni Simine 2, and John K. Tsotsos 2 1 Department

More information

An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns

An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns 1. Introduction Vasily Morzhakov, Alexey Redozubov morzhakovva@gmail.com, galdrd@gmail.com Abstract Cortical

More information

Photoreceptors Rods. Cones

Photoreceptors Rods. Cones Photoreceptors Rods Cones 120 000 000 Dim light Prefer wavelength of 505 nm Monochromatic Mainly in periphery of the eye 6 000 000 More light Different spectral sensitivities!long-wave receptors (558 nm)

More information

Local Image Structures and Optic Flow Estimation

Local Image Structures and Optic Flow Estimation Local Image Structures and Optic Flow Estimation Sinan KALKAN 1, Dirk Calow 2, Florentin Wörgötter 1, Markus Lappe 2 and Norbert Krüger 3 1 Computational Neuroscience, Uni. of Stirling, Scotland; {sinan,worgott}@cn.stir.ac.uk

More information

Adventures into terra incognita

Adventures into terra incognita BEWARE: These are preliminary notes. In the future, they will become part of a textbook on Visual Object Recognition. Chapter VI. Adventures into terra incognita In primary visual cortex there are neurons

More information

On the implementation of Visual Attention Architectures

On the implementation of Visual Attention Architectures On the implementation of Visual Attention Architectures KONSTANTINOS RAPANTZIKOS AND NICOLAS TSAPATSOULIS DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING NATIONAL TECHNICAL UNIVERSITY OF ATHENS 9, IROON

More information

Competing Frameworks in Perception

Competing Frameworks in Perception Competing Frameworks in Perception Lesson II: Perception module 08 Perception.08. 1 Views on perception Perception as a cascade of information processing stages From sensation to percept Template vs. feature

More information

Competing Frameworks in Perception

Competing Frameworks in Perception Competing Frameworks in Perception Lesson II: Perception module 08 Perception.08. 1 Views on perception Perception as a cascade of information processing stages From sensation to percept Template vs. feature

More information

Carlson (7e) PowerPoint Lecture Outline Chapter 6: Vision

Carlson (7e) PowerPoint Lecture Outline Chapter 6: Vision Carlson (7e) PowerPoint Lecture Outline Chapter 6: Vision This multimedia product and its contents are protected under copyright law. The following are prohibited by law: any public performance or display,

More information

International Journal of Computational Science, Mathematics and Engineering Volume2, Issue6, June 2015 ISSN(online): Copyright-IJCSME

International Journal of Computational Science, Mathematics and Engineering Volume2, Issue6, June 2015 ISSN(online): Copyright-IJCSME Various Edge Detection Methods In Image Processing Using Matlab K. Narayana Reddy 1, G. Nagalakshmi 2 12 Department of Computer Science and Engineering 1 M.Tech Student, SISTK, Puttur 2 HOD of CSE Department,

More information

Exploring the Functional Significance of Dendritic Inhibition In Cortical Pyramidal Cells

Exploring the Functional Significance of Dendritic Inhibition In Cortical Pyramidal Cells Neurocomputing, 5-5:389 95, 003. Exploring the Functional Significance of Dendritic Inhibition In Cortical Pyramidal Cells M. W. Spratling and M. H. Johnson Centre for Brain and Cognitive Development,

More information

Performance evaluation of the various edge detectors and filters for the noisy IR images

Performance evaluation of the various edge detectors and filters for the noisy IR images Performance evaluation of the various edge detectors and filters for the noisy IR images * G.Padmavathi ** P.Subashini ***P.K.Lavanya Professor and Head, Lecturer (SG), Research Assistant, ganapathi.padmavathi@gmail.com

More information

Lighta part of the spectrum of Electromagnetic Energy. (the part that s visible to us!)

Lighta part of the spectrum of Electromagnetic Energy. (the part that s visible to us!) Introduction to Physiological Psychology Vision ksweeney@cogsci.ucsd.edu cogsci.ucsd.edu/~ /~ksweeney/psy260.html Lighta part of the spectrum of Electromagnetic Energy (the part that s visible to us!)

More information

The Eye. Cognitive Neuroscience of Language. Today s goals. 5 From eye to brain. Today s reading

The Eye. Cognitive Neuroscience of Language. Today s goals. 5 From eye to brain. Today s reading Cognitive Neuroscience of Language 5 From eye to brain Today s goals Look at the pathways that conduct the visual information from the eye to the visual cortex Marielle Lange http://homepages.inf.ed.ac.uk/mlange/teaching/cnl/

More information

Biological Bases of Behavior. 6: Vision

Biological Bases of Behavior. 6: Vision Biological Bases of Behavior 6: Vision Sensory Systems The brain detects events in the external environment and directs the contractions of the muscles Afferent neurons carry sensory messages to brain

More information

Computational model of MST neuron receptive field and interaction effect for the perception of selfmotion

Computational model of MST neuron receptive field and interaction effect for the perception of selfmotion Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 2008 Computational model of MST neuron receptive field and interaction effect for the perception of selfmotion

More information

Edge detection. Gradient-based edge operators

Edge detection. Gradient-based edge operators Edge detection Gradient-based edge operators Prewitt Sobel Roberts Laplacian zero-crossings Canny edge detector Hough transform for detection of straight lines Circle Hough Transform Digital Image Processing:

More information

Sparse Coding in Sparse Winner Networks

Sparse Coding in Sparse Winner Networks Sparse Coding in Sparse Winner Networks Janusz A. Starzyk 1, Yinyin Liu 1, David Vogel 2 1 School of Electrical Engineering & Computer Science Ohio University, Athens, OH 45701 {starzyk, yliu}@bobcat.ent.ohiou.edu

More information

Recurrent Refinement for Visual Saliency Estimation in Surveillance Scenarios

Recurrent Refinement for Visual Saliency Estimation in Surveillance Scenarios 2012 Ninth Conference on Computer and Robot Vision Recurrent Refinement for Visual Saliency Estimation in Surveillance Scenarios Neil D. B. Bruce*, Xun Shi*, and John K. Tsotsos Department of Computer

More information

Introduction to Computational Neuroscience

Introduction to Computational Neuroscience Introduction to Computational Neuroscience Lecture 5: Data analysis II Lesson Title 1 Introduction 2 Structure and Function of the NS 3 Windows to the Brain 4 Data analysis 5 Data analysis II 6 Single

More information

Nonlinear processing in LGN neurons

Nonlinear processing in LGN neurons Nonlinear processing in LGN neurons Vincent Bonin *, Valerio Mante and Matteo Carandini Smith-Kettlewell Eye Research Institute 2318 Fillmore Street San Francisco, CA 94115, USA Institute of Neuroinformatics

More information

using deep learning models to understand visual cortex

using deep learning models to understand visual cortex using deep learning models to understand visual cortex 11-785 Introduction to Deep Learning Fall 2017 Michael Tarr Department of Psychology Center for the Neural Basis of Cognition this lecture A bit out

More information

Keywords Fuzzy Logic, Fuzzy Rule, Fuzzy Membership Function, Fuzzy Inference System, Edge Detection, Regression Analysis.

Keywords Fuzzy Logic, Fuzzy Rule, Fuzzy Membership Function, Fuzzy Inference System, Edge Detection, Regression Analysis. Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Modified Fuzzy

More information

Quantitative Evaluation of Edge Detectors Using the Minimum Kernel Variance Criterion

Quantitative Evaluation of Edge Detectors Using the Minimum Kernel Variance Criterion Quantitative Evaluation of Edge Detectors Using the Minimum Kernel Variance Criterion Qiang Ji Department of Computer Science University of Nevada Robert M. Haralick Department of Electrical Engineering

More information

Cognitive Modelling Themes in Neural Computation. Tom Hartley

Cognitive Modelling Themes in Neural Computation. Tom Hartley Cognitive Modelling Themes in Neural Computation Tom Hartley t.hartley@psychology.york.ac.uk Typical Model Neuron x i w ij x j =f(σw ij x j ) w jk x k McCulloch & Pitts (1943), Rosenblatt (1957) Net input:

More information

B657: Final Project Report Holistically-Nested Edge Detection

B657: Final Project Report Holistically-Nested Edge Detection B657: Final roject Report Holistically-Nested Edge Detection Mingze Xu & Hanfei Mei May 4, 2016 Abstract Holistically-Nested Edge Detection (HED), which is a novel edge detection method based on fully

More information

Neuromorphic convolutional recurrent neural network for road safety or safety near the road

Neuromorphic convolutional recurrent neural network for road safety or safety near the road Neuromorphic convolutional recurrent neural network for road safety or safety near the road WOO-SUP HAN 1, IL SONG HAN 2 1 ODIGA, London, U.K. 2 Korea Advanced Institute of Science and Technology, Daejeon,

More information

Retinal DOG filters: high-pass or high-frequency enhancing filters?

Retinal DOG filters: high-pass or high-frequency enhancing filters? Retinal DOG filters: high-pass or high-frequency enhancing filters? Adrián Arias 1, Eduardo Sánchez 1, and Luis Martínez 2 1 Grupo de Sistemas Inteligentes (GSI) Centro Singular de Investigación en Tecnologías

More information

Ch 5. Perception and Encoding

Ch 5. Perception and Encoding Ch 5. Perception and Encoding Cognitive Neuroscience: The Biology of the Mind, 2 nd Ed., M. S. Gazzaniga, R. B. Ivry, and G. R. Mangun, Norton, 2002. Summarized by Y.-J. Park, M.-H. Kim, and B.-T. Zhang

More information

The Visual System. Cortical Architecture Casagrande February 23, 2004

The Visual System. Cortical Architecture Casagrande February 23, 2004 The Visual System Cortical Architecture Casagrande February 23, 2004 Phone: 343-4538 Email: vivien.casagrande@mcmail.vanderbilt.edu Office: T2302 MCN Required Reading Adler s Physiology of the Eye Chapters

More information

A new path to understanding vision

A new path to understanding vision A new path to understanding vision from the perspective of the primary visual cortex Frontal brain areas Visual cortices Primary visual cortex (V1) Li Zhaoping Retina A new path to understanding vision

More information

Attentive Stereoscopic Object Recognition

Attentive Stereoscopic Object Recognition Attentive Stereoscopic Object Recognition Frederik Beuth, Jan Wiltschut, and Fred H. Hamker Chemnitz University of Technology, Strasse der Nationen 62, 09107 Chemnitz, Germany frederik.beuth@cs.tu-chemnitz.de,wiltschj@uni-muenster.de,

More information

Parallel streams of visual processing

Parallel streams of visual processing Parallel streams of visual processing RETINAL GANGLION CELL AXONS: OPTIC TRACT Optic nerve Optic tract Optic chiasm Lateral geniculate nucleus Hypothalamus: regulation of circadian rhythms Pretectum: reflex

More information

Ch 5. Perception and Encoding

Ch 5. Perception and Encoding Ch 5. Perception and Encoding Cognitive Neuroscience: The Biology of the Mind, 2 nd Ed., M. S. Gazzaniga,, R. B. Ivry,, and G. R. Mangun,, Norton, 2002. Summarized by Y.-J. Park, M.-H. Kim, and B.-T. Zhang

More information

Edge Detection Techniques Based On Soft Computing

Edge Detection Techniques Based On Soft Computing International Journal for Science and Emerging ISSN No. (Online):2250-3641 Technologies with Latest Trends 7(1): 21-25 (2013) ISSN No. (Print): 2277-8136 Edge Detection Techniques Based On Soft Computing

More information

Continuous transformation learning of translation invariant representations

Continuous transformation learning of translation invariant representations Exp Brain Res (21) 24:255 27 DOI 1.17/s221-1-239- RESEARCH ARTICLE Continuous transformation learning of translation invariant representations G. Perry E. T. Rolls S. M. Stringer Received: 4 February 29

More information

Spectrograms (revisited)

Spectrograms (revisited) Spectrograms (revisited) We begin the lecture by reviewing the units of spectrograms, which I had only glossed over when I covered spectrograms at the end of lecture 19. We then relate the blocks of a

More information

Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention

Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention Tapani Raiko and Harri Valpola School of Science and Technology Aalto University (formerly Helsinki University of

More information

Position invariant recognition in the visual system with cluttered environments

Position invariant recognition in the visual system with cluttered environments PERGAMON Neural Networks 13 (2000) 305 315 Contributed article Position invariant recognition in the visual system with cluttered environments S.M. Stringer, E.T. Rolls* Oxford University, Department of

More information

Parallel processing strategies of the primate visual system

Parallel processing strategies of the primate visual system Parallel processing strategies of the primate visual system Parallel pathways from the retina to the cortex Visual input is initially encoded in the retina as a 2D distribution of intensity. The retinal

More information

Information Processing During Transient Responses in the Crayfish Visual System

Information Processing During Transient Responses in the Crayfish Visual System Information Processing During Transient Responses in the Crayfish Visual System Christopher J. Rozell, Don. H. Johnson and Raymon M. Glantz Department of Electrical & Computer Engineering Department of

More information

Biologically Motivated Local Contextual Modulation Improves Low-Level Visual Feature Representations

Biologically Motivated Local Contextual Modulation Improves Low-Level Visual Feature Representations Biologically Motivated Local Contextual Modulation Improves Low-Level Visual Feature Representations Xun Shi,NeilD.B.Bruce, and John K. Tsotsos Department of Computer Science & Engineering, and Centre

More information

G5)H/C8-)72)78)2I-,8/52& ()*+,-./,-0))12-345)6/3/782 9:-8;<;4.= J-3/ J-3/ "#&' "#% "#"% "#%$

G5)H/C8-)72)78)2I-,8/52& ()*+,-./,-0))12-345)6/3/782 9:-8;<;4.= J-3/ J-3/ #&' #% #% #%$ # G5)H/C8-)72)78)2I-,8/52& #% #$ # # &# G5)H/C8-)72)78)2I-,8/52' @5/AB/7CD J-3/ /,?8-6/2@5/AB/7CD #&' #% #$ # # '#E ()*+,-./,-0))12-345)6/3/782 9:-8;;4. @5/AB/7CD J-3/ #' /,?8-6/2@5/AB/7CD #&F #&' #% #$

More information

Vision Seeing is in the mind

Vision Seeing is in the mind 1 Vision Seeing is in the mind Stimulus: Light 2 Light Characteristics 1. Wavelength (hue) 2. Intensity (brightness) 3. Saturation (purity) 3 4 Hue (color): dimension of color determined by wavelength

More information

Realization of Visual Representation Task on a Humanoid Robot

Realization of Visual Representation Task on a Humanoid Robot Istanbul Technical University, Robot Intelligence Course Realization of Visual Representation Task on a Humanoid Robot Emeç Erçelik May 31, 2016 1 Introduction It is thought that human brain uses a distributed

More information

THE ENCODING OF PARTS AND WHOLES

THE ENCODING OF PARTS AND WHOLES THE ENCODING OF PARTS AND WHOLES IN THE VISUAL CORTICAL HIERARCHY JOHAN WAGEMANS LABORATORY OF EXPERIMENTAL PSYCHOLOGY UNIVERSITY OF LEUVEN, BELGIUM DIPARTIMENTO DI PSICOLOGIA, UNIVERSITÀ DI MILANO-BICOCCA,

More information

Spontaneous Cortical Activity Reveals Hallmarks of an Optimal Internal Model of the Environment. Berkes, Orban, Lengyel, Fiser.

Spontaneous Cortical Activity Reveals Hallmarks of an Optimal Internal Model of the Environment. Berkes, Orban, Lengyel, Fiser. Statistically optimal perception and learning: from behavior to neural representations. Fiser, Berkes, Orban & Lengyel Trends in Cognitive Sciences (2010) Spontaneous Cortical Activity Reveals Hallmarks

More information

Morton-Style Factorial Coding of Color in Primary Visual Cortex

Morton-Style Factorial Coding of Color in Primary Visual Cortex Morton-Style Factorial Coding of Color in Primary Visual Cortex Javier R. Movellan Institute for Neural Computation University of California San Diego La Jolla, CA 92093-0515 movellan@inc.ucsd.edu Thomas

More information

Neuroscience Tutorial

Neuroscience Tutorial Neuroscience Tutorial Brain Organization : cortex, basal ganglia, limbic lobe : thalamus, hypothal., pituitary gland : medulla oblongata, midbrain, pons, cerebellum Cortical Organization Cortical Organization

More information

Information and neural computations

Information and neural computations Information and neural computations Why quantify information? We may want to know which feature of a spike train is most informative about a particular stimulus feature. We may want to know which feature

More information

A Detailed Look at Scale and Translation Invariance in a Hierarchical Neural Model of Visual Object Recognition

A Detailed Look at Scale and Translation Invariance in a Hierarchical Neural Model of Visual Object Recognition @ MIT massachusetts institute of technology artificial intelligence laboratory A Detailed Look at Scale and Translation Invariance in a Hierarchical Neural Model of Visual Object Recognition Robert Schneider

More information

Development of novel algorithm by combining Wavelet based Enhanced Canny edge Detection and Adaptive Filtering Method for Human Emotion Recognition

Development of novel algorithm by combining Wavelet based Enhanced Canny edge Detection and Adaptive Filtering Method for Human Emotion Recognition International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 12, Issue 9 (September 2016), PP.67-72 Development of novel algorithm by combining

More information

Feasibility Study in Digital Screening of Inflammatory Breast Cancer Patients using Selfie Image

Feasibility Study in Digital Screening of Inflammatory Breast Cancer Patients using Selfie Image Feasibility Study in Digital Screening of Inflammatory Breast Cancer Patients using Selfie Image Reshma Rajan and Chang-hee Won CSNAP Lab, Temple University Technical Memo Abstract: Inflammatory breast

More information

arxiv: v1 [q-bio.nc] 12 Jun 2014

arxiv: v1 [q-bio.nc] 12 Jun 2014 1 arxiv:1406.3284v1 [q-bio.nc] 12 Jun 2014 Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition Charles F. Cadieu 1,, Ha Hong 1,2, Daniel L. K. Yamins 1,

More information

Extrastriate Visual Areas February 27, 2003 A. Roe

Extrastriate Visual Areas February 27, 2003 A. Roe Extrastriate Visual Areas February 27, 2003 A. Roe How many extrastriate areas are there? LOTS!!! Macaque monkey flattened cortex Why? How do we know this? Topography Functional properties Connections

More information

AUTOMATIC DIABETIC RETINOPATHY DETECTION USING GABOR FILTER WITH LOCAL ENTROPY THRESHOLDING

AUTOMATIC DIABETIC RETINOPATHY DETECTION USING GABOR FILTER WITH LOCAL ENTROPY THRESHOLDING AUTOMATIC DIABETIC RETINOPATHY DETECTION USING GABOR FILTER WITH LOCAL ENTROPY THRESHOLDING MAHABOOB.SHAIK, Research scholar, Dept of ECE, JJT University, Jhunjhunu, Rajasthan, India Abstract: The major

More information

Basics of Computational Neuroscience

Basics of Computational Neuroscience Basics of Computational Neuroscience 1 1) Introduction Lecture: Computational Neuroscience, The Basics A reminder: Contents 1) Brain, Maps,, Networks,, and The tough stuff: 2,3) Membrane Models 3,4) Spiking

More information

lateral organization: maps

lateral organization: maps lateral organization Lateral organization & computation cont d Why the organization? The level of abstraction? Keep similar features together for feedforward integration. Lateral computations to group

More information

Lecture overview. What hypothesis to test in the fly? Quantitative data collection Visual physiology conventions ( Methods )

Lecture overview. What hypothesis to test in the fly? Quantitative data collection Visual physiology conventions ( Methods ) Lecture overview What hypothesis to test in the fly? Quantitative data collection Visual physiology conventions ( Methods ) 1 Lecture overview What hypothesis to test in the fly? Quantitative data collection

More information

THE VISUAL WORLD! Visual (Electromagnetic) Stimulus

THE VISUAL WORLD! Visual (Electromagnetic) Stimulus THE VISUAL WORLD! Visual (Electromagnetic) Stimulus Perceived color of light is determined by 3 characteristics (properties of electromagnetic energy): 1. Hue: the spectrum (wavelength) of light (color)

More information

Self-Organization and Segmentation with Laterally Connected Spiking Neurons

Self-Organization and Segmentation with Laterally Connected Spiking Neurons Self-Organization and Segmentation with Laterally Connected Spiking Neurons Yoonsuck Choe Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 USA Risto Miikkulainen Department

More information

Perception & Attention

Perception & Attention Perception & Attention Perception is effortless but its underlying mechanisms are incredibly sophisticated. Biology of the visual system Representations in primary visual cortex and Hebbian learning Object

More information

SAPOG Edge Detection Technique GUI using MATLAB

SAPOG Edge Detection Technique GUI using MATLAB SAPOG Edge Detection Technique GUI using MATLAB Poonam Kumari 1, Sanjeev Kumar Gupta 2 Software Engineer, Devansh Softech Consultancy Services Pvt. Ltd., Agra, India 1 Director, Devansh Softech Consultancy

More information

Theoretical Neuroscience: The Binding Problem Jan Scholz, , University of Osnabrück

Theoretical Neuroscience: The Binding Problem Jan Scholz, , University of Osnabrück The Binding Problem This lecture is based on following articles: Adina L. Roskies: The Binding Problem; Neuron 1999 24: 7 Charles M. Gray: The Temporal Correlation Hypothesis of Visual Feature Integration:

More information

Senses are transducers. Change one form of energy into another Light, sound, pressure, etc. into What?

Senses are transducers. Change one form of energy into another Light, sound, pressure, etc. into What? 1 Vision 2 TRANSDUCTION Senses are transducers Change one form of energy into another Light, sound, pressure, etc. into What? Action potentials! Sensory codes Frequency code encodes information about intensity

More information

Shape Representation in V4: Investigating Position-Specific Tuning for Boundary Conformation with the Standard Model of Object Recognition

Shape Representation in V4: Investigating Position-Specific Tuning for Boundary Conformation with the Standard Model of Object Recognition massachusetts institute of technology computer science and artificial intelligence laboratory Shape Representation in V4: Investigating Position-Specific Tuning for Boundary Conformation with the Standard

More information

Contributions to Brain MRI Processing and Analysis

Contributions to Brain MRI Processing and Analysis Contributions to Brain MRI Processing and Analysis Dissertation presented to the Department of Computer Science and Artificial Intelligence By María Teresa García Sebastián PhD Advisor: Prof. Manuel Graña

More information

Supplementary materials for: Executive control processes underlying multi- item working memory

Supplementary materials for: Executive control processes underlying multi- item working memory Supplementary materials for: Executive control processes underlying multi- item working memory Antonio H. Lara & Jonathan D. Wallis Supplementary Figure 1 Supplementary Figure 1. Behavioral measures of

More information

EARLY STAGE DIAGNOSIS OF LUNG CANCER USING CT-SCAN IMAGES BASED ON CELLULAR LEARNING AUTOMATE

EARLY STAGE DIAGNOSIS OF LUNG CANCER USING CT-SCAN IMAGES BASED ON CELLULAR LEARNING AUTOMATE EARLY STAGE DIAGNOSIS OF LUNG CANCER USING CT-SCAN IMAGES BASED ON CELLULAR LEARNING AUTOMATE SAKTHI NEELA.P.K Department of M.E (Medical electronics) Sengunthar College of engineering Namakkal, Tamilnadu,

More information

Control of Selective Visual Attention: Modeling the "Where" Pathway

Control of Selective Visual Attention: Modeling the Where Pathway Control of Selective Visual Attention: Modeling the "Where" Pathway Ernst Niebur Computation and Neural Systems 139-74 California Institute of Technology Christof Koch Computation and Neural Systems 139-74

More information

Cognitive Neuroscience History of Neural Networks in Artificial Intelligence The concept of neural network in artificial intelligence

Cognitive Neuroscience History of Neural Networks in Artificial Intelligence The concept of neural network in artificial intelligence Cognitive Neuroscience History of Neural Networks in Artificial Intelligence The concept of neural network in artificial intelligence To understand the network paradigm also requires examining the history

More information

Symbolic Pointillism: Computer Art motivated by Human Perception

Symbolic Pointillism: Computer Art motivated by Human Perception Accepted for the Symposium Artificial Intelligence and Creativity in Arts and Science Symposium at the AISB 2003 Convention: Cognition in Machines and Animals. Symbolic Pointillism: Computer Art motivated

More information

Image Enhancement and Compression using Edge Detection Technique

Image Enhancement and Compression using Edge Detection Technique Image Enhancement and Compression using Edge Detection Technique Sanjana C.Shekar 1, D.J.Ravi 2 1M.Tech in Signal Processing, Dept. Of ECE, Vidyavardhaka College of Engineering, Mysuru 2Professor, Dept.

More information

VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING

VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING Yuming Fang, Zhou Wang 2, Weisi Lin School of Computer Engineering, Nanyang Technological University, Singapore 2 Department of

More information

COGS 101A: Sensation and Perception

COGS 101A: Sensation and Perception COGS 101A: Sensation and Perception 1 Virginia R. de Sa Department of Cognitive Science UCSD Lecture 5: LGN and V1: Magno and Parvo streams Chapter 3 Course Information 2 Class web page: http://cogsci.ucsd.edu/

More information