Volume 3, Issue 9, September 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Visual Attention Models and Saliency Map Geet kiran Kaur Harsimranjeet Kaur Yasmeen Kaur Computer Science & Engg,PTU Computer Science & Engg PTU Computer Science & Engg,PTU India India India Abstract: Saliency map is the core concept of various visual computational attention models. Visual attention models try to replicate the working of the human visual system. In this paper, we have classified various saliency detection methods under biological, computational and hybrid, and presented the basic idea of the algorithms proposed by various authors. These algorithms detect the salient regions in the image where human beings will focus the attention in free viewing space and depict it in the form of saliency map which is the grey scale image where white regions denote the salient part and the intensity denotes the salient coefficient. Keywords- Saliency Visual Computational Model, Saliency Detection, Computer Vision, Saliency Map I. INTRODUCTION Selective attention is what the evolution has favored for the human need to deal with high amount of sensory data every moment. The Data is in range of 10 8 bits per second in general to high to be processed completely at one go and, so the possible action at one and the same time that is parallel processing is restricted, so the brain has to prioritize, Many modern technical system also faces the same problem.systems based on Computer Vision has to deal with millions of pixels values for each frame and in many problems the computational complexity of the interpretation is very high. This is especially difficult in case it has to operate in real time.cognitive Systems and robotics is two areas, where the real time performance is critical. Different features like color, intensities and orientation are computed in parallel to detect feature dependent saliencies. Visual input Many megabytes per Second S I E N L F E O C R T M E A D T I O N Attention Bottle Neck 40 bits per Second Visual Cognition Top Down selection: Goal Oriented Bottom up selection: Stimulus driven Fig 1: Visual Attention Model Every Stage Director manipulate his audience by exploiting the concepts of human selective attention.a person in the dark is illuminated by a Sudden Spotlight, A voice from the audience side by a character hidden there, not only keeps our interest alive but also guide our gaze, telling us where the current action takes place.the brain has a mechanism that determine which part of the multitude of the sensory data is currently of most interest and that is called selective attention. Visual attention is sometimes compared with the spotlight in the dark room.as one can obtain the impression of the content of the room by moving a spotlight around analogously one can obtain a detailed impression of the room, by scanning a scene with quick eye movements. 2013, IJARCSSE All Rights Reserved Page 1192
II. SALIENCY When we look in the night sky the moon appears Salient that is inspite of many other stars in the sky our attention goes to moon first, likewise red rose amidst other flowers is salient. This is due to the fact that the amount of information that our optic nerve receives something in order of 10 8 bits per second far exceeds our brain processing capacity. So our brain processes selective information from the environment.the information that is registered by our visual system that s affects our behavior appears as salient. Biased-competition hypothesis, suggests that a stimuli in the visual field activate Populations of neurons that engage in competitive interactions, most likely at the intracortical level of the brain. When observers attend to visual stimulation at a given location, such competition is biased in favour of the neurons that have encoding information at those attended area. Therefore, neurons with receptive fields at those attended locations either remain active or become more active, while others are suppressed.while judging a region in a picture as salient or not, three properties are to be considered to determine whether they are salient or not, firstly salient regions are rare or in other words sparse, salient regions are different from other,they occur rarely,example blue roses in the bunch of red roses is salient Secondly Salient regions should be repeatable, that means if the same picture is clicked 50 times from different angles,same regions in the image appears salient each time, Also Salient regions should be closed, rather than distributed that is localizable. Fig 2:a)Original Image b)saliency map III. SALIENCY DETECTION METHODS Saliency estimation methods can broadly be classified as biologically based, purely computational, or a combination i.e. Hybrid. The goal of any method, in general, is to detect properties of contrast, rarity, or unpredictability in images, of a central region with its surroundings, either locally or globally, using one or more low-level features like color, intensity, and orientation. Biological model based methods attempt to mimic known models of the visual system for detecting saliency. Purely computational methods predominantly rely on principles of, spectral domain processing, information theory or signal processing for saliency detection. Some of these algorithms detect saliency over multiple scales, while others operate on a single scale. In some cases individual feature maps are created separately and then combined to obtain the final saliency map, while in some others, a combined-feature saliency map is computed. Saliency Detection Methods Biologically inspired Saliency model Purely Computational Model Hybrid approach Fig.3: Classification of Computational Models IV. BIOLOGICAL INSPIRD SALIENCY MODEL Biological model based methods attempt to mimic known models of the visual system for detecting saliency. Here we try to give overview of some of the Biological inspired saliency model. 2013, IJARCSSE All Rights Reserved Page 1193
a) A model of saliency-based visual attention for rapid scene analysis: The main idea is that several features are computed in parallel and then their values are fuse in a representation which is called a saliency map. The models generally include the following steps: First, the computation of one or several image pyramids from the input image takes place to enable the computation of features at different scales. Then, image features are computation takes place. Commonly used features include intensity, color, and orientation, depth. Every feature channel is subdivided into several feature types (for example, r, g, b maps for color). Differences of Gaussians or center surround mechanisms are used to collect within map contrast into feature maps. The feature maps are summed up to forms feature dependent maps called conspicuity maps. Finally, the conspicuity maps are normalized, weighted and combined together to form the saliency map. The saliency map is usually visualized as gray-scale image in which the brightness of a pixel is proportional to its saliency. b) Shifts in selective visual attention: This paper presented a bottom up saliency model that was an answer to how the neuron like element in the brain can account for the visual attention. Itt et al model an implementation of it, it proposed firstly by various features like color,orientation,direction of movement,disparity etc are computed in parallel to form early representation, secondly there exists a selective mapping from this early representation to a more central non topographical representation such that at any instant the central representation contain the properties of the single region called the selected Region and thirdly some selection rules determines which location is to mapped to the central location, using the conspuity map of location in early representation winner takes all network is used to implement the major rule. c) Modeling attention to salient proto-objects: this paper is based on ittis model of Visual attention which is again is a implementation of the saliency map-based model of bottom-up attention by Koch and Ullman (1985 ).its extensions is that, a mechanism for extracting an image region around the focus of attention (FOA) that corresponds to the approximate extent of a proto-object at that location is designed.in order to estimate the protoobject region based on the map and saliency location computed so far it introduces a feedback connection in the saliency hierarchy.it looks for the map in the conspicuity map that is the main contributor to the winning location and activation is spread to the protoobject at that location.it used neural networks of Linear threshold units (LPU) for this purpose d) The dynamic representation of scenes: This paper proposed an coherent theory of attention which asserts that unattended structures are volatile, and that focused attention is needed to stabilize them sufficiently to allow the perception of change, it suggests that the dynamic representation underlies our perception of scenes. Vision make use of the Virtual representation is other component of this assertion, it is a dynamic form of representation in which whenever representation is required, attention provides detailed, coherent descriptions of objects. Fig 4: Itti s Model of Visual attention e) Unsupervised extraction of visual attention objects in color images: This paper attempt to develop a generic framework to automatically extract viewer s attention objects based on human visual attention mechanisms and full semantic understanding of the image is not required. It is realized in a two-stage process. The first stage is to generate the saliency map by Itti s model [5], which encodes the attention value at every location in the image. In the second stage, only a few attention seeds are first selected according to the saliency map. Then, a Markov 2013, IJARCSSE All Rights Reserved Page 1194
random field (MRF) model integrating the attention value and the low-level features is employed to sequentially grow the attention objects starting from those selected attention seeds. V. COMPUTATIONAL ATTENTION MODEL a) Saliency based on information maximization: In this paper saliency is determined by quantifying the self information for each of the local image patch. Even for a small image patch, the probability distribution resides in high dimension space. There is insufficient data in a single image to produce reasonable estimate of the probability distribution for this reasons a representation based on independent component is employed for the independence assumption it can afford ICA is performed on the a large sample of 7X7 RGB patch drawn from the natural images to determine a suitable basis, for a given image,an estimate of the distribution of each coefficient is learned across the entire image through non parametric density estimation.the probability of observing an RGB image corresponding to a patch centered at any image location may be evaluated by independently considering the likelihood of the each corresponding basis coefficient,the product of such likelihood yields the joint likelihood of the entire basis coefficient. b) Top-down control of visual attention in object detection: This paper defines saliency in terms of likelihood of finding the local features in the image.it suggests that saliency of the object is large, when its features are not expected at that location, it also proposes probabilistic definition of saliency fits more naturally with object recognition and detection.probability is approximated by fitting Gaussian to the distribution of local features in the image. c) Salient region detection and segmentation: In our work, saliency is determined as the local contrast of an image region with respect to its neighborhood at various scales. This is evaluated as the distance between the average feature vectors of the pixels of an image sub-region with the average feature vector of the pixels of its neighborhood. This allows obtaining a combined feature map at a given scale by using feature vectors for each pixel, instead of combining separate saliency maps for scalar values of each feature. At a given scale, the contrast based saliency value c (i, j) for a pixel at position (i, j) in the image is determined as the distance D between the average vectors of pixel feature of the inner region and outer region. Fig 5: Computational Saliency Detection Model VI. HYBRID APPROACH a) Graph-based visual Saliency: this paper has proposed an approach, exploiting the computational power, topographical structure, and parallel nature of graph algorithms to achieve natural and efficient saliency computations. It defines Markov chains over various image maps, and treats the equilibrium distribution over map locations as activation and saliency values here, it take a unified approach to activation and normalization of saliency computation, by using dissimilarity and saliency to define edge weights on graphs which are interpreted as Markov chains. 2013, IJARCSSE All Rights Reserved Page 1195
b) Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search: This paper demonstrates the mandatory role of the scene context for search tasks in real world images it proposes a computational instantiation of a Bayesian model of attention. Attentional mechanisms driven by image saliency and contextual guidance emerge as a natural consequence of the probabilistic framework, providing an integrated and formal scheme into which local and global features can be combined automatically to guide subsequent object detection and recognition. Our approach suggests that a robust holistic representation of scene context can be computed from the same ensemble of low-level features used to construct other low-level image representations (e.g., junctions, surfaces) and can be integrated with saliency computation early enough to guide the deployment of attention and first eye movements toward likely locations of target objects. From an algorithmic point of view, early contextual control of the focus of attention is important as it avoids expending computational resources in analyzing spatial locations with low probability of containing the target on the basis of prior experience. In the contextual guidance model, task-related information modulates the selection of the image regions that are relevant. it demonstrated the effectiveness of the contextual guidance model or Predicting the locations of the first few fixations in three different search tasks, performed on various types of scene categories (urban environments, variety of rooms) and for various object size conditions. c) SUN: A Bayesian framework for saliency using natural statistics: The author in this paper, propose a definition of saliency by considering what the visual system is trying to optimize when directing attention. This paper suggest a Bayesian framework from which bottom-up saliency emerges naturally as the self-information of visual features, and overall saliency (incorporating top-down information with bottom-up saliency) emerges as the point wise mutual information between the features and the target when searching for a target. VII. CONCLUSION Saliency detection is an emerging field in the field of image processing; it is of great interest to the scholar as it can be used as a preprocessing step in various other applications to reduce cost and time. In this paper we have tried to explain some of the saliency detection algorithms which detect the salient part of the image in the form of grayscale image in which the intensity of the whiteness depicts the salient quotient.. REFERENCES [1]. L. Itti, C. Koch, and E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254 1259, November 1998. [2]. C. Koch and S. Ullman, Shifts in selective visual attention: Towards the underlying neural ircuitry, Human Neurobiology, vol. 4, no. 4, pp. 219 227, 1985. [3]. D.Walther and C. Koch, Modeling attention to salient proto-objects, Neural Networks, vol. 19, no. 9, pp. 1395 1407, August 2006. [4]. R. A. Rensink, The dynamic representation of scenes, Visual Cognition, 2000. [5]. J. Han, K. N. Ngan, M. Li, and H.-J. Zhang, Unsupervised extraction of visual attention objects in color images, IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no. 1, pp. 141 145, January 2006. [6]. N. Bruce and J. Tsotsos, Saliency based on information maximization, Advances in Neural Information Processing Systems, vol. 18, pp. 155 162, 2006. [7]. A. Oliva, A. Torralba, M. Castelhano, and J. Henderson, Top-down control of visual attention in object detection, International Conference on ImageProcessing, pp. 253 256, 2003. [8]. J. Harel, C. Koch, and P. Perona. Graph-based visual saliency. Advances in neural information processing systems, 19:545, 2007 [9]. Torralba, A. Oliva, M. S. Castelhano, and J. M. Henderson, Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search, Psychological Review, vol. 113, pp. 766 786, 2006. [10]. L. Zhang, M. H. Tong, T. K. Marks, H. Shan, and G. W. Cottrell, SUN: A Bayesian framework for saliency using natural statistics, Journal of Vision, vol. 8, no. 7, pp. 1 20, 12 2008. 2013, IJARCSSE All Rights Reserved Page 1196