Methods for comparing scanpaths and saliency maps: strengths and weaknesses

Size: px
Start display at page:

Download "Methods for comparing scanpaths and saliency maps: strengths and weaknesses"

Transcription

1 Behav Res (2013) 45: DOI /s Methods for comparing scanpaths and saliency maps: strengths and weaknesses Olivier Le Meur Thierry Baccino Published online: 7 July 2012 # Psychonomic Society, Inc Abstract In this article, we are interested in the computational modeling of visual attention. We report methods commonly used to assess the performance of these kinds of models. We survey the strengths and weaknesses of common assessment methods based on diachronic eye-tracking data. We then illustrate the use of some methods to benchmark computational models of visual attention. Keywords Saliency. Visual attention. Eye movement Introduction Eye tracking is a well-known technique for analyzing visual perception and attention shift and assessing user interfaces. However, until now, data analysis from eye-tracking studies has focused on synchronic indicators such as fixation (duration, number, etc.) or saccade (amplitude, velocity, etc.), rather than diachronic indicators (scanpaths or saliency maps). Synchronic means that an event occurs at a specific point in time, while diachronic means that this event is taken into account over time. We focus in this article on diachronic measures and O. Le Meur (*) Université de Rennes 1. IRISA, Campus universitaire de Beaulieu, Rennes, France olemeur@irisa.fr T. Baccino Thierry BACCINO, Université de Paris VIII, Laboratoire Cognitions Humaine et Artificielle CHArt/LUTIN, 2, rue de la Liberté, St Denis Cedex 02, France thierry.baccino@univ-paris8.fr review different ways of analyzing sequences of fixations represented as scanpaths or saliency maps. Visual scanpaths depend on bottom-up and top-down factors such as the task users are asked to perform (Simola, Salojärvi, Kojo, 2008), the nature of the stimuli (Yarbus, 1967), and the intrinsic variability of subjects (Viviani, 1990). Being able to measure the difference (or similarity) between two visual behaviors is fundamental both for differentiating the impact of different factors and for understanding what governs our cognitive processes. It also plays a key role in assessing the performance of computational models on overt visual attention, by, for example, evaluating how well saliency-based models predict where observers look. In this study, we survey common methods for evaluating the difference/similarity between scanpaths and between saliency maps. In the first section, we describe state-of-the-art methods commonly used to compare visual scanpaths. We then consider the comparison methods that involve either two saliency maps or one saliency map plus a set of visual fixations. We first define how human saliency maps are computed and list some of their most important properties. The strengths and weaknesses of each method are emphasized. In the fourth section, we address inter observer variability, which reflects the natural dispersion of fixations existing between observers watching the same stimuli. It is important to understand the mechanisms underlying this phenomenon, since this dispersion can be used as an upper bound for a prediction. To illustrate the latter point and the use of common similarity metrics, we compare ground truth and model-predicted saliency maps. Finally, we broaden the scope of this article by raising a fundamental question: Do all visual fixations have the same meaning and role, and is it possible to classify fixations as being bottom-up, cognitive, top-down, or semantic?

2 252 Behav Res (2013) 45: Methods for comparing scanpaths Different metrics are available for comparing two scanpaths, using either distance-based methods (string edit technique or Mannan distance) or vector-based methods. Distance-based methods compare scanpaths only from their spatial characteristics, while vector-based approaches perform the comparison across different dimensions (frequency, time, etc.). These metrics are more or less complex and relevant depending on the situation to be analyzed. However, there is no consensus in the community on the use of a given metric. In this section, we present three metrics: the string edit metric, Mannan s metric, and a vector-based metric. String edit metric The idea of the string edit metric is that a sequence of fixations on different areas of interest (AOIs) can be translated into a sequence of symbols (numbers or letters) forming strings that are compared. This comparison is carried out by calculating a string edit distance (often called the Levenshtein distance) thatgivesameasureof the similarity of the strings (Levenshtein, 1966). This technique was originally developed to account for the edit distance between two words, and the measured distance is the number of deletions, insertions, or substitutions that are necessary for the two words to be identical (which is also called the alignment procedure). This metric takes as input two strings (coding AOIs) and computes the minimum number of edits needed to transform one string into the other. A cost is associated with each transformation and each character. The goal is to find a series of transformations that minimizes the cost of aligning one sequence with the other. The total cost is the edit distance between the two strings. When the cost is minimal, the similarity between the two strings is maximal (i.e., when the two strings are identical, the distance is equal to 0). Conversely, the distance increases with the cost and, therefore, with the dissimilarity between the two strings. Figure 1 illustrates the method. The Levenshtein distance is the most common way to compare scanpaths (Josephson Holmes, Fig. 1 Computation of a string edit distance to align the sequences ABCDE and ABAA recorded on a Web page. First, areas of interest are segmented and coded by letters (A, B, C...). Second, the substitution operations are carried out. The total cost is equal to 3 (the minimum number of substitutions) and normalized to the length of the longer string here, 5 yielding an edit distance between the two strings of d 0 (1 3/5) 0 0.4

3 Behav Res (2013) 45: ; Privitera Stark, 2000) and has been widely used for assessing the usability of Web pages (Baccino, 2004). The string edit distance can be computed using a dynamic programming technique (the WagFish algorithm (Wagner Fischer, 1974)) that incrementally computes optimal alignments (minimizing the cost). The Levenshtein distance is not the only string edit distance that can be used for scanpaths. Others are described below: LCS is the length of the longest common subsequence, which represents the score obtained by allowing only addition and deletion, not substitution; Damerau-Levenshtein distance allows addition, deletion, substitution, and the transposition of two adjacent characters; and Hamming distance allows only substitution (and hence, applies only to strings of the same length). The advantage of the string edit technique is that it is easily computed and keeps the order of fixations. It is also possible to compare observed scanpaths with predicted scanpaths when certain visual profiles are expected from the cognitive model used by the researcher (Chanceaux, Guérin-Dugué, Lemaire, Baccino, 2009). However, several drawbacks have to be underlined: Since the string edit is based on a comparison of the sequence of fixations occurring in predefined AOIs, the question is how to define these AOIs. There are two ways: automatically gridded AOIs or content-based AOIs. The former is built by putting a grid of equally sized areas across the visual material, but for the latter, the meaningful regions of the stimulus need to be subjectively chosen. Whatever AOIs are constructed, the string edit method means that only the quantized spatial position of the visual fixations are taken into account. Hence, some small differences in scanpaths may change the string, while others produce the same string. The string edit method is limited when certain AOIs have not been fixated, so there is a good deal of missing data. Mannan s metric The Mannan distance (Mannan, Ruddock, Wooding, 1995, 1996, 1997) is another metric comparing scanpaths, but on the basis of their spatial properties rather than their temporal dimensions, in the sense that the order of fixations is completely ignored. The Mannan distance compares the similarity between scanpaths by calculating the distance between each fixation in one scanpath and its nearest neighbor in the other scanpath. A similarity index (Is) represents the average linear distance between two scanpaths (D), with randomized scanpaths having the same size (Dr). These randomly generated scanpaths are used for weighting the sequence of real fixations, taking into account the fact that real scanpaths may convey a randomized component. The similarity index (Is) is given by Is ¼ 1 D 100 Dr D is a measure of distance given by D 2 ¼ n 1 where, P n2 P j ¼ 1 d2 2j 2n 1 n 2 ða 2 þ b 2 Þ þ n n1 2 i ¼ 1 d2 1i 2n 1 n 2 ða 2 þ b 2 Þ n 1 and n 2 are the number of fixations in the two traces; d 1i is the distance between the ith fixation in the first trace and its nearest neighbor in the second trace; d 2j is the distance between the jth fixation in the second trace and that of its nearest neighbor in the second one; a and b are the side lengths of the image; and Dr is the distance between two sets of random locations. The values returned by the algorithm (Is) range from 0 (random scanpath) to 100 (identity). The main drawbacks of this technique are the following: The Mannan distance does not take into account the temporal order of fixation sequence. This means that two sequences of fixations having a reversed order but with an identical spatial configuration give the same Mannan distance. A difficult problem occurs when the two scanpaths have very different size (the number of fixations between them is very different). The Mannan distance may show a great similarity while the shapes of the scanpaths are definitely different. The Mannan distance is not tolerant to high variability between scanpaths. Vector-based metric An interesting method was recently proposed by Jarodzka, Holmqvist, and Nystr (2010). Each scanpath is viewed as a sequence of geometric vectors that correspond to subsequent saccades of the scanpath. The vector representation shows the length and the direction of each saccade. A saccade is defined by a starting position (fixation n) and ending position (fixation n + 1). Then a scanpath with n fixations is represented by a set of n 1 vectors, and several properties can therefore be preserved, such as the shape of the scanpath, the scanpath length, and the position and duration of fixations. The sequences that have to be compared are aligned according to their shapes (although this alignment can be performed on other dimensions: length, durations, angle, etc.). Each vector of one scanpath corresponds to one or more vectors of another scanpath, such that the path in the matrix of similarity between the vectors going from (1, 1)

4 254 Behav Res (2013) 45: (similarity between the first vectors) to (n, m) (similarity between the last vectors) is the shortest one. Once the scanpaths are aligned, various measures of similarity between vectors (or sequences of vectors) can be used, such as average difference in amplitude, average distance between fixations, and average difference in duration. For example, Fig. 2 shows two scanpaths A and B (the first saccade is going upward). The alignment procedure attempts to match the five vectors (for the five consecutive saccades) of the subject scanpath with the four vectors of the model scanpath. Saccades 1 and 2 of scanpath A are aligned with saccade 1 of scanpath B, saccade 3A is aligned with saccade 2B, etc. Once the scanpaths are aligned, similarity measures are computed for each alignment. Jarodzka et al. s (2010) procedure ends up with five measures of similarity (difference in shape, amplitude, and direction between saccade vectors, distance between fixations, and fixation durations). This vector-based alignment procedure has a number of advantages over the string edit method. The first is that it does not need to determine predefined AOIs (and is, therefore, not dependent on a quantization of space). The second one is that it can align scanpaths not only on the spatial dimension, but also on any dimension available in saccade vectors (angle, duration, length, etc.). For example, Lemaire, Guérin-Dugué, Baccino, Chanceaux, and Pasqualotti (2011) used the spatial distance between saccades, the angle between saccades, and the difference of amplitude to realize the alignment. Third, this alignment technique provides more detailed information on the type of (dis)similarity of two scanpaths according to the dimensions chosen. Lastly, the new measure deals with temporal issues, not only fixation durations, but it also successfully deals with shifts in time and variable scanpath lengths. The major drawbacks are the following: This measure compares only two scanpaths. Sometimes, the overall aim is to compare whole groups of subjects with each other. It is presumed that fixations and saccades must occur. Other eye movements such as smooth pursuit are not handled. Smooth pursuit movements are important when a video is watched. However, the problem may be solved if it is possible to represent smooth pursuit as a series of short vectors that are not clustered into one long vector. This alignment procedure is independent of the stimulus content. However, the chosen dimensions may be weighted by some semantic values carefully selected by the researcher. Methods for comparing saliency maps Comparing two scanpaths requires taking a number of factors, such as the temporal dimension or the alignment procedure, into account. To overcome these problems, another kind of method can be used. In this section, we focus on approaches involving two bidimensional maps. We first briefly describe how the visual fixations are used to compute a continuous saliency map. Second, we describe three common methods used to evaluate the degree of similarity between two saliency maps: a correlation-based measure, the Kullback Leibler divergence, and receiver operating characteristic (ROC) analysis. From a discrete fixation map to a continuous saliency map A discrete fixation map f i for the ith observer is classically defined as Fig. 2 Alignment using saccadic vectors. The alignment procedure attempts to match the five vectors of the two scanpaths. The best match is the following: 1 2/1; 3/2; 4/3; 5/4 5 f i ðxþ ¼ XM k ¼ 1 d x x f ðkþ where x is a vector representing the spatial coordinates and x f (k) is the spatial coordinates of the kth visual fixation. The value M is the number of visual fixations for the ith

5 Behav Res (2013) 45: observer. dðþ is the Kronecker symbol (δ(t) 0 1ift0 1; otherwise, δ(t) 0 0). For N observers, the final fixation map f is given by f ðxþ ¼ 1 N X N i ¼ 1 f i ðxþ: A saliency map S is then deduced by convolving the fixation map f by an isotropic bidimensional Gaussian function SðxÞ ¼f ðxþ*g σ ðxþ where σ is the standard deviation of the Gaussian function. It is commonly accepted that σ should be set to 1 of visual angle. One degree of visual angle represents an estimate of the size of the fovea. The standard deviation depends on the experimental setup (size of the screen and viewing distance). It is also implicitly assumed that a fixation can be approximated by a Gaussian distribution (Bruce Tsotsos, 2006; Velichkovsky, Pomplum, Rieser, Ritter, 1996). An example of fixation and saliency maps is given by Fig. 3. A heat map, which is a simple colored representation of the continuous saliency map, is also shown. Red areas pertain to salient areas, whereas blue areas are for nonsalient areas. Note that the fixation map illustrated by Fig. 3 is not exactly the one defined by the previous formula. Throughout this section, we use the two continuous saliency maps shown in Fig. 4 to illustrate the comparison methods. Both maps are obtained from visual fixations of 3 observers. Fixation duration is not taken into account in the computation of the continuous saliency map. Itti (2005) showed that there was no significant correlation between modelpredicted saliency and duration of fixation. Fixation duration is often considered to reflect the depth and the speed of visual processing in the brain. The longer the fixation duration, the deeper the visual processing (Henderson, 2007; Just Carpenter, 1976). Total fixation time (the cumulative duration of fixations within a region) can be used to gauge the amount of total cognitive processing engaged with the fixated information (Rayner, 1998). There are a number of factors that influence the duration of fixations. Among these factors, the visual quality of the displayed stimuli plays an important role, as suggested by Mannan et al. s (1995) experiment. They presented filtered and unfiltered photos to observers and reported a significant increase in the fixation duration for the filtered photos. Another factor is related to the number of objects present in the scene. Irwin and Zelinsky (2002) observed that the duration of fixations increases with the number of objects in the scene. Correlation-based measures The Pearson correlation coefficient r between two maps H and P is defined as cov H; P r H;P ¼ ð Þ σ H σ P where cov (H, P) is the covariance between H and P and σ H and σ P represent the standard deviation of maps H and P, respectively. The linear correlation coefficient has a value between 1 and 1. A value of 0 indicates that there is no linear correlation between the two maps. Values close to 0 indicate a poor correlation between the two sets. Avalue of 1 indicates a perfect correlation. The sign of r is helpful in determining whether data share the same structure. A value of 1alsoindicatesaperfect correlation, but the data vary together in opposite directions. This indicator is very simple to compute and is invariant to linear transformation. Several studies have used this metric to assess the performance of computational models of visual attention (Jost, Ouerhani, von Wartburg, Mauri, Haugli, 2005; Le Meur, Le Callet, Barba, Thoreau, 2006; Rajashekar, van der Linde, Bovik, Cormack, 2008). Correlations are usually reported with degrees of freedom (the total population minus 2) in parentheses and the significance level. For instance, the two continuous saliency maps illustrated by Fig. 4 are strongly correlated, r ( ) 0.826, p <.001. Note that the Spearman's rank correlation can also be used to measure the similarity between two sets of data (Toet, 2011). The Kullback Leibler divergence The Kullback Leibler (KL) divergence is used to estimate the overall dissimilarity between two probability density (a) (b) (c) (d) Fig. 3 From left to right: a original, b fixation map with red fixation points, c heat map (red spots represent the most visually salient areas of the picture), and d saliency map

6 256 Behav Res (2013) 45: Heat map Saliency map Heat map Saliency map (a) (b) (c) (d) Fig. 4 Heat maps and continuous saliency maps obtained from fixations of two groups of 3 observers functions. Let us define two discrete distributions R and P with probability density functions r k and p k.thekldivergence between R and P is given by the relative entropy of P with respect to R: KLðR; PÞ ¼ X p k klog r k : p k The KL-divergence is defined only if r k and p k both sum to 1 and if r k > 0 for any k such that p k >0. The KL-divergence is not a distance, since it is not symmetric and does not satisfy the triangle inequality. The KL-divergence is nonlinear. It varies in the range of zero to infinity. A zero value indicates that the two probability density functions are strictly equal. The fact that the KLdivergence does not have a well-defined upper bound is a strong drawback. In our context, we have to compare two bidimensional saliency maps (H and P). We first transform these maps into two bidimensional probability density functions by dividing each location of the map by the sum of all pixel values. The probability that an observer focuses on position x is given by: p H ðxþ ¼P p P ðxþ ¼P HðxÞ þ2 ðhðiþþ 2Þ i PðxÞ þ2 ðpðiþ þ2þ i where is a small constant to avoid division by zero. If we consider the example of Fig. 4, we compute the KLdivergence by first considering the saliency map (b) as the reference and, second, the saliency map (d) as the reference. We obtain KL and KL , respectively. Since the KL-divergence is not a distance, the results are not the same. They indicate that the overall dissimilarity is highest when the continuous saliency map (d) is taken as the reference. Receiver operating characteristic analysis The ROC analysis (Green Swets, 1966) is probably the most popular and most widely used method in the community for assessing the degree of similarity of two saliency maps. ROC analysis classically involves two sets of data: The first is from the ground truth (also called the actual values), and the second is the prediction (also called the outcomes). Here, we perform ROC analysis between two maps. It is also common to encounter a second method in the literature that involves fixation points and a saliency map. This method is described in the Hybrid Methods section. Continuous saliency maps are processed as a binary classifier applied on every pixel. It means that the image pixels of the ground truth, as well as those of the prediction, are classified as fixated (or salient) or as not fixated (or not salient). A simple threshold operation is used for this purpose. However, two different processes are used depending on whether the ground truth or the prediction is considered: Thresholding the ground truth: The continuous saliency map is thresholded with a constant threshold in order to keep a given percentage of image pixels. For instance, we can keep the top 2 %, 5 %, 10 %, or 20 % salient pixels of the map, as illustrated by Fig. 5. This threshold is called TG x (G for the ground truth and x indicating the percentage of image considered as being fixated). Thresholding the prediction: The threshold is systematically moved between the minimum and the maximum values of the map. A high-threshold value corresponds to an overdetection, whereas a smaller threshold affects the most salient areas of the map. This threshold is called TP x (P for the prediction and x indicating the ith threshold). For each pair of thresholds, four numbers featuring the quality of the classification are computed. They represent the true positives (TPs), the false positives (FPs), the false negatives (FNs), and the true negatives (TNs). The true positive number is the number of fixated pixels in the ground truth that are also labeled as fixated in the prediction. Figure 5 gives an illustration of the thresholding operation on the parrot picture (Fig. 3). The first continuous saliency map (b) of Fig. 4 is thresholded to keep 20 % of the image (TG 20 ) and is compared with the second continuous saliency map (d) of Fig. 4. The classification result is illustrated by Fig. 6. The red and uncolored areas represent pixels having the same label that is, a good classification (TP). The green areas represent the pixels that are fixated

7 Behav Res (2013) 45: (2%) (5%) (10%) (20%) Fig. 5 Thresholded saliency maps to keep the top percentage of salient areas. From left to right: 2 %, 5 %, 10 %, and 20 % but are labeled as nonfixated locations (FN). The blue areas represent the pixels that are nonfixated but are labeled as fixated locations (FP). A confusion matrix is often used to visualize the algorithm s performance (see Fig. 7c). An ROC curve that plots the FP rate (FPR) as a function of the TP rate (TPR) is usually used to display the classification result for the set of thresholds used. The TPR, also called sensitivity or recall, is defined as TPR 0 TP/(TP + FN), whereas the FPR is given by FPR 0 FP/(TP + FN). The ROC area, or the area under curve (AUC), provides a measure indicating the overall performance of the classification. A value of 1 indicates a perfect classification. The chance level is.5. The ROC curve of Fig. 6 is given in Fig. 7. There are different methods to compute the AUC. The simplest ones are based on the left and right Riemann sums. The left Riemann sum is illustrated by Fig. 7. A more efficient approximation can be obtained by a trapezoid approximation: Rather than computing the area of rectangles, the AUC is given by summing the area of trapezoids. In our example, the AUC value is Hybrid methods So far, we have focused on similarity metrics involving two scanpaths or two saliency maps. In this section, we describe methods based on a saliency map and a set of fixation points. We call this kind of method hybrid since it mixes two types of information. Four approaches are presented: ROC analysis, normalized scanpath saliency, percentile, and the Kullback Leibler divergence. Receiver operating characteristic analysis The ROC analysis is performed here between a continuous saliency map and a set of fixations. The method tests how the saliency at the points of human fixation compares with the saliency at nonfixated points. As in the previous section, the continuous saliency map is thresholded to keep a given percentage of pixels of the map. Each pixel is then labeled as either fixated or not-fixated. For each threshold, the observer s fixations are laid down on the thresholded map. The TPs (fixations that fall on fixated areas) and the FNs (fixations that fall on nonfixated areas) are determined (as illustrated by Fig. 8). A curve that shows the TPR (or hit rate) as a function of the threshold can be plotted. Note that the percentage of the image considered to be salient is in the range of 0 % 100 %. If the fixated and nonfixated locations cannot be discriminated, the AUC will be 0.5. This first analysis method is used in studies such as Tatler, Baddeley, and Gilchrist (2005) and Torralba,Oliva,Castelhano, andhenderson(2006). Although interesting, this method is not sensitive to the false alarm rate. To deal with the previous limitation, a set of control points, corresponding to nonfixated points, can be generated. Two methods commonly encountered in the literature are discussed. The first method is the simplest one and consists in selecting control points from either a uniform or a random distribution. This solution does not take into account the fact that fixations are distributed neither evenly nor randomly throughout the scene, as illustrated by Fig. 9. The second method, proposed by Einhauser and Konig (2003) and Tatler et al. (2005), defines control points by choosing locations randomly from a distribution of all fixation locations that occurred at the same time, but on other images. This way to define the control point is important for different reasons. First, since the fixations come from the same observer, so the same bias, systematic tendency, or peculiarity of the observer are taken into account. This is illustrated by Fig. 10. These factors then have a limited influence on the classification results. Among them, the most important influence is the central bias illustrated in 20% of ground truth Prediction Classification Fig. 6 Classification result (on the right) when a 20 % thresholded ground truth (left picture) and a prediction (middle picture) are considered. Red areas are true positives, green areas are false negatives, and blue areas are false positives. Other areas are true negatives

8 258 Behav Res (2013) 45: Fig. 7 Pseudo-code to perform an ROC analysis between two maps (a), ROC curve (b), and the confusion matrix (c). The AUC is approximated here by a left Riemann sum as illustrated in panel b Fig. 9. A number of factors can explain this central tendency. The center might reflect an advantageous viewing position for extracting visual information (Renninger, Verghese, Coughlan, 2007; Tatler, 2007). However, Tatler noted that the distribution of low-level visual features over the picture has no significant impact on this bias. In addition, this tendency to look at the center of images is particularly difficult to remove. Different strategies were tried by Bindemann (2010) to reduce this bias, but without success. This author concluded that the screen-based central fixation bias might be an inescapable feature of scene viewing under laboratory conditions. Second, the set of control points has to stem from the same time interval as the set that is analyzed. Indeed, bottom-up and top-down influences are not similar over time. For instance, bottom-up influences are maximal just after the stimulus onset. Top-down influences tend to increase with viewing time, leading to a stronger dispersion between observers (see the Measuring a Realistic Upper Bound section). Although the second method is more robust than the first one, the method has a serious flaw. It underestimates the salience of areas which are more or less centred in the image. In a similar fashion to the method in the Receiver Operating Characteristic Analysis section, the control points and the fixation points are then used to plot an ROC curve. For each threshold, the observer s fixations and the control ones are laid down on the thresholded map. The TPR (fixations that fall on fixated areas) and the FPR are determined. From this ROC curve, the AUC is computed. The confidence interval is computed by using a nonparametric bootstrap technique (Efron Tibshirani, 1993). Many samples having the same size as the original set of human fixations are generated by sampling with replacement. These samples are called bootstrap samples. In general, 1,000 bootstrap samples are created. Each bootstrap sample is used as a set of control fixations. The ROC area between the continuous saliency map and the points of human fixation plus the control points is computed. The bootstrap distribution of each ROC analysis is computed, and a bootstrap percentile confidence interval is determined by percentiles of the bootstrap distribution, leaving off a % of each tail of the distribution where α is the confidence level. Sometimes, the quality of the classification relies on the equal error rate (EER). The EER is the location on an ROC curve where the FPR and the TPR are equal (i.e., the error at Fig. 8 Example of ROC analysis. Red and green dots are the fixations of 2 observers for the parrots image. These dots are drawn on a thresholded saliency map. On the left-hand side, the hit rate is 100 %, whereas the rate is 50 % for the example on the right-hand side

9 Behav Res (2013) 45: Fig. 9 Distribution of the first five fixations for 5 observers, combined across seven pictures (top of figure) which false alarms equal the miss rate FPR 0 1 TPR). As with the AUC, the EER is used to compare the accuracy of the prediction. In general, the system with the lowest EER is the most accurate. Normalized scanpath saliency The normalized scanpath saliency (NSS; Peters, Iyer, Itti, Koch, 2005) is a metric that involves a saliency map and a Fig. 10 The red dots correspond to the visual fixations of 1 observer viewing the parrots image. Other colors correspond to fixations of the same observer, but for three different pictures. Control fixations are chosen from this set of fixations

10 260 Behav Res (2013) 45: set of fixations. The idea is to measure the saliency values at fixation locations along a subject s scanpath. The first thing to do is to standardize the saliency values in order to have a zero mean and unit standard deviation. It is simply given by Z SM ðxþ ¼ SMðxÞ μ σ where Z SM is the standardized saliency map and μ ¼ 1 P jj I SMðx t Þ r σ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi t2i P 1 jj I ðsmðx t Þ μþ 2 t2i where the operator jj indicates the number of pixels of the picture. For a given coordinate, the quantity Z SM (x i )represents the distance between the saliency value at x i and the average of saliency expressed in units of the standard deviation. This value is negative when the saliency value at the fixation locations is below the mean, positive when above. To take account of the fact that we do not focus accurately on a particular point, the NSS value for a given fixation location is computed on a small neighborhood centered on that location: X NSS x f ðkþ ¼ K h x f ðkþ x j ZSM x j j2p where K is a kernel with a bandwidth h and π is a neighborhood. The NSS is the average of NSS (x f (k)) for all fixations M of an observer. It is given by NSS ¼ 1 X M M NSS x k ¼ 1 f ðkþ : Figure 11 illustrates the computation of the NSS value for a scanpath composed of eight visual fixations. In this example, the average NSS value is 0.3, indicating a good correspondence between the model-predicted saliency map and the observer s scanpath. Percentile In 2008, Peters and Itti designed a metric called percentile (Peters Itti, 2008). A percentile value P(x f (k)) is computed for each location of fixation points x f (k). This score is the ratio between the number of locations in the saliency map with values smaller than the saliency value at point x f (k)andtheset of all locations. The percentile value is defined as follows: x 2 X : SMðxÞ < SM x f ðkþ Px f ðkþ ¼ 100 jsmj where X is the set of locations of the saliency map SM and x f (k) is the location of the kth fixation. jjindicates set size. Fig. 11 Example of normalized scanpath saliency computation. The heat map is a normalized version of the model-predicted saliency map with a zero mean and unit standard deviation. A scanpath composed of eight fixations (gray circles; the black one is the first fixation) is overlaid upon the standardized map. The normalized salienceis extracted for each location. Values are shown in black next to the fixations The final score is the average of P (x f (k)) for all fixations of an observer. By definition, the percentile metric has a well-defined upper bound (100 %) indicating the highest similarity between fixation points and saliency map. The chance level is 50 %. The Kullback Leibler divergence The KL-divergence, defined in The Kullback Leibler Divergence section, is used here to compute the dissimilarity between the histogram of saliency sampled at eye fixations and that sampled at random locations. Itti and Baldi (2009) were the first to use this method. The set of control points (or the set of nonfixated points) are drawn from a uniform spatial distribution. However, human fixations are not randomly distributed, since they are governed by various factors such as the central bias explained earlier. To be more agnostic to this kind of mechanism, Zhang, Tong, Marks, Shan, and Cottrell (2008) measured the KL-divergence between the saliency distribution of fixated points of a test image and the saliency distribution at the same pixel locations but of a randomly chosen image from the test set. To evaluate the variability of the score, the evaluation was repeated 100 times with 100 different sets of control points.

11 Behav Res (2013) 45: Contrary to the previous KL-divergence method in The Kullback Liebler Divergence section, a good prediction has a high KL-divergence score. Indeed, as the reference distribution represents chance, the saliency computed at humanfixated locations should be higher than that computed at random locations. Measuring a realistic upper bound Most of the methods mentioned above have a well-defined theoretical upper bound. When the performance of a computational model is assessed, it is then reasonable to seek to approach this upper bound. For instance, according to the ROC analysis, an AUC equal or close to one would indicate a very good performance. In our context, this goal is almost impossible to reach. Indeed, there is a natural dispersion of fixations among different subjects looking at the same image. This dispersion (also called inter observer congruency [IOC 1 ]) is contingent upon a number of factors. First, Tatler et al. (2005) showed that the consistency between visual fixations of different subjects is high just after the stimulus onset but progressively decreases over time. Among the reasons that might explain this variability, the most probable one concerns the time course of bottom-up and top-down mechanisms. Just after the stimulus onset, our attention is mostly steered by low-level visual features, whereas topdown mechanisms become more influential after several seconds of viewing. The second factor concerns the visual content itself. In the case where there is nothing that stands out from the background, the IOC would be small. On the contrary, a visual scene composed of salient areas would presumably attract our visual attention, leading to high congruency. The presence of particular features, such as human faces, human beings, or animals, tends to increase the consistency between observers. A number of studies have shown that we are able to identify and recognize human faces and animals very quickly in a natural scene (Delorme, Richard, Fabre-Thorpe, 2010; Rousselet, Macé, Fabre-Thorpe, 2003). Whereas human faces and animals have the ability to attract our attention, decreasing the dispersion between observers, this ability is modulated by novelty or even emotion. Althoff and Cohen s (1999) study is a good example of this point. They investigated the effect of memory or prior experience on eye movements. They found that visual scanpaths made when famous faces were viewed were more variable than those made when nonfamous faces were viewed. A third factor that could account for the variance between people might be related to cultural differences. Nisbett (2003) compared the visual 1 Note that a high dispersion corresponds to a lack of congruency. scan pattern of American and Asian populations. He found that Asian people tend to look more at the background and spend less time on focal objects than do American people. However, a recent study casts doubt on the influence of cultural differences on oculomotor behavior (Rayner, Castelhano, Yang, 2009). The IOC can be measured by using a one-against-all approach, also called leave one out (Torralba et al., 2006). It consists in computing the degree of similarity between the fixations of one observer and those of the other subjects. The final value is obtained by averaging the degree of similarity over all subjects. In this article, the ROC metric is used to compute the degree of similarity, as proposed by Torralba et al. (2006). The first step consists of building a saliency map from the visual fixations of all observers except one (the ith observer). This map is thresholded so that the most fixated areas are set to 1 and the other areas are set to 0. To assess the degree of similarity between the ith observer and the other subjects, the hit rate (as described in the Reciever Operating Characteristic Analysis section) that is, the percentage of fixations that fall into the fixated regions is computed. Iterating over all subjects and averaging the scores gives the IOC. Figure 12 illustrates this method. Figure 13a illustrates the IOC computation as a function of the numbers of fixations for two different pictures. The first picture (parrots) represents two parrots that are visually salient. The second picture (stream) is a mountain landscape where no element strongly attracts our attention. For both pictures, the congruency decreases over time, as expected. The highest congruency, observed at the beginning of the viewing, is likely to be due to the bottom-up influences and the central bias. It is interesting to emphasize that the congruency for the parrots image is significantly higher than that observed for the stream image. This observation is mainly due to the attractiveness of the two contents. When there is nothing in the scene that catches our attention, observers are not unconsciously-constrained, and they just do not explore the visual scene in the same way. Figure 13b, extracted from Le Meur, Baccino, and Roumy (2011), shows other examples of IOC values. Assessing the IOC is fundamental to evaluating the performance of the saliency algorithm, although this is overlooked most of the time. An absolute score between a set of fixations and a predicted map is interesting but is not sufficient to draw any conclusions. A low score of prediction does not systematically indicate that the saliency model performs poorly. Such a statement would be true if the dispersion between the observers is low, but false otherwise. Therefore, it is much more relevant to compare the performance of computational models to the IOC (Judd, Ehinger, Durand, Torralba, 2009;

12 262 Behav Res (2013) 45: Fig. 12 Inter observer congruency measurement. On the left, spatial coordinates of visual fixations for each observer are given. By considering all fixations except those from the ith observer, a heat map is computed (on the right). After an adaptive binarization, we count the number of fixations of the ith observer that fall into salient regions (white regions on the bottom image) Torralba et al., 2006) or to express the performance directly by normalizing the similarity score by the IOC (Zhao Koch, 2011). The normalized score would be close to 1 for a good prediction. Example: performance of state-of-the-art computational models In this section, we examine the performance of the most prominent saliency models that have been proposed in the literature. The quality of the predicted saliency maps is given here by two metrics: the hit rate (see the Receiver Operating Characteristic Analysis section) and the NSS (see the Normalilzed Scanpath Saliency section). These metrics are hybrid metrics, since they involve a set of visual fixations and a map. We believe that these metrics are the best way to assess the relevance of a predicted saliency map. As compared with saliency map-based methods, hybrid methods are nonparametric. Human saliency maps are obtained by convolving a fixation map by a 2-D Gaussian function, which is parameterized by its mean and its standard deviation. Note that instead of using the hit rate, we could have used an ROC analysis. Fig. 13 Inter observer congruency (IOC) as a function of the number of fixations for two different pictures. a Error bars represent the standard error of the mean. b Examples of IOC extracted from pictures used by Le Meur, Baccino, and Roumy (2011)

13 Behav Res (2013) 45: To perform the analysis, we use two eye-tracking data sets that are available on the Internet. They are described in the Eye-Tracking Data Sets section. We present and comment on each model s performance in the Benchmark section. Eye-tracking data sets Eye tracking is nowadays a common solution for studying visual perception. Since 2000, some eye-tracking data can be freely downloaded from the Internet for scientific purposes. Table 1 gives the main characteristics of the most important data collections on the Web. They are composed of stimuli that represent landscape, outdoor, or indoor scenes. Some of them are composed of high-level information such as people, faces, animals, and text. These data sets can be used to evaluate the performance of computational models. There is only an implicit consensusonhowtosetupaneye-trackingtest.thereisno document that accurately describes what must be done and what must be avoided in the experimental setting. For instance, should observers perform a task when viewing stimuli or not? Do the methods used to identify fixations and saccades from the raw eye data give similar results? We have to be aware that these data sets have been collected in different environments and with different apparatus and settings. To evaluate the performance of saliency models, it is highly recommended that more than one data set be used in order to strengthen the findings. a brief review of saliency models, see Le Meur Le Callet, 2009.) Two eye-tracking data sets (Le Meur and Bruce; see Table 1) are used. The degree of similarity between ground truth and model-predicted saliency is evaluated by using the ROC analysis (hit rate) and the NSS metric. Figure 14 gives the ROC curve indicating the performance of different saliency models averaged over all testing images. The method used here is the method described at the beginning of the Receiver Operating Characteristic Analysis section. The upper bound that is, the inter observer variability was computed by the method proposed by Torralba et al. (2006) and described in the Measuring a Realistic Upper Bound section. Table 2 gives the average NSS value over the two tested data sets. Under the ROC metric, Judd s model has the highest performance, as is illustrated by Fig. 14. This result was expected, since this model uses specific detectors (face detection, for instance) that improve the ability to detect salient areas. In addition, this model uses a function to favor the center of the picture in order to take the central bias into account. However, the results are more contrasted under the NSS metric shown in Table 2. On average, across both databases, Judd s model is still the highest performing. On Bruce s data set, Itti s model performs the best, with a value of 0.99, whereas Judd s model performs at The model ranking is therefore dependent on the metric used. It is therefore fundamental to use more than one metric when assessing the performance of computational models of visual attention. Benchmark We compare the performance of four state-of-the-art models: Itti s model (Itti, Koch, Niebur, 1998), Le Meur s model (Le Meur et al., 2006), Bruce s model (Bruce Tsotsos, 2009), and Judd s model (Judd et al., 2009). (For Table 1 Eye-tracking data sets freely available on the Web Data set name Number of subjects Task Number of pictures Le Meur 30 Free viewing s Kootstra 31 Free viewing 100 Bruce 20 Free viewing s MIT 15 Free viewing s DOVES 29 Free viewing 101 5s + task (grayscale) FIFA 27 Free viewing s + task Ehinger 14 Task 912 Stimuli presentation time Limitation: Do visual fixations have the same meaning? Current computational models of visual attention focus on identifying fixated locations of salient areas. From an input picture, a model computes a topographic map indicating the most visually interesting parts. This prediction is then compared with ground truth fixations. The evaluation methodology seems to be appropriate. Unfortunately, an important point is overlooked. By doing this kind of comparison, most researchers have implicitly assumed that fixations, whatever their durations, saccade amplitudes, and start-times, are all similar. In this section, we emphasize the fact that different populations of fixations may exist. Fixations differ in both their durations and their saccade amplitudes during real-world scene viewing. Antes (1974) was among the first researchers to report these variations. He observed that fixation duration increases while saccade size decreases over the course of scene inspection. This early observation was confirmed by a number of studies (Over, Hooge, Vlaskamp, Erkelens, 2007; Tatler Vincent, 2008). The variation in the duration of visual fixations

14 264 Behav Res (2013) 45: Fig. 14 Models performance tested on a the Le Meur data set (top) and b the Bruce data set (bottom). All models perform better than chance and worse than humans. Judd s model gives the best performance, on average is contingent upon factors such as the quality of the stimulus and the number of objects in the scene (as explained in the From a Discrete Fixation Map to a Continuous Saliency Map section). However, this variance might be explained by functional differences in the fixations. To investigate this point, Velichkovsky and colleagues (Unema, Pannasch, Joos, Velichkovsky, 2005; Velichkovsky, 2002) conjointly analyzed the fixation duration and the subsequent saccade amplitude. They found a nonlinear distribution indicating that (1) short fixations are associated with long saccades and, conversely, (2) longer fixations are associated with shorter saccades (Fig. 6 in Unema et al., 2005). This dichotomy permits us to disentangle focal-ambient fixations, using the terminology introduced by Trevarthen in Ambient processing is characterized by short fixations associated with long saccades. This mode might be used to extract contextual information in order to identify the whole scene. Focal processing is characterized by long fixations with short saccades. This mode may be related to recognition and conscious understanding processes. Pannasch, Schulz, and Velichkovsky (2011) proposed the classification of fixations on the basis of the amplitude of previous saccades. If the preceding saccade amplitude is greater than a threshold, the fixation is assumed to belong to the ambient

15 Behav Res (2013) 45: Table 2 NSS scores for four state-of-the-art saliency models on the Le Meur and Bruce data sets visual-processing mode. Otherwise, the fixation belongs to the focal mode. The authors chose a threshold equal to 5 of visual angle. This choice is justified by the size of the parafoveal region in which visual acuity is good. Recently, Follet, Le Meur, and Baccino (2011) proposed an automatic solution to classify visual fixations into focal and ambient groups. From this classification, they computed two saliency maps, one composed of focal fixations and the other based on ambient fixations. By comparing these maps with model-predicted saliency maps, they found that focal fixations are more bottom-up and more centered than ambient ones. Conclusion Avg. NSS ± SEM Model/Data set Le Meur Bruce Itti 0.60+/ / 0.05 Le Meur 0.77+/ / 0.03 Bruce 0.60+/ / 0.04 Judd 0.82+/ / 0.05 Note. SEM is the standard error of the mean. A high average NSS value indicates a good prediction. Prediction of computational models may depend on the data set. For instance, Itti s model performs much better on Bruce s data set than on Le Meur s data set. Judd s model gives similar results for both data sets. This article provides an extensive overview of the different ways of analyzing diachronic variables from eye-tracking data, because they are generally underused by researchers. These diachronic indicators are scanpaths or saliency maps generated to represent the sequence of fixations over time. They are usually provided by eye-tracking software for illustrative purposes, but no real means to compare them are given. This article aims to fill that gap by providing different methods of comparing diachronic variables and calculating relevant indices that might be used in experimental and applied environments. These diachronic variables give a more complete description of the visual attention time course than do synchronic variables and may inform us about the underlying cognitive processes. The ultimate step would be to relate the visual behavior recorded with eyetrackers accurately to the concurrent thoughts of the user. Despite looking at many analysis methods, some variables are still ignored (fixation duration, pupil diameter, etc.), and it is very challenging to study the way these variables can be taken into account within diachronic data. A great improvement was recently made by combining eye movements with other techniques such as fmri or EEG. For example, the development of EFRP (eye-fixation-related potentials) that tries to associate the displacement of the eye with some brain wave components (Baccino, 2011; Baccino Manunta, 2005) is a first step in that direction. But other tracks should be explored, such as EDR (electro dermal response) or ECG (electrocardiography). We are confident that researchers in this area will find new ways to go further in order to have a more complete understanding of human behavior. Requirements A free software, computing some of these diachronic indicators, can be found at the following address: irisa.fr/temics/staff/lemeur/. References Althoff, R. R., Cohen, N. J. (1999). Eye-movement-based memory effect: A reprocessing effect in face perception. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, Antes, J. R. (1974). The time course of picture viewing. Journal of Experimental Psychology, 103, Baccino, T. (2004). La Lecture électronique [Digital Reading]. Grenoble: Presses Universitaires de Grenoble, Coll. Sciences et Technologies de la Connaissance. Baccino, T. (2011). Eye movements and concurrent ERP's: EFRPs investigations in reading. In S. Liversedge, I. D. Gilchrist, S. Everling (Eds.), Handbook on eye movements (pp ). Oxford: Oxford University Press. Baccino, T., Manunta, Y. (2005). Eye-fixation-related potentials: Insight into parafoveal processing. Journal of Psychophysiology, 19, Bindemann, M. (2010). Scene and screen center bias early eye movements in scene viewing. Vision Research, 50, Bruce, N. D. B., Tsotsos, J. K. (2006). Saliency based on information maximisation. Advances in Neural Information Processing System, 18, Bruce, N. D. B., Tsotsos, J. K. (2009). Saliency, attention, and visual search: An information theoretic approach. Journal of Vision, 9 (3), Chanceaux, M., Guérin-Dugué, A., Lemaire, B., Baccino, T. (2009). Towards a model of information seeking by integrating visual, semantic and memory maps. In B. Caputo M. Vincze (Eds.), ICVW 2008 (pp ). Heidelberg: Springer. Delorme, A., Richard, G., Fabre-Thorpe, M. (2010). Key visual features for rapid categorization of animals in natural scenes. Frontiers in Psychology, 1, 21. Efron, B., Tibshirani, R. (1993). An introduction to the bootstrap. New York: Chapman and Hall. Einhauser, W., Konig, P. (2003). Does luminance-contrast contribute to a saliency for overt visual attention? European Journal of Neuroscience, 17, Follet, B., Le Meur, O., Baccino, T. (2011). New insights into ambient and focal visual fixations using an automatic classification algorithm. I-Perception, 2, Green, D., Swets, J. (1966). Signal detection theory and psychophysics. New York: Wiley.

Methods for comparing scanpaths and saliency maps: strengths and weaknesses

Methods for comparing scanpaths and saliency maps: strengths and weaknesses Methods for comparing scanpaths and saliency maps: strengths and weaknesses Olivier Le Meur, Thierry Baccino To cite this version: Olivier Le Meur, Thierry Baccino. Methods for comparing scanpaths and

More information

Methods for comparing scanpaths and saliency maps: strengths and weaknesses

Methods for comparing scanpaths and saliency maps: strengths and weaknesses Methods for comparing scanpaths and saliency maps: strengths and weaknesses O. Le Meur olemeur@irisa.fr T. Baccino thierry.baccino@univ-paris8.fr Univ. of Rennes 1 http://www.irisa.fr/temics/staff/lemeur/

More information

MEMORABILITY OF NATURAL SCENES: THE ROLE OF ATTENTION

MEMORABILITY OF NATURAL SCENES: THE ROLE OF ATTENTION MEMORABILITY OF NATURAL SCENES: THE ROLE OF ATTENTION Matei Mancas University of Mons - UMONS, Belgium NumediArt Institute, 31, Bd. Dolez, Mons matei.mancas@umons.ac.be Olivier Le Meur University of Rennes

More information

Validating the Visual Saliency Model

Validating the Visual Saliency Model Validating the Visual Saliency Model Ali Alsam and Puneet Sharma Department of Informatics & e-learning (AITeL), Sør-Trøndelag University College (HiST), Trondheim, Norway er.puneetsharma@gmail.com Abstract.

More information

What we see is most likely to be what matters: Visual attention and applications

What we see is most likely to be what matters: Visual attention and applications What we see is most likely to be what matters: Visual attention and applications O. Le Meur P. Le Callet olemeur@irisa.fr patrick.lecallet@univ-nantes.fr http://www.irisa.fr/temics/staff/lemeur/ November

More information

Evaluation of the Impetuses of Scan Path in Real Scene Searching

Evaluation of the Impetuses of Scan Path in Real Scene Searching Evaluation of the Impetuses of Scan Path in Real Scene Searching Chen Chi, Laiyun Qing,Jun Miao, Xilin Chen Graduate University of Chinese Academy of Science,Beijing 0009, China. Key Laboratory of Intelligent

More information

Saliency aggregation: Does unity make strength?

Saliency aggregation: Does unity make strength? Saliency aggregation: Does unity make strength? Olivier Le Meur a and Zhi Liu a,b a IRISA, University of Rennes 1, FRANCE b School of Communication and Information Engineering, Shanghai University, CHINA

More information

VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING

VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING VIDEO SALIENCY INCORPORATING SPATIOTEMPORAL CUES AND UNCERTAINTY WEIGHTING Yuming Fang, Zhou Wang 2, Weisi Lin School of Computer Engineering, Nanyang Technological University, Singapore 2 Department of

More information

INVESTIGATION COGNITIVE AVEC LES EFRPS (EYE-FIXATION-RELATED POTENTIALS) Thierry Baccino CHART-LUTIN (EA 4004) Université de Paris VIII

INVESTIGATION COGNITIVE AVEC LES EFRPS (EYE-FIXATION-RELATED POTENTIALS) Thierry Baccino CHART-LUTIN (EA 4004) Université de Paris VIII INVESTIGATION COGNITIVE AVEC LES EFRPS (EYE-FIXATION-RELATED POTENTIALS) Thierry Baccino CHART-LUTIN (EA 4004) Université de Paris VIII MESURES OCULAIRES PRELIMINARY QUESTIONS TO EFRPS Postulate: Individual

More information

On the role of context in probabilistic models of visual saliency

On the role of context in probabilistic models of visual saliency 1 On the role of context in probabilistic models of visual saliency Date Neil Bruce, Pierre Kornprobst NeuroMathComp Project Team, INRIA Sophia Antipolis, ENS Paris, UNSA, LJAD 2 Overview What is saliency?

More information

The Attraction of Visual Attention to Texts in Real-World Scenes

The Attraction of Visual Attention to Texts in Real-World Scenes The Attraction of Visual Attention to Texts in Real-World Scenes Hsueh-Cheng Wang (hchengwang@gmail.com) Marc Pomplun (marc@cs.umb.edu) Department of Computer Science, University of Massachusetts at Boston,

More information

Measuring Focused Attention Using Fixation Inner-Density

Measuring Focused Attention Using Fixation Inner-Density Measuring Focused Attention Using Fixation Inner-Density Wen Liu, Mina Shojaeizadeh, Soussan Djamasbi, Andrew C. Trapp User Experience & Decision Making Research Laboratory, Worcester Polytechnic Institute

More information

An Attentional Framework for 3D Object Discovery

An Attentional Framework for 3D Object Discovery An Attentional Framework for 3D Object Discovery Germán Martín García and Simone Frintrop Cognitive Vision Group Institute of Computer Science III University of Bonn, Germany Saliency Computation Saliency

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 19: Contextual Guidance of Attention Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 20 November

More information

A Model for Automatic Diagnostic of Road Signs Saliency

A Model for Automatic Diagnostic of Road Signs Saliency A Model for Automatic Diagnostic of Road Signs Saliency Ludovic Simon (1), Jean-Philippe Tarel (2), Roland Brémond (2) (1) Researcher-Engineer DREIF-CETE Ile-de-France, Dept. Mobility 12 rue Teisserenc

More information

INTRODUCTION TO MACHINE LEARNING. Decision tree learning

INTRODUCTION TO MACHINE LEARNING. Decision tree learning INTRODUCTION TO MACHINE LEARNING Decision tree learning Task of classification Automatically assign class to observations with features Observation: vector of features, with a class Automatically assign

More information

Deriving an appropriate baseline for describing fixation behaviour. Alasdair D. F. Clarke. 1. Institute of Language, Cognition and Computation

Deriving an appropriate baseline for describing fixation behaviour. Alasdair D. F. Clarke. 1. Institute of Language, Cognition and Computation Central Baselines 1 Running head: CENTRAL BASELINES Deriving an appropriate baseline for describing fixation behaviour Alasdair D. F. Clarke 1. Institute of Language, Cognition and Computation School of

More information

Webpage Saliency. National University of Singapore

Webpage Saliency. National University of Singapore Webpage Saliency Chengyao Shen 1,2 and Qi Zhao 2 1 Graduate School for Integrated Science and Engineering, 2 Department of Electrical and Computer Engineering, National University of Singapore Abstract.

More information

Natural Scene Statistics and Perception. W.S. Geisler

Natural Scene Statistics and Perception. W.S. Geisler Natural Scene Statistics and Perception W.S. Geisler Some Important Visual Tasks Identification of objects and materials Navigation through the environment Estimation of motion trajectories and speeds

More information

Outlier Analysis. Lijun Zhang

Outlier Analysis. Lijun Zhang Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based

More information

UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society

UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society Title Eyes Closed and Eyes Open Expectations Guide Fixations in Real-World Search Permalink https://escholarship.org/uc/item/81z9n61t

More information

Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths

Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths Stefan Mathe 1,3 and Cristian Sminchisescu 2,1 1 Institute of Mathematics of the Romanian Academy of

More information

Object-based Saliency as a Predictor of Attention in Visual Tasks

Object-based Saliency as a Predictor of Attention in Visual Tasks Object-based Saliency as a Predictor of Attention in Visual Tasks Michal Dziemianko (m.dziemianko@sms.ed.ac.uk) Alasdair Clarke (a.clarke@ed.ac.uk) Frank Keller (keller@inf.ed.ac.uk) Institute for Language,

More information

Understanding eye movements in face recognition with hidden Markov model

Understanding eye movements in face recognition with hidden Markov model Understanding eye movements in face recognition with hidden Markov model 1 Department of Psychology, The University of Hong Kong, Pokfulam Road, Hong Kong 2 Department of Computer Science, City University

More information

Real-time computational attention model for dynamic scenes analysis

Real-time computational attention model for dynamic scenes analysis Computer Science Image and Interaction Laboratory Real-time computational attention model for dynamic scenes analysis Matthieu Perreira Da Silva Vincent Courboulay 19/04/2012 Photonics Europe 2012 Symposium,

More information

Finding Saliency in Noisy Images

Finding Saliency in Noisy Images Finding Saliency in Noisy Images Chelhwon Kim and Peyman Milanfar Electrical Engineering Department, University of California, Santa Cruz, CA, USA ABSTRACT Recently, many computational saliency models

More information

Chapter 1: Introduction to Statistics

Chapter 1: Introduction to Statistics Chapter 1: Introduction to Statistics Variables A variable is a characteristic or condition that can change or take on different values. Most research begins with a general question about the relationship

More information

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition Stefan Mathe, Cristian Sminchisescu Presented by Mit Shah Motivation Current Computer Vision Annotations subjectively

More information

Knowledge-driven Gaze Control in the NIM Model

Knowledge-driven Gaze Control in the NIM Model Knowledge-driven Gaze Control in the NIM Model Joyca P. W. Lacroix (j.lacroix@cs.unimaas.nl) Eric O. Postma (postma@cs.unimaas.nl) Department of Computer Science, IKAT, Universiteit Maastricht Minderbroedersberg

More information

Does scene context always facilitate retrieval of visual object representations?

Does scene context always facilitate retrieval of visual object representations? Psychon Bull Rev (2011) 18:309 315 DOI 10.3758/s13423-010-0045-x Does scene context always facilitate retrieval of visual object representations? Ryoichi Nakashima & Kazuhiko Yokosawa Published online:

More information

Changing expectations about speed alters perceived motion direction

Changing expectations about speed alters perceived motion direction Current Biology, in press Supplemental Information: Changing expectations about speed alters perceived motion direction Grigorios Sotiropoulos, Aaron R. Seitz, and Peggy Seriès Supplemental Data Detailed

More information

The influence of clutter on real-world scene search: Evidence from search efficiency and eye movements

The influence of clutter on real-world scene search: Evidence from search efficiency and eye movements The influence of clutter on real-world scene search: Evidence from search efficiency and eye movements John Henderson, Myriam Chanceaux, Tim Smith To cite this version: John Henderson, Myriam Chanceaux,

More information

Contribution of Color Information in Visual Saliency Model for Videos

Contribution of Color Information in Visual Saliency Model for Videos Contribution of Color Information in Visual Saliency Model for Videos Shahrbanoo Hamel, Nathalie Guyader, Denis Pellerin, and Dominique Houzet GIPSA-lab, UMR 5216, Grenoble, France Abstract. Much research

More information

A Visual Saliency Map Based on Random Sub-Window Means

A Visual Saliency Map Based on Random Sub-Window Means A Visual Saliency Map Based on Random Sub-Window Means Tadmeri Narayan Vikram 1,2, Marko Tscherepanow 1 and Britta Wrede 1,2 1 Applied Informatics Group 2 Research Institute for Cognition and Robotics

More information

Supporting Information

Supporting Information 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Supporting Information Variances and biases of absolute distributions were larger in the 2-line

More information

A Locally Weighted Fixation Density-Based Metric for Assessing the Quality of Visual Saliency Predictions

A Locally Weighted Fixation Density-Based Metric for Assessing the Quality of Visual Saliency Predictions FINAL VERSION PUBLISHED IN IEEE TRANSACTIONS ON IMAGE PROCESSING 06 A Locally Weighted Fixation Density-Based Metric for Assessing the Quality of Visual Saliency Predictions Milind S. Gide and Lina J.

More information

Recurrent Refinement for Visual Saliency Estimation in Surveillance Scenarios

Recurrent Refinement for Visual Saliency Estimation in Surveillance Scenarios 2012 Ninth Conference on Computer and Robot Vision Recurrent Refinement for Visual Saliency Estimation in Surveillance Scenarios Neil D. B. Bruce*, Xun Shi*, and John K. Tsotsos Department of Computer

More information

Saliency in Crowd. 1 Introduction. Ming Jiang, Juan Xu, and Qi Zhao

Saliency in Crowd. 1 Introduction. Ming Jiang, Juan Xu, and Qi Zhao Saliency in Crowd Ming Jiang, Juan Xu, and Qi Zhao Department of Electrical and Computer Engineering National University of Singapore Abstract. Theories and models on saliency that predict where people

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training. Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze

More information

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011 Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011 I. Purpose Drawing from the profile development of the QIBA-fMRI Technical Committee,

More information

How does image noise affect actual and predicted human gaze allocation in assessing image quality?

How does image noise affect actual and predicted human gaze allocation in assessing image quality? How does image noise affect actual and predicted human gaze allocation in assessing image quality? Florian Röhrbein 1, Peter Goddard 2, Michael Schneider 1,Georgina James 2, Kun Guo 2 1 Institut für Informatik

More information

Computational modeling of visual attention and saliency in the Smart Playroom

Computational modeling of visual attention and saliency in the Smart Playroom Computational modeling of visual attention and saliency in the Smart Playroom Andrew Jones Department of Computer Science, Brown University Abstract The two canonical modes of human visual attention bottomup

More information

SUPPLEMENTAL MATERIAL

SUPPLEMENTAL MATERIAL 1 SUPPLEMENTAL MATERIAL Response time and signal detection time distributions SM Fig. 1. Correct response time (thick solid green curve) and error response time densities (dashed red curve), averaged across

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Pupil Dilation as an Indicator of Cognitive Workload in Human-Computer Interaction

Pupil Dilation as an Indicator of Cognitive Workload in Human-Computer Interaction Pupil Dilation as an Indicator of Cognitive Workload in Human-Computer Interaction Marc Pomplun and Sindhura Sunkara Department of Computer Science, University of Massachusetts at Boston 100 Morrissey

More information

SUN: A Bayesian Framework for Saliency Using Natural Statistics

SUN: A Bayesian Framework for Saliency Using Natural Statistics SUN: A Bayesian Framework for Saliency Using Natural Statistics Lingyun Zhang Tim K. Marks Matthew H. Tong Honghao Shan Garrison W. Cottrell Dept. of Computer Science and Engineering University of California,

More information

The Importance of Time in Visual Attention Models

The Importance of Time in Visual Attention Models The Importance of Time in Visual Attention Models Degree s Thesis Audiovisual Systems Engineering Author: Advisors: Marta Coll Pol Xavier Giró-i-Nieto and Kevin Mc Guinness Dublin City University (DCU)

More information

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Timothy N. Rubin (trubin@uci.edu) Michael D. Lee (mdlee@uci.edu) Charles F. Chubb (cchubb@uci.edu) Department of Cognitive

More information

Classical Psychophysical Methods (cont.)

Classical Psychophysical Methods (cont.) Classical Psychophysical Methods (cont.) 1 Outline Method of Adjustment Method of Limits Method of Constant Stimuli Probit Analysis 2 Method of Constant Stimuli A set of equally spaced levels of the stimulus

More information

Supplementary Materials

Supplementary Materials Supplementary Materials 1. Material and Methods: Our stimuli set comprised 24 exemplars for each of the five visual categories presented in the study: faces, houses, tools, strings and false-fonts. Examples

More information

Adding Shape to Saliency: A Computational Model of Shape Contrast

Adding Shape to Saliency: A Computational Model of Shape Contrast Adding Shape to Saliency: A Computational Model of Shape Contrast Yupei Chen 1, Chen-Ping Yu 2, Gregory Zelinsky 1,2 Department of Psychology 1, Department of Computer Science 2 Stony Brook University

More information

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data 1. Purpose of data collection...................................................... 2 2. Samples and populations.......................................................

More information

Saliency in Crowd. Ming Jiang, Juan Xu, and Qi Zhao

Saliency in Crowd. Ming Jiang, Juan Xu, and Qi Zhao Saliency in Crowd Ming Jiang, Juan Xu, and Qi Zhao Department of Electrical and Computer Engineering National University of Singapore, Singapore eleqiz@nus.edu.sg Abstract. Theories and models on saliency

More information

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) AUTHORS: Tejas Prahlad INTRODUCTION Acute Respiratory Distress Syndrome (ARDS) is a condition

More information

Reveal Relationships in Categorical Data

Reveal Relationships in Categorical Data SPSS Categories 15.0 Specifications Reveal Relationships in Categorical Data Unleash the full potential of your data through perceptual mapping, optimal scaling, preference scaling, and dimension reduction

More information

Evaluating Visual Saliency Algorithms: Past, Present and Future

Evaluating Visual Saliency Algorithms: Past, Present and Future Journal of Imaging Science and Technology R 59(5): 050501-1 050501-17, 2015. c Society for Imaging Science and Technology 2015 Evaluating Visual Saliency Algorithms: Past, Present and Future Puneet Sharma

More information

PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks

PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks Marc Assens 1, Kevin McGuinness 1, Xavier Giro-i-Nieto 2, and Noel E. O Connor 1 1 Insight Centre for Data Analytic, Dublin City

More information

T. R. Golub, D. K. Slonim & Others 1999

T. R. Golub, D. K. Slonim & Others 1999 T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have

More information

Cultural Differences in Cognitive Processing Style: Evidence from Eye Movements During Scene Processing

Cultural Differences in Cognitive Processing Style: Evidence from Eye Movements During Scene Processing Cultural Differences in Cognitive Processing Style: Evidence from Eye Movements During Scene Processing Zihui Lu (zihui.lu@utoronto.ca) Meredyth Daneman (daneman@psych.utoronto.ca) Eyal M. Reingold (reingold@psych.utoronto.ca)

More information

Saliency Prediction with Active Semantic Segmentation

Saliency Prediction with Active Semantic Segmentation JIANG et al.: SALIENCY PREDICTION WITH ACTIVE SEMANTIC SEGMENTATION 1 Saliency Prediction with Active Semantic Segmentation Ming Jiang 1 mjiang@u.nus.edu Xavier Boix 1,3 elexbb@nus.edu.sg Juan Xu 1 jxu@nus.edu.sg

More information

Objects do not predict fixations better than early saliency: A re-analysis of Einhäuser et al. s data

Objects do not predict fixations better than early saliency: A re-analysis of Einhäuser et al. s data Journal of Vision (23) 3():8, 4 http://www.journalofvision.org/content/3//8 Objects do not predict fixations better than early saliency: A re-analysis of Einhäuser et al. s data Ali Borji Department of

More information

Reactive agents and perceptual ambiguity

Reactive agents and perceptual ambiguity Major theme: Robotic and computational models of interaction and cognition Reactive agents and perceptual ambiguity Michel van Dartel and Eric Postma IKAT, Universiteit Maastricht Abstract Situated and

More information

Visual Saliency with Statistical Priors

Visual Saliency with Statistical Priors Int J Comput Vis (2014) 107:239 253 DOI 10.1007/s11263-013-0678-0 Visual Saliency with Statistical Priors Jia Li Yonghong Tian Tiejun Huang Received: 21 December 2012 / Accepted: 21 November 2013 / Published

More information

Lecturer: Rob van der Willigen 11/9/08

Lecturer: Rob van der Willigen 11/9/08 Auditory Perception - Detection versus Discrimination - Localization versus Discrimination - - Electrophysiological Measurements Psychophysical Measurements Three Approaches to Researching Audition physiology

More information

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling AAAI -13 July 16, 2013 Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling Sheng-hua ZHONG 1, Yan LIU 1, Feifei REN 1,2, Jinghuan ZHANG 2, Tongwei REN 3 1 Department of

More information

Lecturer: Rob van der Willigen 11/9/08

Lecturer: Rob van der Willigen 11/9/08 Auditory Perception - Detection versus Discrimination - Localization versus Discrimination - Electrophysiological Measurements - Psychophysical Measurements 1 Three Approaches to Researching Audition physiology

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

Searching in the dark: Cognitive relevance drives attention in real-world scenes

Searching in the dark: Cognitive relevance drives attention in real-world scenes Psychonomic Bulletin & Review 2009, 16 (5), 850-856 doi:10.3758/pbr.16.5.850 Searching in the dark: Cognitive relevance drives attention in real-world scenes JOHN M. HENDERSON AND GEORGE L. MALCOLM University

More information

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Still important ideas Contrast the measurement of observable actions (and/or characteristics)

More information

Learning to classify integral-dimension stimuli

Learning to classify integral-dimension stimuli Psychonomic Bulletin & Review 1996, 3 (2), 222 226 Learning to classify integral-dimension stimuli ROBERT M. NOSOFSKY Indiana University, Bloomington, Indiana and THOMAS J. PALMERI Vanderbilt University,

More information

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian The 29th Fuzzy System Symposium (Osaka, September 9-, 3) A Fuzzy Inference Method Based on Saliency Map for Prediction Mao Wang, Yoichiro Maeda 2, Yasutake Takahashi Graduate School of Engineering, University

More information

Can Saliency Map Models Predict Human Egocentric Visual Attention?

Can Saliency Map Models Predict Human Egocentric Visual Attention? Can Saliency Map Models Predict Human Egocentric Visual Attention? Kentaro Yamada 1, Yusuke Sugano 1, Takahiro Okabe 1 Yoichi Sato 1, Akihiro Sugimoto 2, and Kazuo Hiraki 3 1 The University of Tokyo, Tokyo,

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

The Role of Color and Attention in Fast Natural Scene Recognition

The Role of Color and Attention in Fast Natural Scene Recognition Color and Fast Scene Recognition 1 The Role of Color and Attention in Fast Natural Scene Recognition Angela Chapman Department of Cognitive and Neural Systems Boston University 677 Beacon St. Boston, MA

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

Dynamic Eye Movement Datasets and Learnt Saliency Models for Visual Action Recognition

Dynamic Eye Movement Datasets and Learnt Saliency Models for Visual Action Recognition Dynamic Eye Movement Datasets and Learnt Saliency Models for Visual Action Recognition Stefan Mathe 1,3 and Cristian Sminchisescu 2,1 1 Institute of Mathematics of the Romanian Academy (IMAR) 2 Faculty

More information

(Visual) Attention. October 3, PSY Visual Attention 1

(Visual) Attention. October 3, PSY Visual Attention 1 (Visual) Attention Perception and awareness of a visual object seems to involve attending to the object. Do we have to attend to an object to perceive it? Some tasks seem to proceed with little or no attention

More information

Prediction of Successful Memory Encoding from fmri Data

Prediction of Successful Memory Encoding from fmri Data Prediction of Successful Memory Encoding from fmri Data S.K. Balci 1, M.R. Sabuncu 1, J. Yoo 2, S.S. Ghosh 3, S. Whitfield-Gabrieli 2, J.D.E. Gabrieli 2 and P. Golland 1 1 CSAIL, MIT, Cambridge, MA, USA

More information

COMPARATIVE STUDY ON FEATURE EXTRACTION METHOD FOR BREAST CANCER CLASSIFICATION

COMPARATIVE STUDY ON FEATURE EXTRACTION METHOD FOR BREAST CANCER CLASSIFICATION COMPARATIVE STUDY ON FEATURE EXTRACTION METHOD FOR BREAST CANCER CLASSIFICATION 1 R.NITHYA, 2 B.SANTHI 1 Asstt Prof., School of Computing, SASTRA University, Thanjavur, Tamilnadu, India-613402 2 Prof.,

More information

Detection of terrorist threats in air passenger luggage: expertise development

Detection of terrorist threats in air passenger luggage: expertise development Loughborough University Institutional Repository Detection of terrorist threats in air passenger luggage: expertise development This item was submitted to Loughborough University's Institutional Repository

More information

Introduction to Computational Neuroscience

Introduction to Computational Neuroscience Introduction to Computational Neuroscience Lecture 11: Attention & Decision making Lesson Title 1 Introduction 2 Structure and Function of the NS 3 Windows to the Brain 4 Data analysis 5 Data analysis

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 5, 6, 7, 8, 9 10 & 11)

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 15: Visual Attention Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 14 November 2017 1 / 28

More information

An Evaluation of Motion in Artificial Selective Attention

An Evaluation of Motion in Artificial Selective Attention An Evaluation of Motion in Artificial Selective Attention Trent J. Williams Bruce A. Draper Colorado State University Computer Science Department Fort Collins, CO, U.S.A, 80523 E-mail: {trent, draper}@cs.colostate.edu

More information

Do you have to look where you go? Gaze behaviour during spatial decision making

Do you have to look where you go? Gaze behaviour during spatial decision making Do you have to look where you go? Gaze behaviour during spatial decision making Jan M. Wiener (jwiener@bournemouth.ac.uk) Department of Psychology, Bournemouth University Poole, BH12 5BB, UK Olivier De

More information

Human Learning of Contextual Priors for Object Search: Where does the time go?

Human Learning of Contextual Priors for Object Search: Where does the time go? Human Learning of Contextual Priors for Object Search: Where does the time go? Barbara Hidalgo-Sotelo, Aude Oliva, Antonio Torralba Department of Brain and Cognitive Sciences and CSAIL, MIT MIT, Cambridge,

More information

Relative Influence of Bottom-up & Top-down Attention

Relative Influence of Bottom-up & Top-down Attention Relative Influence of Bottom-up & Top-down Attention Matei Mancas 1 1 Engineering Faculty of Mons (FPMs) 31, Bd. Dolez, 7000 Mons, Belgium Matei.Mancas@fpms.ac.be Abstract. Attention and memory are very

More information

Representational similarity analysis

Representational similarity analysis School of Psychology Representational similarity analysis Dr Ian Charest Representational similarity analysis representational dissimilarity matrices (RDMs) stimulus (e.g. images, sounds, other experimental

More information

Computational Cognitive Science. The Visual Processing Pipeline. The Visual Processing Pipeline. Lecture 15: Visual Attention.

Computational Cognitive Science. The Visual Processing Pipeline. The Visual Processing Pipeline. Lecture 15: Visual Attention. Lecture 15: Visual Attention School of Informatics University of Edinburgh keller@inf.ed.ac.uk November 11, 2016 1 2 3 Reading: Itti et al. (1998). 1 2 When we view an image, we actually see this: The

More information

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5 PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science Homework 5 Due: 21 Dec 2016 (late homeworks penalized 10% per day) See the course web site for submission details.

More information

The synergy of top-down and bottom-up attention in complex task: going beyond saliency models.

The synergy of top-down and bottom-up attention in complex task: going beyond saliency models. The synergy of top-down and bottom-up attention in complex task: going beyond saliency models. Enkhbold Nyamsuren (e.nyamsuren@rug.nl) Niels A. Taatgen (n.a.taatgen@rug.nl) Department of Artificial Intelligence,

More information

INCORPORATING VISUAL ATTENTION MODELS INTO IMAGE QUALITY METRICS

INCORPORATING VISUAL ATTENTION MODELS INTO IMAGE QUALITY METRICS INCORPORATING VISUAL ATTENTION MODELS INTO IMAGE QUALITY METRICS Welington Y.L. Akamine and Mylène C.Q. Farias, Member, IEEE Department of Computer Science University of Brasília (UnB), Brasília, DF, 70910-900,

More information

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015

More information

Influence of Low-Level Stimulus Features, Task Dependent Factors, and Spatial Biases on Overt Visual Attention

Influence of Low-Level Stimulus Features, Task Dependent Factors, and Spatial Biases on Overt Visual Attention Influence of Low-Level Stimulus Features, Task Dependent Factors, and Spatial Biases on Overt Visual Attention Sepp Kollmorgen 1,2., Nora Nortmann 1., Sylvia Schröder 1,2 *., Peter König 1 1 Institute

More information

Differential Viewing Strategies towards Attractive and Unattractive Human Faces

Differential Viewing Strategies towards Attractive and Unattractive Human Faces Differential Viewing Strategies towards Attractive and Unattractive Human Faces Ivan Getov igetov@clemson.edu Greg Gettings ggettin@clemson.edu A.J. Villanueva aaronjv@clemson.edu Chris Belcher cbelche@clemson.edu

More information

AC : USABILITY EVALUATION OF A PROBLEM SOLVING ENVIRONMENT FOR AUTOMATED SYSTEM INTEGRATION EDUCA- TION USING EYE-TRACKING

AC : USABILITY EVALUATION OF A PROBLEM SOLVING ENVIRONMENT FOR AUTOMATED SYSTEM INTEGRATION EDUCA- TION USING EYE-TRACKING AC 2012-4422: USABILITY EVALUATION OF A PROBLEM SOLVING ENVIRONMENT FOR AUTOMATED SYSTEM INTEGRATION EDUCA- TION USING EYE-TRACKING Punit Deotale, Texas A&M University Dr. Sheng-Jen Tony Hsieh, Texas A&M

More information

A Hierarchical Visual Saliency Model for Character Detection in Natural Scenes

A Hierarchical Visual Saliency Model for Character Detection in Natural Scenes A Hierarchical Visual Saliency Model for Character Detection in Natural Scenes Renwu Gao 1(B), Faisal Shafait 2, Seiichi Uchida 3, and Yaokai Feng 3 1 Information Sciene and Electrical Engineering, Kyushu

More information

Visual Task Inference Using Hidden Markov Models

Visual Task Inference Using Hidden Markov Models Visual Task Inference Using Hidden Markov Models Abstract It has been known for a long time that visual task, such as reading, counting and searching, greatly influences eye movement patterns. Perhaps

More information

NeuroMem. RBF Decision Space Mapping

NeuroMem. RBF Decision Space Mapping NeuroMem RBF Decision Space Mapping Version 4.2 Revised 01/04/2017 This manual is copyrighted and published by General Vision Inc. All rights reserved. Contents 1 Introduction to a decision space... 3

More information

Stats 95. Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures

Stats 95. Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures Stats 95 Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures Stats 95 Why Stats? 200 countries over 200 years http://www.youtube.com/watch?v=jbksrlysojo

More information