A neural network model of modal/amodal completion

Size: px

Start display at page:

Download "A neural network model of modal/amodal completion"

Brittany Lambert
6 years ago
Views:

1 UNIVERSITÀ DEGLI STUDI DI TRIESTE Sede Amministrativa del Dottorato di Ricerca XVII CICLO DEL DOTTORATO DI RICERCA IN PSICOLOGIA A neural network model of modal/amodal completion DOTTORANDA ERICA MENGOTTI COORDINATORE DEL COLLEGIO DEI DOCENTI CHIAR.MO PROF. CORRADO CAUDEK UNIVERSITÀ DEGLI STUDI DI TRIESTE TUTORE CHIAR.MO PROF. CARLO SEMENZA UNIVERSITÀ DEGLI STUDI DI TRIESTE

2 CONTENTS Abbreviations Introduction Chapter 1: Psychophysics of modal/amodal completion 1.1 Introduction Modal and Amodal Completion Modal/Illusory Contours Amodal/Occluded Contours Microgenesis Identity Hypothesis Evidence against Identity Hypothesis Amodal Continuation Summary Chapter 2: Neurophysiology of modal/amodal completion 2.1 Introduction Overview of visual system Ventral system areas Horizontal/lateral connections V1 horizontal connections Nonclassical receptive field Association field Feedback modulations Figure-ground activity in V Border ownership Temporally compact brain Neural correlates of modal/amodal completion Single cells neurophysiology Lesion studies

3 2.6 Human neuroimaging studies Locus of illusory contour processing Feedback hypothesis Comparison between modal and amodal dynamics Summary Chapter 3: Computational models of modal/amodal completion 3.1 Introduction Model of Grossberg and Mingolla Model of Heitger and colleagues Model of Neumann and Sepp Model of Li Conclusions Chapter 4: New computational model of modal/amodal completion 4.1 Introduction Description and equations of the computational model LGN ON and OFF Channels Simple Cells Complex Cells Lateral connections Feedback from higher areas Simulations Modal/amodal completion differences Amodal continuation differences Conclusions Chapter 5: Conclusions Glossary Bibliography

4 ABBREVIATIONS BCS = boundary contour system CRF = classical receptive field ERP = event-related potentials FCS = feature contour system fmri = functional magnetic resonance imaging IC = illusory contour IT = inferotemporal cortex LGN = lateral geniculate nucleus LOC (or LOR) = lateral occipital cortex or lateral occipital complex, it is considered the homologous of macaque IT cortex LOR (or LOC)= lateral occipital region MEG = magnetoencephalography MRF = minimum response field msec = millisecond MST = medial superior temporal area MT = middle temporal area, visual motion processing area, corresponds to V5 OP = occipital pole PET = positron emission tomography rcbf = regional cerebral blood flow RF = receptive field RT = response time SR = support ratio STP = superior temporal polysensory V1 = primary visual cortex or striate cortex V2-V3-V4-V5 = extrastriate visual cortex VOT = ventral occipitotemporal cortex VT = ventral temporal 4

5 INTRODUCTION Very recent developments in psychophysical and neurophysiological research on modal/amodal completion highlight new evidence for different mechanisms involved in these two types of completion. Both psychophysical and neurophysiological experiments show systematic shape dissimilarities between modal and amodal contours. However, the current biologically plausible computational models of contour interpolation do not implement such differences. In this thesis, a novel biologically plausible computational model for modal/amodal completion is presented. This model is based on recent psychophysical findings presented in Chapter 1, and on physiological experimental results presented in Chapter 2. Some of the elements of the model structure are already present in previous computational models of visual interpolation presented in Chapter 3. The details of the model structure are given in Chapter 4, as well as some simulation data obtained with the proposed model. The key properties of the model are excitatory longrange interactions between cells with collinear receptive fields, and a modulating feedback from higher cortical areas. The results of the simulations show that the model can qualitatively reproduce empirical data on modal and amodal contour completion. In Chapter 4 it is further examined the behaviour of the model for different input images. The results suggest a functional role of higher area feedback modulations in the shape differences for modal and amodal contour completion, and in the dissimilarities found in amodal continuation displays. 5

6 CHAPTER 1 Psychophysics of modal/amodal completion 1.1 Introduction The ability to recognise visual objects is a crucial component for everyday interaction with the environment. Detecting objects in a visual scene can be of vital importance. For example, potential prey have to recognise their predator, even if partly hidden from view, to be able to flee. Furthermore, some prey try to hide themselves through camouflage, i.e. minimising the number of visual cues that distinguish them from the environment, and the predator strives to break the camouflage by exploiting the available visual cues. The visual system is constantly faced with the problem of identifying objects from incomplete visual information. Every day, we confront the problem of recognising objects bounded by edges that are not fully defined. In many cases, the lack of visual specification stems from partial occlusion by surrounding objects; in other cases, incomplete edge patterns arise from low luminance contrast between the object and the background environment. One of the more remarkable qualities of the human vision system is its ability to compensate for the missing information. Even though we constantly perceive partly occluded objects, we rarely notice that the visual information we receive is incomplete. Phenomenologically, we seem to complete occluded shapes immediately and effortlessly, so that we see whole, uninterrupted objects. In some way our brain makes sense of the still available information from the visible part of the object and the visual system extrapolates complete information about contours so that we do experience these objects as unified wholes. One of the fundamental problems of vision science is understanding how the visual system recovers object and surface structure from the fragments of image on the retinae. It is now well-known that the visual system contains mechanisms for contour completion (or interpolation) that go beyond the information present in the two eyes by actively connecting image regions that are physically disconnected on the retinae. 6

7 This chapter is not intended to be a comprehensive review of all the research on visual completion. The focus is primarily on the completion of two-dimensional static contours, rather than on volumetric or kinetic completion. The basic outline of the chapter is as follows. In Paragraph 1.2, the definition of modal and amodal completion is given and a brief overview of the most important studies of modal (Paragraph 1.2.1) and amodal (Paragraph 1.2.2) interpolation is presented. At the same time, some empirical methods used in the study of visual completion are introduced. In Paragraph 1.2.3, experiments on the developmental time course of visual completion are described. In Paragraph 1.3, the hypothesis of identity between modal and amodal completion is discussed. The recent evidence against the identity hypothesis is analysed in Paragraph 1.4. Finally, a particular type of amodal completion and how it is utilised to determine the effects of spatiotemporal context on amodal completion is presented in Paragraph Modal and Amodal Completion Two broad classes of visual completion have been distinguished on the basis of the phenomenological states they induce. The first class of visual completion leads to a clear visual impression of a contour or a surface in locations where there is no local image contrast to support this percept (Fig. 1.1). Observers perceive a contrast border (called illusory contour) in image regions that contain no contrast. The interpolated structure appears to have resulted from direct stimulation of the visual modality, as real image contrast, and thus is known as modal completion (Michotte, Thines, and Crabbe, 1964). One of the best known examples of modal completion is the famous Kanizsa triangle (Kanizsa, 1979) illustrated in Fig Fig. 1.1 Fig

The ecological conditions that give rise to modal completion require very specific relationships to exist between a foreground object and the background, i.e. the same luminance and color.

3), a relatively rare occurrence in natural scenes. Fig. 1.

8 The ecological conditions that give rise to modal completion require very specific relationships to exist between a foreground object and the background, i.e. the same luminance and color. These are the same conditions satisfied by camouflage (Fig. 1.3), a relatively rare occurrence in natural scenes. Fig. 1.3 (Katydids camouflage) The second class of visual completion is more common and occurs whenever portions of an object are hidden behind another object. An example of this form of completion can be seen in Fig Distant image fragments appear to form a single object behind an occluding surface and observers report the impression of a single entity (in this case, a black oval form) continuing behind a nearer occluding item (the grey rectangle. This is known as amodal completion because, despite the vivid perception of object unity, observers do not actually see a contour (i.e., a contrast border) in image regions where the completion occurs, they merely `know it is there'(koffka, 1935). Occlusion occurs in virtually every real-world scene and therefore amodal completion is a fundamental visual process. This mechanism probably evolved to allow the observer to recognise and manipulate objects that are partially hidden from view. Fig

9 1.2.1 Modal/Illusory contours Modal (or illusory) contours were first reported by Schumann (1900) and their occurrence has been studied by many researchers (for reviews, see Petry and Meyer, 1987). The surface delineated by illusory contours (like the Kanizsa triangle) elicits some perceptual phenomena: this surface is often perceived as brighter than the surrounding region (Kanizsa, 1979), it appears located nearer than the surround (Coren, 1972; Dresp and Grossberg, 1999), and it is delimited by physically nonexistent contours. Because such a surface is present when the physical stimulus has no contrast or luminance gradients, it is referred to as an illusory figure. Illusory contours are not only present in Kanizsa-type figures, but can also be generated by phase-shifting abutting gratings (Fig. 1.5) or gaps in background gratings that seem to occlude the background gratings (Fig. 1.6). Fig. 1.5 Fig. 1.6 The strength of Kanizsa-type illusory contours (i.e., if the border is more o less visible) depends on the extents of the real edges relative to total edge length (Fig. 1.7). Interpolated contour strength appears to be a linear function of the support ratio (SR), the ratio of physically specified length 2r to total edge length l: SR = 2r/l. Illusory contour strength increases with support ratio, i.e. observers report more visible contours when the gap between the inducers is small compared to total edge length (Fig. 1.8). This relationship holds over a wide variety of display and if the support ratio is held constant, interpolated contour strength remains approximately constant across a broad range of stimulus sizes (Lesher and Mingolla, 1993; Ringach and Shapley, 1996; Shipley and Kellman, 1992a). 9

Fig. 1.7 Fig. 1.8 On the other hand, Lesher and Mingolla (1993) reported that for line end inducers (Fig. 1.9) the illusory contour strength increases as an inverted-u shaped function of the number and density of line end inducers (Fig.

10 Fig. 1.7 Fig. 1.8 On the other hand, Lesher and Mingolla (1993) reported that for line end inducers (Fig. 1.9) the illusory contour strength increases as an inverted-u shaped function of the number and density of line end inducers (Fig. 1.10). Fig. 1.9 Fig The inclusion of small elements in an illusory-figure displays substantially alters the appearance of the figure (Sambin, 1974). The illusory square seen in figure 1.11a depends on 10

11 the dots, as can be verified by comparison with figure 1.11b, where an illusory circle is generally seen. However, these four dots alone do not function as elements inducing illusory contours, producing no illusory figure when presented alone (Fig. 1.11c). In order to induce an illusory figure, the inducers (e.g., the three notched circles in the Kanizsa triangle) must have physically specified edges, leading into tangent discontinuities (Shipley and Kellman, 1990), and interpolation is highly dependent on the orientations and positions of the inducing elements. A dot contains no tangent discontinuities or oriented edges, so its presence should not affect interpolation. Shipley and Kellman (2003) called receiving elements the elements capable of modifying an illusory contour without inducing it. Receiving elements, such as dots, can alter contour formation and can increase the apparent clarity of an illusory edge. Fig Amodal/Occluded contours Amodal completion interpolates the missing parts of a partly occluded object. This process is a critical component of object recognition. As example, in the Bregman-Kanizsa display (Figure 1.12a) each fragment appears as complete within itself. As a consequence, it is very difficult to recognise the represented elements. It is very different, however, when an occluder is superimposed on the fragments (Fig. 1.12b). In this case, the boundary appears to continue amodally behind the black occluding region and the partially occluded Bs are recognisable. 11

Fig. 12 One mechanistic interpretation of this phenomenon is that the occluder has visible contrast with the background, thus it pops forward in front of the Bs, allowing the fragments to amodally

12 Fig. 12 One mechanistic interpretation of this phenomenon is that the occluder has visible contrast with the background, thus it pops forward in front of the Bs, allowing the fragments to amodally complete behind the occluder. This completed representation is forwarded to the object recognition system (Grossberg, 1994). Without an occluder that contrasts with the background, no object surface is seen in front of the Bs, so the Bs cannot complete amodally and are harder to recognise. Research over the past few decades has provided convincing evidence that the visual system does treat partly occluded objects as though they are complete. Amodal completion greatly influences performance in many perceptual tasks, including visual search (He and Nakayama, 1992; Rensink and Enns, 1998), primed matching (Bruno, Bertamini, and Domini, 1997; Sekuler, 1994; Sekuler and Palmer, 1992), shape discrimination (Nakayama, Shimojo, and Silverman, 1989; Ringach and Shapley, 1996). In these tasks, observers typically behave as if responding to the amodal completion of the stimulus, and not only to the visible, unoccluded fragments. In the visual search paradigm, observers have to determine the presence or absence of a pre-defined target as quickly as possible, while maintaining an high accuracy level. The target is distinguishable among a set of distractor items on the basis of one property or a combination of properties. In some cases (e.g., when the target and non-targets are of distinct colors or form as in Fig. 1.13), the time taken to determine the target s presence hardly alters as the number of non-targets in a display is augmented. This effect, termed pop-out, can 12

13 provide an index of parallel search, without the need for examining each element one by one (serial search). If the visual system represents only physically specified regions, then it should represent an occluded circle as a notched disk (i.e., mosaic representation). Consequently, a partly occluded circle should pop out of a display containing complete circles, just as an incomplete circle would pop out of a group of circles. Thus, search times should not vary much as a function of the distractor set size (parallel search). Conversely, if the occluded circle is represented as complete, then it should be quite difficult to detect among a field of circle distractors, and a slow, serial search will be required to detect it (Fig. 1.14). Fig Fig Using a visual search paradigm, Rensink and Enns (1998) found that the response times (RTs) during searching for a notched square among unnotched squares were very fast if notched square and disk were kept separate (Fig. 1.15, Condition 1A - Mosaic). However, the displacement of the square towards the disk caused a dramatic slowdown in search (Fig. 1.15, Condition 1B - Occlusion), indicating that the targets no longer contained a distinctive visual feature. When the notched square was adjacent to the disk (Condition 1B - Occlusion), search was inefficient because the notched target was rendered similar to a complete square by amodal completion and therefore it was much harder to find. On the other hand, search was efficient when the target was separate from the disk and completion was not possible (Condition 1A - Mosaic). 13

14 Fig (from Rensink and Enns, 1998) Numerous other studies indicate that the visual system ultimately represents partly occluded objects as completed forms (Gold, Murray, Bennett, and Sekuler, 2000; He and Nakayama, 1992; Sekuler, 1994; Sekuler and Palmer, 1992; see also Sekuler and Murray, 2001, for a review). These results may be taken as objective, albeit indirect evidence that the visual system treats occluded objects as if they were complete Microgenesis Despite the phenomenological experience of visual completion as instantaneous and effortless, research also indicates that completion takes measurable time. Sekuler and Palmer (1992) used the primed matching paradigm to examine the time course of amodal completion. In the primed matching paradigm, observers view a priming stimulus and then judge whether a pair of test stimuli have the same shape or different shapes. The time taken to correctly identify "same" pairs depends on the representational similarity of the test shapes to the prime. For example, if observers are primed with a circle, they will be faster to respond "same" to a test pair of circles than to a test pair of notched circles (Fig. 1.16a). Conversely, if the prime is a notched circle, observers will be faster to respond "same" to a pair of notched circles than to a pair of circles (Fig. 1.16b). This paradigm has been used to explored the developmental time course, or microgenesis, of visual completion in occlusion displays (Fig. 1.16c). When the priming stimulus is partly occluded, observers describe the prime as a circle partly hidden behind a square (complete representation). However, the pattern of information that reaches the eye is also consistent with the interpretation of a notched circle adjacent to a square (mosaic representation). 14

15 Fig Sekuler and Palmer (1992) presented the prime for varying amounts of time and they found that with long prime durations ( msec), partly occluded objects primed observers same and different responses like complete objects; but with short prime durations (50 msec) partly occluded objects primed observers responses like mosaic objects, or like an intermediate representation. The results of Sekuler and Palmer s experiments suggested that amodal completion requires a measurable amount of time ( msec), and it is a two-stage sequential process. In the first stage, a literal description of the visible parts (mosaic) is produced, while in the second stage the occluded surface is completed. When processing is interrupted early, priming comes from the preliminary mosaic representation and therefore favours matches of mosaic tests. On the other hand, when processing is allowed to proceed further, the system reaches the complete representation that primes the complete tests. In a later primed matching study, Shore and Enns (1997) manipulated the amount of occlusion in their stimuli and found shorter completion times for smaller amounts of occlusion. Guttman, Sekuler and Kellman (2003) found that completion time increases with the size of the occluded region and varied from less than 75 msec to over 200 msec, depending on how much of the stimulus was occluded. Bruno, Bertamini, and Domini (1997) found that time to completion decreases when stimulus displays contain stereo cues congruent with the perceived depth ordering of the target and occluder. As with the primed matching paradigm, the shape discrimination paradigm provides researchers with an objective method of determining how boundary completion (either 15

16 modally or amodally completed contours) is processed by the visual system. This method can be used for studying the relationship between modal/illusory and amodal/occluded contours, and also allows us to explore the microgenesis of completion (Murray, Sekuler, and Bennett, 2001; Ringach and Shapley, 1996). In Ringach and Shapley's study, observers were asked to judge whether a deformed illusory Kanizsa square appeared to have convex or concave vertical sides (i.e., appears fat or thin ). The deformed illusory squares were created by rotating the pacman inducers by a small angle so that the sides of the squares appeared to bend, creating the appearance of a fat or thin figure ( illusory condition, Fig. 1.17). In a second condition, observers discriminated between fat and thin amodally completed squares that looked like deformed squares seen through four holes in an occluding surface ( occluded condition, Fig. 1.18). In the control condition, all the inducers faced in the same direction ( fragmented condition) and the subjects judged whether they were tilted up or down (Fig. 1.19). Locally, the illusory and fragmented stimuli were very similar, and were distinguished only by the orientations of the inducers. Fig Fig Fig

17 Ringach and Shapley (1996) investigated the presentation time necessary to use the illusory or occluded contour information in this shape discrimination task. They found that for duration of about 100 msec and longer, observers were better at discriminating between fat and thin stimuli in the illusory and occluded conditions than in the fragmented condition. Ringach and Shapley argued that this was because observers used the illusory and occluded contours to discriminate between fat and thin stimuli, just as they would use luminance-defined contours. In their experiments, no significance difference emerged between modally and amodally completed Kanizsa squares (pacmen and pacmen with circles, respectively) for the precision with which the shape discrimination task could be done. However, to achieve the same level of performance the exposure duration had to be 50 msec longer for amodal completion. Gegenfurtner, Brown, and Rieger (1997) found an even larger difference in the speed of processing between modally and amodally completed Kanizsa triangles. However, in Ringach and Shapley s experiment only the target was presented, whereas in Gegenfurtner s task the target shape (triangle) was displayed amidst a set of distractors. Davis and Driver (1994) have shown that this requires a serial search for amodally completed illusory figures, but can be done in parallel for modally completed figures. Therefore the necessary visual search for the target might account for the difference in results. Rauschenberger and Yantis (2001) used a visual search paradigm (Paragraph 1.2.2) to determine how the size of the occluded region affects the time required for amodal completion. They found that time to completion increased with the size of the occluded region and argued that early in processing, the representation of a partly occluded object differs from that of its complete counterpart; as processing proceeds, the representations of occluded and complete shapes become similar. Moreover, when circles were occluded by 37%, there was no difference in search efficiency between the adjacent and separate conditions for both 100 msec and 250 msec presentations (Fig. 1.20). Beyond this amount of occlusion, Rauschenberger and Yantis (2001) found no clear evidence of completion, although the longest duration they tested (250 msec) may simply not have been long enough to reveal the effects of completion for a highly occluded object. Indeed, Guttman, Sekuler and Kellman (2003) using the primed matching paradigm found that even highly occluded objects can be completed, given enough time, and confirmed that the time necessary for completion rises with amount of occlusion. 17

18 Fig (adapted from Rauschenberger and Yantis, 2001) Although studies have generally been consistent in concluding that completion requires time, the estimate of precisely how much time is required has varied considerably from one study to the next. Sekuler and Palmer (1992) estimated that msec was required for completion; Ringach and Shapley's (1996) estimates ranged from msec; Murray et al.'s (2001) estimate was approximately 75 msec; Guttman et al.'s (2003) estimates ranged from less than 75 msec to over 400 msec; and Bruno et al. (1997) reported finding no measurable minimum time for completion. The existence of these variations strongly suggests that completion does not require a fixed amount of time, but time to completion varies considerably depending on the context of completion (e.g., amount of occlusion, presence of additional depth cues), task requirements, individual differences, and stimulus variables. 1.3 Identity hypothesis Despite the fact that modal and amodal completion elicit very different phenomenological states, both modal and amodal completion involve the connection of disjoint image fragments into a coherent representation of objects, surfaces, and contours. Figure 1.21 depicts partly-occluded and illusory squares having equivalent physicallyspecified contours and gaps (Fig. 1.21c). Phenomenally, the shape of the interpolated contours is the same in the two images. This suggested the possibility that a common mechanism may underlie both forms of completion, thereby generating identical shapes in the two cases. 18

Fig. 1.21 Shipley and Kellman (1992b) referred to the hypothesis of a single interpolation process as the identity hypothesis.

19 Fig Shipley and Kellman (1992b) referred to the hypothesis of a single interpolation process as the identity hypothesis. They tested this hypothesis using two sets of equivalent partially occluded and illusory figures, as those depicted in Figure The observers were asked to rate the strength of perceived unity of visible parts in the partly occluded figures and the strength of perceived clarity of edges in illusory figures. All studies showed nearly perfect correlations between perceived unity and perceived edge clarity, when the physically parts of the figures were identical. Shipley and Kellman (1992b) argued that this provided evidence for a common contour completion process. Fig (adapted from Shipley and Kellman, 1992) The identity hypothesis claims that partly occluded and illusory contours arise from a common boundary interpolation process that operates independently of completion type 19

20 (Kellman, Guttman, and Wickens, 2001; Kellman, Yin, and Shipley, 1998; Shipley and Kellman, 1992b, Kellman and Shipley, 1991). On this hypothesis, identical contour shapes are interpolated in both modal and amodal completion. The perceptual differences between these phenomena reside only in the depth ordering of the interpolated contours, whether they appear in front of or behind other surfaces, and not in the mechanism that interpolates boundaries. Kellman and colleagues presented a series of demonstrations and experiments that provided support for a common mechanism in illusory and occluded object perception. One source of evidence involved bistable displays, as that shown in figure 1.23a. Despite consisting of a single, physically homogeneous region, the surface appears to split into two separate objects, one of which appears to occlude the other (Petter, 1956; Kanizsa, 1979; Tommasi, Bressan, and Vallortigara, 1995). The two objects have weak or absent depth order information and with prolonged viewing, the depth relationships can appear to flip, causing the objects to alternate between a modal appearance (when nearer) and an amodal one (when occluded), as highlight in figure 1.23b and 1.23c. These changes of depth order do not alter the shapes of interpolated boundaries in bistable displays. When reversals occur, it is not unit formation that changes but only the depth relations between units formed. Fig Another source of evidence is reported by Kellman, Yin, and Shipley (1998). They claimed that the experimental results of Ringach and Shapley (1996) gave support for the identity hypothesis. Actually, using the fat/thin discrimination task, Ringach and Shapley (see Paragraph 1.2.3) found that sensitivity to different degrees of rotation showed the same patterns for occluded (amodally completed) and illusory (modally completed) figures. 20

21 In addition, Kellman, Yin, and Shipley (1998) present a new class of figures in which illusory and occluded contours can join (Fig. 1.24). They called this type of display quasimodal because the completed boundaries are neither modal nor amodal. The existence of quasimodal displays seems to argue against the notion that edge formation can be strictly separated into different processes of modal and amodal completion, providing evidence for the identity hypothesis. Fig (from Kellman, Yin, and Shipley, 1998) The identity hypothesis has the advantage of simplicity. There would be little reason to posit two distinct mechanisms of contour interpolation if modal and amodal completion always generated the same contour shapes. Biologically plausible, computational models of visual completion typically incorporate implicitly the notion of identity between modal and amodal contour. All these models share a dependence on pairs of physically given edges on either side of the gap, and real edge borders are the same both for illusory and occluded contours (Fig. 1.21). Therefore, neural-style models of contour interpolation operate independently of whether the completion is modal or amodal. 1.4 Evidence against Identity hypothesis The first source of evidence casting doubt on the identity hypothesis was reported by Ringach and Shapley (1996), but ignored by Kellman et colleagues. Although Ringach and Shapley found identical shape discrimination performance for modal and amodal displays for some display durations, they also reported a substantial difference in modal and amodal 21

22 discrimination when the display durations were sufficiently short. This significant difference in dynamics between modal and amodal completion implies that there are some fundamental differences between modal and amodal interpolation processes. Anderson, Singh, and Fleming (2002) provided new and strong evidence challenging the notion that modal and amodal completion are driven by a common mechanism. In their experiments they demonstrated that modal boundary completion process exhibits a strong dependence on the prevailing luminance relationships of the scene, whereas amodal completion process does not. Perhaps most importantly, using stereoscopic stimuli, they proved that differences in the shapes of modally and amodally completed displays could be observed. In their experiments, the exact same two images were used to create both the modal and amodal versions of the display (Fig. 1.25). Relative depth was inverted by simply interchanging the two eyes views. The percepts reported by observers are illustrated in figure The interpolated contours are very different in the two percepts, providing direct support of an asymmetry in the modal versus amodal contour completion. Fig (from Anderson, Singh, and Fleming, 2002) 22

corresponding modal and amodal contours.

23 Fig (adapted from Anderson, Singh, and Fleming, 2002) More recently, Singh (2004) demonstrated that the identity hypothesis was incorrect, finding systematic differences between the shapes of corresponding modal and amodal contours. Singh used stereoscopic versions of chromatically homogeneous figures (Fig. 1.27a) and a modified Kanizsa configuration (Fig. 1.28a), the same type of stimuli used by Shipley and Kellman (1992) to account for the identity hypothesis. Participants adjusted the shape of a comparison display in order to match the shape of perceived interpolated contours. Results revealed that participants perceived partly occluded contours to be systematically more angular (i.e., closer to a corner) than corresponding illusory contours (Fig. 1.27d, 1.28d), providing a critical counterexample to the claim of a common boundary completion process. Fig (from Singh, 2004) 23

Fig. 1.28 (from Singh, 2004) Almost all results that were considered as support for the identity hypothesis seem to have been demonstrated incorrect.

24 Fig (from Singh, 2004) Almost all results that were considered as support for the identity hypothesis seem to have been demonstrated incorrect. The error of positing an identity between modal and amodal completion could have been made because the initial experiments (Kellman, Yin, and Shipley, 1998; Shipley and Kellman, 1992b; Kellman and Shipley, 1991) focused mainly on comparing the strengths of modal and amodal contour, rather than their interpolated shapes or their dynamics. Moreover, the studies that have found the identity hypothesis to hold used stimuli that represent special cases in which the two forms of completion exhibit similar properties, thereby making any shape differences less likely to be detected. 1.5 Amodal continuation The contour completion mechanism requires the presence of two real edge fragments on both parts of the gap. However, it can happen that one object is occluded at only one of its extremities instead of in the middle (Figure 1.29). In such cases, the object is usually perceived to be larger than the visible portion. This phenomenon has been described by Kanizsa (1979) and termed amodal continuation. Observers reliably misperceive the size of the visible area of the grey rectangle. The grey rectangle on the left appears to most closely match the middle rectangle on the right even though the second from the top is the correct physical match. In general, observers overestimate the area of the grey rectangle by 8% 24

25 (Kanizsa 1979). Kanizsa argued that this illusion results from boundary extension where one form appears to extend under another. The extension occurs despite the absence of a second visible edge to join to, thus this phenomenon is different from completion or interpolation, in which two visible contours join behind an occluder. Fig Shimojo and Nakayama (1990) used perceived direction in an apparent motion paradigm to find further evidence of amodal continuation and to infer the location of occluded boundaries. The phenomenon of apparent motion is the illusory percept of smooth and continuous motion when a sequence of frames is presented at a proper rate and duration. This is the basis for motion picture technology. For example, when the two frames shown in figure 1.30 alternate, the squares appear to be moving. The direction of motion is determined by the smallest distance that a square must cover: horizontal in figure 1.30 and vertical in figure If the horizontal and vertical distances are equated (Fig. 1.32), the motion of the small squares is potentially ambiguous. Fig Fig

Fig. 1.32 Fig. 1.33 In Shimojo and Nakayama s experiments, the two frames shown in figure 1.32 (equal distances) alternated in apparent motion.

26 Fig Fig In Shimojo and Nakayama s experiments, the two frames shown in figure 1.32 (equal distances) alternated in apparent motion. Without an occluder, the direction of motion was bistable: the squares could appear to move vertically or horizontally. When an occluding rectangle was placed in the display (Fig. 1.33), observers show a bias to perceive vertical motion, and a substantial increase in the vertical spacing was necessary before horizontal motion was reported. This suggested that the vertical pair were represented as being closer than the horizontal pair. Shimojo and Nakayama concluded that this phenomenal proximity was the result of the boundaries of the squares extending vertically under the rectangle. The amodal continuation decreased the phenomenal separation between vertical elements, biasing the apparent motion in that direction. Using the method of ambiguous apparent motion, Joseph and Nakayama (1999) studied whether the amodal representation of a partly occluded object is affected by the recent visual experience of seeing it when it was fully visible. In other words, they aimed to determine whether amodal representations possess memory, in the empirical sense of being influenced by the previously visible attributes of the entire object. In Joseph and Nakayama s study, observers saw bars of variable length that were partly occluded by a moving rectangle (Fig. 1.34a). After a random delay time ranging from 0 to 2 sec (Fig. 1.34b), the squares that remained visible underwent an apparent motion sequence (Fig. 1.34c), with stimuli alternating between frame 1 and 2. Subjects indicated whether the perceived motion was in the horizontal or vertical direction. The subjects showed a bias favouring motion in the direction of the elongated bars, i.e. a greater amodal continuation was perceived in the case of long bars. These results show that the amodal representation of an object depends on the object being seen before partial occlusion, i.e. 26

27 seeing an object before the occlusion event can influence the way in which the visual system interprets and processes occlusion cues. Fig Summary In this chapter, a review of studies on contour completion has been presented. The research has shown that biologically plausible, computational models of contour completion must take into account several empirical facts concerning the contour interpolation. Among these are the following: 1. Contour completion takes time. 2. Time of completion depends on factors such as the amount of occlusion. 3. The shape of completion is different for illusory and occluded contours, i.e. the identity hypothesis is incorrect. 4. The shape of completion depends on past perceptual experience. 27

28 CHAPTER 2 Neurophysiology of modal/amodal completion 2.1 Introduction The visual system has the most complex neural circuitry of all the sensory systems. Much of our knowledge of visual system organisation derives from anatomical and physiological studies of the brains of monkeys. Indeed, similar visual areas and processing stages are known to exist in different monkey species and humans and a vast number of physiological experiments have been done in animals in order to understand how brain mechanisms give rise to visual perception. In this chapter, after a brief review of the relevant anatomical structures of primate and human visual systems, the flow of visual information and the neural structure involved in its processing are described (Paragraph 2.2). In Paragraph 2.3, the way that the contextual information from the surround modulates the neuronal response will be examined. The effects of the feedback from higher visual areas on visual processing are presented in Paragraph 2.4. Finally, several aspects of the physiological characteristics of modal and amodal contour perception are considered both in the monkey and in the human visual system (Paragraph 2.5). 2.2 Overview of the visual system The visual cortex is composed of numerous functional areas that contain a representation of the contralateral visual hemifield (Fig. 2.1) or, in the case of the inferotemporal cortex (IT), of the entire visual field. From the retina, almost all the visual information is passed to the neurones of the lateral geniculate nucleus (LGN) and, with very few exceptions, is sent to the primary visual cortex (V1), also called striate cortex. After processing in area V1, feedforward connections transmit the activity to other visual cortical areas (V2, V3, V4, V5/MT), also called extrastriate areas. These multiple, functionally 28

29 distinct visual areas are organised in two major processing pathways: the dorsal and the ventral stream (Fig. 2.2), both of which originate in the primary visual cortex (V1). Fig. 2.1 (from Palmer, 1999) The dorsal stream relays visual information to the parietal cortex and is implicated in motion and spatial perception. This pathway is crucial in space vision for assessing the spatial relationships among objects and visual guidance toward them, therefore it is also called the where system. The ventral stream projects to the temporal cortex and is involved in colour and form perception. This pathway is critical for the visual recognition of objects, therefore it is also called the what system. Fig. 2.2 (from Kandel, Schwartz, and Jessell, 2002) 29

30 Within each of the two processing streams, it is possible to view the visual information as ascending through a hierarchy of cortical areas (hierarchical model) The information is processed in a sequence of steps corresponding to the different levels of the hierarchy and at each step cells process inputs of increasing complexity. In particular, the ventral stream (i.e. object recognition) has been seen as a bottom-up process in which lowlevel inputs provided by the retina are transformed into more useful representations as the recognition process goes along (Fig. 2.3). Indeed, in primary visual cortex (V1), cells respond to basic features such as orientation and retinal location (Hubel & Wiesel, 1962). Cells in the intermediate ventral area (V4) appear to respond maximally to features of a medium level of complexity such as angles and vertices (Pasupathy & Connor, 1999, 2001). Cells in the inferior temporal cortex (IT) are more sensitive to very complex patterns (Kobatake and Tanaka, 1994). Fig Ventral system areas The first cortical stage of the ventral system hierarchy is the primary visual cortex (V1), located in the back of the occipital lobe. Area V1 receives its main feedforward inputs from the lateral geniculate nucleus (LGN). About 90% of the projections from the eye are channelled through the LGN to V1. The remaining 10% of the retinal fibres project to various subcortical structures, and form a pathway from the retina through the superior colliculus to the pulvinar (see Fig. 2.1). V1 neurones have small receptive fields (RFs). The receptive field is that region of the lower area to which a neurone is connected by way of 30

31 feedforward connections (Fig. 2.4). In visual perception the receptive field is considered the area of the retina and visual space that, when stimulated, produces a change in the response of the neurone. These small receptive fields are well-suited to extracting local features of contour, such as direction and position of short line segments. Fig. 2.4 In recent years, the functionality of different visual areas has been studied also by deactivating them with localised lesions or by cooling parts of the brain causing a temporal disability. Using this method, it has been shown that lesions to area V1 cause devastating visual loss. V1 damage robs higher visual areas of their necessary input and therefore V1 is necessary for normal visual function and awareness of the visual stimulus. Small lesions in V1 lead to scotomas, i.e. phenomenal blindness restricted to corresponding regions of the visual field. From V1, information is transmitted to extrastriate visual areas for further analysis. Feedforward projections from V1 can conduct information very quickly. Minimal response latencies in V2 (45 msec) are only 10 ms longer than in V1 (35 msec). In general, response latencies at any hierarchical level are about 10 ms longer than those at the previous level, reflecting processing time at the previous stage. At area V2, the next visual stage, the characteristics of the neurones are rather similar to those of V1, except that V2 neurones have larger receptive fields. It has been believed that, like V1 neurones, V2 cells were selective only for analysing a narrow subset of shape characteristics such as orientation of lines or edges. However, Kobatake and Tanaka (1994) 31

32 demonstrated that V2 cells are not specialised only for the orientation of lines or edges. In their experiments, these authors found that a few V2 neurones selectively responded to complex stimuli. Furthermore, studying the role of visual area V2 in shape analysis, Hedgé and Van Essen (2000, 2003) found that one-third of cells in area V2 of the alert macaque responded well to various complex shape characteristics. In many V2 cells the most effective complex stimulus elicited a significantly larger response than the most effective line segment. These results indicate that substantial proportions of V2 cells convey information not only about stimulus orientation but also about other characteristics, including shape, size, orientation, and spatial frequency. Moreover, Ito and Komatsu (2004) found that a fairly large number of V2 neurones selectively responded to angle stimuli. On the other hand, onehalf of their sample showed significant orientation selectivity for long bars, although the responses were submaximal. This means that if they had only been looking for selectivity for straight lines, a large fraction of the sample would have been classified as selective for line segment orientation. The next stage along the ventral visual pathway after area V2, is the cortical visual area V4. Area V4 is an important intermediate stage in the ventral stream, and provides the major input to the final stages in the inferotemporal cortex (IT). Initial observations on V4 cells indicated that these cells were selective for colour, and it has been thought that they were devoted exclusively to colour vision. Subsequently, a larger number of V4 neurones were found selective also to the orientation of bars. Moreover, using the stimuli shown in figure 2.5, Pasupathy and Connor (1999, 2001) found that most of V4 neurones appeared to respond maximally to features of a medium level of complexity such as angles and vertices. Lesions in the V4 area impair a monkey s ability to discriminate patterns and shapes but only minimally affect the ability to distinguish colour. Indeed, Girard, Lomber, and Bullier (2002) examined the pattern discrimination abilities of macaque monkeys during cooling deactivation of area V4, and they did not find significant deficits during simple hue discriminations, but found only simple shape discrimination deficits that did not attenuate with time. These results showed that area V4 is an important step in shape and form processing along the ventral visual stream leading to the inferotemporal cortex. 32

Fig. 2.5 Cells at the end of the ventral pathway in the inferotemporal cortex (IT) show a high level of integration and selectivity for global shape.

33 Fig. 2.5 Cells at the end of the ventral pathway in the inferotemporal cortex (IT) show a high level of integration and selectivity for global shape. The response properties of IT neurones are those we might expect from an area involved in a later stage of pattern recognition. The receptive fields are very large (Fig. 2.6) with respect to those in the previous stages, and this may be related to position invariance, that is the ability to recognise the same object anywhere in the visual field. The most prominent visual input to the IT is from area V4, so IT cells show most of the V4 functional properties. For example, IT neurones are sensitive to both shape and colour, although the strength of the response varies for different combinations of shape and colour. Most interesting is the finding that some inferotemporal cells respond only to specific types of complex stimuli, such as the hand or face. Moreover, whereas some neurones respond preferentially to faces, others respond preferentially to specific facial expressions. Lesions of the inferotemporal area can selectively impair face or object recognition. Fig. 2.6 (adapted from Rolls, 1994) 33

34 2.3 Horizontal/lateral connections The hierarchical model emphasises the role of the feedforward connections but anatomical studies indicate that cortical processing is not strictly hierarchical (Bullier and Nowak, 1995). Neurones do not simply transfer information from a lower to a higher visual area through feedforward connections, they also process it. This operation is achieved by two sets of connections: the horizontal or lateral connections that connect together neighbouring neurones within a cortical area and the feedback connections that transmit the information from a higher to a lower hierarchical level, in the direction opposite to feedforward connections. Processing between areas and processing within an area are inevitably related because whatever message is sent by a cortical neurone to neurones of another cortical area is also transmitted by horizontal connections to neighbouring neurones within the same area V1 horizontal connections Subsequent research has shown that neurones in V1 analyse not only the attributes of local features, such as orientation, but also more global characteristics. V1 neurones are sensitive to complex stimuli occupying larger areas than indicated by their receptive fields (RF), and this sensitivity depends on the geometry of the stimulus components (Gilbert et al., 1996; Kapadia et al., 1995, 2000). For example, a V1 neurone s response (measured as spikes/sec) to a bar in the centre of its RF can be augmented by flanking iso-oriented, collinear bars (Fig. 2.7b) that do not elicit a response if presented alone, as shown in Fig. 2.7c. Fig. 2.7 (adapted from Kapadia, Westheimer, and Gilbert, 2000) 34

Human observers exhibit a comparable increased ability to detect a dim bar in the presence of flanks, and the geometric constraints limiting such facilitation are identical both for V1 neurones of

35 Human observers exhibit a comparable increased ability to detect a dim bar in the presence of flanks, and the geometric constraints limiting such facilitation are identical both for V1 neurones of monkeys and in human performance. It has been proposed that V1 selectivity to such stimuli is mediated by long-range horizontal connections intrinsic to V1 (Gilbert et al., 1996). V1 horizontal connections are long-range, reciprocal, intra-laminar projections (Fig. 2.8) that extend over distances of 6 8 mm parallel to the cortical surface. Using biochemical tracer injections to track where the horizontal axons of a particular V1 cell project, it has been determined that horizontal connections exhibit modular specificity, i.e. they link preferentially cortical cells with similar functional properties, such as same preferred orientation selectivity. Fig. 2.8 (from Bosking et al., 1997) Consistent with the orientation specificity of horizontal connections, collinear facilitation (i.e., the enhancement of the cell response to an optimally oriented low-contrast stimulus by flanking co-oriented and coaxial high-contrast stimuli) is highly dependent on the orientation of the flanking stimuli (Fig. 2.9) and decreases as the flank is rotated from iso-orientation with the central stimulus (Kapadia et al., 1995; Gilbert et al., 1996). 35

36 Fig. 2.9 (from Gilbert et al., 1996) Bosking and colleagues (1997) found that in the tree shrew V1 horizontal connections outside a radius of 500 µm exhibit not only modular specificity, but also specificity for axis of projection. Neurone axons extend for longer distances along the axis of the preferred orientation of the cell. Parallel studies in flanking facilitation showed a similar pattern of responses when the flank stimulus in shifted from collinearity (Fig. 2.10). Fig (from Gilbert et al., 1996) These results, combined with the evidence that horizontal connections are largely reciprocal, indicate that individual neurones receive input from other neurones whose receptive fields have similar orientation preference and are displaced along an axis in visual space that corresponds to their preferred orientation. This intra-cortical circuitry enables V1 cells to integrate information over a relatively large portion of the visual field, and is suited to mediating a wide variety of contextual influences that show orientation dependency. The distances traversed by the horizontal connections relate cells with widely separated receptive fields. Consequently, the cellular targets of horizontal connections effectively integrate input over an area of cortex that represents an extent of visual field roughly an order of magnitude 36

37 in area larger than the cells own receptive fields. Therefore, horizontal connections are thought to be the physiological substrate of some of the contextual interactions. Experimental evidence indicates that the conduction velocity of monkey V1 horizontal axons tend to be slow (median 0.3 m/s) and the slow nature of these connections is confirmed measuring directly by electrical stimulation (Girard, Hupé, Bullier, 2001). This is consistent with the thin caliber of horizontal axons and their unmyelinated character (the myelination of the axon is a mechanism for increasing conduction velocity, in fact conduction in myelinated axons is faster than in nonmyelinated axons of the same diameter) Nonclassical receptive field V1 neurones are primarily driven by feedforward inputs, and feedforward connections determine the classical receptive field (CRF) or minimum response field (MRF) of the cell. One method to measure the size of the CRF consists in placing short line segments at different positions in the visual field and register the firing rate of the neurone. In this way it is possible to determine the visual field positions that elicit a response (i.e. an increase in firing rate) of the neurone. The resultant region is the classical receptive field. In contrast to feedforward connections, horizontal axons do not drive their target neurones but only elicit subthreshold responses, thus exerting a modulatory effect. Namely, contextual stimuli do not by themselves activate neurones, but when presented jointly with a stimulus within the CRF can facilitate responses several-fold. This broader region of the visual field, that includes the CRF, is defined as nonclassical receptive field. The contextual influences are not only excitatory. Kapadia, Westheimer, and Gilbert (2000) studied the spatial arrangement of contextual interactions both in the response properties of neurones in the primary visual cortex of alert monkeys and in human perception. They found that contextual influences are not uniform but rather highly dependent on the spatial positioning of the surrounding stimuli relative to the CRF of the cell. For example, while flanks positioned along the axis exert an excitatory effect (Fig. 2.7), flanking stimuli located symmetrically with respect to the RF have an inhibitory modulation (Fig. 2.11). 37

38 Fig (adapted from Kapadia, Westheimer, and Gilbert, 2000) Kapadia and colleagues compared physiological and psychophysical results by means of two-dimensional maps of the receptive field and the surround effects, one obtained from psychophysical experiments and the other from data of cell recordings (Fig. 2.12). In these maps blue zones indicate the excitatory interactions (located along the ends of receptive fields), while the red zones indicate the inhibitory interactions (strongest along the orthogonal axis). It is manifest that the basic features of the two maps are quite similar, therefore a strong candidate for the anatomical substrate of the contextual interactions found in human observers is likely to be the intrinsic, long-range horizontal connections formed by cells in the primary visual cortex (V1). Fig (from Kapadia, Westheimer, and Gilbert, 2000) 38

2.3.3 Association field The fact that horizontal connections relate cells of similar orientation but separated receptive fields suggests that they may represent one mechanism of contour integration.

39 2.3.3 Association field The fact that horizontal connections relate cells of similar orientation but separated receptive fields suggests that they may represent one mechanism of contour integration. Previously, Field, Hayes, and Hess (1993) have coined the term association field to account for the local grouping or association of small segments in neighbouring regions of the visual field (Fig. 2.13). Fig (from Hess and Field, 1999) Field and colleagues based the definition of association field on psychophysical results showing that human observers are much better at detecting a contour composed of multiple segments when the segments are aligned with the path of the contour than when they are aligned orthogonally to the path (Fig. 2.14). The ability to detect contours composed of small oriented line segments among an array of distracter elements is dependent upon both the orientation and position of the elements (Field, Hayes, and Hess, 1993; Hess and Field, 1999). Indeed, subjects demonstrated that they could not detect the contour reliably when the elements along the path differed in orientation by more than 30. The association field reflects the property that one element can be strongly associated only with collinearly and smoothly arranged elements. Field and colleagues proposed that the underlying mechanism of this grouping could be the horizontal connections between cortical V1 neurones. 39

Fig. 2.14 (from Hess and Field, 1999) 2.

40 Fig (from Hess and Field, 1999) 2.4 Feedback modulations The common assumption that the surround interactions of V1 neurones are due mainly to long-range horizontal connections, has been recently challenged. Angelucci and colleagues (Angelucci et al., 2000; Angelucci and Bullier, 2003) reviewed evidence for and against this assumption and concluded that horizontal connections are too slow and cover too little of the visual field to subserve all the functions of suppressive surrounds of V1 neurones. They showed that the extent of visual space covered by horizontal connections is commensurate to the region of the classical receptive field (CRF) and the proximal surround. On the contrary, these connections are not sufficiently extensive to account for the mean size of V1 cells surround fields or longer-range center-surround interactions. Therefore, beyond the CRF, the most likely substrate for surround modulations are feedback connections from higher order areas. In fact, inactivation of higher areas leads to a major decrease in the strength of the suppressive surround of neurones in lower order areas, supporting the hypothesis that feedback connections play a role in surround modulation effects. The influence of feedback connections on the response of neurones has long been known. Evidence for top-down (i.e., feedback) modulation of activity in lower-tier visual areas has been shown in the later part of macaque V1 neurones responses during figureground segregation (Lamme, 1995; Hupé et al., 1998; Lamme, Zipser, and Spekreijse, 1998; Lamme, Rodriguez-Rodriguez, and Spekreijse; Lamme and Roelfsema, 2000; Roelfsema et al., 2002) (Paragraph 2.4.1) and in V2 and V4 reflecting border ownership coding (Zhou, Friedman, and von der Heydt, 2000; von der Heydt, 2004) (Paragraph 2.4.2). 40

41 2.4.1 Figure-ground activity in V1 Lamme and colleagues compared the responses of V1 neurones having receptive fields inside a figure with that of neurones having receptive fields placed in the background. The stimulus used was a textured square on a similarly textured ground (Fig. 2.15a). Fig (from Roelfsema et al., 2002) Lamme and colleagues found that V1 neurones not only responded better when the receptive field was inside the figure compared to when in the background (Fig. 2.15b), but the enhancement was spatially uniform within the figure (also far from the borders) and was not limited to small figures. Based on these observations, Lamme suggested that the interior enhancement effect could be related to whether the neurone s receptive field was encoding a part of the figure or a part of the background. However, they failed to find figure-ground effects in V1 when animals were anaesthetised or in animals with V2 lesions. These modulations occur msec after stimulus onset, considerably later than the initial onset of activity in the same V1 neurones. 41

42 Marcus and van Essen (2002) found an enhancement of neural activity also inside illusory figures, although less evident than for real ones, and the latency of the effect for illusory figures was longer than that to texture-defined figures. The identification of a region as a figure requires global image processing (the system needs to evaluate an area the size of the figure or more). The long latency and global processing suggest that feedback from extrastriate cortices, including IT, plays a role in mediating these effects. The later part of V1 s neural activities reflect this higher level processing. These findings suggest that figure-ground segregation and object recognition are intertwined, they cannot progress in a simple bottom-up serial fashion, but have to happen concurrently and interactively in constant feedforward and feedback loops that involve the entire hierarchical circuit in the visual system (Lee et al., 1998) Border ownership The visual system tends to interpret contrast borders as occluding contours and to assign them to the figure in front. For example, in figure 2.16A1 the border light-dark is perceived as belonging to the light square, while the grey region, which does not own the border, is perceived as extending behind the square, forming the background. Border ownership indicates which region is seen as figure (the one owning the border) and which side as background (the one without the border). Zhou et al. (2000) investigated the neuronal correlates of this perceptual process at early stages of the visual pathway (V1, V2 and V4). They recorded cell activity while a border between figure and ground was placed within the classical receptive field (CRF) of the cell (the ellipse in Fig. 2.16A,B). Border ownership was varied, so that the figure lay on one side of the CRF or the other, without changing the local input of the neurone. Slightly more than half of the recorded neurones in areas V2 and V4 exhibited significantly different responses as a function of border ownership. This effect was less evident in V1. The border ownership modulations emerged soon (<25 msec) after the response onset, and the differences were found also for large figure size. These results show that, besides the local contrast border information, most of V2 and V4 neurones encode the side to which the border belongs ( border ownership coding ), therefore visual 42

stimulation far from the receptive field centre do influence the neuronal activity, indicating mechanisms of global context integration. Fig. 2.16 (from Zhou et al., 2000) 2.4.

43 stimulation far from the receptive field centre do influence the neuronal activity, indicating mechanisms of global context integration. Fig (from Zhou et al., 2000) Temporally compact brain Information from higher to lower order areas may influence the lower areas activity only if high-level information is activated very fast. This implies that feedback modulation is possible only if the transfer of information can be done sufficiently rapidly, both feedforward and feedback. Indeed, Girard, Hupé, and Bullier (2001) found that feedback axons between V2 and V1 are as rapid as feedforward axons and that both are much faster than horizontal connections. For example, given the short distance between areas V1 and V2, conduction times through feedforward and feedback connections between these two areas are of the order of 1 or 2 msec for most axons. Moreover, neuronal responses in higher order areas of the macaque visual system are delayed by only a few milliseconds relative to the responses of neurones in areas V1 and V2. The rapid conduction velocities of feedback action is consistent with the lack of delay observed in the effect of inactivating area MT on the responses of neurones in areas V1, V2 and V3, or of inactivating area V2 on the responses of V1 neurones (Hupé et al., 2001). These results led Bullier and colleagues to the assertion that the visual cortex is temporally compact, i.e. with short conduction times between areas, several centimetres apart in the 43

44 brain and separated by several levels in the hierarchy (Girard, Hupé, and Bullier, 2001; Bullier, 2001). 2.5 Neural correlates of modal/amodal completion The vivid perception of illusory contour suggests that they may be represented explicitly in the visual system. Therefore, neurophysiological studies on contour completion has been done to find its neural correlates and to understand the underlying neural mechanisms. Single cell recording has provided direct evidence that the early visual areas (V1 and V2) are involved in representing illusory contours. Neural correlates of illusory contour perception have been found also in the higher visual areas. Nevertheless, the locus and the mechanism for computation of contour completion remain hard to identify. Most research on boundary interpolation has been done on modal/illusory contours, whereas far fewer findings are available on amodal/occluded completion Single cells neurophysiology In a pioneering experiment, von der Heydt, Peterhans, and Baumgartner (1984) found that V2 neurones responded to stimuli of illusory contours as if the contours were contrast borders: 44% of V2 cells signalled the orientation of an illusory contour defined by abutting gratings (von der Heydt and Peterhans, 1989) and 32% responded to moving Kanizsa-type figures (Peterhans and von der Heydt, 1989). However, they did not see significant signals in V1 area, thus the authors suggested that only V2 neurones were able to detect illusory contours and modal completion evolved at V2 level. However, Peterhans and von der Heydt (1989) reported that cells in V2 responded vigorously to illusory displays that generated a percept of a modally completed bar, but did not respond to displays that generated a percept of an amodally completed bar, providing physiological evidence of differences between modal and amodal completion mechanisms. Following experiments demonstrated that also some V1 neurones respond to illusory contours (Grosof et al., 1993; Lee and Nguyen, 2001). In their experiment, Lee and Nguyen (2001) presented a sequence of displays instead of a single Kanizsa-type square (Fig. 2.17). 44

First, four complete circular discs were shown, then they were abruptly transformed into four notched discs, producing the illusion that a square had appeared in front of the four circular discs.

45 First, four complete circular discs were shown, then they were abruptly transformed into four notched discs, producing the illusion that a square had appeared in front of the four circular discs. The two steps were then repeated in sequence. This manipulation were more effective in evoking the illusory contour response than simply presenting the illusory square. Among other stimuli, Lee and Nguyen used also occluded (Fig. 2.18b) and real contour displays (Fig. 2.18c). Fig (adapted from Lee and Nguyen, 2001) Fig (adapted from Lee and Nguyen, 2001) Results showed that the cells responded significantly more to the illusory contour than to the amodal/occluded condition (Fig. 2.19, left), further evidence for different underlying mechanisms. In addition, the authors found that the illusory contour elicited a response only if it was placed at precisely the same location at which a real contour elicited the maximum response (Fig. 2.19, right). 45

46 Fig (adapted from Lee and Nguyen, 2001) With regard to the timing of modal contour processing, the response to the illusory contour was delayed relative to the response to the real contours by 55 ms, and illusory contour responses in V2 had a shorter response latency compared to V1. Thus, Lee and Nguyen concluded that contour completion in V1 might arise under the feedback modulation from V2. On the basis of these results, Lee (2002, 2003) proposed a possible mechanism of contour completion: an oriented V2 neurone, once activated by the illusory contour, feeds the information back to excite the V1 neurones of the appropriate orientation under its spatial coverage (Fig. 2.20). Fig (adapted from Lee, 2003) Amodal/occluded contours have a phenomenally nonvisual character that would seem to place them in the domain of higher-order visual areas. Nevertheless, Sugita (1999) found neuronal correlates of amodal completion within area V1, where he recorded the responses of single cells to a moving bar hidden by patches of different disparities (Fig. 46

47 2.21). Over 10% of V1 neurones responded to partly occluded contours. These cells did not respond to the bar when a patch of colour identical to the background was placed on the receptive field (Fig. 2.21a), so that the bar appeared as two unconnected segments. They also did not respond when the patch had a visible grey colour (Fig. 2.21b). However, the cells began to respond again when the patch had crossed disparity so that it appeared to be in front of the bar (Fig. 2.21c), but not when the patch had uncrossed disparity so that it appeared to be behind the bar (Fig. 2.21d). Fig (adapted from Sugita, 1999) These results indicate that these cells were receiving inputs from other disparitysensitive cells. Moreover, the response latency for the bar behind the patch was not different from that for the complete bar. Sugita concluded that these responses were likely to be mediated by horizontal connections within V1, or by feedback signals coming from areas very close to V Lesion studies Single cell recording experiments have determined that the early visual cortex (V1 and V2) participates in the representation of illusory contour, but they did not identify the locus or the mechanism of the contour completion. Several lesion studies have suggested that illusory contour sensitivity occurs at higher processing stages. Behavioural investigations with lesioned monkeys indicate the critical roles of macaque V4 area (De Weerd et al., 1996; Merigan, 1996; Merigan and Pham, 1996) and inferotemporal (IT) area (Huxlin and Merigan, 1998; Huxlin et al., 2000; Merigan and Saunders, 2004) for illusory contour 47

48 detection. After lesions in V4 or IT cortex, the monkey s ability to see illusory contours was severely impaired. Huxlin and colleagues (2000) found that monkeys with bilateral IT lesions could not discriminate shapes defined by illusory contours. The monkeys were still able to perform the task with shapes defined by luminance or chromatic cues, therefore the deficit appeared to have been due to the illusory contour completion, rather than the shape discrimination aspects of the task. Merigan and Saunders (2004) demonstrated that unilateral IT lesions did not disrupt the perception of illusory contours. However, the performance of monkeys with bilateral IT lesions dropped from 95% correct to near chance (Fig. 2.22, right). These results suggest that the IT cortex may be critical to the visibility of illusory contours and that the activations registered in areas V1 and V2 may reflect feedback modulation from higher order areas. Human neuroimaging studies support this notion of the involvement of higher visual areas in modal/amodal completion. Fig (from Merigan and Saunders, 2004) 2.6 Human neuroimaging studies Representations of modal and amodal contours have been identified in the human visual cortex by means of noninvasive imaging techniques, such as functional magnetic resonance imaging (fmri) and positron emission tomography (PET), that detect the correlates of the brain s neuronal activation. When a stimulus causes significant neural activity, an additional supply of oxygenated blood is delivered and the proportion of oxygenated to deoxygenated blood 48

49 varies compared with a control condition. The change in the ratio of oxygenated to deoxygenated blood alters the local magnetic field and this can be detected in the fmri signals. These signals are an indirect measure of neural activity, and measuring changes in the local blood oxygenation level it is possible to localise brain activations at a high spatial resolution. Positron emission tomography (PET) is a brain imaging technique in which a radioactive substance is injected into the blood so that blood flow to active brain regions can be detected and used to reconstruct a map of the brain activity during experiments. Another noninvasive technique is the registration of the event-related potentials (ERP), that are a direct consequence of the electrical activity of neurones and allow observation of the underlying neural processes on a millisecond time scale, but with poor spatial resolution Locus of illusory contour processing Using PET technique, Ffytche & Zeki (1996) investigated illusory contour perception and found an increase of blood flow in the extrastriate cortex roughly correspondent to V2 and adjacent visual areas. Therefore, modal contour perception was associated with activity in early visual areas, particularly in area V2, consistent with the findings of Peterhans and von der Heydt (1989) in monkeys. However, in this study, perception of real and illusory contours was not directly contrasted in the same individual, but comparison was made between two different groups of subjects, one group saw the real stimuli and the other the illusory ones. It is quite possible that the large residual variability introduced by this procedure masked any true difference between the conditions, making it difficult to claim with certainty if the activated regions were truly specific for illusory contours. Using PET technique, Larsson and colleagues (1999) contrasted directly real and illusory contour perception in the same subjects. They found that only one region located in the right fusiform gyrus was activated more strongly by perception of illusory rather than real contours (in Fig fusiform gyrus is the pink zone). On the contrary, V2 was not specifically activated by illusory contours compared with real ones. These results were consistent with previous findings of Hirsch and colleagues (1995), who found that perception of illusory contours activated regions of the cortex in the right fusiform gyrus stronger than real contours. 49

subjects viewed different types of contours, both real and illusory (24).

The lateral occipital region (LOR), or lateral occipital complex (LOC), is considered homologue of macaque IT cortex, and has large, bilateral receptive fields.

50 Fig (pink: fusiform gyrus, yellow: lingual gyrus, green: parahippocampal gyrus) Mendola and colleagues (1999) employed fmri to obtain maps of visual cortex while subjects viewed different types of contours, both real and illusory (Fig. 2.24). They found illusory figures elicited significant responses in the lateral occipital region (LOR). The lateral occipital region (LOR), or lateral occipital complex (LOC), is considered homologue of macaque IT cortex, and has large, bilateral receptive fields. The authors hypothesised that these large receptive fields were the most effective substrate for grouping elements across large distances in the image plane. Relatively small signals were also detected in V1 and V2. However, since fmri does not have the temporal resolution to establish the sequence of processing in these different areas, these data were inadequate to determine if the response was due to feedforward activity or feedback modulation. Fig (adapted from Mendola et al., 1999) 50

51 Evidence that the earliest illusory contour sensitivity occurs within the lateral occipital complex (LOC) has been provided also by other studies (Murray et al., 2002; Brighina et al., 2003). Another result shared by many experiments, is the lateralisation of the modal completion process (Hirsch et al., 1995; Larsson et al., 1999; Csibra, Davis, and Johnson, 2001; Murray et al., 2002; Brighina et al., 2003), i.e. the mechanism responsible for illusory contour perception appears preferentially lateralized to the right cerebral hemisphere Feedback hypothesis Since fmri does not have the temporal resolution to determine the timing of illusory contour sensitivity relative to sensory response onset, spatiotemporal patterns of cortical activation during illusory contour processing were estimated using a combination of highdensity electrical mapping, source analysis, and fmri (Murray et al., 2002) or magnetoencephalography (MEG) (Halgren et al., 2003). The high temporal resolution of this electrophysiological technique permitted the assessment of the relative timing of illusory contour processes while recording simultaneously over the entire scalp. Murray and colleagues (2002) found that the earliest modulation to illusory contour presence versus absence (illusory contour effect or IC effect) began at msec over lateral-occipital (LOC) scalp bilaterally, following cortical response onset at 40 msec. The IC effect was larger over the right versus left hemisphere after central presentations of the stimuli, but not after lateral presentations. Furthermore, in the case of lateral presentations, the latency of the IC effect shifted later ( 120 msec), indicating that the retinotopic position alters IC processing. On the basis of their results, the authors proposed that the IC sensitivity observed previously in early visual areas (V1 and V2) was unlikely to occur during the initial phase of processing. On the contrary, V2 and V1 activation in illusory contour perception reflect predominantly feedback modulation from higher-tier LOC areas, where IC sensitivity first occurs. Halgren and colleagues (2003) determined the spatiotemporal dynamics of illusory contour processing using magnetoencephalography, and their results confirmed the feedback hypothesis of Murray and colleagues (2002). Halgren and colleagues found the first strongly significant modulation associated with perception of illusory contours in the lateral occipital 51

region (LOR) with a peak at 155 msec, then the modulation spread back from this location toward the occipital pole (OP), as well as ventrally to involve ventral occipital and temporal cortices for

25 shows the estimated distribution of cortical modulation due to illusory contours.

52 region (LOR) with a peak at 155 msec, then the modulation spread back from this location toward the occipital pole (OP), as well as ventrally to involve ventral occipital and temporal cortices for the next 180 msec, eventually involving ventral orbitofrontal cortex at 325 msec (Fig. 2.25). Fig (adapted from Halgren et al., 1999) Figure 2.25 shows the estimated distribution of cortical modulation due to illusory contours. Activity is displayed on lateral and ventral views of cortical surfaces which have been inflated so that the sulcal cortex (darker grey) can be visualised. Initially, significant modulation is present only in small regions at the right occipital and temporal poles. Subsequently, a highly significant modulation, that distinguishes IC presence versus absence, peaks at 155 msec after stimulus onset. This modulation is most extensive and significant in the right anterior LOR. Over the next 20 msec the modulation becomes weaker in LOR while remaining strong in OP and ventral occipitotemporal cortex (VOT), so that at 195 msec these structures predominate. Over the next 30 msec, the pattern changes further, with VOT losing modulation, while the ventral temporal (VT) and lateral temporal lobe become more active. At 235 msec after stimulus onset, modulation has spread back toward OP, in visual areas V1 and V2. It is also present in the fusiform and parahippocampal gyri of VT, as well as in the inferior temporal gyrus and superior and inferior temporal sulci, especially in the left hemisphere. Thus, by 235 msec, modulation is focused in OP plus VT, as well as the superior temporal sulcus and gyrus. In the following 35 msec, the OP and VT 52

53 modulations become weaker while LOR and VOT become stronger, so that by 265 msec, LOR is again active, but it soon fades, and modulation becomes more concentrated in the right VOT. The feedback hypothesis finds support also from neuropsychological studies reporting that perception of illusory shapes was abnormal in patients with parietal lesions only when the lesion extended posteriorly into the LOR (Vuilleumier, Valenza, and Landis, 2001) Comparison between modal and amodal dynamics Murray et colleagues (2004) compared the spatiotemporal dynamics of modal/illusory contour and amodal/occluded contour processing, in order to settle the controversy regarding whether both completion processes are based on a common neural mechanism. They used high-density electrical mapping, spatiotemporal topographic analyses, and the local autoregressive average distributed linear inverse source estimation. The results showed a common initial neural mechanism for both types of completion processes ( msec) that manifested as a modulation in response strength within higher-tier visual areas (Fig. 2.26, upper row), including the LOC and parietal structures, whereas differential mechanisms were evident at a subsequent time period ( msec), with amodal completion relying on continued strong responses in these structures (Fig. 2.26, lower row). This is another evidence for differences in modal and amodal contour completion processes. Fig (adapted from Murray et al., 2004) 53

54 2.7 Conclusions In this chapter, a review of physiological studies on contour completion has been presented. On the basis of this research, any subsequent computational model of modal/amodal contour completion must take into account the following new results: 5. The interpolation mechanism cannot be implemented only with properties of early visual areas (V1 and V2) such as lateral connections, but has to include feedback mechanisms from object recognition area (LOR or IT). 6. The time of feedback influence is different for modal and amodal completion. 54

55 CHAPTER 3 Computational models of modal/amodal completion 3.1 Introduction Several neural-style computational models of vision processing have been proposed. Biologically plausible computational models of contour interpolation are based on empirical neurophysiological data and attempt to implement computer simulations of how contour integration is realised in biological vision systems. While details vary, these models typically use mechanisms that incorporate principles of long-range interactions, or recurrent feedforward-feedback processing, or both. Neural network models have focused on perceptual grouping and contour enhancement (Grossberg and Mingolla, 1985a, 1985b, 1987a, 1987b; Grossberg, 1994, 1997, 2003; Grossberg, Mingolla, and Ross, 1997), preattentive segmentation and pop-out (Li, 1998, 1999, 2000, 2001; Yen and Finkel, 1998), recurrent interaction between early visual areas V1 and V2 (Neumann and Sepp, 1999; Neumann, 2003; Hansen, Sepp, and Neumann, 2001), the role of the laminar architecture in V1 for contrast-sensitive perceptual grouping (Grossberg & Raizada, 2000; Grossberg and Williamson, 2001), and corners and junctions computation (Heitger et al., 1998; Hansen and Neumann, 2004). There are two basic ideas about how interpolation proceeds: some models are based on bipole or higher-order operator notion (Grossberg and Mingolla, 1985a,b; Heitger et al., 1998, Neumann and Sepp, 1999), while other models suggest an intracortical mechanism within the early visual cortex and use algorithms based on horizontal interaction (Field et al., 1993; Yen and Finkel, 1998; Kellman et al., 2001; see also Paragraph 2.3.3). Bipole-type units require real edge inputs on both sides of a gap (Fig. 3.1a), then the activation of the bipole, centred over a discontinuity in edge input, is used to construct a contour that spans the gap. Neural networks using lateral interactions are based on the idea that at early level of cortical visual processing (V1 or V2) orientation-sensitive units are activated by the stimulus, then these units transmit activation to their neighbours along paths described by particular geometric relationships (Fig. 3.1b). Some models combine aspects of both bipole and network scheme (Li, 1998). 55

56 Fig. 3.1 This chapter reviews some biologically motivated models of contour perception and their basic mechanisms to implement the grouping process. Paragraph 3.2 examines the model of Grossberg and Mingolla and its following implementations. Paragraph 3.3 describes the model of Heitger and colleagues. Paragraph 3.4 presents the model of Neumann and colleagues, and Paragraph 3.5 the model of Li. 3.2 Model of Grossberg and Mingolla The model of Grossberg and Mingolla (1985a, 1985b, 1987a, 1987b) provides a neural network theory of biological vision that attempts to explain modal and amodal contour formation and related phenomena (e.g. filled-in brightness). In the model the cortical processing is accomplished by two parallel but interacting subsystems: the boundary contour system (BCS), that generates boundary representations, and the feature contour system (FCS) that generates representations of surfaces features, such as brightness and colour. These two subsystem interact: the FCS discounts the illuminant and fills-in surface properties (i.e. brightness, colour and depth), the FCS signals spread out until they are blocked by the boundaries generated by the BCS system. Figure 3.2 depicts the macrocircuit of the model: BCS stages are designed by octagonal boxes, FCS stages by rectangular boxes. The description of the model given in this paragraph considers only the static, monocular BCS processing properties. 56

57 Fig. 3.2 (from Gove et al., 1995) In the model, the input from the retina is sent to LGN ON and OFF cells. These ON and OFF cells have antagonistic surrounds (Fig. 3.3a), and extract the contours of the image. The LGN cell outputs activate the first stage of cortical BCS processing, that contains simple cells (Fig. 3.3b). Simple cells are oriented local contrast detectors that respond also to the contrast polarity, i.e. the direction of contrast (light-dark or dark-light). LGN activation input to pairs of like-oriented simple cells, sensitive to opposite directions of contrast. The outputs from these pairs of simple cells of like orientation are added at like-oriented complex cells (Fig. 3.3c). By pooling outputs from oppositely polarised simple cells, complex cells respond to both polarities, as do all subsequent BCS cell types in the model. Therefore, complex cells and subsequent BCS cell types are insensitive to direction of contrast. 57

58 Fig. 3.3 (adapted from Gove et al., 1995) Complex cells activate hypercomplex cells (also called end-stopped complex cells) through an on-center/off-surround network, or spatial competition, whose off-surround carries out an end-stopping operation (Fig. 3.4a). During the spatial competition, complex cells excite hypercomplex cells of the same orientation and position while inhibiting hypercomplex cells of similar orientation at nearby positions. One role of this spatial competition is to sharpen the neural responses to oriented luminance edges. Another role is to initiate the process, called end-cutting, whereby boundaries are formed at line ends, with boundary orientations that are perpendicular or oblique to the orientation of the line itself, such as in case of illusory contour generated by phase-shifting abutting gratings (Fig. 1.5, Paragraph 1.2.1). The hypercomplex cells input to higher order hypercomplex cells that compete across orientations at each position (Fig. 3.4b). This competition acts to sharpen orientational responses at each position. It also completes the end-cutting operation that was initiated at the hypercomplex cells level. Fig. 3.4 (adapted from Gove et al., 1995) 58

59 Outputs from the higher-order hypercomplex cells feed into bipole cells that initiate long-range boundary grouping and completion (Fig. 3.5). Bipole cells have two oriented receptive fields and fire only if both of their receptive fields are sufficiently activated by hypercomplex-cell inputs whose orientation is similar to that of the bipole cell receptive field. For example, a horizontal bipole cell is excited by activation of horizontal hypercomplex cells that input to its receptive field. A horizontal bipole cell is also inhibited by activation of vertical hypercomplex cells. This spatial impenetrability operation (Grossberg, 1987; Grossberg and Mingolla, 1987b) prevents collinear grouping from occurring across regions where noncollinear elements are present. Fig. 3.5 (adapted from Grossberg, 1997) Output signals from bipole cells feed back to the hypercomplex cells after undergoing two types of competitive processing. Bipole cell outputs compete across orientation to determine which orientation is receiving the largest amount of cooperative support (Fig. 3.6a) and across nearby positions to select the best spatial location of the emerging boundary. Hypercomplex cells that receive the most cooperative support from bipole grouping (Fig. 3.6b) further excite the corresponding bipole cells. This cycle of bottom-up and top-down interaction between hypercomplex cells and bipole cells rapidly converges to a final boundary segmentation, that completes the statistically most favoured boundaries, and suppresses less favoured boundaries. This cooperative-competitive feedback circuit has been called the CC loop. 59

60 Fig. 3.6 (adapted from Gove et al., 1995) The CC-loop is capable of generating modal contours in response to Kanizsa-type illusory figures and amodal contours in response to displays of occlusion. The cooperativecompetitive process builds a coherent boundary grouping that spans the gap between the real edge inputs. Figure 3.7 illustrates this process for the amodal contour completion. First, the notched disks activate the simple, complex and hypercomplex cells at the same positions of the real edge. Then, hypercomplex cells activate the bipole cell spanning the two notched disk (Fig. 3.7a). The bipole cell activates the hypercomplex cell near the middle of the illusory contour through feedback pathway (Fig. 3.7b). After this occurs, another bipole cell in the network can be activated by two of the three active hypercomplex (Fig. 3.7c). The remainder of the illusory contour can then rapidly form in parallel (Fig. 3.7d). Because complex cells pool inputs from oppositely polarised simple cells, bipole cells can also form real or illusory contours from oppositely polarised inducers. Fig

61 (Fig. 3.8). In case of displays of occlusion, the series of CC-loop processing stages is similar Fig. 3.8 The boundaries completed within the BCS do not generate visible contrasts within the BCS. In this sense, all boundaries are invisible (Grossberg, 1994, 1997). Boundaries created by the BCS are made visible by the contrast polarity sensitive FCS through diffusive filling-in of brightness and colour signals. FCS activity is not orientation selective and is contained by boundaries created by the BCS. The brightness or colour signal spreads out until it is blocked by the boundary generated by the BCS action. Illusory contours are perceived only when the FCS indicates that the BCS represents a boundary between regions of two locally different brightnesses. Moreover, in illusory figures the strength of the illusory contour should depend on the activation level of the FCS. The greater the contrast between figure and background, the stronger the spread of brightness, and the boundary between these two regions will be more clearly perceived. In more recent versions of the BCS/FCS model, the early ideas have been extended and the model has been generalised also to the multiscale binocular case (Grossberg, 1994; Grossberg and McLoughlin, 1997; Grossberg and Howe, 2003; Grossberg and Swaminathan, 2004) and to motion (Francis and Grossberg, 1996; Grossberg, Mingolla, and Viswanathan, 2001). 61

62 3.3 Model of Heitger et colleagues The model of Heitger, von der Heydt, Peterhans, Rosenthaler, and Kübler (1998) is based on the results of electrophysiological investigations of the response properties of V1 end-stopped cells (Peterhans and von der Heydt, 1989; von der Heydt and Peterhans, 1989). End-stopped cells respond best when a properly oriented line-end or corner is centred in their receptive field (Fig. 3.9a), but they are inhibited if an edge extends across the receptive field (Fig. 3.9b) (Heitger et al., 1992; Peterhans and von der Heydt, 1991; Peterhans, von der Heydt, and Baumgartner, 1986). In the model, end-stopped units are used to signal terminations of edges and lines that evoke illusory contours in Kanizsa-type figures or gratings displays (Fig. 3.10). Fig. 3.9 Fig (adapted from Heitger et al., 1998) The architecture of this model consists of a set of hierarchically organized filtering stages (Fig. 3.11). The model s implementation begins with filters called S-operators, whose functionality closely resembles that of the simple cells of V1. The model postulates six pairs of S-operators at each location, oriented 30º apart, with odd-symmetric and even-symmetric receptive fields. These filters measure the oriented contrast of the input. In the second stage of the model, C-operators are analogous to complex cells of V1. The activation of C-operators is calculated as the root-mean-square average of the responses of the S-operators: C = +. C-operators, like complex cells, respond to any 2 2 S odd S even appropriately oriented edge or line within their receptive fields, and do not differentiate bright lines from dark ones. Thus, C-operators localise oriented luminance discontinuities, without regard to the contrast direction (Fig. 3.12B). 62

Fig. 3.11 (from Heitger et al., 1998) Fig. 3.12 (from Heitger et al., 1998) The third stage in the model combines the output of the C-operators to form endstopped operators (ES).

63 Fig (from Heitger et al., 1998) Fig (from Heitger et al., 1998) The third stage in the model combines the output of the C-operators to form endstopped operators (ES). These operators provide a representation of edge and line terminations, corners, and strongly curved contours. There are two types of end-stopped operators: the single-stopped operators have one excitatory and one inhibitory zone, constructed by taking the difference in the responses of two identical C-operators, positioned end to end (Fig. 3.13a). These operators respond maximally to a line along the orientation of the operator that terminates between the two zones. The double-stopped operators have inhibitory zones on either side of a central excitatory area (Fig. 3.13b). Double-stopped operators respond best to a small disk. As with the S- and C-operators, the model includes end-stopped operators oriented every 30. Fig

64 The result of oriented end-stopped operators is the localisation of the key-points (Fig. 3.12C), defined as points where the summed response of the single- and double-stopped operators reaches a local maximum. In simulations, these key-points correspond to corners, line-ends, junctions, and strong variations of curvature. Many of these key points arise from the occlusion of one edge by another, therefore they play a critical role in initiating edge interpolation processes. The information of the key-points is elaborated in a grouping stage. Two different grouping rules are used to deal with different occlusion features, namely para grouping and ortho grouping, depending on the direction of the end-stopped units with respect to the orientation of the grouping field (Fig. 3.10). The para grouping generates illusory contours in Kanizsa-type illusory figures (Fig. 3.10a), while the ortho grouping produces illusory contour orthogonal to line ends (Fig. 3.10b). The grouping field is defined as a bipole filter, similar to that used in the model of Grossberg and Mingolla (1985). The grouping algorithm is a convolution of this filter with the responses of single-stopped operators at the key points. The numerical result of this computation is assigned to the image point corresponding to the centre of the grouping operator (Fig. 3.12D). Once the para and ortho grouping responses have been determined, they are added to the responses of the C-operators to luminance edges (Fig. 3.12E). The final, perceived contours correspond to the maxima in the combined representation of the C-operator and grouping output (Fig. 3.12F). The output image includes both the real contours and the illusory contours typically perceived by human observers. The capability of the model to successfully generate illusory contours has been demonstrated for a variety of classical psychophysical stimuli such as the Kanizsa triangle as well as for images of natural scenes. However, the model of Heitger and colleagues is not without drawbacks. For example, some of the operations appear to have been designed to produce a particular result, with little theoretical or empirical justification. Furthermore, the grouping process depends on some complex weighting structures for cross-orientation inhibition, necessary to suppress grouping when signals of multiple orientations exist in a single location. Finally, this model has a strictly feedforward scheme. Although the authors have stressed purely feedforward processing as a virtue, the feedforward scheme used in the model is difficult to reconcile with findings suggesting that feedback plays an important role in figure-ground segregation (Lamme, 1995; Zipser et al., 1996; Murray et al., 2002, 2004). 64

65 3.4 Model of Neumann and Sepp Based on the findings that feedback projections primarily serve as a modulation mechanism, Neumann and Sepp (1999) have developed a computational model for contour processing, in which feedforward and feedback interactions between areas V1 and V2 locally enhance those initial V1 activities that are consistent with the feedback V2 responses (Fig. 3.14). Fig (adapted from Hansen et al., 2001) In the lower area (V1) the local contrast orientation is initially measured by cells with oriented receptive fields, such as cortical simple and complex cells. The resulting activations are fed forward to the higher area (V2) where they are subsequently integrated by cells utilising oriented long-range integration (Fig. 3.15). Fig (adapted from Hansen et al., 2001) 65

66 The units of the higher area are bipole-shaped curvature templates (Fig. 3.16a). This bipole templates are very similar to the bipole cells used in the model of Grossberg and Mingolla (1985a). An additional connection pattern of parallel and near parallel orientations is used to model inhibitory contributions (Fig. 3.16b). Fig (from Neumann and Sepp, 1999) Due to their increased receptive field size, the bipole templates bridge gaps including those corresponding to perceived illusory contours. The higher area generates an activity pattern that is propagated backwards via the descending feedback pathway. In V1 responses of cells that match position and orientation of activated V2 bipole cells are selectively enhanced, while those that do not are inhibited (Fig. 3.15). In this way, the feedback activation signals the degree of consistency between local measurements in V1 and model expectations represented by the bipole templates in V2. A gain control mechanism, that is accompanied by competitive interactions in an oncentre/off-surround scheme, realises a mechanism that selectively filters salient input activations while suppressing spurious and inconsistent signals. This facilitates the segmentation of the surface layout and figure-ground segregation. The model accounts for a number of different phenomena such as illusory contour generation of abutting gratings, and it performs contour enhancement and grouping of noisy fragmented shapes in both artificial and natural images (Fig. 3.17). Fig (adapted from Neumann and Sepp, 1999) 66

The model correctly predicts illusory contour strength for different Kanizsa-type illusory figures (Hansen, Sepp, and Neumann, 2001), and localises corners and junction configurations such as L- and

67 The model correctly predicts illusory contour strength for different Kanizsa-type illusory figures (Hansen, Sepp, and Neumann, 2001), and localises corners and junction configurations such as L- and T-junctions (Hansen, Sepp, and Neumann, 2001; Hansen and Neumann, 2004). Spatial locations of corners and junctions are detected by the model as the points in which V1 cells have significant responses in more that one orientation (red circles in figure 3.19c). Fig (from Hansen et al., 2001) 3.5 Model of Li Zhaoping Li has developed a neural model for contour enhancement in primary visual cortex (Li, 1998, 1999, 2000, 2001), that uses only known V1 elements, operations, and connection patterns. The elements of the model are orientation selective cells, local cortical circuits, and horizontal intracortical connections. The network contains coupled oscillators that are defined by pairs of inhibitory interneurons and excitatory cells reciprocally connected. Two different patterns are used for the excitatory and inhibitory neurones, which either extend collinear to the reference orientation (for the excitatory population) or orthogonal to (for the inhibitory population), as depicted in Figure The connection patterns (Fig. 3.21) are designed to fulfil a number of criteria such as no spontaneous pattern generation, higher responses at region borders, and enhancement of smooth contours. Nevertheless, some artefacts occur, such as high responses in homogeneous regions. 67

68 Fig (from Li, 1999) Fig (from Li, 1999) The model has been successfully applied to a number of different tasks such as contour enhancement in noisy displays (Fig. 3.22) (Li, 1998), and pre-attentive texture segmentation (Li, 1999, 2000). Fig (from Li, 1998) 68

69 3.6 Conclusions The recent neurophysiological findings about modal/amodal completion challenge the existent models. The model of Grossberg and colleague uses the same mechanism both for illusory and occluded contours, therefore the same shape is obtained in either case. This result is at odds with psychophysical evidence of different perceived shapes in the two types of completion (Singh, Paragraph 1.4), and with physiological findings (Lee and Nguyen, Paragraph 2.5.1; Murray et al., Paragraph 2.6.3). The model of Heider and colleague implements illusory contour formation in a strictly feedforward manner. This is in contrast with physiological evidence of feedback from higher cortical areas, both from lesion studies (see Paragraph 2.5.2) and from neuroimaging in human observers (see Paragraph 2.6.2). The model of Neumann and colleagues includes a feedback mechanism, but only from V2 and not from higher areas. Furthermore, this model does not consider amodal completion. The model of Li includes only mechanisms of area V1 and does not address illusory contour completion. This model is merely aimed at modeling the aspects of contour enhancement that are observed in V1, and it ignores contour integration because it is most likely completed by higher visual centres, which are absent in the model. Moreover, none of these models can deal with amodal continuation (Paragraph 1.5) because in all of them the interpolation mechanism is implemented with more or less similar bipole units, requiring input from both sides of the gap, while in amodal continuation only one side of the contour is physically defined. Finally, none of these models can approach the problem of angle occlusion. In the next chapter a new computational model for modal/amodal completion will be proposed, which includes mechanisms following recent findings from physiological and psychophysical experiments, and which can cope both with occluded angles and amodal continuation. 69

70 CHAPTER 4 New computational model of modal/amodal completion 4.1 Introduction In this chapter, a biologically plausible computational model for modal/amodal completion will be presented. The key properties of the model are motivated by physiological findings: the model uses localised receptive fields for oriented contrast processing, similar to the simple and complex cells in V1 and V2; it incorporates horizontal long-range connections similar to those found in early visual areas (see Paragraph 2.3); and it includes feedback from higher areas with strength not sufficient to drive cell responses but only to modulate the cell activities, as found in many neurophysiological and neuroimaging studies (see Paragraph 2.4 and 2.6.2). Fig

71 The model architecture (Fig. 4.1) is defined by a sequence of modules (LGN, V1, V2, V4, and IT), resembling the known visual processing areas for object recognition. Each module receives inputs from a small region of the preceding module, allowing the receptive field sizes of the model neurones to increase gradually through the pyramidal structure of the network. Feedback projections, coming from higher area (IT) towards V2, generate an object-based top-down modulation on V2 cell activities. The computational principles used and the mathematical description of each stage are laid out in Paragraph 4.2. Simulations of modal/amodal completion and amodal continuation are presented in Paragraph 4.3. The presented simulations are mainly directed towards determining the perceived shape of the contours, therefore they put stress on the feedback modulations instead on the feedforward pathway. For this reason, the feedback signals for the V2 level are generated by hand, leaving the implementation of the learning phase of V4 and IT to a subsequent development of the model. 4.2 Description and equations of the computational model For each level of the network, convolution filters will be displayed graphically, and the output of each stage will be shown for the same input image (Fig. 4.2). This figure usually elicits in humans the percept of a white square occluding an angle of a black diamond. The input figure is a 100x100 array with values in the range [0, 1]. The white area pixels have value 1, the black area pixels have value 0, and the background pixels all have value 0.5. The output of each network level will be represented using an activity-based scale, with the maximum signal strength represented by white and minimum signal strength by black. Wherever possible, the model differential equations are solved at equilibrium in response to a constant input in order to speed up the processing. These approximations do not affect the reliability of the results. The equations used for the LGN, and of simple and complex cells in V1 and V2 are modified versions of those used in the models of Grossberg and McLoughlin (1997) and Gove, Grossberg and Mingolla (1995). Helpful suggestions for the equations of simple and complex cells came also from Lee (1996). All the phases of the simulations are implemented using MATLAB. 71

72 Fig LGN ON and OFF Channels This first stage of processing involves two types of cell, namely ON and OFF cells, that have a receptive field with antagonistic centre-surround structure. In the ON cells, the centre is excitatory while the surround is inhibitory, whereas in the OFF cells, the centre is inhibitory and the surround is excitatory (Fig. 4.3). Fig. 4.3 Fig. 4.4 The ON channel shows enhanced response in image locations of high intensity relative to the surrounding positions and the OFF channel shows enhanced response to locations of low intensity relative to their surrounding image positions. The LGN ON activities + x ij and OFF activities x ij obey membrane, or shunting, equations with centresurround interactions (Grossberg, 1983). The responses of LGN cells are modelled by convolution of the initial input stimulus I with a difference-of-gaussians (DOG) operator (Fig. 4.4), i.e. inputs I ij at position (i, j) are filtered by convolution filters that are defined by 72

73 difference of two isotropic two-dimensional Gaussians. The equations used are the following: ON cells OFF cells d dt d dt X X ij = 1 X ij + U1 X ij ) C1 ( X ij + L1 ) α ( S (1) ij = 1 X ij + U1 X ij ) S1 ( X ij + L1 ) α ( C (2) 1 1 where the decay constant α 1 = 10, the upper and lower activity bounds are U 1 =1, L 1 =1; the centre C 1 and surround S 1 are defined by Gaussian kernels: C pq I i + p, j+ q ( p, q) C = (3) 1 whith and C S S pq I i + p, j+ q ( p, q) S = (4) 1 A 1 p 2 + q 2 exp 2 πσ c 2σ pq = 2 2 c A p 2 + q 2 exp 2σ 2 pq = 2 2 2πσ s s (5) (6) To give the ON and OFF signals the same total strength, A 1 =1 and A 2 = ; the widths of centre and surround are given by σ c =0.5 and σ s =1.5; and the size of the DOG filter is 9x9. At equilibrium the ON channel activity + X IJ is defined by X + ij ( U1C pq L1S ( p, q) = α1 + + ( C pq S ( p, q) pq pq ) I ) I i+ p, j+ q i+ p, j+ q + (7) and the OFF channel activity X IJ is defined by 73

74 X ij ( U1S pq L1C ( p, q) = α1 + + ( C pq S ( p, q) pq pq ) I ) I i+ p, j+ q i+ p, j+ q + (8) The outputs of the ON and OFF channels are half-wave rectified using [x] + = max(0, x), in order to avoid negative values. These outputs are shown in figure 4.5. Fig. 4.5 The vertical right border of the occluding square has an higher activation because of the higher contrast between the white square and the black diamond compared with the contrast difference with the background Simple Cells Simple cells in V1 and V2 have elongated receptive fields, appropriate for detecting oriented local contrast of a defined contrast polarity (i.e. direction of contrast) (Fig. 4.6a). However, receptive field elongation also creates uncertainty about the positions of the image contrast that activates the cell. This positional uncertainty becomes stronger during the processing of image line ends (Fig. 4.6b) and corners (Fig. 4.6c). 74

Fig. 4.6 Simple cells can have odd-symmetric (Fig. 4.7a) or even-symmetric receptive fields (Fig. 4.7b), different sizes, spatial frequencies, and spatial scales.

7), with general functional form: G( x, y) 2 2 ( x xo ) ( y yo ) π 1 + 2 2 σ β i[ ξ0x+ ν 0 y] = e e (9) 2πσβ where (x 0, y 0 ) is the centre of the receptive field in the spatial domain and (ξ 0,ν 0)

75 Fig. 4.6 Simple cells can have odd-symmetric (Fig. 4.7a) or even-symmetric receptive fields (Fig. 4.7b), different sizes, spatial frequencies, and spatial scales. Their response profiles can be modelled as a family of self-similar two-dimensional (2D) Gabor wavelets (Lee, 1996; Fig. 4.7), with general functional form: G( x, y) 2 2 ( x xo ) ( y yo ) π σ β i[ ξ0x+ ν 0 y] = e e (9) 2πσβ where (x 0, y 0 ) is the centre of the receptive field in the spatial domain and (ξ 0,ν 0) is the optimal spatial frequency of the filter in the frequency domain; σ and β are the standard deviations of the elliptical Gaussian along the x and y axis. Fig. 4.7 (from Lee, 1996) In the current implementation, simple cells are implemented for two polarities (dark/light and light/dark), four different orientations θ=kπ/k with k=0; 1;... ; K-1 (K=4), 75

76 and two spatial scales for each orientation. Therefore, corresponding to each orientation k and position (i, j), there are two simple cells sensitive to opposite contrast polarities (upper index +/-): one for light/dark contrasts ( S ) and one for dark/light contrasts ( S + ijk ijk ). Even- and odd-symmetric simple cell receptive fields centred at location (i, j) of orientation k are defined modifying the even and odd Gabor kernels of Grossberg and McLoughlin (1997), by adding half-wave rectification to avoid double peaks of activities, and generalising the formulas to different orientations: + + odd + / odd + / odd / + S ijk = s pqk X i p, j q spqk X i+ p, j+ q (10) ( p, q) ( p, q) + with + + even+ / even + / even / + S ijk = s pqk X i p, j q s pqk X i+ p, j+ q (11) ( p, q) ( p, q) s s ( 2 p) 2 1 p exp 2 2σ p 2 q + σ q odd pqk = sin 0 2 ( 2 p) p q exp + 2 2σ pk σ qk even pqk = cos (12) (13) where [x] + =max(0, x); -S 1 p, q S 1 (scale 1) and -S 2 p, q S 2 (scale 2) with S 1 =4, S 2 =6, therefore S 1 and S 2 define the size of the simple cell input field; σ p and σ q define the standard deviations of the x and y dimensions of simple cell input field, σ p =1 and σ q =3 for scale 1, and σ p =1.2 and σ q =4.5 for scale 2. Equations (12) and (13) define the vertically oriented odd and even filters. The computation of a filter orientation other than vertical is obtained by applying a translation and rotation of the axis of coordinates before calculating the filter values. The filters implemented in this simulations are displayed in figure 4.8. Figure 4.9 shows the activations of simple cells (scale 2) obtained with the input shown in figure 4.5. The activations of scale 1 simple cells is quite similar to those of scale 2, and therefore are not shown. The activations of odd and even simple cells are more or less similar, except for some little double activations in odd cells. 76

77 Fig. 4.8 Fig

4.2.3 Complex Cells Cortical complex cells are selective to orientation but insensitive to local contrast polarity, thus their responses are generated by summing the half-wave rectified responses of

78 4.2.3 Complex Cells Cortical complex cells are selective to orientation but insensitive to local contrast polarity, thus their responses are generated by summing the half-wave rectified responses of even and odd simple cells of both polarities: odd+ odd even+ even [ S + S + S + S ] + c = (14) ijk ijk ijk ijk ijk The output of the complex cell level is shown in figure 4.10, in which all the four directions are superposed for simplicity. Fig Lateral connections It has been proposed that long-range lateral connections provide the means of contextual modulations on cell responses from surround cells (Kapadia et al., 1996, Gilbert et al., 1996; Bosking et al., 1997; see Paragraph 2.3.1). In the present model, the long-range lateral connections modulate the activities of neighbouring like-oriented complex cells, and they expand on a distance of about ±2 receptive field lengths along the main axis direction of the cell (Fig. 4.11), thus the size of this filter is about four times the size of the receptive field of a complex cell. 78

isink+ jcosk ) 8 2 e iπ ( icosk+ jsink ) e 2 π 2 (15) L ijk gives the influence from the cell in location (i, j); the parameter ω s depends on the scale: ω 1 =1.

79 Fig The equation used to model the long-range connections for a cell in location ( 0,0) and with orientation k, is the following: L ijk = ωs 2 e 2π 2 4( icosk+ jsin k ) + ( isink+ jcosk ) 8 2 e iπ ( icosk+ jsink ) e 2 π 2 (15) L ijk gives the influence from the cell in location (i, j); the parameter ω s depends on the scale: ω 1 =1.7 and ω 2 =2.3. The resulting lateral connection filters for scale 1 are shown in figure 4.12, those for scale 2 are similar but more stretched along the axis. Fig Figure 4.13 illustrates the activation of complex cells when adding the facilitatory influences due to the lateral connections. The resulting pattern for scale 2 shows a faint and 79

subthreshold extension of the diamond contours in the area of the occluding square, consistent with the effect of amodal continuation (Paragraph 1.5). Fig. 4.13 4.2.

80 subthreshold extension of the diamond contours in the area of the occluding square, consistent with the effect of amodal continuation (Paragraph 1.5). Fig Feedback from higher areas In the present model, the interaction between lateral connection activation and feedback modulation is designed in the following way. The complex cell activations are fed forward to the higher levels (Fig. 4.1), through V4 area, until they reach IT area. Here, the feedforward signals activate the units responding to those figures or objects present in the visual input. In the IT area, the firing cells generate modulatory feedback that is propagated backwards via the descending pathway, towards the lower level V2. In V2 the top-down influences are summed to the lateral connection signals of the active complex cells. The V2 silent cells that receive sufficient inputs from both lateral connections and feedback modulations become active. These activated V2 cells send in turn activation to IT through the feedforward pathway, forming a recurrent loop. 80

When a feedback modulation is added (14b), the red cell can reach the threshold activation (14c), though with a slightly weaker level for the absence of real input.

81 Fig The structure of the model allows the formation of an occluded contour starting from the visible border near the occluded figure, where the cells are activated by real border signals (Fig. 4.14a, green cell). The lateral connections of the green cell (green shaded zone) reach the neighbouring red cell, but lateral connection signals are not sufficient to drive the red cell. When a feedback modulation is added (Fig. 4.14b), the red cell can reach the threshold activation (Fig. 4.14c), though with a slightly weaker level for the absence of real input. Then, the red cell collaterals and the feedback signals activate the neighbour blue cell (Fig. 4.14d) and so on, continuing until the sum of collaterals and feedback signals is no more sufficient to drive the neighbour cell and the loop stops. In the present simulations, the feedback signals are generated by hand, considering as feedback the boundaries of the complete occluded figures. Figure 4.15 shows the top-down signals in the case of the input image

constant D=1; L pqk w ( p, q) pqk is the activation due to the lateral connections; α=0.

82 Fig The equation used for the implementation of the influences of the feedback and lateral connections is the following: d dt w ijk = Dw + α L w + β FB ijk ( p, q) pqk pqk s ijk (16) where the decay constant D=1; L pqk w ( p, q) pqk is the activation due to the lateral connections; α=0.09 is the weight of the lateral connection influence; FB ijk is the activation due to the feedback, and β s=1 =0.8 and β s=2 =0.9 are the weights of the feedback modulations for the scale 1 and 2, respectively. Without feedback modulation the occluded contours do not form (Fig. 4.16) and the same is true when only the feedback modulations are present (Fig. 4.17). In figure 4.17, after the recurrent loop, the visible part of the diamond is more active than the square because the feedback signals contain only the information regarding the diamond shape, and due to normalisation, the square is hardly visible. Fig

83 Fig When both lateral connections and feedback signals are present, the occluded contours do form (Fig. 4.18), after a small number of cycles. Fig Simulations The simulation described in the previous paragraph shows a first important property of the proposed model, i.e. this model is capable of completing occluded angles (see Fig. 4.18). This type of completion is impossible to obtain with the other biologically plausible computational models of contour interpolation. The reason is that the other models use bipole cells to complete the missing contours, and bipole cells need input on both sides of their receptive field in order to become active. In the case of occluded angles, a bipole cell receives input on only one side and therefore remains silent. 83

In the following paragraphs, two other simulations will be presented, demonstrating two other important properties of the proposed model.

The second one deals with the differences found in the amodal continuation of a partly occluded object, when different possible representations of the entire object are seen before the occlusion

Moreover, Murray and colleagues (2004; see Paragraph 2.6.3) found that amodal completion relies on continued strong responses in higher visual areas, while the modal contour interpolation does not.

84 In the following paragraphs, two other simulations will be presented, demonstrating two other important properties of the proposed model. The first one concerns the differences in the contour shape in modal versus amodal completion of angles (Sing, 2004; see Paragraph 1.4). The second one deals with the differences found in the amodal continuation of a partly occluded object, when different possible representations of the entire object are seen before the occlusion event (Joseph and Nakayama, 1999, see Paragraph 1.5) Modal/amodal completion differences Singh (2004; see Paragraph 1.4) found that amodal occluded angles are perceived more angular (i.e. closer to a corner) than the corresponding illusory shapes. Moreover, Murray and colleagues (2004; see Paragraph 2.6.3) found that amodal completion relies on continued strong responses in higher visual areas, while the modal contour interpolation does not. If in the model the parameter β s is diminished, which corresponds to a smaller influence of the feedback on the V2 complex cells, a shorter contour continuation is obtained, and thus a less angled interpolated contour (Fig. 4.19). Figure 4.20 shows the output of the model for β 1,2 =[0.8, 0.9] (as an example of amodal completion) and figure 4.21 the output for β 1,2 =[0.62, 0.72] (mimicking modal completion). In the second case, the angle is not completed and therefore it could be possible to perceive it as less angular than in the first case. Fig

Fig. 4.20 Fig. 4.21 4.3.2 Amodal continuation differences Joseph and Nakayama (1999, see Paragraph 1.

85 Fig Fig Amodal continuation differences Joseph and Nakayama (1999, see Paragraph 1.5) found that the object seen before the occlusion event can influence the shape of the perceived occluded figure: in their experiments the observers perceived a greater amodal continuation when they saw longer bars before occlusion than when they saw shorter ones. The same result can be obtained in the model using different images for the feedback. Figure 4.22 shows the input figure used in this simulation, and figure 4.23 illustrates the two feedback activations utilised (Fig. 4.23a1, b1), with reverse occlusion relation displays to highlight the ratio of the different area sizes (Fig. 4.23a2, b2). 85

86 Fig Fig Some outputs of the model stages are shown in figure 4.24 (ON and OFF channels) and 4.25 (complex cells). The different occluded shapes obtained in the model simulations for the two different feedback activation are depicted in figure 4.26 and

87 Fig Fig Fig

Fig. 4.27 4.4 Conclusions The presented simulations demonstrate three important properties of the proposed model: 1.

88 Fig Conclusions The presented simulations demonstrate three important properties of the proposed model: 1. the model can complete occluded angles because the completion mechanisms do not need the utilisation of bipole units requiring real input on both sides of the receptive field; 2. the model can produce different shapes for modal and amodal completion because the model interpolated form depends also on the strength of the feedback modulation; 3. the model can generate different amodal continuation displays because the shape of the completion depends also on the feedback modulation form activated in the higher areas. None of this characteristics is present in the other biologically plausible computational models of modal/amodal interpolation. 88

Contextual Influences in Visual Processing

C Contextual Influences in Visual Processing TAI SING LEE Computer Science Department and Center for Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA, USA Synonyms Surround influence;