[February 2011; In press, Quarterly Journal of Experimental Psychology] Investigating the role of response in spatial context learning.

1 [February 011; In press, Quarterly Journal of Experimental Psychology] Investigating the role of response in spatial context learning Tal Makovski Yuhong V. Jiang Department of Psychology and Center for Cognitive Sciences, University of Minnesota Keywords: contextual cueing, perception and action, visual search, touch response, eye movement Running title: Send correspondence to: Tal Makovski Department of Psychology University of Minnesota N18 Elliott Hall Minneapolis, MN 55455 Email: tal.makovski@gmail.com Tel: 61-64-9483 Fax: 61-66-079 Acknowledgements This study was supported in part by NIH 071788. We thank Khena Swallow and Ming Bao for help with eye tracking, Ameante acoste, Birgit Fink, and Sarah Rudek for comments, and Eric Bressler, Jacqueline Caston, and Jen Decker for help with data collection. Correspondence should be sent to Tal Makovski, N18 Elliott Hall, Minneapolis, MN 55455. Email: tal.makovski@gmail.com.

Abstract Recent research has shown that simple motor actions, such as pointing or grasping, can modulate the way we perceive and attend to our visual environment. Here we examine the role of action in spatial context learning. Previous studies using keyboard responses have revealed that people are faster locating a target on repeated visual search displays ( contextual cueing ). However, this learning appears to depend on the task and response requirements. In Experiment 1, participants searched for a T-target among -distractors, and responded either by pressing a key or by touching the screen. Comparable contextual cueing was found in both response modes. Moreover, learning transferred between keyboard and touch screen responses. Experiment showed that learning occurred even for repeated displays that required no response and this learning was as strong as learning for displays that required a response. earning on no-response trials cannot be accounted for by oculomotor responses, as learning was observed when eye movements were discouraged (Experiment 3). We suggest that spatial context learning is abstracted from motor actions. Embedded (or grounded ) cognition theories postulate that cognition should not be studied independent of the body and environment that accommodates it (cf., Barsalou, 008). Instead, body states, motor actions, and their interactions with the environment all take part in basic cognitive processes. This idea of a unified cognitive system, where an action is more than the mere outcome of cognition (Song & Nakayama, 009), also underlies the vision-for-action hypothesis (Milner & Goodale, 006). According to this hypothesis, the dorsal visual pathway, commonly described as carrying where information, should be described as carrying how information, as objects are represented in relation to the observer and to the actions directed to them (Goodale & Milner, 199). These theories suggest that vision is determined not only by the visual input, but also by the motor output directed to it. Along a similar theoretical vein, the premotor theory of attention (Rizzolatti, Riggio, Dascola, & Umilta, 1987) states that covert attention is driven by the need to overtly direct a motor action to an object. In his review on attention, Allport suggested that perceptual filtering of distractors is not the main function of attention. Rather, people selectively attend to targets to ensure that a successful response is made to them (Allport, 1989). That perception and attention are affected by whether or not an action is carried out raises the possibility that learning and memory may be modulated by motor actions as well. Indeed, a few findings indicate that this may be the case. In research on the production effect, memory for words spoken out loud is enhanced relative to memory for words read silently (Maceod, Gopie, Hourihan, Neary & Ozubko, 010). In addition, it has been shown that negative adjectives were remembered better when participants shook their heads during encoding, and positive adjectives were remembered better when participants nodded their heads during encoding (Forster & Stark, 1996). Even a simple motor action can influence cognition as shapes presented after a go response were considered more pleasant than shapes presented after a no-go response (Buttaccio & Hahn, 010). In the present study we further examine the interaction between vision and action. In particular, we test how overt motor action made toward a search target modulates contextual learning between the target and its surrounding spatial context (Chun & Jiang, 1998).

3 The role of action in visual learning People are highly efficient at extracting regularities embedded in the environment. earning is revealed for many aspects of the environment, including repeated spatial layout (Chun & Jiang, 1998), motion trajectories (Chun & Jiang, 1999; Makovski, Vázquez, & Jiang, 008), target locations (Miller, 1988; Umemoto, Scolari, Vogel, & Awh, 010), visual or movement sequences (Nissen & Bullemer, 1987; Swallow & Zacks, 008), and object-to-object associations (Fiser & Aslin, 00; Turk-Browne, Junge, & Scholl, 005). For at least some types of visual statistical learning, making an overt motor response is not necessary. For example, consistent association between objects can be learned when observers are simply asked to look at a series of shapes (Fiser & Aslin, 00). In other types of learning, however, motor response appears to be an integral component, as shown in the Serial Reaction Time task (SRT, Nissen & Bullemer, 1987). In this task, participants press one of several keys to report the corresponding position of a stimulus on the screen. Unbeknown to them, the positions of the stimulus over several trials sometimes form a repeated sequence (e.g., positions 143431 143431 ). Participants usually respond faster to repeated sequences than unrepeated one, even in situations where they are unaware of the repetition (Nissen & Bullemer, 1987). Although participants can learn the motor sequence of finger movements independent of the perceptual sequence of stimulus on the screen (Willingham, 1999), or the perceptual sequence on the screen independent of the motor sequence (Remillard, 003), covarying the motor and perceptual sequences may facilitate learning of complex sequences (Dennis, Howard & Howard, 006). Ziessler and Nattkemper (001) suggest that response-tostimulus association governs learning in the SRT task. Specifically, in line with the notion of response-effect learning (e.g., Hommel, Müsseler, Aschersleben & Prinz, 001), participants learn to predict the upcoming stimulus based on their current response. Unlike the SRT task, spatial context learning in visual search does not involve predictive association between a repeated display and a motor response, or between a motor response and the next display. In this task, participants are asked to press a key to report the orientation of a T- target embedded among -distractors. Unbeknown to them, some search displays are repeated in the experiment, presenting consistent associations between the spatial layout of the search display and the target s location (Chun & Jiang, 1998). However, the target s orientation, and hence the associated key press, is randomized, preventing participants from associating a repeated display with a specific response. Nonetheless, participants are often faster searching from repeated displays than from new ones, revealing contextual cueing (Brady & Chun, 007; Chun & Jiang, 1998; Kunar, Flusberg, Horowitz & Wolfe, 007). Although learning in contextual cueing cannot be attributed to the acquisition of consistent stimulus-response association, there is indirect evidence that this type of learning is not fully abstracted from the task and responses that participants made. First, when the same spatial displays are used for either a change-detection or a visual search task, learning does not fully transfer between the two tasks (Jiang & Song, 005). Second, whereas young children (6-13 years old) fail to show contextual cueing when making a keyboard response to the target s orientation (Vaidya, Huger, Howard, & Howard, 007), they do show robust learning when they simply touch the target s location on a touch screen (Dixon, Zelazo, & De Rosa, 010). Although these two studies differ in several aspects (e.g., whether the search stimuli were fish or letters), it is possible that some types of motor response promote greater learning than other types. Specifically, touching a target on the screen involves a task-relevant, viewer-centered spatial component that is absent in a keypress task. Moreover, touching a screen may facilitate learning

4 because vision is superior near the hands (Abrams, Davoli, Du, Knapp & Paull, 008). Therefore visual learning may be enhanced when the hands are positioned near the screen than near the keyboard. Third, the representation of spatial context is viewpoint specific, as contextual cueing acquired from a virtual 3-D search display does not transfer after a 30º change in viewpoint (Chua & Chun, 003). The viewpoint-dependent characteristic suggests that what is learned in contextual cueing may depend on the viewer s perspective and goals toward the target. Finally, in adults, whereas successful detection of a target leads to learning of the search display, learning is absent when search is interrupted before the target is detected and a response is made (Shen & Jiang, 006). The studies reviewed above suggest that statistical regularity alone is insufficient to fully account for what is learned in contextual cueing. It is possible that making an action toward the target is an integral part of the learning. Furthermore, making a detection response, as opposed to no response, constitutes a change in the participant s current activity, and this change may trigger a re-checking or updating process that enhances the learning and memory of concurrently presented visual information (Swallow & Jiang, 010, 011; Swallow, Zacks, & Abrams, 009). The main purpose of this study is to directly evaluate the role of motor actions in spatial context learning. We test whether learning is specific to the kind of motor responses made to the targets (keyboard vs. touch-screen response), and whether withholding a response to targets hinders learning. Experiment 1 Experiment 1 investigated the specificity of contextual cueing to different response modes toward the search target. The ideomotor principle (cf. Stock & Stock, 004) states that movements are associated with their contingent sensory effects. Consistent with this principle, Hommel and colleagues proposed that an action is automatically associated with all active event codes to form an associative structure (action-effect binding; e.g., Elsner & Hommel, 001; Hommel, Alonso & Fuentes, 006; Hommel et al., 001). If action codes are integrated with perceptual representation in spatial context learning, then contextual cueing acquired in one response mode may not readily transfer to tasks involving a different response mode. Alternatively, contextual cueing may be largely abstracted from the action made to the targets. If this is the case, then it should transfer between different response modes. To address this issue, in Experiment 1 we first tested whether spatial context learning was affected by response modes. During training, all participants completed a touch screen version and a keyboard version of the standard contextual cueing task in separate sessions. We examined whether contextual cueing was greater in the touch task than in the keypress task. After the training session, we tested whether learning acquired from touch transferred to keypress and vice versa. Method Participants: All participants were students from the University of Minnesota. They were 18 to 35 years old and had normal color vision and normal or corrected-to-normal visual acuity. Twelve participants (mean age 19.4 years; 5 males) completed Experiment 1.

5 Equipment: Participants were tested individually in a normally lit room and sat unrestricted at about 55 cm from a 15 EO touch screen monitor (resolution: 104x768 pixels; refresh rate: 75Hz). The experiment was programmed with psychophysics toolbox (Brainard, 1997; Pelli, 1997) implemented in MATAB (www.mathworks.com). Stimuli: The search items were white T and s (1.75 X 1.75 ) presented against a black background. Participants searched for a T rotated 90 to the right or to the left. The orientation of the T was randomly selected provided that there were equal numbers of left and right Ts in each condition of each block. Eleven -shaped stimuli served as distractors. They were rotated in one of the four cardinal orientations (the offset at the junction of the two segments was 0.175 ). All items were positioned in an imaginary 8 by 6 matrix (8 X 1 ) and the position of each item was slightly off the center of a cell to reduce co-linearity between items. Touch task: In the touch task, participants initiated each trial by touching a small white fixation point (0.4 X 0.4 ) at the bottom of the screen. Then the search display appeared and remained until participants touched the target s position (Figure 1 eft). Responses falling within an imaginary square (1.05 º X 1.05 º) surrounding the center of the target were considered correct. Each incorrect response was followed by a red minus sign for 000 ms. Press task: In the keypress task, each trial started with a small white fixation point (0.4 X 0.4 ) appearing at the bottom of the screen for 500 ms, followed by the search array (Figure 1 eft). To make the keypress task similar to the touch task in terms of task requirement (detecting the T), participants were asked to press the spacebar as soon as they detected the target. The time from the search display onset to the spacebar response was taken as the search RT. To make sure that participants had accurately located the target, we erased the display upon the spacebar response and asked participants to report whether the T was rotated to the left (by pressing the N key) or to the right (by pressing the M key). The second response was unspeeded and provided us with an accuracy measure. Each incorrect response was followed by a red minus sign for 000 ms. Training: press (touch) Old T new T Old T Testing: touch (press) Block N Old T new T... Block N+1 new T... Block M... Figure 1. eft: A schematic illustration of trial sequences used in the press and touch tasks of Experiment 1. Right: A schematic illustration of old and new conditions. Items are not drawn to scale.

6 Design: We used a within-subject design for all experimental factors. All participants completed two sessions of the experiment separated by one week. Each session started with a training phase in one task (e.g., touch), followed by a testing phase in the other task (e.g., press). Training and testing tasks for session 1 were reversed in session. The order of the two sessions (touch training first or press training first) was randomized across participants. The spatial configurations used for the two sessions were unrelated. Participants were familiarized with the tasks and procedure prior to each session with 10 practice trials in each task. This was followed by the training phase that consisted of 576 trials, divided into 4 blocks of 4 trials each. Half of the trials in each block involved old displays: these were 1 unique search displays that were repeated across blocks, once per block. The old displays were generated at the beginning of each session and were not the same across the two sessions. The other half of the trials involved new displays: they were 1 unique displays newly generated at the beginning of each block, with the constraint that the search target s location was repeated across blocks (see Chun & Jiang, 1998). To avoid biases for searching among a particular quadrant, the target s location was evenly distributed across the four visual quadrants in both the new and old conditions. Within each session, after completing the training phase in one task (e.g., touch), participants were tested in the other task (e.g., keypress) using old displays shown earlier in the session. They were familiarized with 10 practice trials in the new task before the testing phase began. The testing phase consisted of four, 4-trial blocks. Half of the trials within each block were the old displays used in the training phase and half were newly generated (new) displays. The test phase enabled us to assess whether there was any transfer of learning across touch and keypress tasks. A comparison between the amount of learning in the test phase and in the training phase allowed us to examine whether learning was significantly weakened by a change in task. Figure 1 (right) depicts a schematic illustration of display conditions. Explicit recognition test: Memory of the search displays was tested at the end of session in a surprise recognition test. Twelve new displays were randomly intermixed with 1 old displays from session. These 4 displays were presented one at a time for participants to make an unspeeded old/new response. Results Accuracy Accuracy was relatively high in both training and testing phases of the experiment and for both touch and keypress tasks (Table 1). Accuracy was not affected by any of the experimental factors or their interactions (all p s >.30) except for the main effect of task: touch responses were more accurate than keypress responses, F(1, 11) = 13.51, p <.01, η p =.51. This difference was likely due to the fact that participants could touch the T while it was on the screen, but they had to report the T s orientation after the display was cleared in the keypress task. Table 1: Percent correct as a function of task (touch, press), experiment phase (training, testing) and condition (old, new). Standard errors of the mean are presented in parentheses. Trained with touch Trained with keypress Training (touch) Testing (press) Training (press) Testing (touch) Old 98.5 (0.5) 9.5 (.6) 95. (1.9) 99.5 (0.3) New 98.8 (0.3) 9.4 (.9) 94.5 (1.9) 99.1 (0.5)

7 RT In the RT analysis, we excluded incorrect trials as well as trials exceeding three standard deviations of each participant s mean of each condition. This RT outlier cutoff removed 1.5% of the correct trials. 1. Training phase To reduce statistical noise, the 4 training blocks were binned into 6 epochs (Figure eft; 1 epoch = 4 blocks). A repeated-measures ANOVA on task (touch or press), condition (old or new), and epoch (1 to 6) revealed a significant main effect of epoch, F(5, 55) = 9.09, p <.01, η p =.73, reflecting faster RTs as the experiment progressed. Touch responses were numerically but not statistically slower than keypress responses, F(1, 11) = 1.81, p >.0. Contextual cueing was revealed by the significantly faster responses in the old than new conditions, F(1, 11) = 5.4, p <.05, η p =.33, and this effect did not interact with task, F < 1. There was no difference between the old and new conditions in epoch 1 (F < 1), but old trials were responded to faster than new trials in epochs -6, F(1, 11) = 7.9, p <.0, η p =.4. The interaction between epoch and condition, however, failed to reach significance, F(5, 55) =.05, p <.09, η p =.16. None of the other interactions were significant (all F s<1). Figure : Mean RT in the training (left) and Testing (right) phases of Experiment 1. Error bars show ±1 S.E. of the difference between old and new displays in each task. Testing phase To test whether learning was specific to response mode, we compared responses of old and new search displays in a testing phase, during which the task changed (Figure, right). A repeated-measures ANOVA on task during testing (touch, press) and condition (old, new) revealed a main effect of task, as touch responses were slower than keypress responses, F(1, 11) = 9.35, p <.05, η p =.46, possibly because it took longer to move the limb than to press down a key. Critically, we observed significantly faster responses in the old than in the new conditions, F(1, 11) = 4.9, p <.05, η p =.31, suggesting that there was transfer of learning from the training

8 to the testing phase. This transfer was observed from touch to press and vice versa, as the interaction between task and condition was not significant, F < 1 [footnote 1 ]. Was contextual cueing during the testing phase comparable to that shown in the training phase? To answer this question, we compared the testing phase with the second half of the training phase of that session. We calculated contextual cueing (new-old RT) and examined the effects of phase (training vs. testing) and task (touch vs. press). An ANOVA revealed no significant effects of phase, task, or their interaction, all F s < 1. This suggests that contextual cueing in the testing phase was comparable to that in the training phase, regardless of whether subjects were trained in the touch task or the keypress task. Explicit recognition Participants were equally likely to identify a display as old when it was an old display (M= 55.6%, S.E. = 4.1) as when it was a new display (M= 56.3%, S.E. = 4.6), F < 1, suggesting that learning was largely implicit (Chun & Jiang, 003; Reber, 1989). Discussion Experiment 1 addressed the possibility that response codes are incorporated in contextual cueing. The results from the training phase revealed that learning was unaffected by the type of responses made to the target: Search was faster in the old condition than in the new condition regardless of whether the task was to touch the screen or to press a key. This finding rebuts the idea that a touch-screen task that inherently involves a task-relevant, viewer-centered spatial component, promotes learning more than a distal keyboard task. Furthermore, learning readily transferred between response types, providing no evidence for the idea that response codes may be incorporated in contextual cueing. Previous studies on young children have found a discrepancy between touch screen responses and keyboard responses in contextual cueing. A study that used touch screen response revealed a significant contextual cueing effect in 5-9 year olds (Dixon et al., 009), whereas one that used keyboard response did not find significant learning in 6-13 year olds (Vaidya et al., 007). Although response type may have contributed to the discrepancy, other factors may also be important. For example, children might have found the search items in Dixon et al. s study (fish) more interesting than in Vaidya et al. s study (letters). In addition, new displays were randomly intermixed with old displays during training in Vaidya et al. s study, whereas only old displays were shown during training in Dixon et al. s study. These differences could make it more difficult for children to acquire contextual cueing in Vaidya et al. s study. Experiment 1 of the current study used a within-subject design and similar stimuli and task requirements between the two response types. This design revealed that contextual cueing was largely independent of the format of responses to the target. Experiment Experiment 1 showed that spatial context learning is not specific to particular types of response: learning transferred between touch and keypress responses toward the target. In both 1 Given that RT was faster in the press task than in the touch task there was a concern that any interaction between response mode and learning was masked by this overall latency difference. To address this concern, we calculated a normalized index of contextual cueing as (RT(new)-RT(old))/mean (RT(new)+RT(old)).The normalized learning index also failed to reveal an effect of response mode on contextual cueing in either the training phase or the testing phase (F s<1).

9 tasks, however, participants made a speeded response upon target detection. Thus, it remains unclear as whether learning is contingent on making a response to the target. Previous neuroscience research has shown that responding to targets changes brain activity. For example, when monkeys pressed a lever upon detecting a vertical bar target, neurons in ocus Coeruleus (C) transiently increased their activity (Aston-Jones, Rajkowski, Kubiak, & Alexinsky, 1994). In humans, target detection leads to increased activity in parietal and dorsolateral prefrontal regions (Beck, Rees, Frith, & avie, 001) and early visual areas (Ress & Heeger, 003). Behavior studies have shown that secondary materials presented concurrently with a target response are better remembered and learned than those presented with distractors (Makovski, Swallow, & Jiang, 011; Seitz & Watanabe, 003; Swallow & Jiang, 010, 011). Although these brain responses and behavioral gains may be induced by target detection independent of the responses made to the targets, it is also possible that they are partly driven by the response. Indeed, the C response was time locked to the animal s response rather than to stimulus onset (Clayton, Rajkowski, Cohen, & Aston-Jones, 004). In addition, shapes presented concurrently or immediately after a no-go response were judged less pleasant than shapes presented with or after a go response (Buttaccio & Hahn, 010). If producing a motor response to targets modifies the encoding and memory of a visual display, then contextual cueing may depend on making a response, even though the specific mode of response (touch or press) does not matter. The other possibility is that contextual cueing is largely abstracted from motor actions: learning is independent of whether participants have responded to the search target. Experiment examines the role of response by comparing the magnitude of contextual cueing between displays that received a response and displays that did not receive a response. Experiment 3 tests the same question under conditions where eye movements are discouraged. In Experiment, we asked participants to search for a T among s and to press the spacebar if the T was titled to the left ( response trials), and withhold response if the T was tilted to the right ( no-response trials). This compound search task (Duncan, 1985) required participants to detect and attend to the T on both types of trials, but only respond to the T on the response trials. If detecting targets is sufficient for contextual learning, then contextual cueing should be comparable between the response and no-response conditions. In contrast, overt response to the target may facilitate learning by triggering a re-checking or updating process (analogous to that induced by segmenting events, Swallow et al., 009). In addition, the need to withhold a response may result in inhibition of learning (Buttaccio & Hahn, 010), reducing the learning of no-response displays, and possibly leading to negative transfer when participants need to respond to these displays again. Method Participants: Nineteen new participants (mean age 1.1 years; 6 males) completed Experiment. Equipment and stimuli: The equipment and stimuli were identical to the ones used in Experiment 1 except that a 17 CRT monitor replaced the touch screen. Additionally, in the training phase, the T-item was rotated by 15, 30 or 45 to the right or to the left, while the -

10 shaped distractors were randomly rotated (0 to 345 in steps of 15 ) [footnote ]. In the testing phase, the T was always vertical and the -shaped distractors were randomly rotated in one of the four cardinal orientations. There was no offset between the two segments of the -shaped distractors. Training phase, compound search task: The training phase consisted of a compound search task in which participants were asked to find the T and identify its orientation. The task was to press the spacebar only when the T was rotated to the right (for half of the participants; or left, for the other half). Each trial started with a small white fixation point (0.4 X 0.4 ) at the center of the screen for 500 ms, followed by the search array. To ensure that participants attended to the search display for the same amount of time on the response and no-response trials, the search array was presented for a fixed amount of time - 500 ms. The search array was followed by a blank interval of 1000 ms (Figure 3, top). Participants were instructed to find the T and to press the spacebar on response trials. Responses made within 1500 ms from the onset of the search display were recorded. After this time window, accuracy feedback (a green plus or a red minus) was displayed for 400 ms. Testing phase, simple search task: The purpose of the testing phase was to assess whether learning had occurred for response and no response trials. This was done by measuring search RTs on the two types of old displays learned under either response or no-response conditions, and comparing them with RT to the new displays. In the testing phase of the experiment, each trial started with a small white fixation point (0.4 X 0.4 ) for 500 ms. Then the search display appeared. Participants searched for the T, which was present on every trial, and pressed the spacebar upon detection. The time from the onset of the search display to the spacebar response was used for RT analysis. After the response, all items turned into random digits (1-6) and participants entered the digit that occupied the target s location (Figure 3, bottom). A green plus followed correct responses (400 ms) and a red minus followed incorrect responses (000 ms). We used three possible angles in each orientation to minimize the possibility that observers would use a single memory template for search, such as 45º-left tilted T. In debriefing, all participants confirmed that their strategy was to first find the T and only then to determine its orientation.

11 A. Training: Press only if the T is tilted to the left (right) 500ms B. Testing: Find the T 500ms 1000ms T T 500ms Until response Until response Figure 3. Trial s sequence in the training (top) and testing (bottom) phases of Experiment. Design: Participants were familiarized with the compound-search task in 40 practice trials. Then, in the training phase, they completed 0 blocks of 3 trials of the compound-search task. The 3 trials of each block were unique displays that were presented once per block, in a random order. For each participant 16 displays were associated with a response (old-response) and 16 displays were associated with no response (old-no-response), and this was repeated across blocks. We did not include new displays because learning could not be assessed during training for the old-no-response trials. After completing the training phase, participants did a short practice block of 5 simple search trials before the testing phase began. The testing phase consisted of four, 48-trial blocks of the simple search task. The search displays were the 16 old-response and 16 old-no-response displays from the training phase, randomly intermixed with 16 new displays. We asked subjects to respond to all types of displays to obtain an RT measure, but the displays differed in the way they were learned previously. All displays were presented again for 3 additional blocks in the testing phase to increase statistical power. Finally, to control for the repetition of target locations, the same 16 target locations were used in all 3 conditions (old-no-response, old response, new). Explicit recognition test: Participants explicit memory was tested at the end of the experiment in a surprise recognition test. There were 64 trials. Participants were asked to decide whether they have seen each display in the main experiment. The 64 displays consisted of the 16

1 displays in each of the three conditions shown in the testing session (old-no-response, old response, new) randomly intermixed with 16 additional novel displays. Results Training phase Data from three participants were excluded because their false alarm rate on no-response trials in the training phase was higher than 35%, whereas the average false alarm rate of the remaining sixteen participants was 10.7% (S.E. = 1.1). Participants correctly executed a response on 83.3% (S.E. = 1.6) of the response trials, with a mean RT of 737 ms (S.E. = 13.5). Figure 4A shows mean RT for detecting the target on response trials, and overall accuracy as a function of training epoch (1 epoch = 4 blocks). Testing phase: Accuracy Accuracy in the testing phase was not affected by display type: 97.8% for the old-noresponse condition, 97.5% for the old-response condition, and 96.5% for the new condition, F(, 30) = 1.66, p >.. In the RT analysis, we excluded incorrect trials as well as trials exceeding three standard deviations of each participant s mean of each condition. The latter cutoff eliminated 1.% of the correct trials. Testing phase: RT Figure 4B shows RT in the testing phase of the experiment. A repeated-measures ANOVA found a significant effect of condition, F(, 30) = 4.1, p <.05, η p =.. Planned contrasts showed that whereas the old-no-response and old-response conditions were not significantly different (p >.7), they were both significantly faster than the new condition, t(15) =.30, p <.05 for new vs. old-response; t(15) =.60, p <.05 for new vs. old-no-response. This finding suggests that the omission of a target detection response during training was not detrimental to spatial context learning. 840 A: Training B: Testing RT "hits" (ms) 780 70 660 600 1 RT (ms) 80 780 Proportion correct 0.9 0.8 0.7 0.6 740 0.5 1 3 4 5 700 old-no-response old-response new Epoch Display type

13 Figure 4: eft: Mean RT for hit responses (top) and overall proportion correct (bottom) in the training phase of Experiment. Right: Mean RT in the testing phase of Experiment. Error bars show ±1 S.E. Explicit recognition In the recognition task, participants committed high false alarm rates for the new (59%) and novel (54.3%) conditions, which were similar to the hit rates for the old-no-response (53.1%) and the old-response conditions (57%), F < 1. These findings confirm that learning was largely implicit for old displays that received a response and for those that received no response. Discussion Experiment showed that contextual cueing did not depend on making an overt motor response to the search targets. Search displays that did not receive a motor response were learned to the same extent as search displays that received a response. We found no evidence that refraining from responding to the target reduced learning or produced inhibition. This finding provides further evidence that spatial context learning is abstracted from responses. Not only is it invariant to two different response modes to the target (touch or press, Experiment 1), but it is also comparable for displays involving a detection response and ones where a response was omitted. The advantages afforded by target detection observed in previous behavioral and neuroscience studies (e.g., Aston-Jones et al., 1994; Swallow & Jiang, 010, 011) are likely triggered by detecting the target, rather than making an overt motor response to it. Experiment 3 The results have clearly shown that neither the need nor the nature of making a manual response has an effect on spatial context learning. Before we can conclude that overt motor action is not critical for such learning, we need to consider the role of oculomotor responses. Previous research has shown that spatial context learning not only results in faster manual response at detecting the target, but also faster eye movement towards the target (Peterson & Kramer, 001). Even though contextual cueing was observed when the search display was too brief for eye movements (Chun & Jiang, 1998), this finding was observed when subjects made a manual detection response to the target. Experiment had revealed contextual cueing in the absence of a manual detection response, however, subjects may have made saccadic eye movements toward the target [footnote 3] 3. To address whether contextual cueing depends on any type of motor response, Experiment 3 repeated Experiment s design under conditions where eye movements were prevented or discouraged. In Experiment 3a participants were discouraged from making any eye movements, and compliance with this requirement was checked with the use of an eye tracker. In Experiment 3b the search display was briefly presented for 150 ms, a duration long enough for shifting covert attention but shorter than a typical saccade latency (Nakayama& Mackeben, 1989). If overt motor response is a critical component of spatial context learning, then in the absence of eye movements or manual responses, contextual cueing should be eliminated. 3 We thank Dr. Dominic Dwyer and an anonymous reviewer for raising this possibility.

14 Experiment 3a Method Participants: Performing a search task for a long time without moving their eyes turned out to be difficult for naïve subjects. Only eight (out of 14) participants (mean age 3.8 years; 6 males) were able to complete Experiment 3a. Equipment, stimuli, procedure and design: The main novelty of this experiment was the use of an ISCAN ET-300R eye-tracker to ensure fixation. A chinrest was used to fixate head position at a viewing distance of 75 cm (thereby the perceived size of the stimuli was reduced by ~17%). Participants were told to fixate their gaze at the white fixation point (0.35 X 0.35 ) that remained on the screen throughout a trial. A trial would not start until participants had fixated within a 1.04 X 1.04 imaginary box surrounding the center, for at least 60% of the preceding 500 ms sliding time window. The same criterion fixating within the 1.04º fixation box for at least 60% of a sliding 500 ms time window was used to indicate whether fixation had been broken during a trial, and was based on the notion that subjects would not be able to execute a saccade towards the target without being detected. Violating this stringent criterion would trigger a warning message ( you moved your eyes in the last trial! ) 500 ms after the accuracy feedback. Compliance with the eye fixation requirement was checked during both training and testing phases and a 15-30 minutes practice for maintaining fixation was given prior to the start of the experiment. We did not administer the explicit recognition test. In all other respects the experiment was identical to Experiment. Results Training phase Except for one subject who broke fixation on 65% of the trials in the training phase, the averaged fixation breaks was 8.9% (S.E. =4.8). Fixation breaks were more frequent on noresponse trials (38.4%) than on response trials (8.4%) probably because these trials lasted longer, providing participants more opportunities to break fixations. The average false alarm rate was 11.3% (S.E. =.5) and participants correctly executed a response on 79.5% (S.E. =.0) of the response trials, with a mean RT of 79 ms (S.E. = 1.5) Testing phase: Accuracy Accuracy in the testing phase was not affected by display type: 94.3% for the old-noresponse condition, 9.8% for the old-response condition, and 93.6% for the new condition, F<1. In the RT analysis, we excluded incorrect trials as well as trials exceeding three standard deviations of each participant s mean of each condition (.5% of the correct trials). Testing phase: RT Figure 5a shows RT in the testing phase of the experiment. A repeated-measures ANOVA revealed a significant effect of condition, F(, 14) = 5.4, p <.0, η p =.44. Planned contrasts replicated the previous findings: Whereas the old-no-response and old-response conditions were not significantly different (p >.79), they were both significantly faster than the new condition, p s<.0. This pattern of results held when we excluded trials where fixation was broken during testing (4.8%). It is important to note that a fixation break (according to the stringent criterion we used) does not necessarily mean that subjects moved their eyes towards the target. Instead, it could have resulted from measurement noise or small gaze fluctuations. To confirm that fixation breaks were not associated with eye movements and learning we measured contextual cueing for individual displays. This analysis showed that the amount of fixation breaks during training did

15 not correlate with the size of contextual cueing (as indexed by (RT(new)-RT(old))/mean (RT(new)+RT(old)): r(16)=-.09, p>.33, for no-response displays, and r(16)=-.03, p>.7, for response displays. A recent conference report also demonstrated contextual cueing in participants whose eye movement was minimized with the use of an eye tracker (e Dantec & Seitz, 010). 1080 Experiment 3a: Eye-Tracker RT (msec) 1040 1000 960 90 880 840 old-no-response old-response new Display type 800 Experiment 3b: Brief display RT (msec) 760 70 680 640 old-no-response old-response new Display type Figure 5: Experiment 3a s (top)and 3b s (bottom)mean RT in the testing phase. Error bars show ±1 S.E. Experiment 3b Experiment 3a replicated the findings of Experiment under conditions where eyemovements were discouraged. Specifically, even when oculomotor responses were minimized, similar learning effects were found regardless of whether the task required a motor response. However, because of the high rate of fixation breaks we deemed it necessary to seek converging evidence for these findings. In Experiment 3b we repeated the previous design but restricted the presentation duration of the search display to 150 ms. Because this duration is shorter than saccade latency, subjects could not have moved their eyes to the target (Chun & Jiang, 1998). Method Participants: Sixteen participants (mean age 1.0 years; males) completed Experiment 3b.

16 Equipment, stimuli, procedure and design: The main change to Experiment 3b was to reduce the presentation duration of the training phase from 500 ms to 150 ms. A pilot study showed that the target-detection task used in Experiment was too difficult when the display was presented briefly. We therefore simplified the search task by lowering set size from 1 to 8. We also increased the difference between the target T and distractor s (the items were now elongated to 1.75ºx1.8º) and simplified the T and s (the T-item was rotated by 15 to the right or to the left, while the -shaped distractors were randomly rotated in one of the four cardinal orientations). Finally we asked participants to fixate their gaze at the white fixation point (0.4 X 0.4 ) that remained on the screen throughout a trial. We did not use an eye tracker, neither did we administer the explicit recognition test. In all other respects the experiment was identical to Experiment. Results Training phase The average false alarm on no-response trials was 8.7% (S.E. = 1.1). Accuracy on response trials was 89.1% (S.E. =.1) and mean RT was 64 ms (S.E. = 18.5). Testing phase: Accuracy Accuracy in the testing phase was not affected by display type: 95.1% for the old-noresponse condition, 95.3% for the old-response condition, and 95.4% for the new condition, F < 1. In the RT analysis, we excluded incorrect trials as well as trials exceeding three standard deviations of each participant s mean of each condition (1.7% of the correct trials). Testing phase: RT Figure 5b shows RT in the testing phase of the experiment. Planned contrasts showed that whereas the old-no-response and old-response conditions were not significantly different (p >.8), they were both significantly faster than the new condition, t(15) =.18, p <.05 for new vs. old-response; t(15) =.94, p <.01 for new vs. old-no-response. Direct comparison between these data and Experiment s findings revealed no effects of presentation duration, all F s<1. These findings confirm that spatial context learning does not require a motor response, as learning was observed in the absence of manual response or eye-movements to the target. Discussion Two control experiments were conducted to address the possibility that common eye movements underlie learning in both the response and no-response conditions. The results of these experiments argue against this idea. In both experiments subjects were discouraged from moving their eyes either by an eye tracker or by the use of briefly presented displays. However, in neither experiment was learning enhanced when participants made an overt manual detection response to the target. While these findings do not rule out the possibility that covert programming for saccadic eye movement, or covert detection responses are important for spatial context learning, they do strengthen the idea that learning is abstracted from overt motor responses. General Discussion According to a unified concept of perception, cognition and action, motor responses are more than the mere outcome of a serial process going from perception to action (e.g., Song &

17 Nakayama, 009). Furthermore, the notion that action plays an important role in both perception and memory has gained ample support (Elsner & Hommel, 001; Hommel, et al., 001; Hommel et al., 006; Forster & Stark, 1996; Maceod et al., 010; Milner & Goodale, 006). Therefore, the goal of the present study was to investigate the role of overt motor response in implicit visual learning. Specifically, we asked whether learning is affected by different types of motor response to a search target, and whether making an overt response to the target facilitates learning. Our results showed that learning was comparable for displays that received a touch response and a keypress response. In addition, spatial context learned through one type of response was effectively used even when the response type had changed. Finally, learning took place even when participants made no manual or oculomotor response to the target. These data provide converging evidence to suggest that action plays a limited role in spatial context learning. Remillard (003) has differentiated among three types of learning in the SRT task: perceptual-based learning for target locations, response-based learning for keypress locations, and effector-based learning for finger movements. The results of Experiment 1 suggest that the task-specificity of contextual cueing is neither due to motor-specificity (effector-based learning) nor due to response-specificity. Rather, contextual cueing is a form of a perceptual-based learning as learning transferred across the touch and press conditions even though both the motor response and the spatial representation of the response had changed. If contextual cueing is perception-based, how do we account for the learning specificity to different tasks observed in the past? One reason is that when tasks had changed in previous studies, perceptual encoding of the visual input had also changed. Consider the finding that contextual cueing does not fully transfer between change detection and visual search tasks (Jiang & Song, 005). In visual search, the target s location is strongly associated with its local context that is, items that are near the target (Brady & Chun, 007; Jiang & Wagner, 004; Olson & Chun, 00). In change detection, the target is associated with the global layout of the items (Jiang, Olson, & Chun, 000; Jiang & Song, 005). Although the same spatial layout is presented to participants, it is represented differently depending on the task. That learning does not require a response (Experiments and 3) raises the question of why contextual cueing was not found on trials where search was interrupted before the target was successfully located and responded to (Shen & Jiang, 006). In an interrupted search, the target is not found, so one cannot associate a search layout with the target s location. Although participants could still associate a search layout with where targets were not found (Greyer, Shi & Müller, 010), this information is likely too limited to facilitate search. The lack of learning in interrupted search, therefore, may be due to the absence of detecting the target, rather than the absence of making a response. Why was learning not affected by action? One possible explanation is that the association between spatial context and the target s location requires a form of representation that is abstracted from the viewer s motor response. That is, unlike the action-effect binding where action and perceptual codes are learned to form an associative structure (e.g., Elsner & Hommel, 001; Hommel et al., 001; Hommel et al., 006), in contextual cueing, the spatial locations of search target and other items are represented in relation to each other. This kind of relational coding may have taken place before an action code (or a no-action code, Kühn & Brass, 010) is generated. Alternatively, it is possible that our measurements were not sensitive enough to detect finer motor representations that might play a role in contextual cueing. Particularly, some interactions between action and perception are found when the perceptual stimulus and the action required shared common properties (e.g., the stimulus-response compatibility effect, Kornblum,

18 Hasbroucq & Osman, 1990). Thus, it is possible that testing compatibility effects in contextual cueing might reveal that some motor response codes are incorporated into learning. Presently this possibility does not have strong support. Although a keypress response is arbitrary and not highly related to the search task, a touch response and a saccadic eye movement have a viewercentered spatial component. However, learning acquired with a touch response was not weakened when tested with a keypress response, and contextual cueing was observed in the absence of an eye movement. These findings suggest that even a spatially directed motor response is not a critical component of spatial context learning. Nonetheless, we made an attempt to examine the possibility that motor response codes might be incorporated in contextual cueing when a spatial compatibility component was introduced between perception and action. In this follow-up experiment, we compared learning of repeated displays under a condition where the motor response was spatially compatible with the target display (press with the right hand for targets presented on the right and with the left hand for targets presented on left) with a condition where the motor response was spatially incompatible with the target display (press with the left hand for targets presented on the right and with the right hand for targets presented on left). The results of this experiment showed that although both the effects of compatibility and learning were highly significant, F s(1, 19) > 14, p s <.01, η p >.43, they did not interact, F<1. That is, contextual cueing was comparable for the spatially compatible and spatially incompatible conditions. These results provide further evidence for the perceptual nature of spatial context learning. Future research testing other types of compatibility effects, particularly those with a stronger motor component, is important for determining whether motor response codes are sometimes incorporated into learning. To conclude, while it has been shown that vision-for-action differs from vision-forperception and that action codes are integrated into perceptual representations, the present results show that spatial context learning is relatively independent of simple actions made to the search target. Such learning transfers across response modes and occurs in the absence of an overt response to the target.