Episodic and prototype models of category learning

DOI 10.1007/s10339-011-0403-2 RESEARCH REPORT Episodic and prototype models of category learning Richard J. Tunney Gordon Fernie Received: 7 October 2010 / Accepted: 28 March 2011 Ó Marta Olivetti Belardinelli and Springer-Verlag 2011 Abstract The question of what processes are involved in the acquisition and representation of categories remains unresolved despite several decades of research. Studies using the well-known prototype distortion task (Posner and Keele in J Exp Psychol 77:353 363, 1968) delineate three candidate models. According to exemplar-based models, we memorize each instance of a category and when asked to decide whether novel items are category members or not, the decision is explicitly based on a similarity comparison with each stored instance. By contrast, prototype models assume that categorization is based on the similarity of the target item to an implicit abstraction of the central tendency or average of previously encountered instances. A third model suggests that the categorization of prototype distortions does not depend on pre-exposure to study exemplars at all and instead reflects properties of the stimuli that are easily learned during the test. The four experiments reported here found evidence that categorization in this task is predicated on the first and third of these models, namely transfer at test and the exemplarbased model. But we found no evidence for the second candidate model that assumed that categorization is based on implicit prototype abstraction. Keywords Categorization Implicit learning A fundamental aspect of human cognition is the ability to acquire knowledge of categories. Precisely how the mind represents categories has received a substantial amount of attention and recently theories of categorization have been R. J. Tunney (&) G. Fernie University of Nottingham, Nottingham, UK e-mail: richard.tunney@nottingham.ac.uk informed by studies involving amnesic patients and brainimages. These, however, tend to contradict the evidence obtained using traditional psychological methods. The present research is concerned with two classes of theory of categorization, in particular: episodic models and prototype models. Both prototype and episodic models assume that categorization decisions are based on similarity. According to episodic models, we memorize each instance of a category (e.g., Hintzman 1986). When asked to decide whether novel items are category members or not, the decision is based on a comparison of the item with each stored exemplar. In effect, categorization is little more than a form of episodic memory. 1 By contrast, prototype models assume that categorization decisions are made by judging the similarity of the item to a prototype, rather by comparing it to the study items stored in memory (Smith and Minda 2002). A prototype is usually defined as an abstraction of the central tendency or an average of previously encountered exemplars. Because the exemplars themselves need not be stored in memory, some proponents of prototype models have claimed that prototype abstraction is an implicit process or that it results in an implicit representation (e.g., Reber et al. 2003; Squire and Knowlton 1995). Prototype distortion tasks Prototype distortion tasks (Posner and Keele 1968) have been highly influential in developing our understanding of 1 Strictly speaking exemplar models are a subordinate category of episodic models we refer to them synonymously because exemplar models require some form of episodic memory to store the instances, although this need not be veridical.

category learning, and this paper is concerned with the most common experimental preparation known as the A, not A task that was designed to test between prototype and exemplar models of categorization (Ashby and Maddox 2005). We present evidence that calls for a re-evaluation of this task and present the first studies that directly test the claim that prototype abstraction involves implicit knowledge. In the most common prototype distortion task, a prototype stimulus is formed by generating a random pattern of dots; category members are then created by distorting the co-ordinates of each dot of the prototype stimulus by some probability. Similar category members are relatively low distortions of the original co-ordinates; dissimilar items are higher distortions (see Fig. 1). In the A, not A procedure the participants are first shown a set of high distortions. Learning is incidental in the sense that the instructions do not mention the existence of a category. Later, participants are asked to categorize previously unseen high distortions, low distortions, the prototype, and random patterns. Some proponents of prototype abstraction have argued that exemplar and prototype models make different predictions about which items are most likely to be endorsed as category members (e.g., Smith and Minda 2002). It is assumed that if the exemplar model were correct then the previously unseen high distortions are the items most likely to be endorsed as category members because these are assumed to be most similar to the study items. The prototype model predicts that the original prototype used to generate the stimuli and the low distortions are the items most likely to be endorsed as category members, despite the fact that they are less similar to the study items than are the high distortions. The results of numerous studies confirm this pattern of results, known as the prototype enhancement effect, and this is generally taken as support for the prototype model (e.g., Knowlton and Squire 1993; Kolodny 1994; Posner and Keele 1968; Squire and Knowlton 1995). That is, participants abstract the central tendency of the high distortions during study and arrive at a representation of a prototype that closely matches the prototype that was used to create the stimuli. Category decisions are then based on the similarity of the test items to the abstracted prototype rather than to the study items. A key problem with this model is that the similarity of the test items to either the study items or the prototype is based on an assumption rather than empirical evidence. In the experiments that follow, we ask whether this assumption is correct. Evidence from studies of amnesic patients and brain imaging support the prototype account and suggest that behavioral dissociations between prototype and exemplar knowledge are products of separate neural processes. Both Squire and Knowlton (1995) and Kolodny (1994) reasoned that if the prototype was abstracted during the study episode, then episodic memory would not be needed to store the study exemplars. It follows that category learning should be a preserved function in patients with amnesia. Indeed both studies found similar patterns of categorization in amnesic patients and controls. As expected, the amnesic patients performed poorly relative to controls in subsequent recognition tests of the study exemplars. The conclusion from studies using amnesic patients is that category knowledge can be acquired in the absence of explicit memory for study exemplars. Thus, any episodic model would appear untenable because it does not primae facie predict the pattern of results in normal or amnesic participants. Episodic models of the prototype enhancement effect and amnesia Fig. 1 Example stimuli used in prototype distortion tasks Nonetheless episodic models have had some success in accounting for the prototype enhancement effect and dissociations between recognition and categorization in amnesic patients. Shin and Nosofsky (1992) asked participants to rate the similarity of each of the test items to one another. These similarity ratings were then used to model the probability that each test item would be endorsed as a category member and compared with a model based on the similarity of each test item to prototype items. The exemplar model provided a better fit with empirical data the prototype model. Episodic models can simulate apparent dissociations between categorization and recognition so long as it is

assumed that amnesia results in a reduced but not entirely eliminated capacity to memorize exemplars. For example, Nosofsky and Zaki (1998) demonstrated that variables that affect memory in healthy participants, such as a long retention interval, result in impaired recognition but relatively spared categorization. The message from studies of this kind is that dissociations between categorization and recognition do not rule out episodic models of category learning (see also Kinder and Shanks 2001, 2003). Abstraction as an implicit process A common inference derived from dissociations between categorization and recognition in healthy and amnesic populations is that prototype abstraction and subsequent categorization relies upon implicit processes (Squire and Knowlton 1995). Studies using fmri seem to support this view. For example, Reber et al. (2003) reported differences in brain activity when categorizing test items following either incidental or intentional learning instructions that they concluded reflected differences in implicit and explicit processes, respectively. Intentional learning resulted in activation of the hippocampus; by contrast, incidental learning resulted in deactivation of the posterior occipital cortex. Because the hippocampus was not associated with incidental learning Reber et al. (2003) interpreted this as an implicit process (see also Aizenstein et al. 2000; Little et al. 2006). Although a recent re-evaluation of these studies questions whether these dissociations result from differences in instructions and materials rather than differences in the underlying neural processes (Gureckis et al. 2010). Whatever the neural substrates might be, we feel that no conclusions can be reached regarding awareness in this paradigm because the claim that prototype abstraction is an implicit process is based entirely on the neuropsychological evidence showing that categorization does not appear to involve the hippocampus, rather than any assessment awareness itself. Subjective reports of confidence are a common means to assess levels of implicit and explicit knowledge in other category learning paradigms such as artificial grammar learning (Dienes et al. 1995; Tunney 2005, 2010), along with decision-making (Juslin et al. 2000), perception (Kunimoto et al. 2001), and memory (Henson et al. 2000), but have yet to be applied to prototype distortion tasks. Several techniques to assess levels of awareness based on subjective confidence have been used, of which the most popular is the zero-correlation criterion (Dienes et al. 1995). This approach defines awareness in terms of the relationship between confidence and accuracy. If participants are aware of the knowledge they use to categorize items, then they should be more confident when they make correct decisions than when they make incorrect ones. Thus, confidence should be correlated with accuracy. Conversely, if discriminations are made without awareness, then participants should be just as confident when they make incorrect decisions as when they make correct ones, and the correlation between confidence and accuracy will be close to zero. Data from experiments using this criterion suggest that categorization can be implicit under some circumstances (see Dienes 2008). However, Dienes and Scott (2005) make a distinction between awareness that a decision is correct, and awareness of the reasons why that might be. Minimally, a correlation between confidence and accuracy only implies awareness that item belongs to a category and not necessarily knowledge of the structural reasons that defines category membership. But if participants have structural knowledge of the category, then the information used to make confidence ratings should be the same as that used to categorize items. Tunney (2005) demonstrated that confidence ratings are closely related to structural knowledge in artificial grammar learning and could be used to discriminate between competing theories of categorization. Because confidence ratings were closely related to the similarity of the test items to the study items, but not the rules of the grammar, Tunney concluded that the categorization in artificial grammar learning was based on explicit memory for exemplars rather than rule abstraction. In the experiments that follow, we directly test claims of the implicitness of judgement knowledge in prototype distortion tasks. Transfer at test There remains a third model of categorization in prototype distortion tasks, namely transfer at test (Palmeri and Flanery 1999; Zaki and Nosofsky 2004). Palmeri and Flanery (1999) noted that the episodic model could only explain preserved categorization in amnesic patients when recognition remained above chance. Given that at least one amnesic patient s recognition performance was too poor to support categorization on an episodic basis; either the episodic model is wrong or participants can learn something of the category structure from the test items without recourse to any representation of the study items. To test this, Palmeri and Flanery (1999) simulated amnesia in healthy participants by substituting the study period with an irrelevant word-identification task and told the participants afterward that the patterns had been presented subliminally. The resulting pattern of categorization appeared to closely match that of the true amnesic patients reported earlier (Knowlton and Squire 1993; Squire and Knowlton 1995). Palmeri and Flanery (1999) reasoned that since in their experiment categorization could only proceed on the basis of information obtained from the test items this was also likely to be true of the amnesic patients

(see also Palmeri and Flanery 1999, 2002; Zaki and Nosofsky 2004; 2007). This raises the obvious question of what the properties of the test items are that can give rise to the prototype enhancement effect without any study pre-exposure to the category exemplars? In the method described by Posner et al. (1967) to create dot patterns, the dots are arranged in concentric rings (although these are not seen by the participants) and the amount of distortion refers to the probability that each of the dots that make up a pattern will move to an outer ring. The consequence of this is that high distortions are more dispersed relative to the center of the pattern than are the low distortions. It is this attribute of dot patterns that is thought to underlie transfer at test (Zaki and Nosofsky 2004). Much of the evidence that has appeared to favor prototype abstraction as the psychological mechanism responsible for the prototype enhancement effect (Smith 2002; Smith and Minda 2002) might be due instead to simple transfer at test (Zaki and Nosofsky 2007). Indeed Zaki and Nosofky demonstrated that much of the computational modeling that has supported the abstraction model is redundant given that the prototype enhancement effect occurs in the absence of preexposure. Overview of experiments The experiments that follow are designed to discriminate between the three candidate models of categorization in the prototype distortion task. We first asked two groups of participants provided similarity ratings for each of the test items to either the study items or to the prototype. In the subsequent experiments, individual regression analyses using these similarity ratings were conducted for each participant individually to determine whether their representation of the category was best described as prototype-based or episodic-based. To determine whether participants representation of the category resulted from exposure to exemplars or to transfer at test each experiment directly compared the performance of a pre-exposed group to that of untrained controls using a fake-subliminal procedure similar to those used by Palmeri and Flanery, and Zaki and Nosofky (Palmeri and Flanery 1999; Zaki and Nosofsky 2004). To our knowledge, no previous study has formally made these comparisons. Finally, we asked the participants to report how confident they were in each decision so that we could directly assess whether their representation of the category is predicated on implicit or explicit knowledge. Similarity ratings The exemplar and abstraction models both predict that test items are categorized on the basis of similarity. Exemplar models predict that test items are categorized on the basis of their similarity to the study items. Abstraction models predict that test items are categorized on the basis of their similarity to the prototype. To test this very basic assumption behind dot-pattern learning two groups of participants were asked to rate the similarity of the test items to either the prototype or to the study exemplars. To our knowledge, no study has obtained similarity ratings of the test items to the study items. This will enable us to test in the experiments that follow whether categorization is based on prototype abstraction or memory for study items. Method Participants Forty-eight members of the University of Nottingham community participated in this study in return for a payment of either 7.00 or 10 (approx 8.80 and 12.60, respectively). Ten were men and 14 were women. Their mean age was 22.58 years (sd = 3.79). Stimuli The stimuli were the same as those used by Knowlton and Squire (1993) and were constructed using the method described by Posner et al. (1967). These consist of one prototype item used to generate the category members, 20 low distortions, 40 high distortions, and 40 non-members (in effect random patterns as they are each a high distortion of a new prototype). Each stimulus is composed of a pattern of nine white dots inside a white box on a black background. Half of the high distortions are used as study exemplars and the remaining half as test items. Procedure Participants were presented with two patterns on screen at the same time and were asked to indicate how similar they were on a scale of 1 9 (where 1 = very different and 9 = identical). Each pair appeared for 5,000 ms before the participants were prompted to make their similarity rating. After this, the pair disappeared and the screen blanked for 1,000 ms before the next pair appeared. In the prototype condition, the pattern on the left was always the prototype and participants made 82 ratings (20 low distortions, 20 High distortions, 40 non-members, and the prototype was presented twice). The exemplar condition was slightly different because there were 3,360 comparisons to be made (i.e., comparing each of the 84 test items to each of the 40 study exemplars). To reduce the time taken for each participant and to reduce possible fatigue, the test items were split into 4 subsets

each with an equal number of each item type so that each participant only had to make 840 comparisons. These participants were paid more for their time ( 10) and were at liberty to take breaks between blocks of 84 comparisons. There was an enforced 10-min break after 420 comparisons. Results The mean similarity ratings for each item type by each condition are shown in Table 1. The similarity ratings of the test items to the prototype show the expected linear relationship. The similarity ratings of the test items to the study items show a pattern that was not predicted. A number of previous studies using these stimuli have assumed that because high distortions are used as study items that the high distortions used at test are more similar to the study items than any other type of item (Knowlton and Squire 1993; Smith and Minda 2002). Thus, the basic claim that the participants failure to endorse high distortions as category members more than low distortions or the prototype items is evidence against episodic models would appear to be untrue. This is an important finding and calls for a general re-evaluation of previous research. Nonetheless, the following experiments use this specific set of stimuli so that the results can be generalized to other studies. Regression analyses using the Lorch and Myers (1990) method can be made to test between the competing models on an item-by-item basis and continue to utilize this specific set of stimuli. Experiment 1 The aim of Experiment 1 is to replicate the prototype enhancement effect and to determine, by comparison with an untrained control group, the extent to which it can be explained by simple transfer at test (Palmeri and Flanery 1999) or if there is any benefit from pre-exposure to category exemplars. If so, is the representation that results from pre-exposure best described by a prototype- or episodic model; and finally, is this information available to awareness? The stimuli and design of the experiment were Table 1 Mean similarity ratings of test items to the prototype and the study exemplars Prototype Low distortions High distortions Nonmembers M se M se M se M se Prototype 9.00 0.00 5.99 0.22 3.94 0.15 2.41 0.09 Exemplar 4.22 0.33 4.00 0.14 3.71 0.12 2.85 0.08 identical to the one used by Knowlton and Squire (1993), with the exception that we included a Control group and asked the participants to make confidence ratings for each decision. Method Participants Forty-eight members of the University of Nottingham community took part in this study in return for a payment of 7.00 (approx 8.80). Fourteen were men and thirty-four were women. Their mean age was 24.04 years (sd = 5.92). Stimuli The stimuli were the same as those described earlier and used by Knowlton and Squire (1993). Design and procedure This was a mixed-model design with Exposure (Preexposed vs. Control) as the between-subjects factor and Item (Prototype, Low Distortion, High Distortion, and Non-members) as the within-subject factor. During the study period, the pre-exposed participants were presented with 40 high distortions of the prototype in the center of a computer display. The participants were told that the study was an experiment on visual attention. Participants were instructed to look for the dot closest to the center of the screen but were not given any instructions about the presence of a category or how to encode the items. To encourage their attention to the task, a small camera was mounted on top of the display and participants were told that their eye-movements would be recorded. Study trials consisted of a 3,000 ms white fixation cross, followed by a study item that appeared for 5,000 ms. Each study item appeared in a white box approximately 4 00 9 4 00. Control participants were informed that they were taking part in an experiment on subliminal perception and visual attention. Control study trials consisted of a 3,000 ms white fixation cross. After this, a black screen was displayed for 1,000 ms, then the screen flashed white for 50 ms, a black screen with an empty 4 00 9 4 00 white box flashed for 50 ms, followed by another white screen for 50 ms before an empty black screen returned for 4,000 ms. There was a 5-min break between study and test. The test procedure was identical in both conditions. The study items consisted of 40 high distortions of the prototype. The test items consisted of 20 high distortions, 20 low distortions, 40 category non-members, and 4 repetitions of the prototype item itself. These were presented in a randomized order. The participants were told that all of the

items that they had just seen were instances of a category and that they would now see some new items, some of which belonged to the category and some did not. Each test trial consisted of a 3,000 ms fixation cross. Each test item appeared for 5,000 ms followed by a prompt to indicate whether the item was category member or not. After this, participants made a confidence rating on a continuous scale that ranged from 50% (guess) to 100% (certain). Results The mean endorsement rates for each item type for each condition are shown in Fig. 2. Three participants in the Control condition who made more than 95% stereotyped responses (i.e., endorsing or rejecting all items) were eliminated from all the analyses. The remaining data were entered into a split-plot ANOVA with Exposure (Preexposed vs. Control) as the between-subjects factor and Item (Prototype, Low distortion, High distortion, and Nonmembers) as the within-subjects factor. 2,3 There was a main effect of Item (F 1.51, 64.70 = 20.73, MSE = 0.08, P \ 0.01, g 2 p = 0.36), the effect of Exposure was marginal (F 1,43 = 3.40, MSE = 0.08, P = 0.07, g 2 p = 0.07). There was no interaction between the two (F 1.51,64.70 = 0.57, MSE = 0.08, P = 0.52, g 2 p = 0.01). The linear contrast across all levels of Item was reliable confirming the prototype enhancement effect (F 1,43 = 25.85, MSE = 0.09, P \ 0.01, g 2 p = 0.38). Planned comparisons revealed no differences between the groups in endorsement rates for the prototype items (t 43 = 0.82, se = 0.10, P = 0.42) the low distortions, (t 43 = 0.94, se = 0.06, P = 0.35) or the nonmembers (t 43 = 0.68, se = 0.06, P = 0.53). However, the pre-exposed group endorsed reliably more high distortions than the Control group (t 43 = 3.25, se = 0.04, P \ 0.01). This pattern of results is consistent with the view that the prototype enhancement effect following incidental exposure to exemplars is largely a product of transfer at test rather than abstraction from the study exemplars. However, there is some evidence for the episodic model in that Preexposed participants exhibited a preference for the high distortions over and above the Control participants. If so, we would expect that endorsements would be predicted by the similarity of the test items to the prototype in both the Pre-exposed condition and the Control condition, since this information is likely to be acquired at test. However, endorsements in only the Pre-exposed condition should be predicted by the similarity of the test items to the study 2 Alpha was set to a = 0.05 for all tests. 3 Degrees of freedom have been adjusted using the Greenhouse- Geisser correction in cases where the assumption of sphericity is violated. Fig. 2 Mean endorsement rates by condition in Experiment 1. Error bars are standard errors of the mean. P prototype items, LD low distortion items, HD high distortion items, NM category non-members exemplars, since this information can only be acquired from study. To test this, we conducted separate regression analyses for each participant using the method described by Lorch and Myers (1990) in which both sets of observed similarities were simultaneously used to predict the probability that each item would be endorsed. The mean standardized regression coefficients for each condition are shown in Fig. 3. The similarity of the test items to the prototype reliably predicted endorsement rates in both the Pre-exposed and the Control conditions (t 23 = 3.10, se = 0.05, P \ 0.01, and t 20 = 2.67, se = 0.07, P = 0.02, respectively). Consistent with our hypothesis, the similarity of the test items to the study items predicted endorsement rates in the Pre-exposed condition but not in the Control condition (t 23 = 3.05, se = 0.03, P \ 0.01, and t 20 = 0.99, se = 0.03, P = 0.34, respectively). Awareness The question of whether categorization is an implicit as suggested by the prototype model was addressed using the zero-correlation criterion (Dienes et al. 1995). If the knowledge that participants use to make their decisions is available to awareness (i.e., explicit), then they should be more confident in correct than incorrect decisions. This is best expressed as a difference score. Since the confidence ratings ranged from 50 to 100%, the resulting difference score ranges from 0 to 50. Scores that do not differ from zero, in cases where there is a clear behavioral effect, indicate implicit knowledge. Scores that differ reliably from zero reflect explicit knowledge. The mean confidence in both endorsements and rejections, and resulting difference scores are shown in Table 2. The difference scores in

Fig. 3 Mean standardized regression co-efficient for the observed similarity ratings used to predict endorsement rates. a refers to categorization in Experiment 1, b refers to recognition in Experiment 2, c refers to categorization and recognition in Experiment 4 the pre-exposed condition computed over all test items was greater than zero (t 23 = 2.04, P = 0.05). In the control condition, the difference scores were also reliably greater than zero (t 20 = 2.12, P \ 0.05). There were no reliable differences in the difference scores between the two conditions (t \ 1). The conclusion from this analysis is that the knowledge used to categorize the test items was, even in the case of the untrained controls, available to awareness. Although this result does not necessarily confirm that the participants have fully explicit structural knowledge it does imply that they have explicit judgement knowledge (Scott and Dienes 2010). Experiment 2 Evidence that categorization can proceed independently of episodic memory comes from task and functional dissociations between categorization of new items on the one hand and recognition of old items on the other (e.g., Reber et al. 2003; Squire and Knowlton 1995). In other tasks, such as notably artificial grammar learning these dissociations have been criticized as unfairly advantaging abstraction models (Kinder and Shanks 2003; Kinder et al. 2003). Experiment 2 is similar to Experiment 1 with the exception that instead of making decisions about category membership, the participants were asked whether they recognized the test items. If the high distortions used in the study and test periods are similar to one another, then we should see higher endorsement rate for these items in the pre-exposed group than in the control group. Alternatively, participants when asked whether they recognize study items could, in principle, show a prototype enhancement effect. This pattern would question the validity of the abstraction model because recognition judgements should not be affected by the similarity the test items to the prototype. This experiment provides a fairer test of the dissociation between recognition and categorization than those reported previously because the same set of items is used in each case. Table 2 Mean confidence ratings for endorsements and rejections, and resulting difference scores for each experiment Item Response Experiment 1 Experiment 2 Experiment 3 Experiment 4 Control Pre-exposed Control Pre-exposed Control Pre-exposed Recognition Categorization M se M se M se M se M se M se M se M se Prototype Endorsements 68.16 3.38 68.12 2.79 68.93 2.46 73.45 2.92 68.05 3.45 71.00 2.72 71.38 2.89 82.02 3.21 Rejections 62.01 4.31 64.03 2.52 65.06 2.72 63.30 3.35 60.04 2.84 61.56 2.93 70.99 5.01 68.11 7.29 Low distortions Endorsements 67.22 2.40 69.47 1.80 65.49 1.88 70.03 1.87 69.08 3.24 73.54 2.67 69.64 1.86 78.55 2.16 Rejections 64.24 1.61 64.69 1.69 61.60 1.31 67.87 1.78 66.38 4.07 64.86 1.73 66.45 2.28 69.23 2.74 High distortions Endorsements 66.31 2.67 68.34 1.82 62.68 1.55 68.40 1.17 63.56 1.77 74.38 3.99 66.49 1.70 72.95 1.80 Rejections 65.58 2.07 66.22 1.81 60.13 1.26 66.21 1.55 63.78 2.05 66.35 1.69 65.12 1.91 68.86 2.57 Non-Members Endorsements 64.52 2.32 67.28 1.54 62.42 1.51 66.74 1.68 63.24 1.68 66.81 1.55 65.37 2.08 70.78 2.12 Rejections 67.74 2.26 68.69 1.69 62.55 1.59 73.76 1.91 67.64 2.51 73.89 1.55 72.82 2.27 75.71 3.05 Difference score 2.45 1.16 2.50 1.23 3.09 1.03 4.74 1.20 2.85 1.55 7.54 1.22 3.80 1.35 7.53 1.64

Method Participants Forty-eight members of the University of Nottingham community took part in this study in return for a payment of 7.00 (approx 8.80). Eighteen were men and thirty were women. Their mean age was 21.88 years (sd = 3.30). Stimuli These were the same as in Experiment 1. Design and procedure These were the same as in Experiment 1 with the exception that participants were not informed that the exemplars belonged to a category and instead of making categorization decisions participants were asked whether they recognized the items from the study period. Results The mean endorsement rates for each item type for each condition are shown in Fig. 4. All the participants met the criterion for inclusion (i.e., none made more than 95% stereotyped responses). These data were entered into a split-plot ANOVA with Exposure (Pre-exposed vs. Control) as the between-subjects factor and Item (Prototype, Low distortion, High distortion, and Non-members) as the within-subjects factor. There was a main effect of Item (F 1.84,84.53 = 62.47, MSE = 0.04, P \ 0.01, g 2 p = 0.58), but no effect of Exposure (F 1,46 = 0.05, MSE = 0.06, P = 0.82, g 2 p \ 0.01). There was, however, a reliable interaction between the two (F 1.84,84.53 = 7.21, MSE = 0.04, P \ 0.01, g 2 p = 0.14). As in Experiment 1, the linear contrast across all levels of Item was reliable and indicates a prototype enhancement effect (F 1,46 = 104.05, MSE = 0.04, P \ 0.01, g 2 p = 0.69). The linear interaction between Item and Exposure was also reliable (F 1,46 = 8.85, MSE = 0.04, P \ 0.01, g 2 p = 0.16). Planned comparisons revealed no reliable differences in endorsements between conditions for the prototype items (t 46 = 1.27, se = 0.07, P = 0.21), the low distortions, (t 46 = 0.26, se = 0.05, P = 0.80), or the high distortions (t 46 = 0.99, se = 0.04, P = 0.33). However, the Pre-exposed condition rejected reliably more non-members than the Controls (t 46 = 4.51, se = 0.04, P \ 0.01). Regression analyses At first glance, this pattern of results appears to be inconsistent with the exemplar model since that would predict a preference for high distortions in the pre-exposed condition. However, this follows, only if one assumes that the high distortions are more similar to the study items than the other items. In fact, the similarity ratings of the high distortions to the study items are lower than the low distortions. The best evidence in favor of the exemplar model would be obtained if the similarity of the test items to the study items predicts endorsements in the Pre-exposed condition but not in the Control condition. The mean standardized regression co-efficients are shown in Fig. 3 panel B. The similarity of the test items to the study items reliably predicted endorsements in the Pre-exposed condition (t 23 = 6.00, se = 0.03, P \ 0.01) but not in the Control condition (t 23 = 0.06, se = 0.03, P = 0.95). The similarity of the test items to the prototype reliably predicted endorsements in both the Pre-exposed condition and the Controls (t 23 = 6.39, se = 0.04, P \ 0.01, and t 23 = 4.36, se = 0.04, P \ 0.01, respectively). Moreover, the similarity of the test items to the study items was a better predictor of endorsements in the Pre-exposed condition than Controls (t 46 = 4.44, se = 0.04, P \ 0.01). The similarity of the test items to the prototype was no better at predicting endorsements in the Pre-exposed than in the Control condition (t 46 = 1.50, se = 0.05, P = 0.14). Awareness Fig. 4 Mean endorsement rates by condition in Experiment 2. Error bars are standard errors of the mean. P prototype items, LD low distortion items, HD high distortion items, NM category non-members The mean confidence in both endorsements and rejections, and resulting difference scores are shown in Table 2. The mean difference computed over all test items was reliably different from zero in the pre-exposed condition (t 23 = 3.94, P \ 0.01), and when computed over the non-members and high distortions alone (mean = 4.66, se = 0.69, t 23 = 6.74,

P \ 0.01) independently. The difference scores for the untrained controls also differed reliably from zero for when computed over all items (t 23 = 3.01, P \ 0.01), and over the high distortions and non-members (mean = 1.34, se = 0.63, t 23 = 2.12, P \ 0.05). Between conditions, the difference scores were reliably higher in the pre-exposed condition for the high distortion items (t 46 = 3.35, P \ 0.01), but not when computed over all the items (t 46 = 1.05, P [ 0.05). Comparisons with Experiment 1 Comparisons of the endorsement behavior of the Preexposed groups in Experiments 1 and 2 show no difference in endorsements to the prototype items (t 46 = 0.67, se = 0.08, P = 0.51) or the low distortions (t 46 = 1.00, se = 0.05, P = 0.32). However, participants in Experiment 1 who had been instructed to categorize the test items endorsed more high distortions than those in Experiment 2 who were instructed to make recognition judgements (t 46 = 2.05, se = 0.04, P \ 0.05). By contrast, recognition instructions resulted in more rejections and fewer endorsements to category non-members than categorization instructions (t 46 = 3.01, se = 0.05, P \ 0.01). Computational simulations of dissociations between categorization and recognition in other paradigms suggest that the two decisions differ not in the underlying processes recruited but in the similarities of the items and response bias. That is, recognition decisions have a stricter criterion than categorization. Indeed, following pre-exposure recognition instructions resulted in fewer endorsements overall than categorization instructions (mean = 0.45, se = 03, mean = 0.55, se = 0.03, respectively: t 46 = 2.88, se = 0.04, P \ 0.01). Thus, it would seem that earlier studies that suggest that the reasons why participants do not endorse or categorize high distortions is not because the underlying process is one of the abstractions rather than memory, but because the high distortions are no more similar to the study items than are the low distortions. The prototype enhancement effect observed in these experiments does not appear to be a product of abstraction at study, but may be the result of transfer at test (Palmeri and Flanery 1999). Experiment 3 Experiment 3 tests the hypothesis that the reason that participants failed to endorse the high distortions as old in Experiment 2 is because they are not sufficiently familiar or similar to the study items. To test this, we simply changed the study items so that they are the same as the high distortion test items. This in effect makes the high distortion test items actually old and recognition judgements would be correct. Method Participants Forty-eight members of the University of Nottingham community took part in this study in return for a payment of 7.00 (approx 8.80). Twenty were men and twentyeight were women. Their mean age was 21.83 (sd = 4.74). Stimuli These were the same as in Experiments 1 and 2 except that a random selection of half the study items used in Experiment 1 were replaced with the high distortions used in the test. That is, in Experiments 2 and 3 participants made recognition decisions about the same items, but in Experiment 3, the high distortions had actually appeared during the study period. Design and procedure These were the same as in Experiment 2. Results The mean endorsements for each item are shown for each condition in Fig. 5. Three participants were excluded from the Control condition for making more than 95% stereotyped responses. The remaining data were entered into a split-plot ANOVA with Item as the within-subjects factor and Exposure as the between-subjects factor. There was a main effect of Item (F 2.02,86.88 = 30.53, MSE = 0.04, P \ 0.01, g 2 p = 0.42), but no main effect of Exposure (F 1,43 = 1.00, MSE = 0.06, P = 0.32, g 2 p = 0.02) indicating no difference in the overall proportion of endorsements between conditions. There was a reliable interaction between the two (F 2.02,86.88 = 6.41, MSE = 0.04, P \ 0.01, g 2 p = 0.13). The linear contrast across items was reliable indicating a prototype enhancement effect (F 1,43 = 43.68, MSE = 0.06, P \ 0.01, g 2 p = 0.50), and a reliable linear interaction between Item and Exposure indicating that the effect differed between conditions (F 1,43 = 4.00, MSE = 0.06, P \ 0.05, g 2 p = 0.09). Planned comparisons revealed no differences between conditions in endorsements to either the prototype or low distortions (t 43 = 0.91, se = 0.09, P = 0.37, and t 43 = 1.30, se = 0.05, P = 0.20, respectively). However, participants in the pre-exposed condition endorsed reliably more old high distortions than controls (t 43 = 3.72, se = 0.04, P \ 0.01), and reliably fewer

Fig. 5 Mean endorsement rates by condition in Experiment 3. Error bars are standard errors of the mean. P prototype items, LD low distortion items, HD old high distortion items, NM category nonmembers non-members (t 43 = 4.20, se = 0.04, P \ 0.01). This is consistent with the episodic model. Awareness The mean confidence in both endorsements and rejections, and resulting difference scores are shown in Table 2. The difference scores were reliably greater than zero in the preexposed condition (all items, t 23 = 6.18, P \ 0.01, and high distortions mean = 7.55, se = 1.67, t 23 = 4.52, P \ 0.01). The control condition had difference scores greater than zero for the high distortions (mean = 2.13, se = 0.67, t 20 = 3.10, P \ 0.01), but not over all items (t 20 = 1.83, P [ 0.05). The difference scores were reliably higher in the pre-exposed condition when computed over all the test items (t 43 = 2.41, P \ 0.05), and the high distortions (t 43 = 2.88, P \ 0.01). Experiment 4 A series of studies have reported that different brain regions are involved in the categorization and recognition of prototype distortions (Aizenstein et al. 2000; Little et al. 2006; Little and Thurlborn 2006; Reber et al. 2003; Reber et al. 1998a, b; Vogels et al. 2002). In general, these studies a priori favor the prototype model, largely it seems on the assumption that categorization is preserved in amnesic patients. Of particular relevance, here are the reports of direct comparisons between recognition and categorization. For instance, Reber et al. (1998a) observed that when participants were asked to make categorization decisions there was decreased activity in the posterior occipital cortex. By contrast, when participants were given recognition instructions stimuli that had previously been seen in the study list (i.e., old items) were associated with increased activity in the posterior occipital cortex (P. J. Reber et al. 1998a). Both Aizenstein et al. (2000) and Reber et al. (2003) report differences in cortical activity between participants who acquired their category knowledge incidentally (and by inference implicitly) and those who intentionally attempted to learn the category. In short, Reber et al. observed that incidental learning resulted in decreased occipital activity (for category members); by contrast, recognition resulted in increased activity in areas that included the hippocampus. Despite the involvement of different regions, the two encoding manipulations actually resulted in behavioral categorization performance that qualitatively similar. Indeed, a recent study has argued that these differences reflect the instructions given to participants at the time of study rather than differences that arise from separate cognitive processes (Gureckis et al. 2010). Experiment 4 was designed to test directly whether instructions to learn the category structure from study items results in an explicit representation of the prototype. A second group of participants were instructed to memorize the study items and were given a recognition test for the same set of items. The prototype model predicts a robust prototype enhancement effect (i.e., linear effect of item) in the Categorization condition that should, in the regression analyses, be strongly related to the observed similarity of the test items to the prototype. In the Recognition condition, there should be a reduced prototype enhancement effect (i.e., a condition by item linear interaction), and endorsements should be relatively more closely related to the observed similarity of the test items to the study exemplars. On the other hand, if instructions to learn the category structure does not result in abstraction, and instead participants by default memorize the study items; and the decision process behind categorization and recognition are the same, we might instead see an overall difference in bias, but not in qualitative behavior. That is, participants in the memory condition may, as predicted by the episodic model, merely be more reluctant to endorse items as old than participants in the categorization condition are to endorse items as category members. Method Participants Forty-eight members of the University of Nottingham community took part in this study in return for a payment of 7.00 (approx 8.80). Twenty-three were men and twenty-five were women. Their mean age was 23.40 (sd = 5.13).

Stimuli These were the same as in Experiments 1 and 2. Design and procedure These were the same as in Experiment 1 with the exception that in the Recognition condition the participants were instructed to memorize the study items and at test to make recognition decisions. In the Categorization condition, the participants were instructed to learn the category structure from the study items and subsequently made categorization decisions at test. Results The mean endorsement rates for each item type and each condition are shown in Fig. 6. These data were entered into a split-plot ANOVA with Item as the within-subjects factor and condition as the between-subjects factor. There was a main effect of Item (F 2.24,100.60 = 58.97, MSE = 0.05, P \ 0.01, g 2 p = 0.57). There was a main effect of condition = 0.10), but no (F 1,45 = 5.18, MSE = 0.06, P \ 0.05, g 2 p interaction between the two (F 2.24,100.60 = 0.41, MSE = 0.05, P [ 0.05, g 2 p \ 0.01). The linear contrast over Items was reliable (F 1,45 = 88.62, MSE = 0.06, P \ 0.01, g 2 p = 0.66) but the linear contrast interaction between conditions and items was not (F 1,45 = 0.14, MSE = 0.06, P [ 0.05, g 2 p \ 0.01). Overall, participants in the Categorization condition made more endorsements than participants in the Recognition condition (t 45 = 2.31, P \ 0.05). We interpret these data in terms of a more liberal response criterion following categorization instructions than recognition instructions, but see no difference in terms of the underlying representations. Indeed, contrary to what one might expect from a prototype representation, participants in the Categorization condition were more likely to endorse the high distortions as category members than participants in the Recognition condition were to endorse them as old (t 45 = 2.54, P \ 0.05). There were no reliable differences in endorsements to the other items (Prototype t 45 = 1.02, P [ 0.05, Low distortions, t 45 = 1.60, P \ 0.05, Non-members t 45 = 0.67, P [ 0.05). Regression analyses The mean standardized regression coefficients are shown in Fig. 3 Panel C. In the categorization condition, both similarity to the study exemplars and similarity to the prototype were reliable predictors of endorsements (t 22 = 4.98, P \ 0.01, and t 22 5.17, P \ 0.01, respectively). There was no difference between the two t 22 = 1.64, P [ 0.05). The same pattern was obtained in the Recognition condition (similarity to the study items t 23 = 4.65, P \ 0.01 and similarity to the prototype t 22 6.01, P \ 0.01), no reliable difference between the two (t 23 = 1.97, P [ 0.05). Individual regressions analyses revealed no difference in slopes between the two conditions for either the prototype or the exemplar similarity ratings (both t values \ 1). Awareness The mean difference score in each condition are shown in Table 2. Difference scores were reliably greater than zero in both conditions suggesting that knowledge was explicit (Categorization: t 22 = 4.58, P\ 0.01, Recognition: t 23 = 2.82, P = 0.01). There was no reliable difference between the two conditions (t 45 = 1.76, P[ 0.05). Comparisons with Experiment 1 Fig. 6 Mean endorsement rates by condition in Experiment 4. Error bars are standard errors of the mean. P prototype items, LD low distortion items, HD old high distortion items, NM category nonmembers Are there any differences in behavior, representation, and awareness between participants who in Experiment 4 were asked to intentionally learn the category structure and those who Experiment 1 did so incidentally? A cross-experiment comparison in the endorsement rates for each item type show no differences for the prototype (t 45 \ 1.0), the low distortions (t 45 = 1.09, P [ 0.05), or the high distortions (t 45 = 1.78, P [ 0.05). Intentionally trained participants in Experiment 4 endorsed fewer category non-members than incidentally trained participants in Experiment 1. Overall, behavior in the two conditions would seem to be qualitatively similar. There were no reliable differences between the regression coefficients for the two similarity ratings (similarity to prototype t 45 = 1.41, P [ 0.05; similarity to exemplars t 45 = 1.38, P [ 0.05). We conclude from this

that the same sort of underlying representation of the category structure is the result of both intentional and incidental learning. Finally, do the study instructions result in different levels of awareness? The differences scores for confidence ratings did indeed differ: The difference score obtained in intentionally trained participants was higher than in incidentally trained participants (t 45 = 2.46, P \ 0.05). Note, however, that the difference score obtained in Experiment 1 was greater than zero indicating that this knowledge was in any case explicit. General discussion The experiments reported here were designed to discriminate between three models of categorization in the classic Posner and Keele (1968) prototype distortion task. To discriminate between the prototype and episodic models, we first obtained two sets of similarity ratings for each test item. One set measured the similarity of the test items to the prototype and the other set measured the similarity of the test items to the study exemplars. These ratings were then used to predict the endorsement behavior of participants in subsequent experiments in order to determine whether category learning in this task is based on prototype abstraction or memory for the study items. We also used untrained control groups to examine the extent to which transfer at test could account for the prototype enhancement effect. In Experiment 1, the categorization behavior of preexposed and untrained controls was compared directly in order to test the transfer at test hypothesis (Palmeri and Flanery 1999; Zaki and Nosofsky 2004). Both groups exhibited a similar prototype enhancement effect, which in the case of the untrained controls cannot be the result of abstraction during the study period. However, there was a benefit from pre-exposure in the sense that participants in this condition were more likely to endorse the high distortions as category members relative to the untrained controls. Regressions analyses revealed that categorization in both conditions were reliably predicted by the similarity ratings of the test items to the prototype. This, in the case of the untrained controls at least, is supportive of the transfer at test model. However, in the pre-exposed condition categorization was also related to the similarity ratings of the test items to the study items. The conclusions from these data are that the prototype enhancement effect is the result of a combination of learning at test and categorization some episodic knowledge of the study items. Moreover, this knowledge was to some extent explicit in the sense that the participants were more confident when they were correct than when they were not. Although this implies a degree of awareness of judgement knowledge, it does not necessarily imply that the structural knowledge was fully conscious (c.f. Scott and Dienes 2010). We conclude from Experiment 1 that the representation of category structure derived from study exemplars involves explicit episodic memory. Experiments 2 and 3 examined whether instructions to recognize study items at test would eliminate the prototype enhancement effect. We ensured that the recognition test was identical to categorization test used in Experiment 1 in all respects other than the instructions. If we assume that the high distortions used at test are similar to their counterparts used as study exemplars, then we might predict that participants should be more willing to accept high distortions as old. However, recognition instructions resulted in behavior that appears qualitatively similar to categorization. When, in Experiment 3, the high distortions used at test were the same the study items, and all other aspects of the design are identical, participants behavior was consistent with an episodic model. One previous study has intentionally manipulated the familiarity of the high distortion test items (Zaki and Nosofsky 2007). This study also reported a preference for high distortion items that were also accounted for by a simple episodic plus transfer at test model. Experiment 4 was designed to produce conditions that would result in different representations if recognition and categorization were truly dissociable processes. If prototype abstraction occurs at all, then we would expect that instructions to learn the category followed by instructions to categorize items would result in qualitatively different behavior than participants who are instructed to memorize the study items and who subsequently make recognition decisions. On the other hand, if categorization and recognition are predicated on the same representations and processes, and differ only in response criterion, then the only difference should be one of the bias or willingness to accept items. We observed the latter pattern of results. We conclude that even when participants are instructed to learn the category, their strategy is to memorize the study items rather than to abstract the prototype. Recognition and categorization decisions differ only in response criterion. It is not the case that we have eliminated prototype abstraction as a candidate model of categorization. In each experiment, we observed that the similarity of the test items to the prototype was a reliable predictor of endorsements. However, this was observed even when the participants were instructed to make recognition decisions, and when participants received no pre-exposure. So any prototype abstraction that did occur most likely happened during the test and not the study period. Transfer at test does not, however, explain all of the results obtained here. In each experiment, the similarity of the test items to the study items is a reliable predictor of endorsements even