Necessity, possibility and belief: A study of syllogistic reasoning

THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2001, 54A (3), 935 958 Necessity, possibility and belief: A study of syllogistic reasoning Jonathan St. B.T. Evans, Simon J. Handley, and Catherine N.J. Harper University of Plymouth, Plymouth, UK The present study extended the investigation of the belief bias effect in syllogistic reasoning in two ways: (1) The effect was studied under instructions to decide whether conclusions were possible, as well as necessary, given the premises; and (2) the effect was studied for types of syllogism where people rarely endorse the conclusions as well as those (valid and fallacious) where endorsements are common. Three experiments are reported, which show first that there is a marked tendency to reject unbelievable conclusions relative to abstract or neutral controls on all kinds of syllogism and under both types of instruction. There was also significant evidence of positive belief bias (increased acceptance of believable conclusions) and of interactions between belief bias effects and logical form. The results are discussed with particular respect to accounts of belief bias offered by theorists in the mental-model tradition. The study of human reasoning is a topic of central interest to cognitive psychology. Deductive reasoning is the process by which we bring together different pieces of knowledge and infer conclusions that were only latent or implicit in what we already believed. Without deduction, we would have to store massive amounts of redundant information. With it, we can store our beliefs in the form of rules and generalizations, whose consequences can be inferred when required in a given context. It is not surprising then that both philosophers and psychologists have been concerned by the mass of evidence of logical error and bias that has been accumulated in psychological experiments on deduction (see Evans, Newstead, & Byrne, 1993; Manktelow, 1999, for recent reviews of this field). As a result, there has been a considerable debate about the implications of such research for human rationality (see Evans & Over, 1996; Stanovich, 1999). Two recurring themes in the debate about rationality and reasoning are the following: (1) Does formal logic provide an adequate normative model against which to assess the accuracy of reasoning?; and (2) do psychological experiments provide an externally valid measure of real-world reasoning ability? For example, psychological experiments may have underestimated people s everyday reasoning abilities by expecting participants to be able to respond to Requests for reprints should be sent to Jonathan St.B.T. Evans, Centre for Thinking and Language, Department of Psychology, University of Plymouth, Plymouth PL4 8AA, UK. Email: J.Evans@plym.ac.uk The research reported in this paper was supported by a research grant from the Economic and Social Research Council of the United Kingdom (R000221742). Ó 2001 The Experimental Psychology Society http://www.tandf.co.uk/journals/pp/02724987.html DOI:10.1080/02724980042000417

936 EVANS, HANDLEY, HARPER instructions that require them to disregard prior beliefs and reason only on the basis of the information presented. If the adaptive mechanisms that underlie effective everyday reasoning are to some extent implicit and beyond conscious control, the ability of participants to respond to such laboratory experiments will be highly constrained (see Evans & Over, 1996, for extended discussion of this problem). A good example of the problem is illustrated by the so-called belief bias effect in syllogistic reasoning (see Evans, Newstead, & Byrne, 1993; chap. 8, for a review of the relevant literature). This effect, which has been known for many years, is usually described as a tendency for people to produce or endorse conclusions that they believe to be true, regardless of their logical validity. The effect is strikingly demonstrated in studies that require people to decide whether a presented conclusion follows from some premises known as the syllogistic evaluation task (e.g., Evans, Barston, & Pollard, 1983; Newstead, Pollard, Evans, & Allen, 1992) but has also been shown to influence responding on tasks where people are given some premises and asked to generate their own conclusions known as the production task (e.g., Cherubini, Garnham, Oakhill, & Morley, 1998; Oakhill, Johnson-Laird, & Garnham, 1989). The classic pattern of findings is well illustrated by the evaluation task data of Evans et al. (1983). Before we examine their data, a brief introduction to the form and logic of syllogisms is required. A syllogism consists of two categorical premises followed by a conclusion. Premises and conclusions can each have one of four moods known as A, E, I, O: A All A are B Universal affirmative E No A are B Universal negative I Some A are B Particular affirmative O Some A are not B Particular negative The validity of a syllogism depends both upon the mood of the premises and conclusion and upon the figure. Figure refers to the order in which terms are arranged. In this paper we follow the convention of referring to the three terms in the syllogisms as A, B, and C, where A is linked to B in the first premise, B to C in the second, and A to C in the conclusion. Classically, syllogistic figure is described by the arrangement of terms in the premises and given conclusion. However, we follow here the convention of Johnson-Laird and Bara (1984) in describing four figures of premise arrangement as follows: Figure 1 AB BC; Figure 2 BA CB; Figure 3 AB CB; Figure 4 BA BC Each of these can paired with a conclusion in the form AC or CA making eight combinations. As there are 64 possible moods of a syllogism (a choice of 4 for each premise and conclusion) there are thus 512 possible syllogisms counting both orders of premises (256 logically distinct). The great majority of these are invalid. For example, the following syllogism combines Figure 3 premises with an AC conclusion and the mood OAO: Some A are not B All C are B Therefore, some A are not C

NECESSITY, POSSIBILITY AND BELIEF 937 This syllogism is valid. However, if we change the conclusion mood to I: Some A are C it becomes a fallacy. Research on syllogistic reasoning has shown that reasoning can be biased both by the mood of the three statements and the figure in which the syllogism is arranged (see Evans, Newstead, & Byrne, 1993, for detailed review). These factors were controlled in the syllogisms used by Evans et al. (1983) as shown in Table 1. There were four categories of syllogism: valid believable, valid unbelievable, invalid believable, or invalid unbelievable. If we examine the four examples shown, we see that they all have the same mood (EIO) and the same figure of premises (3 AB CB) in all cases. However, conclusion order had to be changed to produce both valid (CA) and invalid (AC) conclusions. This is not important, as Figure 3 premises are known not to cause biases. The so-called figural bias (Johnson-Laird & Bara, 1984) consists of a preference for AC conclusions with Figure 1 (AB BC) and for CA conclusions on Figure 2 (BA CB). The thematic terms have been entered in such a way as to make one of the valid syllogisms yield a believable conclusion and one an unbelievable conclusion. A similar method yields both believable and unbelievable conclusions for invalid syllogisms. The rates of endorsement of syllogisms of each type shown in Table 1 are accumulated across three experiments (n = 120). Three clear findings emerged in this study: (1) People endorsed more valid than invalid conclusions; (2) they endorsed more believable than unbelievable conclusions the belief bias effect; (3) the extent of the belief bias is more marked on invalid than on valid syllogisms. All three effects were highly statistically significant. All have been replicated in several subsequent studies using evaluation task methodology (e.g., Evans, Allen, Newstead, & Pollard, 1994; Lambell, 1998; Newstead et al., 1992). There are two reasons why the belief bias effect may not be as irrational as at first appears. First, there is an a priori argument (see Evans & Over, 1996; Evans, Over, & Manktelow, 1993) that in real life it is adaptive to reason from all relevant belief. That is, given some information it is effective for the individual to retrieve from memory associated and relevant knowledge and add it to the given premises. It is also arguably adaptive to disregard premises that are unbelievable given one s background beliefs, and recent studies have shown that people do this. For example, straightforward conditional inferences may be withheld if people disbelieve TABLE 1 Examples of the four kinds of syllogism used by Evans, Barston, and Pollard (1983) together with the overall acceptance rates for each type Category Example Acceptance rate a Valid believable No police dogs are vicious 89% Some highly trained dogs are vicious Therefore, some highly trained dogs are not police dogs Valid unbelievable No nutritional things are inexpensive 56% Some vitamin tablets are inexpensive Therefore, some vitamin tablets are not nutritional Invalid believable No addictive things are inexpensive 71% Some cigarettes are inexpensive Therefore, some addictive things are not cigarettes Invalid unbelievable No millionaires are hard workers 10% Some rich people are hard workers Therefore, some millionaires are not rich people a In percentages.

938 EVANS, HANDLEY, HARPER the conditional premise (George, 1995; Stevenson & Over, 1995). The combination of these effects could certainly explain why unbelievable conclusions lead to lower inference rates. For example, if the unbelievable conclusion is based on a valid argument, then at least one premise or combination of premises will themselves be unbelievable. There is also a case for arguing that belief bias is not really a bias at all, but actually a debias. This is based upon the observation that when syllogistic arguments that are abstract or neutral in nature are presented to people then very high rates of endorsement of fallacious arguments are usually observed. A fallacy is an argument whose conclusion could be true given the premises, but does not need to be true such as the invalid arguments shown in Table 1. Most studies of belief bias have not included control conditions with belief-neutral conclusions. Those that have (e.g., Evans & Pollard, 1990; Newstead et al., 1992) have shown that belief bias is primarily a negative effect. That is to say people do not draw more conclusions when they are believable; rather they draw fewer conclusions when they are unbelievable. If we reconsider the data of Evans et al. (1983) shown in Table 1, we note that there was a significant belief by logic interaction. This reflects the fact that the main influence of belief is to suppress fallacies. Invalid conclusions were accepted 71% of the time when believable, but only 10% of the time when unbelievable. It is true that there was a belief bias on valid arguments as well in this study, but it was significantly weaker and has generally been observed to be small or absent in replication studies. Why should people have a strong tendency to draw fallacious conclusions unless these are unbelievable? Before answering this question directly, we need to consider what we know about the mental processes involved in syllogistic reasoning, and in particular to discuss in some detail a recent study that provides the rationale for several of the manipulations in the current study. First, Evans and Over (1996) have argued that our habitual method of reasoning is inductive, rather than deductive, and that we tend to focus on a single model of the world in hypothetical reasoning. This tendency persists in the laboratory even when instructions to reason deductively are given. This argument is strongly supported by a recent study of syllogistic reasoning reported by Evans, Handley, Harper, and Johnson-Laird (1999). This paper contains a very large syllogistic reasoning experiment (their Experiment 2) in which participants were asked to evaluate four conclusions for each of the 64 possible premises pairs. All problems used arbitrary problem content, so that prior beliefs were not involved in this experiment. One group evaluated conclusions in AC order and the other in CA order, so that all 512 syllogisms were evaluated. Moreover, a further two groups were constructed according to the type of instruction. Some participants received instructions to make judgements of logical necessity that is, they were required to endorse conclusions that must be true if the premises were true. The others were asked to make judgements of possibility that is, to endorse conclusions that could be true, if the premises were true. The experiment was designed to test several predictions derived from the mental-model theory of reasoning (Johnson-Laird & Byrne, 1991). These predictions were confirmed but are not relevant for our present purposes. Of particular interest here is an unexpected finding that emerged as a result of asking participants to evaluate all possible syllogisms (to our knowledge this is the first study to have done this). We discovered that some fallacies are consistently endorsed. We termed these possible strong problems as their conclusions are possible (but not necessary) given the premises, and there is a strong tendency for people to endorse them. However, we also discovered a set of potential fallacies that were very rarely made possible weak. In fact, possible strong problems

were endorsed almost as often as necessary (valid) problems and possible weak almost as infrequently as impossible problems those where the conclusion must be false if the premises are true. Consider the following syllogism: All B are A Some C are B Therefore, some C are not A The majority of participants asked to assess the validity of this syllogism say that the conclusion necessarily follows from the premises. In fact, it is a fallacy and hence a syllogism of the type we call possible strong. The same premises can be paired with an alternative conclusion Therefore, all C are A. This too is a potential fallacy: It could be true given the premises. However, few reasoners endorse this conclusion. Hence, it is of the type we call possible weak. This finding has an important implication for the mental-model theory of syllogistic reasoning (see Johnson-Laird & Bara, 1984; Johnson-Laird & Byrne, 1991). The theory proposes that reasoning proceeds in the stages: 1. Model formation. The reasoner forms an initial model from the premises. 2. Conclusion formation. The reasoner derives a putative conclusion from the model, which is informative (e.g., not a repetition of a premise). 3. Conclusion validation. The reasoner searches for a counter-example; that is, a model in which the premises are true but the conclusion is false. If no such model is found the conclusion is valid. We frame our discussion here in terms of the mental-models account following the same practice by Evans et al. (1999). In terms of this theory, the findings of Evans et al. suggest that there may normally be rather little searching for counter-examples (Stage 3) going on. All fallacies yield at least two mental models of the premises: One in which the conclusion is true and one in which it is false. Suppose that reasoners normally consider only one model of the premises and that they endorse the conclusion if and only if it is supported in this model. In this case, possible strong problems would be those in which the model that occurs to people supports the conclusion and possible weak those in which the model that comes to mind does not support the conclusion. Let us consider the earlier examples in more detail to illustrate this. First consider the possible strong syllogism: All B are A Some C are B Therefore, some C are not A NECESSITY, POSSIBILITY AND BELIEF 939 Reasoners probably imagine a situation (mental model) of the premises which looks like this: [b] a 1 [b] a c a c

940 EVANS, HANDLEY, HARPER where a, b, and c are tokens representing exemplars of the categories A, B, and C. The square brackets around b mean that it is exhausted with respect to a s. That is, it cannot occur in any model where a is absent. In this representation, Some C are B is interpreted as meaning that c s occur sometimes (but not always) when b is present and sometimes when it is absent. On this basis, the conclusion Some C are not A would follow. However, the premises are also logically consistent with other situations such as: [b] a c 2 [b] a c a c in which the conclusion does not follow. Now consider the same premises with the alternative (possible weak) conclusion: All B are A Some C are B Therefore, all C are A This conclusion is also possible given the premises, but is rarely endorsed by participants. The explanation for the difference between these two cases is that the premises suggest Model 1 rather than Model 2. As Model 1 comes to mind, the conclusion (PS) that it supports is endorsed. Model 2 is not generally considered, so the fallacious conclusion (PW) that it supports is generally avoided. Evans et al. (1999) report that Johnson-Laird s computational model fits with this analysis. The program produces the alternative mental models in a particular order. In nearly every case, the first model generated supports the conclusion for possible strong but not for possible weak syllogisms. Let us now return to the case of belief bias. The invalid syllogisms used by Evans et al. (1983) were the type that we now know to be possible strong that is, fallacies that would normally be made with abstract content. The same is true of subsequent studies that have demonstrated similar effects including those reported by Evans et al. (1994), Lambell (1998), and Newstead et al. (1992). Now, the explanation of belief bias originally advanced by mentalmodel theorists (see Oakhill & Johnson-Laird, 1985) embodied the belief that the effect is due to endorsement of believable conclusions. Hence, it was proposed that the presence of a believable conclusion would reduce the motivation to search for counter-examples leading to a fallacy. However, as stated earlier we now know that the belief bias demonstrated in the literature with possible strong fallacies is largely a negative effect. That is, fallacies that would normally be made are withheld when the conclusion is unbelievable. Hence, it would be more appropriate to say that people do not normally search for counter-examples (also consistent with Evans et al., 1999), but that an unbelievable conclusion may motivate them to do so. This account is lent credence by the evidence in the literature that people can search for counter-examples to syllogisms when specifically instructed to do so (see, in particular, Bucciarelli & Johnson-Laird, 1999, Experiment 3). In the present study we extend the experiments of Evans et al. (1999) to the study of belief effects. First, we test for belief bias under instructions for possible as well as necessary inference this has not been done previously in the literature. We term these the necessity and

NECESSITY, POSSIBILITY AND BELIEF 941 possibility conditions. Second, we investigate reasoning with examples of the four syllogism types identified by Evans et al. necessary (valid conclusion follows), possible strong, possible weak, and impossible (conclusion cannot be true if the premises are true). It is not clear that the revised model theory account of belief bias given earlier could predict the presence of belief bias under possibility instructions for possible strong problems. To decide that a conclusion is possible requires only that one discovers a model that supports the conclusion no search for counter-examples is required. Hence, there is no basis to defeat an unbelievable conclusion supported by the model that initially comes to mind. The first model considered established the possibility of the conclusion regardless of any other model that may be found. For this reason, it is of considerable interest to see whether or not the belief bias that is normally observed on possible strong syllogisms is maintained under instructions to decide the possibility as opposed to the necessity of the conclusion. The present study also includes for the first time to our knowledge the opportunity to test for belief bias on possible weak syllogisms, that is fallacies that would not normally be made with abstract materials. Acceptance rates are so low that the normal negative belief bias is unlikely to be found. Might there, however, be a positive belief bias? That is, could we find that a fallacy that is normally withheld with an abstract conclusion is made when its conclusion is believable? Evans et al. (1999) found overall that substantially more conclusions were endorsed under possibility than necessity instructions but that this trend interacted with the logical type of the argument. The elevation of acceptance rates was particularly marked on problems whose conclusions were only possible and in particular those classified as possible weak. What this finding suggests is that contrary to the lack of search for counter-examples to disprove the necessity of conclusions on possible strong problems, there was some tendency to search for alternative models to establish the possibility of conclusions on possible weak problems. Such a trend might well be elevated when the conclusion to be proved is believable. The model theory predictions that follow from this analysis can be summarized as follows: 1. There should be a negative belief bias on possible strong problems, but only under instructions for necessity. 2. There should be a positive belief bias on possible weak problems, but only under instructions for possibility We should note that these predictions, and the earlier discussion of possible strong and weak fallacies are predicated on the assumption that people reason forwards by constructing a model of the premises and attempting to derive a conclusion, and then endorsing a stated conclusion if it matches. The mental-model theory has, however, been principally tested with production tasks, where conclusions are not given and need to be derived by the reasoner, rather than evaluation tasks in which a conclusion is stated for evaluation as in the study of Evans et al. (1999) and in the current paper. The possibility that presentation of a conclusion changes the process of reasoning is considered in detail later in the paper. EXPERIMENT 1 In order to achieve the purposes of this research, it was necessary first to select syllogisms that fall into the four categories necessary (N), possible strong (PS), possible weak (PW), and

942 EVANS, HANDLEY, HARPER impossible (I). We show the full list of selected syllogisms in Table 2. We chose a pair of premises in each of the four figures that could be combined with conclusions of all four types in the AC order. The procedure was repeated for conclusions in CA orders making a total of 32 distinct syllogisms. The classification of N and I conclusions is strictly logical: On N problems, the conclusion is necessary given the truth of the premise, and on I problems the conclusion is impossible given the truth of the premises. PS and PW problems are potential fallacies: That is, syllogisms where the conclusion could be true given the premises but need not be true. Under instructions to judge the Necessity of conclusions, participants should logically reject both PS and PW problems. However, those categorized as PS are those that were frequently endorsed in the study of Evans et al. (1999) under such instructions, whereas those labelled PW were mostly rejected in that study. The acceptance rates in the earlier study are shown in TABLE 2 Syllogistic forms used in Experiments 1 and 2 Acceptance Acceptance Figure Conclusion AC rate a Conclusion CA rate a 1 Premises Some A are B Some A are B All B are C All B are C Conclusions N Some A are C 87 Some C are A 83 I No A are C 10 No C are A 7 PS Some A are not C 80 Some C are not A 67 PW All A are C 7 All C are A 13 2 Premises All B are A All B are A Some C are B Some C are B Conclusions N Some A are C 87 Some C are A 83 I No A are C 7 No C are A 13 PS Some A are not C 73 Some C are not A 53 PW All A are C 3 All C are A 10 3 Premises Some A are not B All A are B All C are B Some C are not B Conclusions N Some A are not C 90 Some C are not A 80 I All A are C 3 All C are A 0 PS Some A are C 77 Some C are not A 57 PW No A are C 7 No C are A 17 4 Premises Some B are A Some B are A All B are C All B are C Conclusions N Some A are C 77 Some C are A 73 I No A are C 7 No C are A 10 PS Some A are not C 73 Some C are not A 67 PW All A are C 7 All C are A 13 N = necessary, I = impossible, PS = possible strong, PW = possible weak. a In percentages. From Evans et al. (1999), Experiment 2.

Table 2. Under instructions to judge Possibility, of course, conclusions to both PS and PW problems should be endorsed. However, in this case again, the PS problems were much more frequently endorsed in the Evans et al. study. Readers will note a confounding between the category of the syllogism and the mood of the conclusion. The necessary and possible strong conclusions are all in the particular (some or some not) form, whereas the possible weak and impossible conclusions are in the universal (all or no) forms. This confounding is inescapable. In the case of the determinate syllogisms, most valid syllogisms have particular conclusions, and most invalid ones have universal conclusions. This is because particular statements make for weaker claims; that is, they assert less information. In the case of the possible syllogisms our main source of interest we are obliged by definition to classify these as strong or weak according to the preferences of our participants, which are heavily in favour of particular conclusions. One can argue that this preference reflects the relative information value of the conclusions, and a detailed theory of syllogistic inference has recently been presented along these lines (Chater & Oaksford, 1999). This confounding does not, however, interfere with the main purpose of our study, which is to examine the effects of believability in fallacious conclusions that generally are or are not endorsed by participants (for whatever reason). Although our hypotheses have been framed in terms of possible problems only, we have also included the logically determinate necessary and impossible syllogism types in Experiments 1 and 2. The rationale in terms of the mental-model theory presented earlier would not lead to the prediction of either positive or negative belief bias effects on these types. This is because there are no counter-example models to be found in the case of unbelievable but necessary conclusions, and no example models to be found in the case of believable but impossible conclusions. Inclusion of these cases provides some control for the confounding problem discussed earlier. For example, if the pattern of positive and negative belief bias found on possible strong and possible weak problems differs due to the difference between particular and universal conclusions, then parallel patterns should be found in comparison of reasoning with necessary and impossible syllogisms. In fact, such parallel trends would have to be predicted by an information-based theory such as that of Chater and Oaksford (1999), which posits no deductive component for distinguishing valid from invalid syllogisms. Experiment 1 investigated reasoning with the problems shown in Table 2 using abstract materials. The purpose of this was first to provide a manipulation check to ensure that the different conclusion types were rated in a similar manner to that observed by Evans et al. (1999). The second purpose was to provide a base-line comparison for the thematic syllogisms to be used in Experiment 2, in order to judge whether any belief bias effects observed in the second experiment were positive or negative in nature. With this second purpose in mind, Experiments 1 and 2 were run simultaneously and sampled from the same participant population. Method Design NECESSITY, POSSIBILITY AND BELIEF 943 Participants were run in two groups, necessity and possibility. The former group were required to decide whether the conclusion presented must be true given the premises, and the latter had to decide whether the conclusion could be true given the premises. Participants in each group rated all 32 syllogisms in an individually randomized order.

944 EVANS, HANDLEY, HARPER Participants A total of 50 undergraduate students of the University of Plymouth were recruited as paid volunteers to participate in the experiment; 25 were allocated to each of the two groups. Materials and procedure All participants were tested individually by computers running a custom-written program. Instructions were presented in written form on sheets set by each terminal in the laboratory. Participants were asked to read these and then to start as soon as they were ready. They were instructed that the experiment was designed to find out how people solve logical reasoning problems. It was explained that the problems would be presented one at a time on separate screens. The problem layouts were described, and participants were instructed to click in the YES or NO boxes to make their response. Necessity participants were instructed to click YES if the conclusion must follow from the premises and NO if it did not follow. Possibility participants were instructed to click YES if the conclusion could follow from the premises and NO if it could not. Following instructions and training in use of the mouse as a response device, each of the 32 syllogisms was presented on a single screen with the following example layout: GIVEN THAT Some P are not J All J are T IS IT NECESSARY [POSSIBLE] THAT Some P are T The YES and NO response boxes were displayed on the screen underneath this information. Results and discussion There was no evidence in this experiment nor in those that followed that the figure of the syllogistic premises affected responding, or interacted with the conclusion order used. Although this finding appears to conflict with report of figural bias by Johnson-Laird and Bara (1984) it is important to note that they used the syllogistic production task (participants generate their own conclusion, given the premises) whereas we used the conclusion evaluation method. Recently, Lambell (1998) has demonstrated directly that figural bias is largely restricted to the production task, whereas belief bias is predominantly found with the evaluation task. This makes sense from a model theory perspective, because models are generated from premises in the production task, which then lead to conclusion construction. If people are trying to construct models to fit conclusions on the evaluation task, as our introduction implies, then the shape of the premises will have little relevance. In any event, as figural effects were absent, we report our analyses with the data pooled over the four figures used. The factors for the analysis of variance (ANOVA) were thus instruction type (necessity vs. possibility, between groups), conclusion type (N, PS, PW, I within participant), and conclusion order (AC, CA within participant). Collapsing across figure permitted acceptance scores in the range 0 4 in each cell. Acceptance rates were expressed as proportions for the purposes of the ANOVA. The frequency of acceptance of conclusions in Experiment 1 is shown in Figure 1. As in the study of Evans et al. (1999), more conclusions were accepted under instructions for possibility

NECESSITY, POSSIBILITY AND BELIEF 945 Figure 1. Percentage acceptance of conclusions in Experiment 1 (abstract syllogisms). (58%) than necessity (47%), F(1, 48) = 18.69, MSE = 0.080, p <.001, and acceptance rates were significantly influenced by the type of conclusion, F(3, 144) = 89.54, MSE = 0.099, p <.001. Again as in Evans et al., the N (84%) and PS (73%) conclusions are accepted substantially more frequently than the PW (27%) and I (27%) conclusions under both types of instruction. There does appear to be a trend for endorsement of PS conclusion to fall relative to N conclusion under necessity instructions only. No such trend was observed by Evans et al. who found that acceptance rates of N and PS conclusions were very similar throughout. In fact, the interaction between instruction and conclusion type did not approach significance (F < 1) in the current experiment. There was no effect of conclusion order, nor any other significant effects in the ANOVA. The analysis of Experiment 1 shows that the selected syllogisms have produced the expected trends with abstract materials. We now consider the effect of introducing thematic content associated with prior beliefs into the same syllogistic structures. EXPERIMENT 2 Experiment 2 employed the same logical structures as those used in Experiment 1, but introduced thematic material in such a way that the conclusions offered for evaluation were either believable or unbelievable. The purpose of this was to test for the presence of belief bias effects. The experiment extends previous research by (1) testing for belief bias under instructions for possibility as well as necessity and (2) testing for belief bias across all four categories of syllogism: necessary, possible strong, possible weak, and impossible. The specific hypotheses concerning positive and negative effects of belief bias under different instructions, derived earlier from the mental model theory, can be assessed also by making comparisons to the abstract data collected in Experiment 1.

946 EVANS, HANDLEY, HARPER Method Design In contrast to Experiment 1, conclusion order (AC, CA) was manipulated as a between-groups factor. The reason for this was that the introduction of conclusion believability as a new factor doubles the number of syllogisms to be considered. Manipulating conclusion order between participants enabled the number of syllogisms presented to each individual to be kept down to 32. There were hence four groups used in this experiment: necessity AC, necessity CA, possibility AC, and possibility CA. The withinparticipant factors were conclusion believability (believable vs. unbelievable) and conclusion type (N, PS, PW, I). Participants A total of 100 undergraduate students of the University of Plymouth were recruited as paid volunteers to participate in the experiment; 25 were allocated to each of the four groups. Materials and procedure In order to construct believable and unbelievable conclusions, the A and C terms of the syllogisms were drawn from four animal classification categories: reptiles, birds, mammals, and fish along with two typical members of each category: snakes and lizards, robins and sparrows, dogs and cats, and cod and tuna. The linking (B) terms were all nonsense terms taken from Newstead et al. (1992): hemophods, bictoids, juanrics, zaphods, phylones, enculions, glissomae, and cryptoids. The advantage of using made-up terms for B is that each individual premise will lack obvious prior believability. These terms were used to construct 16 believable and 16 unbelievable versions of both the AC and CA syllogisms shown in Table 1. An example of a believable (conclusion) syllogism with a necessary conclusion is: Some reptiles are not bictoids All lizards are bictoids Therefore, some reptiles are not lizards and an example of an unbelievable (conclusion) syllogism with a possible weak conclusion is the following: Some birds are haemophods All haemophods are cats Therefore, all cats are birds Each participant received 32 syllogisms in an individually randomized order. Apart from the change to the materials, the procedure was identical to that described for Experiment 1. Results and discussion The percentage acceptance rates under necessity and possibility instructions are shown in Figures 2 and 3 respectively. Abstract response rates (Experiment 1) have been included for comparison. An analysis of variance was carried out initially on the frequency of acceptance of conclusions in Experiment 2 only. There were two between-group factors: conclusion order

NECESSITY, POSSIBILITY AND BELIEF 947 Figure 2. Percentage acceptance of conclusions in Experiment 2 necessity group. Data for abstract syllogisms under necessity instructions (Experiment 1) also shown. Figure 3. Percentage acceptance of conclusions in Experiment 2 possibility group. Data for abstract syllogisms under possibility instructions (Experiment 1) also shown. (AC, CA) and instruction type (necessity, possibility), and two within-participant factors: conclusion type (N, PS, PW, I) and conclusion believability (believable, unbelievable). For the purpose of this analysis, only the black and white bars in the figures are relevant. All four main effects produced statistically significant effects. First, as expected, there were more conclusions accepted under instructions for possibility (52%) than for necessity (44%), F(1, 96) = 13.36, MSE = 0.081, p <.001. Second, in line with the belief bias effect, there were substantially more conclusions accepted when they were believable (64%) than when they were unbelievable (32%), F(1, 96) = 20.93, MSE = 0.201, p <.001. Third, the number of acceptances was affected by the type (N 77%, PS 59%, PW 34%, I 21%) of the conclusion, F(3, 288) = 12.59, MSE = 0.079, p <.001. Finally, there was a much smaller but significant effect of conclusion order, such that CA (50%) conclusions were accepted more often than AC (46%)

948 EVANS, HANDLEY, HARPER conclusions, F(1, 96) = 5.85, MSE = 0.081, p <.05. This last effect was unexpected (not observed by Evans et al., 1999). It does not indicate figural bias, which requires an interaction between figure and conclusion order. As it is relatively small in size and caused no interactions with other factors, we will not discuss it further. The analysis produced one significant interaction, between conclusion believability and conclusion type, F(3, 288) = 13.66, MSE = 0.032, p <.001. This interaction is to be expected on the basis of previous research on belief bias. For example, as noted earlier, there was a significant belief by logic interaction observed by Evans et al. (1983) and in subsequent replications which is shown in Table 1. The typical pattern is that belief bias is more marked on invalid than on valid syllogisms. These correspond to the possible strong and necessary conclusion types in the present experiment. The difference between believable and unbelievable acceptances was 26% for N and 45% for PS. The possible weak category has not to our knowledge been previously investigated with the belief bias manipulation. Here, the difference between believable and unbelievable was 34% compared with 24% for impossible problems. In terms of the mental model theory both trends make sense, in that on the possible problems (PS, PW) there exists at least one model that supports the conclusion and at least one that does not, so providing an opportunity for people to discover a model that supports prior belief in either case. On determinate problems (N and I) available models either support the conclusion (N) or do not (I). The fact that some belief bias occurs on such determinate problems at all, however, requires an explanation beyond the concept of searching for alternative models, and we return to this issue in the General Discussion. For theoretical reasons explained earlier, it is important to decide whether the belief bias effects are positive (increased acceptance of believable conclusions) or negative (decreased acceptance of unbelievable conclusions) in nature. This requires a baseline measure, which we take for the time being from the abstract response levels of Experiment 1. Our first hypothesis was that negative belief bias effects should be observed for PS problems, but only under instructions for necessity. This is because it only makes sense to look for counter-examples to a conclusion supported by the current model when trying to prove that the conclusion is necessary. Inspection of Figure 2 shows that the influence of belief on PS problems was primarily negative under necessity instructions, as the abstract response level is close to that for believable conclusions and much higher than for unbelievable conclusions. However, contrary to the hypothesis, exactly the same trend appears to be present under possibility instructions. Looking at Figure 3, we see indications of a positive belief bias for PW under possibility instructions as predicted (acceptance of believable conclusions is well above abstract levels). However, contrary to the hypothesis again, Figure 2 suggests a similar trend under necessity instructions. We assessed the statistical evidence for these trends by running four separate ANOVA comparisons between the findings of Experiments 1 and 2. This was necessitated by the fact that conclusion order was within participant in Experiment 1 but between participants in Experiment 2. In the first analysis, we included the data for all abstract-task participants (Experiment 1) but including only the 16 trials on which they assessed problems with an AC conclusion order. This analysis also included the AC groups from Experiment 2, but included only the 16 trials on which they received the syllogisms with believable conclusions. Combining the data in this way, we were able to produce a balanced ANOVA with conclusion believability and instruction (necessity, possibility) as a between-group factor and conclusion

NECESSITY, POSSIBILITY AND BELIEF 949 type (N, PS, PW, I) as a within-participant factor. The same procedure was repeated with the CA problems to produce a parallel second ANOVA that also tested for positive belief bias. Two similar analyses were performed to test for negative belief bias, by combining data for unbelievable conclusions only collected in Experiment 2 with appropriate comparisons in Experiment 1. The results of all four ANOVAs are summarized in Table 3. There were no significant effects other than those shown in the table. All three main effects were significant in all analyses. There were significantly more conclusions accepted under instructions for possibility than necessity, and significant effects of the conclusion type (N, PS, PW, I). It is interesting to note that the conclusion believability factor was significant throughout, thus demonstrating both positive and negative belief bias effects, and that in no case contrary to expectation did the belief effect interact with the instruction type. We have already noted in the comments on Figures 2 and 3 that the trend for negative belief bias seemed most marked on possible strong problems, and that for positive belief bias seemed most marked on possible weak problems. The corresponding interaction between conclusion believability and conclusion type was significant in three out of four analyses. It did not appear in the test of positive belief bias when AC order problems only were considered. The nature of the interaction of belief and conclusion type is not consistent with an account in terms of the mood of the syllogistic conclusion. Recall (see Table 2) that for unavoidable reasons necessary and possible strong syllogisms had particular conclusions (some and some not) whereas possible weak and impossible syllogisms had universal conclusions (all and no). If this were the cause of the interaction then we would see trends for the former two types TABLE 3 Summary of analyses of variance testing for positive and negative belief bias effects by selective comparisons of the data in Experiments 1 and 2 Analysis Effect df F MSE p Positive belief, Conclusion believability 1, 96 3.96 0.095 <.05 AC order Instruction 1, 96 12.49 0.095 <.001 Conclusion type 3, 288 148.35 0.056 <.001 Conclusion believability 3, 288 1.09 0.056 n.s. Conclusion type Positive belief, Conclusion believability 1, 96 26.71 0.087 <.001 CA order Instruction 1, 96 12.66 0.087 <.001 Conclusion type 3, 288 118.70 0.064 <.001 Conclusion believability 3, 288 4.03 0.064 <.001 Conclusion type Negative belief, Conclusion believability 1, 96 58.45 0.095 <.001 AC order Instruction 1, 96 17.57 0.095 <.001 Conclusion type 3, 288 120.47 0.057 <.001 Conclusion believability 3, 288 8.59 0.057 <.001 Conclusion type Negative belief Conclusion believability 1, 96 30.94 0.112 <.001 CA order Instruction 1, 96 4.22 0.112 <.05 Conclusion type 3, 288 113.29 0.065 <.001 Conclusion believability 3, 288 2.98 0.065 <.05 Conclusion type

950 EVANS, HANDLEY, HARPER differing from the latter two. In fact, negative belief bias is stronger for PS than for N problems, and positive belief bias is stronger for PW than for l problems. These trends cannot be accounted for in terms of the confounding of conclusion mood with conclusion type. Although we have found clear effects of belief in these experiments, the pattern is evidently deviant in a number of ways from that hypothesized in our Introduction. Theoretical discussion of these trends, as well as more detailed analysis of positive and negative belief bias effects, is provided in the General Discussion. EXPERIMENT 3 The comparative analyses of Experiments 1 and 2 have produced clear evidence of both positive and negative belief bias throughout, regardless of instructions for judgements of possibility and necessity. It also appears that positive belief bias is particularly marked on possible weak problems, and negative belief bias is mostly present on possible strong problems (see the General Discussion for further analysis of these trends). These findings have potentially important consequences for the mental-model theory of reasoning. Before discussing them we report a further experiment to replicate and confirm the most theoretically significant of these findings. In Experiment 3, we improve on the design of the previous experiments in two ways. First, we introduce syllogisms with neutral conclusions constructed from similar terms to those used to make the believable and unbelievable ones. This provides a more accurate baseline from which to infer positive and negative belief bias effects than the use of abstract syllogisms. Second, we bring believable, unbelievable, and neutral syllogisms together in a single experiment of balanced design, avoiding the complications of the kind of comparative analysis reported in Table 3. In Experiment 3, we concentrated on providing a powerful test of the hypotheses of most interest: namely that there will be a positive belief bias on possible weak problems and a negative belief bias on possible strong problems. For this reason, we also simplified the design by (1) dropping the impossible problems from the experiment and (2) confining the study to the use of necessity instructions only. The reasons for the latter restriction are first that these correspond to the validity judgements with which all other belief bias experiments in the literature are concerned, and second that we have clear evidence from Experiments 1 and 2 that there is no interaction between the effects of belief (positive or negative) and the type of instruction used. Method Design Participants were tested in two separate groups, according to whether they received conclusions to syllogisms in the AC or CA orders. Within-participant factors were conclusion believability on three levels (believable, unbelievable, neutral) and conclusion type on three levels (N, PS, PW). There were four repetitions of each syllogism type, making 36 problems to be processed by each individual participant. Participants A total of 60 undergraduate students of the University of Plymouth were recruited as paid volunteers to participate in the experiment; 30 were allocated to each of the two groups.

NECESSITY, POSSIBILITY AND BELIEF 951 Materials and procedure The believable and unbelievable syllogisms used were the same as those employed in Experiment 2 excluding the I problem types. Syllogisms with belief-neutral conclusions were constructed from the same logical types (Table 2) by the following means. A class member was used for the linking (B) terms in the syllogism, and the other terms were taken from the previous set of nonsense terms, plus four new ones acting as supposed class types: macata, panphids, cirblones, and catraphedon. An example of a neutral syllogism with a necessary (valid) conclusion is the following: Some junarics are lizards All lizards are panphids Therefore, some panphids are junarics Instructions and procedure were similar to those used in Experiments 1 and 2. The 36 problems were presented in an individually randomized order. Results and discussion The results of Experiment 3 are shown in Figure 4. As before, acceptance rates were highest for necessary syllogisms, next highest for possible strong, and lowest for possible weak. Inspection of the effects of conclusion believability suggests that there is small negative belief bias on N syllogisms, a much larger and also negative belief bias on PS syllogisms, and largely positive belief bias effect on PW syllogisms. An analysis of variance was carried out to assess the significance of these trends. The between-groups factor was conclusion order (AC, CA). Within-groups factors were conclusion believability and conclusion type (N, PS, PW). In the ANOVA there was no significant effect (F < 1) of the group factor, conclusion order. There were, however, highly significant effects of conclusion believability (believable 68%, neutral 58%, unbelievable 41%), F(2, 116) = 53.17, MSE = 0.063, p <.001, and conclusion type (necessary 80%, possible strong 60%, possible weak 26%), F(2, 116) = 158.53, MSE = 0.087, p <.001, and a significant interaction between these two factors, F(4, 232) = 5.67, MSE = 0.045, p <.001. The interaction reflects the fact that the nature and the extent of the belief Figure 4. Percentage acceptance of conclusions in Experiment 3 (necessity instructions only).